regex - Need a breakdown of the following regular expression

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I have some trouble understanding how the following regular expression is working.
,(?=([^\"]*\"[^\"]*\")*[^\"]*$)
The expression basically matches all the commas that are NOT enclosed in quotes.
For example:
apple, banana, pineapple, "tropical fruits like mango, guava, key lime", peaches
Will be split into:
apple
banana
pineapple
"tropical fruits like mango, guava, key lime"
peaches
Can someone provide me with a good breakdown of the expression? I don't understand how positive look-ahead is working.
                It might be easier for you to understand without the unnecessary escapes (). You can use just plain " unless this regex is itself wrapped in " quotes.
– Phil Perry
                Dec 24, 2013 at 14:29
                positive lookahead means to find the pattern and see if it's followed by another pattern. There are also positive lookbehind, negative lookahead and negative lookbehind. Take a look at this regular-expressions.info/lookaround.html
– Angela
                Dec 24, 2013 at 14:31
                @gtgaxiola I would match instead of split the string. Here's the pattern /("|').*?\1|[^,\s]+/s, it should work in Java if you escaped twice \1, \s etc... See this demo. I've added an explanation here
– HamZa
                Dec 24, 2013 at 15:08
If you would visualy represent your regex you would get (thanks to RegExpr)
You could use ^(([^'",]+|'[^']*'|"[^"]*")|,)+$ the the second capture group would get each of your elements 
Note: I have no clue what programming language you are using... That makes it harder to give a good example. Because I do not know what exacty you want to match. If you programing lanuaguage is able to store each of the #2 Groups in an array, you have a sollution...
Look-around assertions
Look-around assertions (positive look-ahead including) are zero-width checks. They really don’t consume anything from the input, but they let the regex engine backtrack if they are not satisfied.
Positive look-ahead remembers the position in input and tries to match from the current position to the right. If it does not match, regex engine backtracks, otherwise it returns to the remembered position in input and continues after the look-ahead.
The regex deconstructed
This regex consumes a comma and ensures, that the rest of input matches ([^\"]*\"[^\"]*\")*[^\"]*$.
[^\"] means “one character, not a double-quote”.
* means the previous character can be repeated zero or more times.
The parentheses form a group – it means “any string containing exactly two double-quotes, ending with one”.
When * is applied on this group, it means “any string containing even number of double-quotes, ending with one”.
The “ending with one [double-quote]” part of description is problem, you don’t want such a constraint. So you append [^\"]* to provide possibility for non-double-quote characters.
$ matches the end of string.
So all-in-all, the look-ahead checks if there is even number of double-quotes till the and of string after the comma.
Match the character "," literally «,»
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=([^\"]*\"[^\"]*\")*[^\"]*$)»
   Match the regular expression below and capture its match into backreference number 1 «([^\"]*\"[^\"]*\")*»
      Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
      Note: You repeated the capturing group itself.  The group will capture only the last iteration.  Put a capturing group around the repeated group to capture all iterations. «*»
      Match any character that is not a "A " character" «[^\"]*»
         Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
      Match the character """ literally «\"»
      Match any character that is not a "A " character" «[^\"]*»
         Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
      Match the character """ literally «\"»
   Match any character that is not a "A " character" «[^\"]*»
      Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
   Assert position at the end of the string (or before the line break at the end of the string, if any) «$»
(I'm not affiliated with RegexBuddy or its author in any way. Just a user of the software product.)
        Thanks for contributing an answer to Stack Overflow!
Please be sure to answer the question. Provide details and share your research!
But avoid …
Asking for help, clarification, or responding to other answers.
Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.