Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I am coding in Java and have a method that returns a string that looks something like this:
0, 2, 23131312,"This, is a message", 1212312
and I would like the string to be spit like:
["0", "2", "23131312", "This, is a message", "1212312"]
When I use the split string method on comma, it splits the "This, is a message" as as well, which I don't want. I would like it to ignore that particular comma and get rid of double quotes, if possible.
I looked up some answers and CSV seems to be the way to do it. However, I don't understand it properly.
–
–
I think you can use the regex,(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)
from here: Splitting on comma outside quotes
You can test the pattern here: http://regexr.com/3cddl
Java code example:
public static void main(String[] args) {
String txt = "0, 2, 23131312,\"This, is a message\", 1212312";
System.out.println(Arrays.toString(txt.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)")));
Another way of doing it, would be to iterate through the string, save an index, and when you hit a " ", do String.substring and insert into the array, and update the index. When you hit a double quote ("), you look for another double quote, and insert the substring into the array and update the index.
I'll comment on solutions based on programming an algorithm from scratch without the help of any library. I'm not saying that this is better than using a library.
First, this problem has more quirks than it would seem at first glance. I mean:
Spaces around commas must be removed.
Syntax errors are possible, e.g. 0,1,"string"notcomma,hi
I wonder how double quotes within a string would be escaped, I guess double quotes would be doubled (e.g. "This, is a ""message"""
). These should be parsed correctly too.
If (as it seems) non-quoted values are always numbers (or, at least, whitespace-free), I'd go for a solution which scans the string:
class StringScanner
private final String s;
private int currentPosition;
public StringScanner (String s)
this.s = s;
this.currentPosition = 0;
skipWhitespace ();
private void skipWhitespace ()
while (currentPosition < s.length() && s.charAt (currentPosition) == ' ')
currentPosition++;
private String nextNumber ()
final int start = currentPosition;
while (currentPosition < s.length() && s.charAt (currentPosition) != ' ')
currentPosition++;
return s.substring (start, currentPosition);
private String nextString ()
if (s.charAt (currentPosition) != '\"')
throw new Error ("You should NEVER see this error, no matter what the input string is");
currentPosition++;
final int start = currentPosition;
// Modify the following loop to test for escaped quotes if necessary
while (currentPosition < s.length() && s.charAt (currentPosition) != '\"')
currentPosition++;
if (currentPosition >= s.length || s.charAt (currentPosition) != '\"')
throw new Error ("Parse error: Unterminated string");
final String r = s.substring (start, currentPosition);
currentPosition++;
return r;
public String nextField ()
String r;
if (currentPosition >= s.length ())
r = null;
else if (s.charAt (currentPosition) == '\"')
r = nextString ();
r = nextNumber ();
skipWhitespace ();
if (currentPosition < s.length () && s.charAt (currentPosition) != ',')
throw new Error ("Parse error: no comma at end of field");
currentPosition++;
skipWhitespace ();
if (currentPosition >= s.length ())
throw new Error ("Parse error: string ends with comma");
return r;
Then, split the string with something like:
String s = "0, 1, \"Message, ok?\", 55";
StringScanner ss = new StringScanner (s);
String field = ss.nextField ();
while (field != null)
System.out.println ("Field found: \"" + field + "\"");
field = ss.nextField ();