Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams String test1[] = string.split(",,"); String test2[] = StringUtils.splitPreserveAllTokens(string , ",,");

test2 has four elements

  • [Hello, , l, , World]
  • with two empty elements. Test1 has 3

  • [Hello, l, World]
  • which is the expected behavior.

    According to the javadoc of splitPreserveAllTokens following is logical

         * StringUtils.splitPreserveAllTokens("::cd:ef", ":")    = ["", "", cd", "ef"]
         * StringUtils.splitPreserveAllTokens(":cd:ef:", ":")    = ["", cd", "ef", ""]
    

    But Still test2 output is not clear to me. Please explain the test2's additional empty elements.

    You could look at the source code which might explain things: commons.apache.org/proper/commons-lang/apidocs/src-html/org/… – Tim Biegeleisen Oct 11, 2017 at 7:16 Yep it explains it splites the string by each char of sperator. Still don't make sense for me though – Asiri Liyana Arachchi Oct 11, 2017 at 7:26

    meaning it should not make any difference if you use "," or ",," as second argument.

    In combination with the first quote and the examples I assume that string beginning and end are as well treated as seperator:

    StringUtils.splitPreserveAllTokens(":cd:ef:", ":") One (empty) token between beginning and first colon, one token between the first and the second colon ("cd"), one between the second and third ("ef") and one (again empty) between the last colon and the end of the string leading to the shown result from the docs: ["", "cd", "ef", ""] (With corrected typo).

    In your case the second quote above is the more relevant one. ",," is not treated as the seperator but as a set of seperator chars. Meaning ",," is equivalent to "," in this case. And then following the first quote you can explain the result you get:
    Beginning of String to first ,: "Hello"
    first comma to second one: ""
    second comma to third: "l"
    thrid to forth: ""
    forth to end of the string: " World"

    String test1[] = string.split("$$"); String test2[] = StringUtils.splitPreserveAllTokens(string , "$$");

    Output:

      Test2  [Hello, l,  World]
      Test1  [Hello$l$ World]
    

    Following is the code for splitPreserveAllTokens

      // standard case
            while (i < len) {
                if (separatorChars.indexOf(str.charAt(i)) >= 0) {
                    if (match || preserveAllTokens) {
                        lastMatch = true;
                        if (sizePlus1++ == max) {
                            i = len;
                            lastMatch = false;
                        list.add(str.substring(start, i));
                        match = false;
                    start = ++i;
                    continue;
                lastMatch = false;
                match = true;
    

    This means that separator chars will be treated as a set of individual separator characters. And whenever any separator character found on the main string it will be splitted.

    Advantage using this method over usual split would be

    splitPreserveAllTokens method handles null implicitly.

    And as mentioned here

    in StringUtils uses splitWorker(String str, char separatorChar, boolean preserveAllTokens) , it's own method, which is a Performance tune for 2.0 (JDK1.4). Difference between splitByWholeSeparatorPreserveAllTokens and split

    Thanks for contributing an answer to Stack Overflow!

    • Please be sure to answer the question. Provide details and share your research!

    But avoid

    • Asking for help, clarification, or responding to other answers.
    • Making statements based on opinion; back them up with references or personal experience.

    To learn more, see our tips on writing great answers.