Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I have an ... odd issue with a bash shell script that I was hoping to get some insight on.

My team is working on a script that iterates through lines in a file and checks for content in each one. We had a bug where, when run via the automated process that sequences different scripts together, the last line wasn't being seen.

The code used to iterate over the lines in the file (name stored in DATAFILE was

cat "$DATAFILE" | while read line 

We could run the script from the command line and it would see every line in the file, including the last one, just fine. However, when run by the automated process (which runs the script that generates the DATAFILE just prior to the script in question), the last line is never seen.

We updated the code to use the following to iterate over the lines, and the problem cleared up:

for line in `cat "$DATAFILE"` 

Note: DATAFILE has no newline ever written at the end of the file.

My question is two part... Why would the last line not be seen by the original code, and why this would change make a difference?

I only thought I could come up with as to why the last line would not be seen was:

  • The previous process, which writes the file, was relying on the process to end to close the file descriptor.
  • The problem script was starting up and opening the file prior fast enough that, while the previous process had "ended", it hadn't "shut down/cleaned up" enough for the system to close the file descriptor automatically for it.
  • That being said, it seems like, if you have 2 commands in a shell script, the first one should be completely shut down by the time the script runs the second one.

    Any insight into the questions, especially the first one, would be very much appreciated.

    BTW, note that cat somefile | while read causes any variables set in the while loop to be destroyed when the loop exits. You probably want while read ...; done <somefile instead; see BashFAQ #24. – Charles Duffy Nov 26, 2019 at 15:42 As workaround: add blank lines after cat, executed in a subshell: (cat "$DATAFILE"; echo "") | while read line – G. C. Feb 8, 2022 at 15:11

    The C standard says that text files must end with a newline or the data after the last newline may not be read properly.

    ISO/IEC 9899:2011 §7.21.2 Streams

    A text stream is an ordered sequence of characters composed into lines, each line consisting of zero or more characters plus a terminating new-line character. Whether the last line requires a terminating new-line character is implementation-defined. Characters may have to be added, altered, or deleted on input and output to conform to differing conventions for representing text in the host environment. Thus, there need not be a one-to- one correspondence between the characters in a stream and those in the external representation. Data read in from a text stream will necessarily compare equal to the data that were earlier written out to that stream only if: the data consist only of printing characters and the control characters horizontal tab and new-line; no new-line character is immediately preceded by space characters; and the last character is a new-line character. Whether space characters that are written out immediately before a new-line character appear when read in is implementation-defined.

    I would not have expected a missing newline at the end of file to cause trouble in bash (or any Unix shell), but that does seem to be the problem reproducibly ($ is the prompt in this output):

    $ echo xxx\\c
    xxx$ { echo abc; echo def; echo ghi; echo xxx\\c; } > y
    $ cat y
    $ while read line; do echo $line; done < y
    $ bash -c 'while read line; do echo $line; done < y'
    $ ksh -c 'while read line; do echo $line; done < y'
    $ zsh -c 'while read line; do echo $line; done < y'
    $ for line in $(<y); do echo $line; done      # Preferred notation in bash
    $ for line in $(cat y); do echo $line; done   # UUOC Award pending
    

    It is also not limited to bash — Korn shell (ksh) and zsh behave like that too. I live, I learn; thanks for raising the issue.

    As demonstrated in the code above, the cat command reads the whole file. The for line in `cat $DATAFILE` technique collects all the output and replaces arbitrary sequences of white space with a single blank (I conclude that each line in the file contains no blanks).

    Tested on Mac OS X 10.7.5.

    What does POSIX say?

    The POSIX read command specification says:

    The read utility shall read a single line from standard input.

    By default, unless the -r option is specified, <backslash> shall act as an escape character. An unescaped <backslash> shall preserve the literal value of the following character, with the exception of a <newline>. If a <newline> follows the <backslash>, the read utility shall interpret this as line continuation. The <backslash> and <newline> shall be removed before splitting the input into fields. All other unescaped <backslash> characters shall be removed after splitting the input into fields.

    If standard input is a terminal device and the invoking shell is interactive, read shall prompt for a continuation line when it reads an input line ending with a <backslash> <newline>, unless the -r option is specified.

    The terminating <newline> (if any) shall be removed from the input and the results shall be split into fields as in the shell for the results of parameter expansion (see Field Splitting); [...]

    Note that '(if any)' (emphasis added in quote)! It seems to me that if there is no newline, it should still read the result. On the other hand, it also says:

    STDIN

    The standard input shall be a text file.

    and then you get back to the debate about whether a file that does not end with a newline is a text file or not.

    However, the rationale on the same page documents:

    Although the standard input is required to be a text file, and therefore will always end with a <newline> (unless it is an empty file), the processing of continuation lines when the -r option is not used can result in the input not ending with a <newline>. This occurs if the last line of the input file ends with a <backslash> <newline>. It is for this reason that "if any" is used in "The terminating <newline> (if any) shall be removed from the input" in the description. It is not a relaxation of the requirement for standard input to be a text file.

    That rationale must mean that the text file is supposed to end with a newline.

    The POSIX definition of a text file is:

    3.395 Text File

    A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the <newline> character. Although POSIX.1-2008 does not distinguish between text files and binary files (see the ISO C standard), many utilities only produce predictable or meaningful output when operating on text files. The standard utilities that have such restrictions always specify "text files" in their STDIN or INPUT FILES sections.

    This does not stipulate 'ends with a <newline>' directly, but does defer to the C standard and it does say "A file that contains characters organized into zero or more lines" and when we look at the POSIX definition of a "Line" it says:

    3.206 Line

    A sequence of zero or more non- <newline> characters plus a terminating <newline> character.

    so per the POSIX definition a file must end in a terminating newline because it's made up of lines and each line must end in a terminating newline.

    A solution to the 'no terminal newline' problem

    Note Gordon Davisson's answer. A simple test shows that his observation is accurate:

    $ while read line; do echo $line; done < y; echo $line
    

    Therefore, his technique of:

    while read line || [ -n "$line" ]; do echo $line; done < y
    
    cat y | while read line || [ -n "$line" ]; do echo $line; done
    

    will work for files without a newline at the end (at least on my machine).

    I'm still surprised to find that the shells drop the last segment (it can't be called a line because it doesn't end with a newline) of the input, but there might be sufficient justification in POSIX to do so. And clearly it is best to ensure that your text files really are text files ending with a newline.

    Thanks for the extensive writeup. I think the difference between the two commmands' behavior is described very well. I'm still a little confused on why the first command fails when run as part of a pipeline that generates the file but not when run independently. Also worth noting is that it's behavior seems to conflict with your experiences of the no-newline behavior of read. I may need to go back to the script and make sure I haven't misinterpreted it's results. – RHSeeger Oct 16, 2012 at 16:35 @adrelanos: I use read because it worked fine 30 years ago and still does for me. Modern style is to use read -r because read got butchered by the POSIX-ization process. Your call — I'm not going to be offended if you use read -r, as long as you can explain what it protects you from compared to using read, and you can explain why you care about that protection. – Jonathan Leffler Aug 29, 2015 at 18:10 One way to work around this restriction is printf '\n' | cat myfile.txt - | while IFS= read -r VAR; do echo "$VAR"; done – XMB5 Jun 1, 2020 at 16:29 @JonathanLeffler regarding "I'm still surprised to find that the shells drop the last segment" at the end of your answer - a file that doesn't end in a newline isn't a text file so any tool designed to read text files can do whatever it likes with it, including discard it. Regarding using read -r instead of read in your comment - the latter will convert foo\tbar into foo<literal tab>bar, etc. as it expands escape sequences and so should only be used if you specifically want that behavior. – Ed Morton Sep 28, 2021 at 12:47

    According to the POSIX spec for the read command, it should return a nonzero status if "End-of-file was detected or an error occurred." Since EOF is detected as it reads the last "line", it sets $line and then returns an error status, and the error status prevents the loop from executing on that last "line". The solution is easy: make the loop execute if the read command succeeds OR if anything was read into $line.

    while read line || [ -n "$line" ]; do
                    +1: Interesting observation, Gordon.  Using my example file y, I ran: while read line; do echo $line; done < y; echo $line and indeed got four different values echoed.  I'm not sure it's a particularly helpful or intuitive behaviour, but ...
    – Jonathan Leffler
                    Oct 16, 2012 at 17:04
                    this solved my problem of reading words from a text file without having a newline at the end of the text file.
    – tauseef_CuriousGuy
                    May 3, 2018 at 15:55
                    Thank you @gordon-davisson it works like expected. But I'm not comfortable with [ -n "$line" ] expression. How is it supposed to recover the last line content?
    – Martin Tovmassian
                    Nov 9, 2021 at 8:25
                    @MartinTovmassian It doesn't recover the last line, it just prevents the loop from being skipped if the last line ends with end-of-file rather than a newline character.
    – Gordon Davisson
                    Nov 9, 2021 at 8:28
    

    Adding some additional info:

  • There's no need to use cat with while loop. while ...;do something;done<file is enough.
  • Don't read lines with for.
  • When using while loop to read lines:

  • Set the IFS properly (you may lose indentation otherwise).
  • You should almost always use the -r option with read.
  • with meeting the above requirements a proper while loop will look like this:

    while IFS= read -r line; do
    done <file
    

    And to make it work with files without a newline at end (reposting my solution from here):

    while IFS= read -r line || [ -n "$line" ]; do
      echo "$line"
    done <file
    

    Or using grep with while loop:

    while IFS= read -r line; do
      echo "$line"
    done < <(grep "" file)
    

    As a workaround, before reading from the text file a newline can be appended to the file.

    echo -e "\n" >> $file_path
    

    This will ensure that all the lines that was previously in the file will be read.We need to pass -e argument to echo to enable interpretation of escape sequences. https://superuser.com/questions/313938/shell-script-echo-new-line-to-file

    # create dummy file. last line doesn't end with newline
    printf "%i\n%i\nNo-newline-here" >testing
    

    Test with your first form (piping to while-loop)

    cat testing | while read line; do echo $line; done
    

    This misses the last line, which makes sense since read only gets input that ends with a newline.

    Test with your second form (command substitution)

    for line in `cat testbed1` ; do echo $line; done
    

    This gets the last line as well

    read only gets input if it's terminated by newline, that's why you miss the last line.

    On the other hand, in the second form

    `cat testing` 
    

    expands to the form of

    line1\nline2\n...lineM 
    

    which is separated by the shell into multiple fields using IFS, so you get

    line1 line2 line3 ... lineM 
    

    That's why you still get the last line.

    p/s: What I don't understand is how you get the first form working...

    I'll go back to the script and make sure I'm not mis-interpreting something. This was all done as part of some work I'm helping with and it's possible we mis-read something in our haste to get it working. – RHSeeger Oct 16, 2012 at 16:41

    Use sed to match the last line of a file, which it will then append a newline if one does not exist and have it do an inline replacement of the file:

    sed -i '' -e '$a\' file

    The code is from this stackexchange link

    Note: I have added empty single quotes to -i '' because, at least in OS X, -i was using -e as a file extension for the backup file. I would have gladly commented on the original post but lacked 50 points. Perhaps this will gain me a few in this thread, thanks.

    I had a similar issue. I was doing a cat of a file, piping it to a sort and then piping the result to a 'while read var1 var2 var3'. cat $FILE|sort -k3|while read Count IP Name The work under the "do" was an if statement that identified changing data in the $Name field and based on change or no change did sums of $Count or printed the summed line to the report. I also ran into the issue where I couldnt get the last line to print to the report. I went with the simple expedient of redirecting the cat/sort to a new file, echoing a newline to that new file and THEN ran my "while read Count IP Name" on the new file with successful results. cat $FILE|sort -k3 > NEWFILE echo "\n" >> NEWFILE cat NEWFILE |while read Count IP Name Sometimes the simple, inelegant is the best way to go.