Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I run a find command with tee log and xargs process output; by accident I forget add xargs in second pipe and found this question.

The example:

% tree
├── a.sh
└── home
    └── localdir
        ├── abc_3
        ├── abc_6
        ├── mydir_1
        ├── mydir_2
        └── mydir_3
7 directories, 1 file

and the content of a.sh is:

% cat a.sh
#!/bin/bash
LOG="/tmp/abc.log"
find home/localdir -name "mydir*" -type d  -print | tee $LOG | echo

If I add the second pipe with some command, such as echo or ls, the write log action would occasionally fail.

These are some examples when I ran ./a.sh many times:

% bash -x ./a.sh; cat /tmp/abc.log  // this tee failed
+ LOG=/tmp/abc.log
+ find home/localdir -name 'mydir*' -type d -print
+ tee /tmp/abc.log
+ echo
% bash -x ./a.sh; cat /tmp/abc.log  // this tee ok
+ LOG=/tmp/abc.log
+ find home/localdir -name 'mydir*' -type d -print
+ tee /tmp/abc.log
+ echo
home/localdir/mydir_2  // this is cat /tmp/abc.log output
home/localdir/mydir_3
home/localdir/mydir_1

Why is it that if I add a second pipe with some command (and forget xargs), the tee command will fail occasionally?

@LeeHoYo echo is only for example, I run this command with xargs, but by accident I forget write xargs and found this problem, so I want to know why would cause this. – Tanky Woo Jul 24, 2016 at 6:29 @TankyWoo, but echo does not read stdin so once the pipe fills up, it will block all the pipeline (or if you use the echo internal bash command, you'll get probably an error, as echo doesn't read anything) – Luis Colorado Jul 25, 2016 at 12:37 @LuisColorado block the pipeline is one of the result, such as replace echo with sleep 1, but it can write to file. See my answer. – Tanky Woo Jul 26, 2016 at 2:35 @TankyWoo, if you replace echo with sleep 1 (which also doesn't read data from stdin) as has been explained, as soon as it finishes, the process that feeds it with input will be signalled by the kernel and return with a broken pipe message. – Luis Colorado Jul 27, 2016 at 19:23

The problem is that, by default, tee exits when a write to a pipe fails. So, consider:

find home/localdir -name "mydir*" -type d  -print | tee $LOG | echo

If echo completes first, the pipe will fail and tee will exit. The timing, though, is imprecise. Every command in the pipeline is in a separate subshell. Also, there are the vagaries of buffering. So, sometimes the log file is written before tee exits and sometimes it isn't.

For clarity, let's consider a simpler pipeline:

$ seq 10 | tee abc.log | true; declare -p PIPESTATUS; cat abc.log
declare -a PIPESTATUS='([0]="0" [1]="0" [2]="0")'
$ seq 10 | tee abc.log | true; declare -p PIPESTATUS; cat abc.log
declare -a PIPESTATUS='([0]="0" [1]="141" [2]="0")'

In the first execution, each process in the pipeline exits with a success status and the log file is written. In the second execution of the same command, tee fails with exit code 141 and the log file is not written.

I used true in place of echo to illustrate the point that there is nothing special here about echo. The problem exists for any command that follows tee that might reject input.

Documentation

Very recent versions of tee have an option to control the pipe-fail-exit behavior. From man tee from coreutils-8.25:

--output-error[=MODE]
set behavior on write error. See MODE below

The possibilities for MODE are:

MODE determines behavior with write errors on the outputs:

   'warn' diagnose errors writing to any output
   'warn-nopipe'
          diagnose errors writing to any output not a pipe
   'exit' exit on error writing to any output
   'exit-nopipe'
          exit on error writing to any output not a pipe
  

The default MODE for the -p option is 'warn-nopipe'. The default operation when --output-error is not specified, is to exit immediately on error writing to a pipe, and diagnose errors writing to non pipe outputs.

As you can see, the default behavior is "to exit immediately on error writing to a pipe". Thus, if the attempt to write to the process that follows tee fails before tee wrote the log file, then tee will exit without writing the log file.

The writing to echo's standard input won't fail until echo exits and tee attempts to write again. The echo may exit rather quickly, and the tee may spend time waiting for input before writing to standard output. – Jonathan Leffler Jul 24, 2016 at 7:01 @JonathanLeffler OK. That is likely here and I updated the answer.. I also tested seq 100000 | tee abc.log | sleep 10 and obtained the same intermittent results even though sleep is long-lived. In this case, it appears to me that, when the input buffer for sleep fills, tee can no longer write even though sleep is still active, and this triggers the fail. – John1024 Jul 24, 2016 at 7:12 When the pipe fills, tee is prevented from writing any more to the pipe (it is blocked in the last write call) until the 'reading' process (sleep or echo, neither of which actually reads its standard input) reads something, or terminates (or otherwise closes its standard input, so there is no process left reading the standard output of tee). – Jonathan Leffler Jul 24, 2016 at 7:15 @TankyWoo Thank you for the information on output-error and coreutils version. Answer updated. As for pipes, on Unix anyway, no, processes are not run succession: they run in parallel. This is important when pipes are used to provide continuous output. It also allows pipes to handle very large quantities of data without having to write it to disk at each step. – John1024 Jul 24, 2016 at 8:54 @TankyWoo: No — a pipeline is not serialized execution. A pipeline uses concurrent execution of the programs. It must; the pipes have a finite and quite small capacity (traditionally 5 KiB, but often 64 KiB on modern systems), so if the data flowing down a pipe is more than that size, it is crucial that the programs execute concurrently as otherwise, the writing process will be blocked. – Jonathan Leffler Jul 24, 2016 at 15:11

Right, piping from tee to something that exits early (not dependent on reading the input from tee in your case) will cause intermittent errors. For a summary of this gotcha see:

http://www.pixelbeat.org/docs/coreutils-gotchas.html#tee

I debugged the tee source code, but I'm not familiar with Linux C, so maybe have problems.

tee belongs to coreutils package, under src/tee.c

First, it set buffer with:

setvbuf (stdout, NULL, _IONBF, 0); // for standard output
setvbuf (descriptors[i], NULL, _IONBF, 0);  // for file descriptor

So it is unbuffer?

Second, tee put stdout as its first item in descriptor array, and will write to descriptor with for loop:

/* In the array of NFILES + 1 descriptors, make
   the first one correspond to standard output.   */
descriptors[0] = stdout;
files[0] = _("standard output");
setvbuf (stdout, NULL, _IONBF, 0);
  for (i = 0; i <= nfiles; i++) {
    if (descriptors[i]
        && fwrite (buffer, bytes_read, 1, descriptors[i]) != 1)  // failed!!!
        error (0, errno, "%s", files[i]);
        descriptors[i] = NULL;
        ok = false;

such as tee a.log, descriptors[0] is stdout, and descriptors[1] is a.log.

As @John1024 said, pipeline is parallel (what I misunderstand before). The second pipe command, such as echo, ls, or true, not accept input, so it would not "wait" for the input, and if it execute faster, it will close the pipe (input end) before tee write to output end, so above code, the comment line will failed not not go on writing to file descriptor.

Supply:

The strace result with killed by SIGPIPE:

write(1, "1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n", 21) = -1 EPIPE (Broken pipe)
--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=22649, si_uid=1000} ---
+++ killed by SIGPIPE +++
                I'm tempted to downvote this, as it's mostly a red herring, but the insight at the end that this was a waste of effort is valuable, and I'm sure it was spent with the best of intentions.
– tripleee
                Jul 24, 2016 at 11:03
        

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.