Stack Overflow for Teams is now called
Stack Internal
. Bring the best of human thought and AI automation together at your work.
Try for free
Learn more
Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Stack Internal
Knowledge at work
Bring the best of human thought and AI automation together at your work.
Explore Stack Internal
Previously I had asked a
question
regarding how to terminate thread blocked for I/O. I have used
pthread_kill()
instead of
pthread_cancel()
or writing to pipes, considering few advantages. After implementing the code to send signal (SIGUSR2) to the target thread using
pthread_kill()
, initially it worked fine and didn't face any issues.
static void signalHandler(int signum) {
LogInfo("SIGNAL HANDLER : %d",signum); //print info message using logger utility
void setSignalHandler() {
struct sigaction actions;
memset(&actions, 0, sizeof(actions));
sigemptyset(&actions.sa_mask);
actions.sa_flags = 0;
actions.sa_handler = signalHandler;
int rc = sigaction(SIGUSR2,&actions,NULL);
if(rc < 0) {
LogError("Error in sigaction"); //print error using logger utility
void stopThread() {
fStop = 1; //set the flag to stop the thread
if( ftid ) { //ftid is pthread_t variable of thread that needs to be terminated.
LogInfo("Before pthread_kill()");
int kill_result = pthread_kill(ftid,SIGUSR2); // send signal to interrupt the thread if blocked for poll() I/O
LogInfo("pthread_kill() returned : %d ( %s )",kill_result,strerror(kill_result));
int result = pthread_join( ftid, NULL );
LogInfo("Event loop stopped for %s", fStreamName);
LogInfo( "pthread_join() -> %d ( %s )\n", result, strerror(result) );
if( result == 0 ) ftid = 0;
Recently I have noticed a deadlock issue where few threads (which tried to log statements) blocked with below stacktrace
#0 0x00007fc1b2dc565c in __lll_lock_wait_private () from /lib64/libc.so.6
#1 0x00007fc1b2d70f0c in _L_lock_2462 () from /lib64/libc.so.6
#2 0x00007fc1b2d70d47 in __tz_convert () from /lib64/libc.so.6
#3 0x00000000004708f7 in Logger::getCurrentTimestamp(char*) ()
getCurrentTimestamp(char*) function is called from our logger module when LogInfo() or LogError() functions are invoked. This function internally calls gettimeofday() to print the current time in logs. So I'm suspecting LogInfo() function (which calls gettimeofday()) called inside signalHandler() might be causing the deadlock issue. But i'm not confident of this as this issue is not getting reproduced frequently and i did not get any information stating that gettimeofday() is not async signal safe.
stack trace of the thread that was interrupted by signal.
#0 0x00007fc1b2dc565c in __lll_lock_wait_private () from /lib64/libc.so.6
#1 0x00007fc1b2d70f0c in _L_lock_2462 () from /lib64/libc.so.6
#2 0x00007fc1b2d70d47 in __tz_convert () from /lib64/libc.so.6
#3 0x00000000004708f7 in Logger::getCurrentTimestamp(char*) ()
#4 0x000000000046e80f in Log::write(Logger*, LogType, std::string const&, char const*) ()
#5 0x000000000046df74 in Log::i(Logger*, char const*, std::string const&, std::string const&, char const*, ...) ()
#6 0x0000000000456b67 in ?? ()
#7 <signal handler called>
#8 0x00007fc1b2da8dc5 in _xstat () from /lib64/libc.so.6
#9 0x00007fc1b2d71056 in __tzfile_read () from /lib64/libc.so.6
#10 0x00007fc1b2d703b4 in tzset_internal () from /lib64/libc.so.6
#11 0x00007fc1b2d70d63 in __tz_convert () from /lib64/libc.so.6
#12 0x00000000004708f7 in Logger::getCurrentTimestamp(char*) ()
#13 0x000000000047030e in Logger::writeLog(int, std::string&, std::string&) ()
#14 0x0000000000470633 in Logger::Write(Logger*, int, std::string, std::string const&) ()
#15 0x000000000046eb02 in Log::write(Logger*, LogType, std::string const&, char const*) ()
#16 0x000000000046df74 in Log::i(Logger*, char const*, std::string const&, std::string const&, char const*, ...) ()
You might also want to check rest of the implementation of LogError to verify that it is async safe completely. Hint: printf, fprintf, sprintf or any C++ streaming API won't be async safe. The POSIX write function is safe.
CommentedNov 5, 2018 at 15:45@user2079303 s[n]printf() can be async-signal-safe. It's not mandated by POSIX, but some implementations of s[n]printf() are async-signal-safe, such as Solaris. So it can be possible to use s[n]printf() to create a string and then use write() to emit the string.
CommentedNov 5, 2018 at 17:28If you're wondering how s[n]printf could not be async-signal-safe, any C library function that isn't mandated to be async-signal-safe is allowed to use malloc internally for scratch space (and I can attest that GNU libc's *printf implementation definitely does do that).
2018-11-05 17:32:31 +00:00
CommentedNov 5, 2018 at 17:32@AndrewHenle but it's not portable to use s[n]printf because not all systems implement it async safely. What would be safe is to copy the source of s[n]printf from a system that does implement it async safely, compile it with a different name, and use that instead of the system provided one. First one must verify that copying the source is license-safe operation :)
CommentedNov 5, 2018 at 17:35@zwol glibc does much worse than just merely using malloc().. That's the underlying implementation of glibc's snprintf(). It uses a static memory buffer probably obtained with something like fmemopen() - with locks. So it's not only non-async-signal-safe, it'll contend with itself if even just used from multiple threads. Ooof, that's ugly.
You mean gettimeofday? If it not explicitly marked as such, anything can happen. It's UB (undefined behavoir).
gettimeofday is not defined as async-signal-safe, but if you pass 0 for the second (timezone) argument, it doesn't have to do anything that time and clock_gettime (both of which are officially async-signal-safe) don't also do, so as a matter of QoI it should be async-signal-safe in that case.
Your deadlock is inside the internal libc function __tz_convert.
#2 0x00007fc1b2d70d47 in __tz_convert () from /lib64/libc.so.6
#3 0x00000000004708f7 in Logger::getCurrentTimestamp(char*) ()
It appears to have been called directly from Logger::getCurrentTimestamp, but that's because it was "tail called" from a documented API function. There are only four functions in GNU libc that call __tz_convert (I grepped the source code): localtime, localtime_r, gmtime, and gmtime_r. Therefore, your problem is not that you are calling gettimeofday, but that you are calling one of those functions.
localtime and gmtime are obviously not async-signal-safe since they write to a global variable. localtime_r and gmtime_r are not async-signal-safe either, because they have to look at the global database of timezone information (yes, even gmtime_r does this — it might be possible to change it not to need to do that, but it still wouldn't be a thing you could rely on cross-platform).
I don't think there's a good workaround. Formatted output from an async signal handler is going to trip over all kinds of other problems; my advice is to restructure your code so that you never need to call logging functions from async signal handlers.
gettimeofday (in glibc at least) calls localtime_r which as you point out calls __tz_convert (which has a lock and is therefore very much async unsafe). The problem is in fact (very likely) the call to gettimeofday.
@user2079303 That can't be OP's immediate problem, because then gettimeofday would be on the stack, since that's not a tail call. However, it's a good point anyway; I forgot about the timezone out-parameter to gettimeofday (because I've never seen anyone pass anything other than NULL -- you will notice that localtime_r is only called when tz is not NULL). Will adjust answer.
Is the shown stack trace from the thread of execution, or from the signal handler? I think that the execution is in __tz_convert one way or another (not by the call to gettimeofday necessarily, as it is not in the call stack after all) and the lock has been acquired, and at that moment the execution is interrupted and the signal handler calls gettimeofday specifically, which also tries to acquire the lock, and blocks indefinitely.
@user2079303 I think you're right that the deadlock is on the lock acquired inside __tz_convert, but, again, gettimeofday doesn't call localtime_r unless its timezone argument is nonzero, and that call isn't a tail call,. Therefore, if Logger::getCurrentTimestamp called gettimeofday with a nonzero timezone argument, we would see gettimeofday on the stack. We don't. Therefore I deduce that Logger::getCurrentTimestamp called one of localtime/localtime_r/gmtime/gmtime_r directly.
@zwol You have pointed me in the right direction. I have added definition of Logger::getCurrentTimestamp and stack trace of thread that received signal. It looks like call to localtime() might be taking lock and causing deadlock. So I have decided to remove LogInfo() statement from signal handler as logging is not important for me. But if I want to print logs in future within signal handler, how can I do it ?
|
Thanks for contributing an answer to Stack Overflow!
Please be sure to answer the question. Provide details and share your research!
But avoid …
Asking for help, clarification, or responding to other answers.
Making statements based on opinion; back them up with references or personal experience.