1 问题背景

后台系统有一个单线程的http 接口,为了提高并发处理能力,开启多个线程并发在跑,修改后接口的响应确实得到提高,但是server每3分钟出现一次crash。原因是系统使用的是curl-7.21.1(August 11 2010)的库,此版本并非线程安全。遂替换了最新的curl-7.34.0(December 12 2013)库,悲催的是时隔几小时还是会偶现crash,于是再仔细阅读官方文档。

官方对最新版本libcurl 的Multi-threading Issues解释如下 [1]

The first basic rule is that you must never simultaneously share a libcurl handle (be it easy or multi or whatever) betweenmultiple threads. Only use one handle in one thread at any time. You can passthe handles around among threads, but you must never use a single handle frommore than one thread at any given time.

libcurl is completely thread safe, except for two issues:signals and SSL/TLS handlers . Signals are used for timingout name resolves (during DNS lookup) - when built without c-ares support andnot on Windows.

When using multiple threads youshould set the CURLOPT_NOSIGNAL option to 1 for all handles .Everything will or might work fine except that timeouts are not honored duringthe DNS lookup - which you can work around by building libcurl with c-aressupport. c-ares is a library that provides asynchronous name resolves. On some platforms, libcurl simply will not function properlymulti-threaded unless this option is set .

Also, note that CURLOPT_DNS_USE_GLOBAL_CACHE is notthread-safe.

此接口并没有使用到SSL/TLS ,但会不会是用到了signals导致的crash呢?官方建议在多线程场景下应该设置CURLOPT_NOSIGNAL选项,因为在解析DNS出现超时的时候将会发生“糟糕”的情况。官方也给出了解决方法,可以使用c-ares [2] 的libcurl版本实现异步域名解析来预防这种“糟糕”的情况,但是最后一句还是告诫我们:在多线程场景下,若不设置CURLOPT_NOSIGNAL选项,可能会有“意外”的情况发生 。通过官方这段描述,可以大致猜测到是没有设置这个选项造成的crash 。下面是官方对此选项的说明 [3]

CURLOPT_NOSIGNAL

Pass a long. If it is 1, libcurl will not use anyfunctions that install signal handlers or any functions that cause signals tobe sent to the process. This option is mainly here toallow multi-threaded unix applications to still set/use all timeout optionsetc, without risking getting signals. The default value for thisparameter is 0. (Added in 7.10)

If this option is set and libcurl has been built withthe standard name resolver, timeouts will not occur while the name resolvetakes place. Consider building libcurl with c-ares support to enableasynchronous DNS lookups, which enables nice timeouts for name resolves withoutsignals.

Setting CURLOPT_NOSIGNAL to 1 makes libcurl NOT ask the system to ignore SIGPIPE signals, whichotherwise are sent by the system when trying to send data to a socket which isclosed in the other end. libcurl makes an effort tonever cause such SIGPIPEs to trigger, but some operating systems have no way toavoid them and even on those that have there are some corner cases when theymay still happen, contrary to our desire. In addition, using CURLAUTH_NTLM_WB authentication could cause a SIGCHLD signal to be raised.

CURLOPT_NOSIGNAL选项的作用是,在多线程处理场景下使用超时选项时,会忽略signals对应的处理函数,但是官方也“无奈地”解释说,这个选项只是“尽量”去避免产生signals,但是在一些操作系统或“极少数的”情况下,还是有产生signals的情况发生。意思是还是有小概率的crash情况发生,这个只能在现网的机器验证了。

仔细看下后台系统接口的实现,发现确实有用到设置超时选项的代码:

        curl_easy_setopt(curl,   CURLOPT_CONNECTTIMEOUT,   timeout);
        curl_easy_setopt(curl,   CURLOPT_TIMEOUT,          timeout);

这两个选项在官方的解释分别是:

CURLOPT_CONNECTTIMEOUT

Pass a long. It should contain the maximum time inseconds that you allow the connection to the server to take. This only limitsthe connection phase, once it has connected, this option is of no more use. Setto zero to switch to the default built-in connection timeout - 300 seconds. Seealso the CURLOPT_TIMEOUT option.

In unix-like systems, thismight cause signals to be used unless CURLOPT_NOSIGNAL is set .

CURLOPT_TIMEOUT

Pass a long as parameter containing the maximum timein seconds that you allow the libcurl transfer operation to take. Normally,name lookups can take a considerable time and limiting operations to less thana few minutes risk aborting perfectly normal operations. This option will causecurl to use the SIGALRM to enable time-outing system calls.

In unix-like systems, thismight cause signals to be used unless CURLOPT_NOSIGNAL is set .

Default timeout is 0 (zero) which means it nevertimes out.

因此,虽然替换了最新thread-safe 的libcurl库,但是这两行设置超时选项的代码,会导致signal发生产生线程安全性问题,因而还是会偶尔出现crash。

2 遗留问题

在官方的Multi-threading Issues 描述中并没有提及curl_global_init [4-5] 的线程安全问题,而在curl_global_init(3)的接口描述中,提及了curl_global_init是非线程安全的。

This function sets up the program environment thatlibcurl needs. Think of it as an extension of the library loader.

This function must be called atleast once within a program (a program is all the code that shares a memoryspace) before the program calls any other function in libcurl .The environment it sets up is constant for the life of the program and is thesame for every program, so multiple calls have the same effect as one call.

The flags option is a bit pattern that tells libcurlexactly what features to init, as described below. Set the desired bits byORing the values together. In normal operation, youmust specify CURL_GLOBAL_ALL . Don't use any other value unless you arefamiliar with it and mean to control internal operations of libcurl.

This function is not thread safe . You must not call it when any other thread in the program (i.e. a threadsharing the same memory) is running. This doesn't just mean no other threadthat is using libcurl. Because curl_global_init() calls functions of other libraries that are similarly thread unsafe, it couldconflict with any other thread that uses these other libraries.

See the description in libcurl (3)of global environment requirements for details of how to use this function.

因此,在多线程的环境下,程序一开始需要先显示地调用一次curl_global_init, 这样在工作线程处理每次请求调用curl_easy_init()时,判断curl_global_init是否调用过,从而避免再次调用curl_global_init以减少冲突的概率。例如,可以这样初始化:

static bool bInit = false;
if (bInit == false)
    bInit= true;
    curl_global_init(CURL_GLOBAL_ALL);
CURL *curl = curl_easy_init();
if (!curl)
    //error handle

3 官网一个多线程的例子

/* A multi-threaded example that uses pthreads extensively to fetch
 * X remote files at once */ 
#include <stdio.h>
#include <pthread.h>
#include <curl/curl.h>
#define NUMT 4
  List of URLs to fetch.
  If you intend to use a SSL-based protocol here you MUST setup the OpenSSL
  callback functions as described here:
  http://www.openssl.org/docs/crypto/threads.html#DESCRIPTION
const char * const urls[NUMT]= {
  "http://curl.haxx.se/",
  "ftp://cool.haxx.se/",
  "http://www.contactor.se/",
  "www.haxx.se"
static void *pull_one_url(void *url)
  CURL *curl;
  curl = curl_easy_init();
  curl_easy_setopt(curl, CURLOPT_URL, url);
  curl_easy_perform(curl); /* ignores error */ 
  curl_easy_cleanup(curl);
  return NULL;
   int pthread_create(pthread_t *new_thread_ID,
   const pthread_attr_t *attr,
   void * (*start_func)(void *), void *arg);
int main(int argc, char **argv)
  pthread_t tid[NUMT];
  int i;
  int error;
  /* Must initialize libcurl before any threads are started */ 
  curl_global_init(CURL_GLOBAL_ALL);
  for(i=0; i< NUMT; i++) {
    error = pthread_create(&tid[i],
                           NULL, /* default attributes please */ 
                           pull_one_url,
                           (void *)urls[i]);
    if(0 != error)
      fprintf(stderr, "Couldn't run thread number %d, errno %d\n", i, error);
      fprintf(stderr, "Thread %d, gets %s\n", i, urls[i]);
  /* now wait for all threads to terminate */ 
  for(i=0; i< NUMT; i++) {
    error = pthread_join(tid[i], NULL);
    fprintf(stderr, "Thread %d terminated\n", i);
  return 0;

更多例子:http://curl.haxx.se/libcurl/c/multithread.html

[1] http://curl.haxx.se/libcurl/c/libcurl-tutorial.html

[2] http://curl.haxx.se/mail/lib-2010-11/0188.html

[3] http://curl.haxx.se/libcurl/c/curl_easy_setopt.html#CURLOPTNOSIGNAL

[4] http://curl.haxx.se/libcurl/c/curl_global_init.html

[5] http://code.lovemiao.com/?tag=multi-thread

最近在工作中用到了libcurl请求大量网页,感觉使用多线程的方式线程数太高的话会影响性能,然后就写了一个简单的基于libcurl和reactor模型的框架,实现单线程高并发,不需要考虑竞争条件的问题,提高性能的同时也能提高安全性,毕竟不需要互斥锁临界区什么的,减轻了多线程编码的负担。 这两天有个朋友在 当中使用,就改成了一下,可以供易语言调用,由于本人易语言水平不怎么样,也就没使用易语言重写,直接用C++编译出DLL供易语言调用。 今天抽空写了两个小demo,也一并发出来了。这个异步框架和平常使用的同步请求思路上有些不同,具体有哪些不同,可以看源码。 本框架只是对libcurl的简单封装,没有太高的技术含量,由于框架没有经过太多的测试,所以难免会有一些bug,如遇到可以发邮件给我,也可以在github发issue 框架源码请移步github:https://github.com/windpiaoxue/curlevent 项目中使用curl,经常莫名其妙挂掉,堆栈破坏,但是最多显示的是DNS解析失败,在网上找了一大圈,终于知道了问题所在和解决方式 引入大神文章:Curl的毫秒超时的一个"Bug",http://www.laruence.com/2014/01/21/2939.html 可惜文章标题没取好,最终在找到第一个解决方案nosignal后再次搜索才找到该文章,发现终极解决方案 1. 初级解决方案,解决... libcurl库如果要静态链接,必须在调用 #incluce &lt;curl/curl.h&gt;前面添加 #define CURL_STATICLIB //静态链接 #define CURL_STATICLIB //静态链接 二、区分debug和release库 程序执行到curl_easy_perform这个函数就崩溃了,按F5继续运行,VS显示是异常,d... #原则相冲突,请谅解,勿喷 system info :Linux 4.2.0-27-generic #32~14.04.1-Ubuntu SMP Fri Jan 22 15:32:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux 安装libcurl的方法 sud... 用libcurl一段时间遇到莫名其妙的程序崩溃的情况,开会觉得是线程栈溢出导致的段错误,专门增加了线程栈的大小貌似无效。线程也是分离的。用valgrind定位到问题可能出现在curl的调用上。 排查的时候也发现了libcurl一些额外的坑,现做个总结笔记。 线程使用libcurl访问时,设置了超时时间,而libcurl库不会为这个超时信号做任何处理,信号产生而没有信号句柄处理,可能导 当多个线程,同时进行curl_easy_init时,由于会调用非线程安全的curl_global_init,因此导致崩溃。 应该在主线程优先调用curl_global_init进行全局初始化。再在线程中使用curl_easy_init。 /***************************************************************************... 一、ibcurl简介 作为是一个多协议的便于客户端使用的URL传输库,基于C语言,提供C语言的API接口,支持DICT, FILE, FTP, FTPS, Gopher, HTTP, HTTPS, IMAP, IMAPS, LDAP, LDAPS, POP3, POP3S, RTMP, RTSP, SCP, SFTP, SMTP, SMTPS, Telnet and TFTP这些协议,同时支持使... libcurl是一个跨平台的开源网络协议库,支持http, https, rtsp等多种协议 。libcurl同样支持HTTPS证书授权,HTTP POST, HTTP PUT, FTP 上传, HTTP基本表单上传,代理,cookies,和用户认证。 所以,使用libcur... 今天发现如果使用多线程调用curl_easy的接口,并发访问若干https的接口,程序会出现偶尔的崩溃。崩溃位于调用curl_easy_cleanup的时候,最后崩溃的函数位于LIBEAY32.dll中的getrn。google搜索之后发现这是libcurl在使用openssl的时候,根据openssl版本的不同,可能需要设置回调的函数https://curl.haxx.se/libcurl/c/...