相关文章推荐
成熟的红薯  ·  Linux Docker安装Oracle ...·  10 月前    · 
奔放的熊猫  ·  HttpContent.ReadAsStri ...·  2 年前    · 

recv和send缓冲区的大小如何影响TCP的性能?

1 人关注

我有个问题,关于recv()和send()缓冲区的大小如何影响TCP的性能。考虑下面这个完全工作的C++例子,通过TCP将1GB的(任意的)数据从客户端传输到服务器。

#include <unistd.h>
#include <netdb.h>
#include <errno.h>
#include <netinet/tcp.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/time.h>
#include <sys/ioctl.h>
#include <iostream>
#include <memory>
#include <cstring>
#include <cstdlib>
#include <stdexcept>
#include <algorithm>
#include <string>
#include <sstream>
typedef unsigned long long TimePoint;
typedef unsigned long long Duration;
inline TimePoint getTimePoint() {
    struct ::timeval tv;
    ::gettimeofday(&tv, nullptr);
    return tv.tv_sec * 1000000ULL + tv.tv_usec;
const size_t totalSize = 1024 * 1024 * 1024;
const int one = 1;
void server(const size_t blockSize, const std::string& serviceName) {
    std::unique_ptr<char[]> block(new char[blockSize]);
    const size_t atLeastReads = totalSize / blockSize;
    std::cout << "Starting server. Receiving block size is " << blockSize << ", which requires at least " << atLeastReads << " reads." << std::endl;
    addrinfo hints;
    memset(&hints, 0, sizeof(addrinfo));
    hints.ai_family = AF_INET;
    hints.ai_socktype = SOCK_STREAM;
    hints.ai_flags = AI_PASSIVE;
    hints.ai_protocol = 0;
    addrinfo* firstAddress;
    int result = getaddrinfo(nullptr, serviceName.c_str(), &hints, &firstAddress);
    if (result != 0) return;
    int listener = socket(firstAddress->ai_family, firstAddress->ai_socktype, firstAddress->ai_protocol);
    if (listener == -1) return;
    if (setsockopt(listener, SOL_SOCKET, SO_REUSEADDR, &one, sizeof(one)) != 0) return;
    if (bind(listener, firstAddress->ai_addr, firstAddress->ai_addrlen) != 0) return;
    freeaddrinfo(firstAddress);
    if (listen(listener, 1) != 0) return;
    while (true) {
        int server = accept(listener, nullptr, nullptr);
        if (server == -1) return;
        u_long mode = 1;
        if (::ioctl(server, FIONBIO, &mode) != 0) return;
//        if (setsockopt(server, IPPROTO_TCP, TCP_NODELAY, &one, sizeof(one)) != 0) return;
//        int size = 64000;
//        if (setsockopt(server, SOL_SOCKET, SO_RCVBUF, &size, sizeof(size)) != 0) return;
//        if (setsockopt(server, SOL_SOCKET, SO_SNDBUF, &size, sizeof(size)) != 0) return;
        std::cout << "Server accepted connection." << std::endl;
        size_t leftToRead = totalSize;
        size_t numberOfReads = 0;
        size_t numberOfIncompleteReads = 0;
        const TimePoint totalStart = ::getTimePoint();
        Duration selectDuration = 0;
        Duration readDuration = 0;
        while (leftToRead > 0) {
            fd_set readSet;
            FD_ZERO(&readSet);
            FD_SET(server, &readSet);
            TimePoint selectStart = ::getTimePoint();
            if (select(server + 1, &readSet, nullptr, nullptr, nullptr) == -1) return;
            selectDuration += ::getTimePoint() - selectStart;
            if (FD_ISSET(server, &readSet) != 0) {
                const size_t toRead = std::min(leftToRead, blockSize);
                TimePoint readStart = ::getTimePoint();
                const ssize_t actuallyRead = recv(server, block.get(), toRead, 0);
                readDuration += ::getTimePoint() - readStart;
                if (actuallyRead == -1)
                    return;
                else if (actuallyRead == 0) {
                    std::cout << "Got 0 bytes, which signals that the client closed the socket." << std::endl;
                    break;
                else if (toRead != actuallyRead)
                    ++numberOfIncompleteReads;
                ++numberOfReads;
                leftToRead -= actuallyRead;
        const Duration totalDuration = ::getTimePoint() - totalStart;
        std::cout << "Receiving took " << totalDuration << " us, transfer rate was " << totalSize / (totalDuration / 1000000.0) << " bytes/s." << std::endl;
        std::cout << "Selects took " << selectDuration << " us, while reads took " << readDuration << " us." << std::endl;
        std::cout << "There were " << numberOfReads << " reads (factor " << numberOfReads / ((double)atLeastReads) << "), of which " << numberOfIncompleteReads << " (" << (numberOfIncompleteReads / ((double)numberOfReads)) * 100.0 << "%) were incomplete." << std::endl << std::endl;
        close(server);
bool client(const size_t blockSize, const std::string& hostName, const std::string& serviceName) {
    std::unique_ptr<char[]> block(new char[blockSize]);
    const size_t atLeastWrites = totalSize / blockSize;
    std::cout << "Starting client... " << std::endl;
    addrinfo hints;
    memset(&hints, 0, sizeof(addrinfo));
    hints.ai_family = AF_INET;
    hints.ai_socktype = SOCK_STREAM;
    hints.ai_flags = 0;
    hints.ai_protocol = 0;
    addrinfo* firstAddress;
    if (getaddrinfo(hostName.c_str(), serviceName.c_str(), &hints, &firstAddress) != 0) return false;
    int client = socket(firstAddress->ai_family, firstAddress->ai_socktype, firstAddress->ai_protocol);
    if (client == -1) return false;
    if (connect(client, firstAddress->ai_addr, firstAddress->ai_addrlen) != 0) return false;
    freeaddrinfo(firstAddress);
    u_long mode = 1;
    if (::ioctl(client, FIONBIO, &mode) != 0) return false;
//    if (setsockopt(client, IPPROTO_TCP, TCP_NODELAY, &one, sizeof(one)) != 0) return false;
//    int size = 64000;
//    if (setsockopt(client, SOL_SOCKET, SO_RCVBUF, &size, sizeof(size)) != 0) return false;
//    if (setsockopt(client, SOL_SOCKET, SO_SNDBUF, &size, sizeof(size)) != 0) return false;
    std::cout << "Client connected. Sending block size is " << blockSize << ", which requires at least " << atLeastWrites << " writes." << std::endl;
    size_t leftToWrite = totalSize;
    size_t numberOfWrites = 0;
    size_t numberOfIncompleteWrites = 0;
    const TimePoint totalStart = ::getTimePoint();
    Duration selectDuration = 0;
    Duration writeDuration = 0;
    while (leftToWrite > 0) {
        fd_set writeSet;
        FD_ZERO(&writeSet);
        FD_SET(client, &writeSet);
        TimePoint selectStart = ::getTimePoint();
        if (select(client + 1, nullptr, &writeSet, nullptr, nullptr) == -1) return false;
        selectDuration += ::getTimePoint() - selectStart;
        if (FD_ISSET(client, &writeSet) != 0) {
            const size_t toWrite = std::min(leftToWrite, blockSize);
            TimePoint writeStart = ::getTimePoint();
            const ssize_t actuallyWritten = send(client, block.get(), toWrite, 0);
            writeDuration += ::getTimePoint() - writeStart;
            if (actuallyWritten == -1)
                return false;
            else if (actuallyWritten == 0) {
                std::cout << "Got 0 bytes, which shouldn't happen!" << std::endl;
                break;
            else if (toWrite != actuallyWritten)
                ++numberOfIncompleteWrites;
            ++numberOfWrites;
            leftToWrite -= actuallyWritten;
    const Duration totalDuration = ::getTimePoint() - totalStart;
    std::cout << "Writing took " << totalDuration << " us, transfer rate was " << totalSize / (totalDuration / 1000000.0) << " bytes/s." << std::endl;
    std::cout << "Selects took " << selectDuration << " us, while writes took " << writeDuration << " us." << std::endl;
    std::cout << "There were " << numberOfWrites << " writes (factor " << numberOfWrites / ((double)atLeastWrites) << "), of which " << numberOfIncompleteWrites << " (" << (numberOfIncompleteWrites / ((double)numberOfWrites)) * 100.0 << "%) were incomplete." << std::endl << std::endl;
    if (shutdown(client, SHUT_WR) != 0) return false;
    if (close(client) != 0) return false;
    return true;
int main(int argc, char* argv[]) {
    if (argc < 2)
        std::cout << "Block size is missing." << std::endl;
    else {
        const size_t blockSize = static_cast<size_t>(std::atoll(argv[argc - 1]));
        if (blockSize > 1024 * 1024)
            std::cout << "Block size " << blockSize << " is suspicious." << std::endl;
        else {
            if (argc >= 3) {
                if (!client(blockSize, argv[1], "12000"))
                    std::cout << "The client encountered an error." << std::endl;
            else {
                server(blockSize, "12000");
                std::cout << "The server encountered an error." << std::endl;
    return 0;

我在两台通过1Gbit/s局域网连接的Linux(内核版本4.1.10-200.fc22.x86_64)机器上运行这个例子,在这两台机器上我得到以下行为:如果recv()和send()系统调用使用40字节以上的缓冲区,那么我使用所有可用带宽;然而,如果我在服务器或客户端使用较小的缓冲区,那么吞吐量就会下降。这种行为似乎不受评论出来的套接字选项(Nagle算法和/或发送/接收缓冲区大小)的影响。

我可以理解,以小块的方式发送数据可能是低效的:如果Nagle算法被关闭,而且小块的数据很小,那么TCP和IP的头大小可能会支配有用的有效载荷。然而,我并不指望接收缓冲区的大小会影响传输率。我希望recv()系统调用的成本与在LAN上实际发送数据的成本相比是很便宜的。因此,如果我以5000字节为单位发送数据,我希望传输速率在很大程度上与接收缓冲区的大小无关,因为我调用recv()的速率应该仍然大于局域网传输速率。然而,事实并非如此。

如果有人能向我解释导致速度减慢的原因,我将非常感激:这仅仅是系统调用的成本,还是在协议层面发生了什么?

我在编写基于消息的云计算应用时遇到了这个问题,如果有人能告诉我,在他们看来,这个问题应该如何影响系统的架构,我将不胜感激。由于各种原因,我没有使用ZeroMQ这样的消息传递库,而是自己编写消息传递接口。云中的计算是这样的:服务器之间的消息流不是对称的(也就是说,根据工作负荷,服务器A向服务器B发送的数据可能比反之多得多),消息是异步的(也就是说,消息之间的时间是不可预测的,但许多消息可以被突然发送),消息的大小是可变的,通常很小(10到20字节)。此外,消息原则上可以不按顺序传递,但重要的是消息不被丢弃,也需要一些流量/拥堵控制;因此,我使用TCP而不是UDP。由于消息的大小不一,每条消息都以一个指定消息大小的整数开始,后面是消息的有效载荷。为了从套接字中读取消息,我首先读取消息大小,然后是有效载荷;因此,读取一条消息至少需要两次recv()调用(也许更多,因为recv()返回的数据可能比要求的少)。现在,由于消息大小和消息有效载荷都很小,我最终有许多小的recv()请求,正如我的例子所展示的,这并不能让我充分使用可用带宽。有人对在这种情况下构建消息传递的 "正确 "方法有什么建议吗?

提前感谢您的帮助

5 个评论
"是否只是系统调用的成本".看上去很有可能。你不能用剖析器运行你的两个版本,看看吗?祝你好运。
每次recv()调用所处理的字节数并不影响通过网络发送的字节数。 recv()调用只是从内核的接收数据缓冲区复制下一个N字节到用户空间。 但是每次系统调用都有开销(上下文切换等),所以减少每秒调用recv()的次数确实能让事情变得更有CPU效率。 (我认为每个传入的消息调用两次recv()是在编码的简易性和效率之间的一个合理权衡。)
我不确定我是否理解如何将系统调用的成本与底层工作的成本隔离开来:在剖析器中,我会看到系统调用所花费的总时间,但我无法知道其中有多少是在内核中用掉的,有多少是由于协议问题(例如,尚未收到数据)。有什么方法可以衡量内核切换的成本吗?
如果你把套接字设置为非阻塞,那么你就能保证recv()的调用总是立即返回。 (当然,你将需要在其他地方进行阻塞,例如在select()中,以避免CPU的旋转,但通过这样做,你将能够把等待传入数据的时间和recv()中的时间分开。)
这是个好主意 -- 我已经按照Jeremy的建议更新了测试代码。在进行了一些实验后,发现花在select()上的时间比花在recv()/send()上的时间要长十倍左右。在我看来,这表明问题并不是由于系统调用的开销造成的;而是在协议层面上发生了一些事情。
linux
performance
sockets
tcp
Boris
Boris
发布于 2015-12-05
2 个回答
user207421
user207421
发布于 2016-09-01
已采纳
0 人赞同
  • 你不需要两次 recv() 调用来读取你描述的数据。更聪明的代码,或 recvmsg() ,会解决这个问题。你只需要能够应对这样一个事实:下一条消息中的一些数据可能已经被读取了。

  • 套接字的接收缓冲区至少应该和链接的带宽-延迟乘积一样大。通常情况下,这将是许多千字节。

  • 套接字发送缓冲区应该至少和对等体的套接字接收缓冲区一样大。

  • 否则你就不能使用所有可用的带宽。

    EDIT 针对你下面的评论。

    我不明白为什么用户空间的recv()/send()缓冲区的大小会影响吞吐量。

    它影响到吞吐量,因为它影响到可以飞行的数据量,其最大值由链路的带宽-延迟乘积给出。

    正如人们在上面所说,对recv()/send()的请求并不影响协议。

    这简直是胡扯。对 send() 的请求导致数据被发送,这通过导致协议参与发送而影响协议,而对 recv() 的请求导致数据从接收缓冲区被移除,这通过改变下一个ACK所宣传的接收窗口而影响协议。

    因此,我认为,只要内核的缓冲区有足够的空间,只要我足够快地读取这些数据,应该不会有任何问题。然而,我观察到的情况并非如此。(i)改变内核缓冲区的大小没有影响,(ii)我用40字节的缓冲区已经使用了可用的带宽。