Performance Characterization of the FreeBSD Network Stack
This paper analyzes the behavior of high-performance web servers along three axes: packet rate, number of connections, and communication latency. Modern, high-performance servers spend a significant fraction of time executing the network stack of the operating system---over 80% of the time fora web server. These servers must handle increasing packet rates, increasing numbers of connections, and the long round trip times of the Internet. Low overhead, non-statistical profiling shows that a large number of connections and long latencies degrade instruction throughput of the operating system network stack significantly. This degradation results from a dramatic increase in L2 cache capacity misses because the working set size of connection data structures grows in proportion to the number of connections and their reuse decreases as communication latency increases. For instance, L2 cache misses increase the number of cycles spent executing the TCP layer of the network stack by over 300% from 1312 cycles per packet to 5364. The obvious solutions of increasing the L2 cache size or using prefetching to reduce the number of misses are surprisingly ineffective.