I have implemented an application protocol stack that moves an incoming stream of data upward through several layers, as follows:
- copies a TCP segment from an OS buffer to
my_buffer
. - after identifying a record boundary, splits the record in
my_buffer
into tab-separated strings which are copied intodeque<string> my_deque
(rather than into a vector because I then have to immediately pop a couple of fields from the front) - copies
my_deque
tovector<string> my_records[n]
where it is presented without further copying to the application.
I’m wondering whether it’s best to stick to a clean architecture (layering) and pay the price (whatever that is) of copying payload from layer to layer, or if there are some simple optimizations that people use.
Is it customary to use separate buffers for separate layers, or is this something which can easily be optimized away without dangerously compromising the independence of the layers?
2
I’ve had some experience in optimizing network code, both native and .NET. Things that I’ve fond that are important include:
- How many messages/second are being transferred?
- How many megabytes/second are being transferred?
- Is this a garbage-collected language? (In your case, no)
- What are your app requirements?
If 1 is “small” (50 per second or less?), then your network code frankly isn’t what anyone will notice. You should instead optimize for touch-friendliness, or improve your customer’s workflows of something; you’ll get more bang for the buck. For server scale, you’ll need to worry more about thread affinity than about memory copies. If it’s large, a number of OSes now include special network APIs to handle the shear number of calls — Windows, for example, has the RIO APIs in Winsock for just this.
If 2 is “big” (> 100 megabytes/second), then you need to worry about extra copying on servers but only because of the memory pressure. For a VM environment especially, that memory pressure will often be what keeps your VM from handling more clients, which instantly leads to having to run more VMs.
If 3 is true and #1 is “big” and you’re on a server, you have to worry about memory pinning and object lifetime. But don’t let that drive you to avoid GC languages; there are lots of big programs that do networking in GC languages.
And #4 is truly important. Do you have extreme latency requirements like high-speed stock traders? Have you measured your performance?
I rarely meet up with client-side programs where optimizing the number of buffers copied made any difference that users could see. For servers, I generally first see that there’s an unexpected bottleneck that hits long before anything else. There’s a bunch of techniques to help — receive-side scaling, various async techniques, and special buffer-management calls.
TL;DR: if you’re not a server, keep your code clean. If you are, measure first, optimize second. And “optimize” includes lots of techniques, of which “copy less memory” is one.
2
Rather than solve this problem again, I would highly recommend trying out http://kentonv.github.io/capnproto/. It is being developed by the person who was maintaining https://developers.google.com/protocol-buffers/ internally at Google, and is specifically designed to take advantage of all of the good ideas that he had while doing that about how to make things faster for network communication. (Protocol buffers themselves are already far more efficient than what most people do over the wire, and are extensively used internally at Google for exactly that reason.)
2
Avoid copying at all costs. If you look into the history of network protocol implementations, you’ll find that, time and again, the real high-bandwidth optimizations come from building “zero copy” protocols and algorithms. Modern OSes work very hard to pass packets upstream from the interrupt handler through to the IP and sometimes higher layers without copying the data, for very good reasons.