Why is multithreading often preferred for improving performance?

I have a question, it’s about why programmers seems to love concurrency and multi-threaded programs in general.

I’m considering 2 main approaches here:

an async approach basically based on signals, or just an async approach as called by many papers and languages like the new C# 5.0 for example, and a “companion thread” that manages the policy of your pipeline
a concurrent approach or multi-threading approach

I will just say that I’m thinking about the hardware here and the worst case scenario, and I have tested this 2 paradigms myself, the async paradigm is a winner at the point that I don’t get why people 90% of the time talk about multi-threading when they want to speed up things or make a good use of their resources.

I have tested multi-threaded programs and async program on an old machine with an Intel quad-core that doesn’t offer a memory controller inside the CPU, the memory is managed entirely by the motherboard, well in this case performances are horrible with a multi-threaded application, even a relatively low number of threads like 3-4-5 can be a problem, the application is unresponsive and is just slow and unpleasant.

A good async approach is, on the other hand, probably not faster but it’s not worst either, my application just waits for the result and doesn’t hangs, it’s responsive and there is a much better scaling going on.

I have also discovered that a context change in the threading world it’s not that cheap in real world scenario, it’s in fact quite expensive especially when you have more than 2 threads that need to cycle and swap among each other to be computed.

On modern CPUs the situation it’s not really that different, the memory controller it’s integrated but my point is that an x86 CPUs is basically a serial machine and the memory controller works the same way as with the old machine with an external memory controller on the motherboard. The context switch is still a relevant cost in my application and the fact that the memory controller it’s integrated or that the newer CPU have more than 2 core it’s not bargain for me.

For what i have experienced the concurrent approach is good in theory but not that good in practice, with the memory model imposed by the hardware, it’s hard to make a good use of this paradigm, also it introduces a lot of issues ranging from the use of my data structures to the join of multiple threads.

Also both paradigms do not offer any security abut when the task or the job will be done in a certain point in time, making them really similar from a functional point of view.

According to the X86 memory model, why the majority of people suggest to use concurrency with C++ and not just an async approach ? Also why not considering the worst case scenario of a computer where the context switch is probably more expensive than the computation itself ?

You have multiple cores/procesors, use them

Async is best for doing heavy IO bound processing but what about heavy CPU bound processing?

The problem arises when single-threaded code blocks (ie gets stuck) on a long-running process. For instance, remember back when printing a word processor document would make the whole application freeze until the job was sent? Application freezing is a side-effect of a single-threaded application blocking during a CPU-intensive task.

In a multi-threaded application, CPU-intensive tasks (ex a print job) can be sent to a background worker thread thereby freeing up the UI thread.

Likewise, in a multi-process application the job can be sent via messaging (ex IPC, sockets, etc) to a subprocess designed specifically to process jobs.

In practice, async and multi-threaded/process code each have their benefits and drawbacks.

You can see the trend in the major cloud platforms, as they will offer instances specialized for CPU bound processing and instances specialized for IO bound processing.

Examples:

Storage (ex Amazon S3, Google Cloud Drive) is CPU bound
Web Servers are IO bound (Amazon EC2, Google App Engine)
Databases are both, CPU bound for writes/indexing and IO bound for reads

To put it into perspective…

A webserver is a perfect example of a platform that is strongly IO bound. A multi-threaded webserver that assigns one thread per connection doesn’t scale well because every thread incurs more overhead due to the increased amount of context switching and thread locking on shared resources. Whereas an async webserver would use a single address space.

Likewise, an application specialized for encoding video would work much better in a multi-threaded environment because the heavy processing involved would lock the main thread until the work was done. There are ways to mitigate this but it’s much easier to have a single thread managing a queue, a second thread managing cleanup, and a pool of threads managing the heavy processing. Communication between threads happens only when tasks are assigned/completed so thread-locking overhead is kept to a bare minimum.

The best application often uses a combination of both. A webapp, for instance may use nginx (ie async single-threaded) as a load balancer to manage the torrent of incoming requests, a similar async webserver (ex Node.js) to handle http requests, and a set of multi-threaded servers handle uploading/streaming/encoding content, etc…

There have been a lot of religious wars over the years between multi-threaded, multi-process, and async models. As with the most things the best answer really should be, “it depends.”

It follows a the same line of thinking that justifies using GPU and CPU architectures in parallel. Two specialized systems running in concert can have a much greater improvement than a single monolithic approach.

Neither are better because both have their uses. Use the best tool for the job.

Update:

I removed the reference to Apache and made a minor correction. Apache uses a multiprocess model which forks a process for every request increasing the amount of context switching at the kernel level. In addition, since the memory can’t be shared across processes, each request incurs an additional memory cost.

Multi-threading gets around requiring additional memory because it relies on a shared memory between threads. Shared memory removes the additional memory overhead but still incurs the penalty of increased context switching. In addition — to ensure that race conditions don’t happen — thread locks (that ensure exclusive access to only one thread at a time) are required for any resources that are shared across threads.

It’s funny that you say, “programmers seems to love concurrency and multi-threaded programs in general.” Multi-threaded programming is universally dreaded by anybody who has done any substantial amount of it in their time. Dead locks (a bug that happens when a resource is mistakenly locked by two different sources blocking both from ever finishing) and race conditions (where the program will mistakenly output the wrong result randomly due to incorrect sequencing) are some of the most difficult to track down and fix.

Update2:

Contrary to the blanket statement about IPC being faster than network (ie socket) communications. That’s not always the case. Keep in mind that these are generalizations and implementation-specific details may have a huge impact on the result.

Microsoft’s asynchronous approach is a good substitue for the most common of the purposes for multithreaded programming: improving responsiveness with respect to IO tasks.

However, it’s important to realize that the asynchronous approach is not capable of improving performance at all, or improving responsiveness with respect to CPU intensive tasks.

Multithreading for Responsiveness

Multithreading for responsiveness is the traditional way to keep a program responsive during heavy IO tasks or heavy computation tasks. You save files on a background thread, so that the user can continue their work, without having to wait for the hard drive to finish its task. The IO thread often blocks waiting for some portion of a write to finish, so context switches are frequent.

Similarly, when performing a complex calculation, you want to allow regular context switching so the UI can remain responsive, and the user doesn’t think the program has crashed.

The goal here is not, in general, to get the multiple threads to run on different CPUs. Instead, we’re just interested in getting context switches to happen between the long-running background task and the UI, so that the UI is able to update and respond to the user while the background task is running. In general, the UI won’t take up much CPU power, and the threading framework or OS will usually decide to run them on the same CPU.

We actually lose overall performance due to the extra cost of context switching, but we don’t care because performance of the CPU wasn’t our goal. We know that we usually have more CPU power than we need, and so our goal with regard to multithreading is to get a task done for the user without wasting the user’s time.

The “Asynchronous” Alternative

The “asynchronous approach” changes this picture by enabling context switches within a single thread. This guarantees that all of our tasks will run on a single CPU, and may provide some modest performance improvements in terms of less thread creation/cleanup and fewer real context switches between threads.

Instead of creating a new thread to await the receipt of a network resource (e.g. downloading an image), an async method is used, which awaits the image becoming available, and, in the meantime, yields to the calling method.

The main advantage here is that you don’t have to worry about threading issues like avoiding deadlock, as you aren’t using locks and synchronization at all, and there’s a bit less work for the programmer setting up the background thread, and getting back on the UI thread when the result comes back in order to update the UI safely.

I haven’t looked too deeply into the technical details, but my impression is that managing the download with occasional light CPU activity becomes a task not for a separate thread, but rather something more like a task on the UI event queue, and when the download completes, the asynchronous method is resumed from that event queue. In other words, await means something akin to “check whether the result I need is available, if not, put me back in this thread’s task queue”.

Note that this approach would not solve the problem of a CPU-intensive task: there’s no data to await, so we can’t get the context switches we need to happen without creating an actual background worker thread. Of course, it might still be convenient to use an asynchronous method to start the background thread and return the result, in a program that pervasively uses the asynchronous approach.

Multithreading for Performance

Since you talk about “performance”, I’d also like to discuss how multithreading can be used for performance gains, something that’s entirely impossible with the single-threaded asynchronous approach.

When you’re actually in a situation where you don’t have enough CPU power on a single CPU, and want to use multithreading for performance, it’s actually often difficult to do. On the other hand, if one CPU isn’t enough processing power, it’s also often the only solution that could enable your program to do what you’d like to accomplish in a reasonable timeframe, which is what makes the work worthwhile.

Trivial Parallelism

Of course, sometimes it can be easy to get real speedup from multithreading.

If you happen to have a large number of independent computation-intensive tasks (that is, tasks whose input and output data are very small with respect to the calculations that must be performed to determine the result), then you can often get significant speedup by creating a pool of threads (sized appropriately based on the number of available CPUs), and having a master thread distribute the work and collect the results.

Practical Multithreading for Performance

I don’t want to put myself forward as too much of an expert, but my impression is that, in general, most practical multithreading for performance that happens these days is looking for places in an application that have trivial parallelism, and using multiple threads to reap the benefits.

As with any optimization, it’s usually better to optimize after you’ve profiled your program’s performance, and identified the hot spots: it’s easy to slow down a program by deciding arbitrarily that this part should run in one thread and that part in another, without first determining whether both parts are taking up a significant portion of CPU time.

An extra thread means more setup/teardown costs, and either more context switches or more inter-CPU communication costs. If it’s not doing enough work to make up for those costs if on a separate CPU, and doesn’t need to be a separate thread for responsiveness reasons, it will slow things down for no benefit.

Look for tasks that have few interdependencies, and that are taking up a significant portion of the runtime of your program.

If they have no interdependencies, then it’s a case of trivial parallelism, you can easily set up each with a thread and enjoy the benefits.

If you can find tasks with limited interdependence, so that locking and synchronization to exchange information doesn’t slow them down significantly, then multithreading can give some speedup, provided you’re careful to avoid the dangers of deadlock due to faulty logic when synchronizing or incorrect results due to not synchronizing when it’s necessary.

Alternatively, some of the more common applications for multithreading aren’t (in a sense) looking for speedup of a predetermined algorithm, but instead for a larger budget for the algorithm they’re planning to write: if you’re writing a game engine, and your AI has to make a decision within your frame rate, you can often give your AI a bigger CPU cycle budget if you can give it its own CPU.

However, be sure to profile the threads and ensure that they’re doing enough work to make up for the cost at some point.

Parallel Algorithms

There are also a lot of problems that can be sped up using multiple processors, but that are too monolithic to simply split between CPUs.

Parallel algorithms have to be carefully analyzed for their big-O runtimes with respect to the best available non-parallel algorithm, as it’s very easy for the inter-CPU communication cost to eliminate any benefits from using multiple CPUs. In general, they must use less inter-CPU communication (in big-O terms) than they use calculations on each CPU.

At the moment, it’s still largely a space for academic research, in part because of the complex analysis required, in part because trivial parallelism is quite common, in part because we don’t yet have so many CPU cores on our computers that problems which can’t be solved in a reasonable time frame on one CPU could be solved in a reasonable time frame using all of our CPUs.

the application is unresponsive and is just slow and unpleasant.

And there’s your problem. A responsive UI does not make a performant application. Often the opposite. A bunch of time is spent checking UI input rather than having the worker threads do their job.

As far as ‘just’ having an async approach, that’s multithreading as well although tweaked for that one particular use case in most environments. In others, that async is done via coroutines that are… not always concurrent.

Frankly, I find async operations to be more difficult to reason about and use in a way that actually provides benefit (performance, robustness, maintainability) even compared to… more manual approaches.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 13:07

Thẻ: concurrency, multithreading, x86

Thiết kế website giá rẻ

Danh mục

Why is multithreading often preferred for improving performance?