Can you explain why multiple threads need locks on a single-core CPU?

Assume these threads run in single core cpu. As a cpu only run one instruction in one cycle. That is said, even thought they share the cpu resource. but the computer ensure that one time one instruction. So is the lock un-necessary for multiplethreading?

This is best illustrated with an example.

Suppose we have a simple task that we want to perform multiple times in parallel, and we want to keep track globally of the number of times that the task has been performed, for example, counting hits on a web page.

When each thread gets to the point at which it’s incrementing the count, its execution will look like this:

Read the number of hits from memory into a processor register
Increment that number.
Write that number back to memory

Remember that every thread can suspend at any point in this process. So if thread A performs step 1, and then gets suspended, following by thread B performing all three steps, when thread A resumes, its registers will have the wrong number of hits: its registers will be restored, it will happily increment the old number of hits, and store that incremented number.

In addition, any number of other threads could have run during the time that thread A was suspended, so the count thread A writes at the end might be well below the correct count.

For that reason, it’s necessary to ensure that if a thread performs step 1, it must perform step 3 before any other thread is allowed to perform step 1, which can be accomplished by all threads waiting to get a single lock before they begin this process, and freeing the lock only after the process is complete, so that this “critical section” of code cannot be incorrectly interleaved, resulting in a wrong count.

But what if the operation were atomic?

Yes, in the land of magical unicorns and rainbows, where the increment operation is atomic, then locking would not be necessary for the above example.

It’s important to realize, however, that we spend very little time in the world of magical unicorns and rainbows. In almost every programming language, the increment operation is broken down into the above three steps. That’s because, even if the processor supports an atomic increment operation, that operation is significantly more expensive: it has to read from memory, modify the number, and write it back to memory…and usually the atomic increment operation is an operation that can fail, meaning the simple sequence above has to be replaced with a loop (as we’ll see below).

Since, even in multithreaded code, many variables are kept local to a single thread, programs are much more efficient if they assume each variable is local to a single thread, and let the programmers take care of protecting shared state between threads. Especially given that atomic operations are not usually enough to solve threading issues, as we’ll see later.

Volatile variables

If we wanted to avoid locks for this particular problem, we first have to realize that the steps depicted in our first example aren’t actually what happens in modern compiled code. Because compilers assume only one thread is modifying the variable, each thread will keep its own cached copy of the variable, until the processor register is needed for something else. As long as it has the cached copy, it assumes it doesn’t need to go back to memory and read it again (which would be expensive). They also won’t write the variable back to memory as long as it’s kept in a register.

We can get back to the situation we gave in the first example (with all the same threading problems we identified above) by marking the variable as volatile, which tells the compiler that this variable is being modified by others, and so must be read from or written to memory whenever it’s accessed or modified.

So a variable marked as volatile will not take us to the land of atomic increment operations, it only gets us as close as we thought we were already.

Making the increment atomic

Once we’re using a volatile variable, we can make our increment operation atomic by using a low-level conditional set operation that most modern CPUs support (often called compare and set or compare and swap). This approach is taken, for example, in Java’s AtomicInteger class:

197       /**
198        * Atomically increments by one the current value.
199        *
200        * @return the updated value
201        */
202       public final int incrementAndGet() {
203           for (;;) {
204               int current = get();
205               int next = current + 1;
206               if (compareAndSet(current, next))
207                   return next;
208           }
209       }

The above loop repeatedly performs the following steps, until step 3 succeeds:

Read the value of a volatile variable directly from memory.
Increment that value.
Change the value (in main memory) if and only if its current value in main memory is the same as the value we initially read, using a special atomic operation.

If step 3 fails (because the value was changed by a different thread after step 1), it again reads the variable directly from main memory and tries again.

While the compare-and-swap operation is expensive, it’s slightly better than using locking in this case, because if a thread is suspended after step 1, other threads that reach step 1 do not have to block and wait for the first thread, which can prevent costly context switching. When the first thread resumes, it will fail in its first attempt to write the variable, but will be able to continue by re-reading the variable, which again is likely less expensive than the context switch that would have been necessary with locking.

So, we can get to the land of atomic increments (or other operations on a single variable) without using actual locks, via compare and swap.

So when is locking strictly necessary?

If you need to modify more than one variable in an atomic operation, then locking will be necessary, you won’t find a special processor instruction for that.

As long as you’re working on a single variable, and you’re prepared for whatever work you’ve done to fail and to have to read the variable and start over again, compare-and-swap will be good enough, however.

Let’s consider an example where each thread first adds 2 to a the variable X, and then multiplies X by two.

If X is initially one, and two threads run, we expect the result to be (((1 + 2) * 2) + 2) * 2 = 16.

However, if the threads interleave, we could, even with all operations being atomic, instead have both additions occur first, and the multiplications come after, resulting in (1 + 2 + 2) * 2 * 2 = 20.

This happens because multiplication and addition are not commutative operations.

So, the operations themselves being atomic is not enough, we must make the combination of operations atomic.

We can do that either by using locking to serialize the process, or we could use one local variable to store the value of X when we started our calculation, a second local variable for the intermediate steps, and then use compare-and-swap to set a new value only if the current value of X is the same as the original value of X. If we fail, we would have to start over again by reading X and performing the calculations again.

There are several trade-offs involved: as the calculations become longer, it becomes much more likely that the running thread will be suspended, and the value will be modified by another thread before we resume, meaning failures become much more likely, leading to wasted processor time. In the extreme case of large numbers of threads with very long running calculations, we might have 100 threads read the variable and be engaged in calculations, in which case only the first to finish will succeed in writing the new value, the other 99 will still complete their calculations, but discover upon completion that they can’t update the value…at which point they’ll each read the value and start the calculation over. We’d likely have the remaining 99 threads repeat the same problem, wasting vast quantities of processor time.

Full serialization of the critical section via locks would be much better in that situation: 99 threads would suspend when they didn’t get the lock, and we’d run each thread in order of arrival at the locking point.

If serialization is not critical (as in our incrementing case), and the calculations that would be lost if updating the number fails are minimal, there may be a significant advantage to be gained from using the compare-and-swap operation, because that operation is less expensive than locking.

Consider this quote:

Some people, when confronted with a problem, think, “I know, I’ll use
threads,” and then two they haver poblesms

you see, even if 1 instruction runs on a CPU at any given time, computer programs comprise a lot more than just atomic assembly instructions. So for example, writing to the console (or a file) means you have to have to lock to ensure it works like you want.

Seems many answer attempted to explain locking, but I think what OP needs is an explanation of what multitasking actually is.

When you have more than one thread running on a system even with one CPU, there are two main methodologies that dictate how these threads will be scheduled (i.e. placed to run into your single-core CPU):

Cooperative Multitasking – Used in Win9x required each application to explicitly give up control. In this case, you wouldn’t need to worry about locking since as long as Thread A is executing some algorithm, you would be guaranteed that it will never be interrupted
Preemptive Multitasking – Used in most modern OSs (Win2k and later). This uses timeslices and will interrupt threads even if they are still doing work. This is much more robust because a single thread can never hang your entire machine, which was a real possibility with cooperative multitasking. On the other hand, now you need to worry about locks because at any given time, one of your threads could be interrupted (i.e. preempted) and OS might schedule a different thread to run. When coding multithreaded applications with this behavior, you MUST consider that between every line of code (or even every instruction) a different thread might run. Now, even with a single core, locking becomes very important to ensure consistent state of your data.

The problem does not lie with individual operations, but the larger tasks the operations carry out.

Many algorithms are written with the assumption that they are in full control of the state they operate on. With an interleaved ordered execution model like the one you describe, the operations may be arbitrarily interleaved with each other, and if they share state, there is a risk that the state is in an inconsistent shape.

You can compare it with functions that may temporarily break an invariant in order to do what they do. As long as the intermediary state is not observable from the outside, they can do whatever they want to achieve their task.

When you write concurrent code, you need to ensure that contended state is considered unsafe unless you have exclusive access to it. The common way to achieve exclusive access is synchronizing on a synchronization primitive, like holding a lock.

Another thing that synchronization primitives tend to result in on some platforms is that they emit memory barriers, which ensure inter-CPU consistency of memory.

Except for setting ‘bool’ there is no guarrantee (at least in c) that reading or writing a variable takes only one instruction – or rather can’t be interrupted in the middle of reading/writing it

Shared memory.

It’s the definition of… threads: a bunch of concurrent processes, with shared memory.

If there is no shared memory, they are usually referred as old-school-UNIX processes.
They might need a lock, now and then, when accessing a shared file, though.

(shared memory in UNIX-like kernels was indeed usually implemented using a fake file descriptor representing the shared memory address)

A CPU runs one instruction at a time, but what if you have two or more CPUs?

You are right in that locks are not needed, if you can write the program such that it takes advantage of atomic instructions: instructions whose execution is not interruptible on the given processor, and free from interference by other processors.

Locks are required when several instructions need to be protected from interference, and there is no equivalent atomic instruction.

For instance, inserting a node into a doubly-linked list requires the update of several memory locations. Prior to the insertion, and after the insertion, certain invariants hold about the structure of the list. However, during the insertion, those invariants are temporarily broken: the list is in an “under construction” state.

If another thread marches through the list while the invariants, or also tries to modify it when it is such a state, the data structure will probably become corrupted and the behavior will be unpredictable: maybe the software will crash, or continue with incorrect results. It is therefore necessary for threads to somehow agree to stay out of each other’s way when the list is being updated.

Suitably designed lists can be manipulated with atomic instructions, so that locks are not needed. Algorithms for this are called “lock free”. However, note that atomic instructions are a actually a form of locking. They are specially implemented in hardware, and work via communication among processors. They are more expensive than similar instructions which are not atomic.

On multiprocessors that lack the luxury of atomic instructions, primitives for mutual exclusion have to be built up of simple memory accesses and polling loops. Such problems have been worked on by the likes of Edsger Dijkstra and Leslie Lamport.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 17:54

Thẻ: multithreading

Thiết kế website giá rẻ

Danh mục

Can you explain why multiple threads need locks on a single-core CPU?