Why would a program require a specific minimum number of CPU cores?

Is it possible to write code (or complete software, rather than a piece of code) that won’t work properly when run on a CPU that has less than N number of cores? Without checking it explicitly and failing on purpose:

IF (noOfCores < 4) THEN don’t run properly on purpose

I’m looking at a game’s (Dragon Age: Inquisition) minimum system requirements, and it states a minimum of a four-core CPU. Many players say it does NOT run on two-core CPU’s and EVEN on Intel Core i3s with two physical and two logical cores. And it’s NOT a problem of computing power.

From my understanding, threads are completely isolated from the CPU by the OS since that cannot be done.

Just to clear things out:

I am NOT asking “Can I find out the number of CPU cores from code, and fail on purpose?” … Such code would be ill-intentioned (forces you to buy a more expensive CPU to run a program – without the need of computational power). I am asking that your code, say, has four threads and fails when two threads are run on the same physical core (without explicitly checking system information and purposely failing).

In short, can there be software that requires multiple cores, without needing additional computing power that comes from multiple cores? It would just require N separate physical cores.

It may be possible to do this “by accident” with careless use of core affinity. Consider the following pseudocode:

start a thread
in that thread, find out which core it is running on
set its CPU affinity to that core
start doing something computationally intensive / loop forever

If you start four of those on a two-core CPU, then either something goes wrong with the core affinity setting or you end up with two threads hogging the available cores and two threads that never get scheduled. At no point has it explicitly asked how many cores there are in total.

(If you have long-running threads, setting CPU affinity generally improves throughput)

The idea that game companies are “forcing” people to buy more expensive hardware for no good reason is not very plausible. It can only lose them customers.

Edit: this post has now got 33 upvotes, which is quite a lot given that it’s based on educated guesswork!

It seems that people have got DA:I to run, badly, on dual-core systems: http://www.dsogaming.com/pc-performance-analyses/dragon-age-inquisition-pc-performance-analysis/ That analysis mentions that the situation greatly improves if hyperthreading is turned on. Given that HT does not add any more instruction issue units or cache, it merely allows one thread to run while another is in a cache stall, that suggests strongly that it’s linked to purely the number of threads.

Another poster claims that changing the graphics drivers works: http://answers.ea.com/t5/Dragon-Age-Inquisition/Working-solution-for-Intel-dual-core-CPUs/td-p/3994141 ; given that graphics drivers tend to be a wretched hive of scum and villany, this isn’t surprising. One notorious set of drivers had a “correct&slow” versus “fast&incorrect” mode that was selected if called from QUAKE.EXE. It’s entirely possible that the drivers behave differently for different numbers of apparent CPUs. Perhaps (back to speculation) a different synchronisation mechanism is used. Misuse of spinlocks?

“Misuse of locking and synchronisation primitives” is a very, very common source of bugs. (The bug I’m supposed to be looking at at work while writing this is “crash if changing printer settings at same time as print job finishes”).

Edit 2: comments mention OS attempting to avoid thread starvation. Note that the game may have its own internal quasi-scheduler for assigning work to threads, and there will be a similar mechanism in the graphics card itself (which is effectively a multitasking system of its own). Chances of a bug in one of those or the interaction between them are quite high.

www.ecsl.cs.sunysb.edu/tr/ashok.pdf (2008) is a graduate thesis on better scheduling for graphics cards which explicitly mentions that they normally use first-come-first-served scheduling, which is easy to implement in non-preemptive systems. Has the situation improved? Probably not.

It could be necessary to have 4 cores because the application runs four tasks in parallel threads and expects them to finish almost simultaneously.

When every thread is executed by a separate core and all threads have the exact same computational workload, they are quite likely (but far from guaranteed) to finish roughly the same time. But when two threads run on one core, the timing will be a lot less predictable because the core will switch context between the two threads all the time.

Bugs which occur because of unexpected thread timing are referred to as “race conditions”.

In the context of game development, one plausible architecture with this kind of problem could be one where different features of the game are simulated in real-time by different CPU threads. When each feature runs on an own core, they are all simulated with roughly the same speed. But when two features run on one core, both will only be simulated half as fast as the rest of the game world, which could cause all kinds of weird behaviors.

Note that a software architecture which depends on independent threads running with specific timings is extremely fragile and a sign of very bad understanding of concurrent programming. There are features available in practically all multithreading APIs to synchronize threads explicitly to prevent these kinds of problems.

It is unlikely that these “minimum requirements” represent something below which the game will not run. Far more likely is that they represent something below which the game will not run with acceptable performance. No game company wants to deal with lots of customers complaining about crappy performance when they are running it on a single core 1 Ghz box, even if the software could technically run. So they probably deliberately design to fail hard on boxes with fewer cores than would give them acceptable performance.

One important metric in game performance is the frame rate. Typically they run at either 30 or 60 frames per second. This means that the game engine has to render the current view from the game state in a fixed amount of time. To achieve 60 fps, it has just a bit more than 16 msecs to do this. Games with high-end graphics are extremely CPU bound and so there’s a huge give-and-take between trying to push higher quality (which takes more time) and the need to stay in this time budget. Thus, the time budget for each frame is extremely tight.

Because the time budget is tight, the developer ideally wants exclusive access to one or more cores. They also likely want to be able to do their rendering stuff in a core, exclusively, as it’s what has to get done on that time budget, while other stuff, like calculating the world state, happens on a separate process where it won’t intrude.

You could, in theory, cram all this onto a single core, but then everything becomes much harder. Suddenly you have to make sure all that game state stuff happens fast enough, and allows your rendering to happen. You can’t just make them two software threads because there’s no way to make the OS understand “thread A must complete X amount of work in 16 msecs regardless of what thread B does”.

Game developers have zero interest in making you buy new hardware. The reason they have system requirements is that the cost of supporting lower end machines is not worth it.

Three realtime threads that never sleep and one other thread. If there are less than four cores, the fourth thread never runs. If the fourth thread needs to communicate with one of the realtime threads for the realtime thread to finish, the code will not finish with less than four cores.

Obviously if realtime threads are waiting on something that doesn’t allow them to sleep (such as a spinlock) the program designer screwed up.

First of all software threads has nothing to do with hardware threads and is often mixed up. Software threads are pieces of code than can be dispatched and run on it’s own within the process context. Hardware threads are mostly managed by the OS and are dispatches to the processor’s core when talking about regular programs. These hardware threads are dispatched based on load; the hardware thread dispatcher acts more or less like a load balancer.

However when it comes to gaming, especially high end gaming, sometimes the hardware threads are managed by the game itself or the game instructs the hardware thread dispatcher what to do. That is because every tasks or group of tasks doesn’t have the same priority like in a normal program. Because dragon age comes from an high end game studio using high end game engines I can imagine that it uses “manual” dispatch and then the number of cores becomes a minimal system requirement. Any program would crash when I send a piece of code to the 3rd physical core running on a machine with only 1 or 2 cores.

Since it is possible to use virtualize to have more virtual cores than physical and the software would not know it is running on a virtualize and instead think that it does have that many physical cores, I would say such software is not possible.

That is to say, it is not possible to write software that will always stop on less than N cores.

As others have pointed out, there are software solutions that can potentially check, especially if the OS and code being used has little protection against race conditions when N processes run on <N processors. The real trick is code that will fail when you have less than N processors but won’t fail when you do have N processors but have an OS that may assign work to less than N processors.

It could be that there are three threads doing something (generating backgrounds or generating NPC movement) and passing events to a fourth, which is supposed to aggregate/filter the events and update the view model. If the fourth thread doesn’t get all the events (because it’s not scheduled on a core) then the view model doesn’t get updated correctly. This may only happen sporadically, but those cores need to be available at any point. This might explain why you’re not seeing high CPU usage all the time, but the game is failing to work properly anyway.

I think Joshua is heading down the right path, just not to it’s conclusion.

Suppose you have an architecture where there are three threads that are written to do as much as they can–when they finish what they are doing they do it again. To keep performance up these threads do not release control for anything–they don’t want to risk the lag from the Windows task scheduler. So long as there are 4 or more cores this works fine, it fails badly if there aren’t.

In general this would be bad programming but games are another matter–when you’re faced with a choice between a design that’s inferior on all hardware or a design that is superior on sufficiently good hardware or a failure on inferior hardware game developers usually choose to require the hardware.

Is it possible to write code (or complete software, rather than a piece of code) that won't work properly when run on a CPU that has less than N number of cores?

Absolutely. The use of real-time threads would be a good example of a situation in which this is, not only possible, but the desired way (and often, the only correct way) to get the job done. However, real-time threads are usually limited to the OS kernel, usually for drivers which need to be able to guarantee that a hardware event of some sort is handled within some defined period of time. You should not have real-time threads in normal user applications and I’m not sure that it’s even possible to have one in a Windows user-mode application. Generally, operating systems make it intentionally impossible to do this from user land precisely because it does allow a given application to take over control of the system.

Regarding user-land applications: Your assumption that checking for a given number of threads in order to run is necessarily malicious in intent is not correct. For instance, you could have 2 long-running, performance-intensive tasks that need a core to themselves. Regardless of CPU core speed, sharing a core with other threads could be a serious and unacceptable performance degradation due to cache thrashing along with the normal penalties incurred by thread switching (which are pretty substantial.) In this case, it would be perfectly reasonable, especially for a game, to set each of these threads to have an affinity only on one particular core for each of them and then set all of your other threads to not have affinity on those 2 cores. In order to do this, though, you’d have to add a check that the system has more than 2 cores and fail if it doesn’t.

Any code using spinlocks with any noticeable amount of lock contention will perform terribly (to an extent where — for an application like a game — you can say “doesn’t work”) if the number of threads exceeds the number of cores.

Imagine for example a producer thread submitting tasks to a queue which serves 4 consumer threads. There are only two cores:

The producer tries to obtain the spinlock, but it is held by a consumer running on the other core. The two cores are running lockstep while the producer is spinning, waiting on the lock to be released. This is already bad, but not as bad as it will get.
Unluckily, the consumer thread is at the end of its time quantum, so it is preempted, and another consumer thread is scheduled. It tries to get hold of the lock, but of course the lock is taken, so now two cores are spinning and waiting for something that cannot possibly happen.
The producer thread reaches the end of its time slice and is preempted, another consumer wakes up. Again, two consumers are waiting for a lock to be released, and it just won’t happen before two more time quantums have passed.
[…]
Finally the consumer that was holding the spinlock has released the lock. It is immediately taken by whoever is spinning on the other core. There is a 75% chance (3 to 1) that it’s another consumer thread. In other words, it’s 75% likely that the producer is still being stalled. Of course this means that the consumers stall, too. Without the producer sumbitting tasks, they have nothing to do.

Note that this works in principle with any kind of lock, not just spinlocks — but the devastating effect is much more prominent with spinlocks because the CPU keeps burning cycles while it achieves nothing.

Now imagine that in addition to the above some programmer had the brilliant idea to use a dedicated thread with affinity set to the first core, so RDTSC will give reliable results on all processors (it won’t anyway, but some people think so).

If I understand what you’re asking, it’s possible, but it’s a very, very bad thing.

The canonical example of what you’re describing would be maintaining a counter which is incremented by multiple threads. This requires almost nothing in therms of computing power but requires careful coordination among the threads. As long as only one thread at a time does an increment (which is actually a read followed by an addition followed by a write), its value will always be correct. This is because one thread will always read the correct “previous” value, add one and write the correct “next” value. Get two threads into the action at the same time and both will read the same “previous” value, get the same result from the increment and write the same “next” value. The counter will effectively have been incremented only once even though two threads think they each did it.

This dependency between timing and correctness is what computer science calls a race condition.

Race conditions are often avoided by using synchronization mechanisms to make sure threads wanting to operate on a piece of shared data have to get in line for access. The counter described above might use a read-write lock for this.

Without access to the internal design of Dragon Age: Inquisition, all anyone can do is speculate about why it behaves the way it does. But I’ll have a go based on some things I’ve seen done in my own experience:

It might be that the program is based around four threads that have been tuned so everything works when the threads run mostly-uninterrupted on their own physical cores. The “tuning” could come in the form of rearranging code or inserting sleeps in strategic places to mitigate race-condition-induced bugs that cropped up during development. Again, this is all conjecture, but I’ve seen race conditions “resolved” that way more times than I care to count.

Running a program like that way on anything less capable than the environment for which it was tuned introduces timing changes that are a result of the code not running as quickly or, more likely, context switches. Context switches happen in physical (i.e., the CPU’s physical cores are switching between the work its logical cores are holding) and logical (i.e., the OS on the CPU is assigning work to the cores) ways, but either is a significant divergence from what would be the “expected” execution timing. That can bring out bad behavior.

If Dragon Age: Inquisition doesn’t take the simple step of making sure there are enough physical cores available before proceeding, that’s EA’s fault. They’re probably spending a small fortune fielding support calls and emails from people who tried to run the game on too little hardware.

Windows has built-in functionality for this: the function GetLogicalProcessorInformation is in the Windows API. You can call it from your program to get information about cores, virtual cores, and hyperthreading.

So the answer to your question would be: Yes.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 20:16

Thẻ: cpu, multi-core, multithreading

Thiết kế website giá rẻ

Danh mục

Why would a program require a specific minimum number of CPU cores?