Understanding the inconsistent size of HashMap in a multithreaded Java program

Blockquote

I have a multithreaded Java program where each thread is inserting unique keys into a shared HashMap. However, when I check the size of the HashMap after all threads have finished executing, the size is not consistent with the expected number of insertions.

Here’s the code:

public class CurrentHashMapDemo {
    private static final int NUM_THREADS = 5;
    private static final int NUM_INSERTIONS = 100;
    private static HashMap<String, Integer> hashMap = new HashMap<>();

    public static void main(String[] args) throws InterruptedException{
        ExecutorService executorService = Executors.newFixedThreadPool(NUM_THREADS);
        for(int i=0; i< NUM_THREADS; i++){
            executorService.execute(insertRecord());
        }
        executorService.shutdown();
        if(!executorService.isTerminated()){
            Thread.sleep(1000);
        }
        System.out.println("Size of the hashmap="+ hashMap.size());
    }

    private static Runnable insertRecord(){
        return () -> {
            for(int i=0; i<NUM_INSERTIONS; i++){
                System.out.println("Key:"+ i+Thread.currentThread().getName());
                hashMap.put(i+Thread.currentThread().getName(), i);
            }
        };
    }
}

In this code, each thread is inserting NUM_INSERTIONS number of records into the HashMap. The key for each record is a combination of the loop index i and the name of the current thread, which should be unique for each thread.

However, when I print the size of the HashMap after all threads have finished executing, the size is not consistently NUM_THREADS * NUM_INSERTIONS. I understand that HashMap is not thread-safe, and I know that I could use ConcurrentHashMap to solve this issue. But I’m interested in understanding where exactly the race condition is occurring in the HashMap implementation that’s causing this inconsistent behavior.

I’m using OpenJDK 17. Can anyone explain where this race condition might be happening in this specific JDK’s HashMap implementation and how it’s affecting the size of the HashMap?

Can anyone explain where this race condition might be happening and how it’s affecting the size of the HashMap?

Thank @user85421 for your comment, I was looking for the same,

one possibility: the hash map is created with a capacity of 16, when this is exhausted, it must be reorganized (resized) – if two threads do that concurrently, something might (will) get messed up

It’s.. everywhere, really.

Here’s the simple and seemingly surprisingly rule:

Accessing a field from 2 threads is completely broken and results in an application whose behaviour is arbitrary and untestable.

That sounds drastic, doesn’t it? And yet, this really is how it works. So, what ’causes’ this behaviour of hashmap? Well, just open the source of hashmap. Look at all those fields. Look at how the methods you are invoking, such as put, touch those fields. As per the rule above, vola. the behaviour is arbitrary, and thus untestable. But it’ll fail on you. If not sooner, then later, but, fail it will.

This drastic state of affairs isn’t to annoy you. It’s because that’s how computers really work. These days, a CPU cannot even access main memory. No ability to that, whatsoever. Instead, CPUs can only access local caches. These are one page’s worth (pages are 64k to 128k or so generally) of contiguous memory, and they are a copy – independent from main memory; modify main memory and this cached page will not change. Modify the cache and main memory does not change. This is for simple and obvious physics reasons. Go look at a CPU. Look at where the memory is. Get out a ruler. Oh deary me that’s a humongous ocean of a gap between those two. Speed of light isn’t infinite, you know! Just sending a signal alllllllll the way from a CPU to that memory bank? Good grief. That CPU could be doing hundreds of instructions in that time. Hold on, we have to even wait for that signal to travel allll the way back? Unacceptable!

So how does a CPU access main memory? It asks the memory control to evict one of its cache pages (write the whole thing back into main memory, all 64k/128k of it), then copy a page of main memory into this cache page. Again, the whole lot – these architectures do that as fast as 1 byte so no point even building a system to let you more fine-grainedly control what needs copying. Then the CPU core will find some thumbs and intently twiddle them for 500 to a 1000 cycles. Because that’s going to take oodles of time.

Of course, CPUs aren’t guaranteed to work this way, and in practice the amount of cache pages available is quite limited; a CPU may have to ‘evict’ a page halfway through an operation. That’ll cause slowdowns, and causes something to be written into main memory. It’s extremely difficult to try to predict exactly what, if multiple cores all have different versions of the same region of main memory. Which core evicts first, which evicts last? Who knows?

Java couldn’t possibly manage to run fast unless it adopted a memory model that is compatible with all this. Hence, the simple but seemingly drastic rule: Accessing a field from 2 threads is completely broken and results in an application whose behaviour is arbitrary and untestable.

One field can have 10 different values simultaneously; each thread has its own copy. Or not. And you have no control nor any idea when a thread will make its copy, when (or if!) it will flush that copy back to main mem.

But wait! The JMM’s Happens Before rules!

The JMM is the Java Memory Model; it describes not so much ‘how java works’ (because locking down how it works means, if that particular way of working is inefficient on some architecture, that java either cannot run on that architecture, or runs badly – and we can’t have that!) – but more ‘these limited things are guaranteed by the JVM. The rest aint – you’re going to have to make it work reliably working solely with the guarantees we prove here in the JMM section’.

The above section describes mostly what is not guaranteed. So let’s get to what is guaranteed: Happens-Before relationships.

This is an explicitly enumerated list that establishes ‘HB’ between 2 bytecode instructions. If HB(A, B) is established, where A and B are bytecode instructions:

B cannot observe any field in a state that it was prior to A having finished. In other words, if A writes a value to a field, B cannot observe the old value.
Note that it’s not about ‘it guarantees B happens after A’. It merely guarantees that B cannot observe previous state. For example, if B doesn’t look at any field A modifies, then the JVM is free to run B first and A later.
It doesn’t necessarily work the other way around.
Timing is explicitly excluded. If you can figure out how certain things were based on how long some segment of code runs, that’s not a beach of the guarantees stipulated in the JMM.

So, what establishes HB?

The ‘natural’ / ‘obvious’: if A and B are executed by the same thread, then the code itself establishes an order in which that code is executed, and HB(A, B) exists for all of it. In other words, given int x = 5; x = 10; int z = x;, z cannot possibly be 5, because all 3 statements are HB due to all being run by the same thread. This is.. obvious, but note that in int x = 5; int y = 10; int z = x; int a = y;, the system is free to execute int y = 10; first, then int a = y;, then int x = 5;, then int z = x;. Will it? Probably not. But a JVM that does, wouldn’t be breaking the rules: Because int a = y; doesn’t observe the fact that int x = 5; hasn’t run yet, the JMM is adhered to.
Thread management: t.start(); is HB relative to the first line in the run() block of that thread. Also, the last line executed by a thread is HB relative to another thread executing t.join().
synchronized. 2 blocks of code each having a synchronized (x) where x refers to the same object will figure out some ordering. The last statement executed in that block by whatever thread ‘goes first’ is HB relative to the first line in the other thread.
volatile also does, but it can be difficult to figure out which thread was ‘first’ in this story.
A bunch of other more exotic things that you tend not to run into.

So, how do you even ‘chat’ between threads, given that you can’t touch fields? By ensuring that there is clear HB for every such interaction. Anytime one thread writes and another reads, you really, really want HB somewhere in that otherwise what you read is a coin toss. And not a fair coin either – an evil coin; your code works every time for 3 weeks and then starts failing just as you are giving that important demo to the big customer.

Of course, most of the classes in the java.util.concurrent package establish HB all over the place. Read the java docs carefully – it is pointless to attempt to analyse what they guarantee by inspecting the code and reasoning out how the JMM applies to it and which guarantees you can glean from that. No – just read the javadoc, they list the guarantees in their own terms. More reliable (code can change, javadoc guarantees as a rule don’t, as that would be a breaking change), much simpler to understand. Point is, the code delivers on those guarantees by establishing HB.

This is just to turn what everyone is saying (correctly!) in the comments into an Answer. (Since you seem to want on.)

The problem with your demo code is that it is not thread-safe. You have multiple threads that are updating a shared object (a HashMap) instance which is specified to be non-thread-safe. That will lead to unspecified behavior.

The exact (detailed) cause of the inconsistent behavior that you are seeing is immaterial. The real answer would require a detailed analysis of a ~2600 line HashMap class to identify the point(s) at which memory anomalies would occur in the map when used in the way that you are using it. And even that wouldn’t give a certain answer.

Your arguments that the demo “should work because ” are flying in the face of the evidence that it doesn’t work! And they are ignoring the fact that the javadoc for HashMap is (in effect) saying that it may not work.

The fix (in a real application) would be to either synchronize the HashMap externally, or use a ConcurrentHashMap.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: Kiến thức lập trình - @ 05:57

Thẻ: javamultithreadinghashmap

Thiết kế website giá rẻ

Danh mục

Understanding the inconsistent size of HashMap in a multithreaded Java program

But wait! The JMM’s Happens Before rules!