Why are my Java multi-thread I/O slower than sequential?

While testing thread creation/parallel processing, I met this strange situation where the multi-thread code is much slower than sequential version.

The code is simply counting words in files, then summing the result.

Sequential code :

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.Arrays;

public class WordCount {

    public static int countWords(String filename) throws IOException {
        long startTime = System.currentTimeMillis();
        try (BufferedReader br = new BufferedReader(new FileReader(filename))) {
            int total = 0;
            for (String line = br.readLine(); line != null; line = br.readLine()) {
                total += line.split("\s+").length;
            }
            System.out.println("Time for file "+filename+" : "+(System.currentTimeMillis()-startTime) + " ms for "+ total + " words");
            return total;
        }
    }
    
    public static void main(String[] args) {
        long startTime = System.currentTimeMillis();
        int [] wordCount = new int[args.length];
        for (int i = 0; i < args.length; i++) {
            try {
                wordCount[i] = countWords(args[i]);
            } catch (IOException e) {
                System.err.println("Error reading file: " + args[i]);
                e.printStackTrace();
            }
        }
        System.out.println("Word count:" + Arrays.toString(wordCount));
        int total = 0;
        for (int count : wordCount) {
            total += count;
        }
        System.out.println("Total word count:" + total);
        System.out.println("Total time "+(System.currentTimeMillis()-startTime) + " ms");
    }
}

This is the version that runs one thread per file.

public class WordCountMT {

    public static int countWords(String filename) throws IOException {
        // same code
    }

    private static class CounterWorker implements Runnable {
        private String filename;
        private int index;
        private int[] wordCount;

        public CounterWorker(String filename, int index, int[] wordCount) {
            this.filename = filename;
            this.index = index;
            this.wordCount = wordCount;
        }

        @Override
        public void run() {
            try {
                wordCount[index] = countWords(filename);
            } catch (IOException e) {
                System.err.println("Error reading file: " + filename);
                e.printStackTrace();
            }
        }
    }

    public static void main(String[] args) {
        long startTime = System.currentTimeMillis();
        int[] wordCount = new int[args.length];
        List<Thread> th = new ArrayList<>();

        for (int i = 0; i < args.length; i++) {
            th.add(new Thread(new CounterWorker(args[i], i, wordCount)));
            th.get(i).start();
        }
        try {
            for (Thread t : th) {
                t.join();

            }
        } catch (InterruptedException e) {
            e.printStackTrace();
        }

        System.out.println("Word count:" + Arrays.toString(wordCount));
        int total = 0;
        for (int count : wordCount) {
            total += count;
        }
        System.out.println("Total word count:" + total);
        
        System.out.println("Total time "+(System.currentTimeMillis()-startTime) + " ms");
    }
}

Ok, so it’s a very basic example used for teaching. Now I could understand that creating a lot of threads is sometimes counter productive, but these traces are strange :

Non MT

    Time for file /tmp/data/data6-5.txt : 14 ms for 1468 words
    Time for file /tmp/data/data5-6.txt : 3 ms for 1468 words
    Time for file /tmp/data/data5.txt : 1 ms for 390 words
    Time for file /tmp/data/data5-5.txt : 1 ms for 780 words
    Time for file /tmp/data/data7.txt : 1 ms for 1078 words
    Time for file /tmp/data/data6.txt : 2 ms for 1078 words
    Time for file /tmp/data/data6-6.txt : 2 ms for 2156 words
    Time for file /tmp/data/data8.txt : 1 ms for 1078 words
    Time for file /tmp/data/data3.txt : 0 ms for 390 words
    Time for file /tmp/data/data6-7.txt : 1 ms for 2156 words
    Time for file /tmp/data/data4.txt : 1 ms for 390 words
    Time for file /tmp/data/data5-4.txt : 1 ms for 780 words
    Time for file /tmp/data/data1.txt : 0 ms for 390 words
    Time for file /tmp/data/data2.txt : 0 ms for 390 words
    Time for file /tmp/data/data9.txt : 1 ms for 1078 words
    Time for file /tmp/data/data5-7.txt : 0 ms for 1468 words
    Time for file /tmp/data/data6-4.txt : 1 ms for 1468 words
    Word count:[1468, 1468, 390, 780, 1078, 1078, 2156, 1078, 390, 2156, 390, 780, 390, 390, 1078, 1468, 1468]
    Total word count:18006
    Total time 54 ms

With MT

    Time for file /tmp/data/data3.txt : 46 ms for 390 words
    Time for file /tmp/data/data5-4.txt : 54 ms for 780 words
    Time for file /tmp/data/data5-5.txt : 58 ms for 780 words
    Time for file /tmp/data/data1.txt : 45 ms for 390 words
    Time for file /tmp/data/data6-5.txt : 70 ms for 1468 words
    Time for file /tmp/data/data5-7.txt : 68 ms for 1468 words
    Time for file /tmp/data/data5.txt : 45 ms for 390 words
    Time for file /tmp/data/data7.txt : 65 ms for 1078 words
    Time for file /tmp/data/data2.txt : 43 ms for 390 words
    Time for file /tmp/data/data6.txt : 64 ms for 1078 words
    Time for file /tmp/data/data5-6.txt : 70 ms for 1468 words
    Time for file /tmp/data/data9.txt : 58 ms for 1078 words
    Time for file /tmp/data/data6-7.txt : 71 ms for 2156 words
    Time for file /tmp/data/data8.txt : 66 ms for 1078 words
    Time for file /tmp/data/data6-6.txt : 72 ms for 2156 words
    Time for file /tmp/data/data4.txt : 46 ms for 390 words
    Time for file /tmp/data/data6-4.txt : 68 ms for 1468 words
    Word count:[1468, 1468, 390, 780, 1078, 1078, 2156, 1078, 390, 2156, 390, 780, 390, 390, 1078, 1468, 1468]
    Total word count:18006
    Total time 98 ms

So total time is worse, but also time per file has completely exploded e.g. data8 was taking 1ms now takes 66ms !

How can I better diagnose why ? Can I measure CPU time consumption instead of wall time ?
Is it because parallel IO requests actually are slower ?

Thank you for an explanation or any additional tests/instrumentation I could add to better pinpoint the issue.

12

So total time is worse, but also time per file has completely exploded
e.g. data8 was taking 1ms now takes 66ms !

Yes, but the total time for all 17 files has only doubled, so much of that explosion is incurred in parallel.

Thread startup cost, such as you speculate in your own answer, is one conceivable explanation, but that’s testable: thread startup time is independent of the thread’s workload, so we can get a better bound on it by running the same number of threads with little or no workload. As I described in a comment, I performed such a test by modifying the MT version of your code such that countWords() does nothing but return 0. That ran 17 threads on my machine in about 2ms according to its own measurement (and it ran 50 threads in about 3ms), so that puts a cap of about 2ms on the per-thread startup time. I expect it actually to be only a fraction of that, and I don’t expect the cost to be too far different on your machine.

You’re more likely to be seeing a combination of other factors, but especially:

  • you have about four times as many threads as physical cores in the MT case, so if you’re getting much concurrency then that program contends with itself for CPU cache space and memory bus bandwidth. The ST version has much less contention for those resources, especially when the system is lightly loaded.

Other, likely less significant factors include:

  • context switching costs
  • general thread-management overhead
  • delays related to contention for System.out

How can I better diagnose why ? Can I measure CPU time consumption
instead of wall time ?

Yes. Provided that your system supports JMX, you can use the VM’s ThreadMXBean to measure the CPU time consumed by your threads.

Is it because parallel IO requests actually are
slower ?

Most disk devices are single-channel. Some can queue multiple read requests at a time, but a single-channel device can only deliver data for a single request at a time. The only way that parallelizing requests to such a device could produce a performance gain is if threads were unable individually to saturate the net bandwidth of device, interconnect, and memory bus, but that’s not typically the case.

For some kinds of devices (e.g. spinning disks), there are additional considerations for sequential (faster) versus random (much slower) access patterns. With such a device, the parallel I/O might be expected really to be slower, but that does not apply to SSDs. Also, it probably wouldn’t be a large effect for your particular test workload, given that it looks like few of the files occupy more than two disk blocks, and some only one.

The I/O in your particular MT program is unlikely to be faster than the I/O in your ST program, given that the latter probably saturates available I/O bandwith already. But the disk –> memory data transfer is probably not itself slower for your MT case, either.

Parallel I/O (or parallel anything) is not automatically faster than serial. You stand to gain from parallel I/O to the extent that additional threads can make use of system resources that otherwise would go unused. That’s not usually the case for concurrent disk I/O involving a single disk. Network I/O is the canonical use case, but not the only one.

So what we expect (hope for)

Suppose IO for file takes 10ms, and counting words takes 5 ms. Thread creation is free.

Because IO happens outside CPU, using memory bus, the sequential version for 10 files takes 150 ms

in parallel version, the IO bus is occupied continuously, for 100 ms, all except the last count could run in parallel with IO, we take roughly 105 ms total (good)

The issue is that thread creation is not free, here it seems to take roughly 70 ms

Each thread starts it’s run with an IO, hence is commuted by force

So most likely scheduling is threads sample wall clock time, then are switched, then main finishes creating threads, then the threads get to count. They report a time that includes the main time used to create threads as well as their own.

This is visible as the summed runtimes reported by threads is much greater than total wall time, so we are overlapping (more or less ok) but also counting in our time actions of other threads (most probably main creating the threads)

In conclusion while we do speed up slightly due to parallel IO requests (we end up with 50 ms penalty instead of 70ms) the time to create so many threads makes it not worth it.

6

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị
Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa
Thiết kế website Thiết kế website Thiết kế website Cách kháng tài khoản quảng cáo Mua bán Fanpage Facebook Dịch vụ SEO Tổ chức sinh nhật