Thiết kế website giá rẻ

Question

While testing thread creation/parallel processing, I met this strange situation where the multi-thread code is much slower than sequential version.

The code is simply counting words in files, then summing the result.

Sequential code :

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.Arrays;

public class WordCount {

    public static int countWords(String filename) throws IOException {
        long startTime = System.currentTimeMillis();
        try (BufferedReader br = new BufferedReader(new FileReader(filename))) {
            int total = 0;
            for (String line = br.readLine(); line != null; line = br.readLine()) {
                total += line.split("\s+").length;
            }
            System.out.println("Time for file "+filename+" : "+(System.currentTimeMillis()-startTime) + " ms for "+ total + " words");
            return total;
        }
    }
    
    public static void main(String[] args) {
        long startTime = System.currentTimeMillis();
        int [] wordCount = new int[args.length];
        for (int i = 0; i < args.length; i++) {
            try {
                wordCount[i] = countWords(args[i]);
            } catch (IOException e) {
                System.err.println("Error reading file: " + args[i]);
                e.printStackTrace();
            }
        }
        System.out.println("Word count:" + Arrays.toString(wordCount));
        int total = 0;
        for (int count : wordCount) {
            total += count;
        }
        System.out.println("Total word count:" + total);
        System.out.println("Total time "+(System.currentTimeMillis()-startTime) + " ms");
    }
}

This is the version that runs one thread per file.

public class WordCountMT {

    public static int countWords(String filename) throws IOException {
        // same code
    }

    private static class CounterWorker implements Runnable {
        private String filename;
        private int index;
        private int[] wordCount;

        public CounterWorker(String filename, int index, int[] wordCount) {
            this.filename = filename;
            this.index = index;
            this.wordCount = wordCount;
        }

        @Override
        public void run() {
            try {
                wordCount[index] = countWords(filename);
            } catch (IOException e) {
                System.err.println("Error reading file: " + filename);
                e.printStackTrace();
            }
        }
    }

    public static void main(String[] args) {
        long startTime = System.currentTimeMillis();
        int[] wordCount = new int[args.length];
        List<Thread> th = new ArrayList<>();

        for (int i = 0; i < args.length; i++) {
            th.add(new Thread(new CounterWorker(args[i], i, wordCount)));
            th.get(i).start();
        }
        try {
            for (Thread t : th) {
                t.join();

            }
        } catch (InterruptedException e) {
            e.printStackTrace();
        }

        System.out.println("Word count:" + Arrays.toString(wordCount));
        int total = 0;
        for (int count : wordCount) {
            total += count;
        }
        System.out.println("Total word count:" + total);
        
        System.out.println("Total time "+(System.currentTimeMillis()-startTime) + " ms");
    }
}

Ok, so it’s a very basic example used for teaching. Now I could understand that creating a lot of threads is sometimes counter productive, but these traces are strange :

Non MT

    Time for file /tmp/data/data6-5.txt : 14 ms for 1468 words
    Time for file /tmp/data/data5-6.txt : 3 ms for 1468 words
    Time for file /tmp/data/data5.txt : 1 ms for 390 words
    Time for file /tmp/data/data5-5.txt : 1 ms for 780 words
    Time for file /tmp/data/data7.txt : 1 ms for 1078 words
    Time for file /tmp/data/data6.txt : 2 ms for 1078 words
    Time for file /tmp/data/data6-6.txt : 2 ms for 2156 words
    Time for file /tmp/data/data8.txt : 1 ms for 1078 words
    Time for file /tmp/data/data3.txt : 0 ms for 390 words
    Time for file /tmp/data/data6-7.txt : 1 ms for 2156 words
    Time for file /tmp/data/data4.txt : 1 ms for 390 words
    Time for file /tmp/data/data5-4.txt : 1 ms for 780 words
    Time for file /tmp/data/data1.txt : 0 ms for 390 words
    Time for file /tmp/data/data2.txt : 0 ms for 390 words
    Time for file /tmp/data/data9.txt : 1 ms for 1078 words
    Time for file /tmp/data/data5-7.txt : 0 ms for 1468 words
    Time for file /tmp/data/data6-4.txt : 1 ms for 1468 words
    Word count:[1468, 1468, 390, 780, 1078, 1078, 2156, 1078, 390, 2156, 390, 780, 390, 390, 1078, 1468, 1468]
    Total word count:18006
    Total time 54 ms

With MT

    Time for file /tmp/data/data3.txt : 46 ms for 390 words
    Time for file /tmp/data/data5-4.txt : 54 ms for 780 words
    Time for file /tmp/data/data5-5.txt : 58 ms for 780 words
    Time for file /tmp/data/data1.txt : 45 ms for 390 words
    Time for file /tmp/data/data6-5.txt : 70 ms for 1468 words
    Time for file /tmp/data/data5-7.txt : 68 ms for 1468 words
    Time for file /tmp/data/data5.txt : 45 ms for 390 words
    Time for file /tmp/data/data7.txt : 65 ms for 1078 words
    Time for file /tmp/data/data2.txt : 43 ms for 390 words
    Time for file /tmp/data/data6.txt : 64 ms for 1078 words
    Time for file /tmp/data/data5-6.txt : 70 ms for 1468 words
    Time for file /tmp/data/data9.txt : 58 ms for 1078 words
    Time for file /tmp/data/data6-7.txt : 71 ms for 2156 words
    Time for file /tmp/data/data8.txt : 66 ms for 1078 words
    Time for file /tmp/data/data6-6.txt : 72 ms for 2156 words
    Time for file /tmp/data/data4.txt : 46 ms for 390 words
    Time for file /tmp/data/data6-4.txt : 68 ms for 1468 words
    Word count:[1468, 1468, 390, 780, 1078, 1078, 2156, 1078, 390, 2156, 390, 780, 390, 390, 1078, 1468, 1468]
    Total word count:18006
    Total time 98 ms

So total time is worse, but also time per file has completely exploded e.g. data8 was taking 1ms now takes 66ms !

How can I better diagnose why ? Can I measure CPU time consumption instead of wall time ?
Is it because parallel IO requests actually are slower ?

Thank you for an explanation or any additional tests/instrumentation I could add to better pinpoint the issue.

Thiết kế website giá rẻ

Danh mục

Why are my Java multi-thread I/O slower than sequential?