Why is multi-threaded, chunked writing of a large file slower when writing from many cores rather than the same core?

Note: the question has undergone some edits, taking into account numerous suggestions and findings from the comments, which may now appear outdated. It initially focused on the number of threads, while the actual problem seems to be threads’ core affinities.

(I bet your intuitive answer is “synchronization” – bear with me as I reason why this is not necessarily the answer.)

The below code compares writing exactly the same data to a file: essentially, N times the same 1 MB chunk until the 10 GB target file size is reached. It does so from a variable number of threads (1 through 56), taking care of starting all threads simultaneously and measuring the time spent in filebuf::sputn calls.

(Because this has come up in the comments several times: the aim is not to write data faster than in a single thread. The aim is to write the data generated in several independent data generation threads, ideally without requiring another separate writing thread.)

Find also below the output from the code, generated on Windows using MSVC, a Samsung MZWL63T8HFLT-00AW7 SSD, and an Intel Xeon w9-3495X CPU (hyper-thread disabled, hence the limit of 56 threads), which I graphed using https://www.desmos.com/calculator. In essence, you see that if threads are assigned each to their own core, the time spent writing the file depends on the number of threads from which filebuf::sputn is called, which I am at a loss to explain since locking the mutex is excluded from the writing-time measurement, and the comparison of file-writing and overall durations indicates that locking amounts to not more than 2% of the total time anyway.

If all threads are assigned to the same core, the problem is not seen. Unfortunately, while this may be a solution in this toy example, it is not applicable in a real-world scenario where each thread generates its own data using expensive CPU operations.

Is that an expected result? What are strategies to avoid such performance when writing from multiple cores, apart from maybe the obvious (have all threads dump their data into a queue, with a separate thread emptying the queue and writing the file)?

Further testing showed this effect is strongly impacted by the Windows Power Plan. Below results (40% write rate reduction) have been collected under the “Balanced” plan; under the “Ultimate” plan, the effect is still seen, although smaller (15% reduction). Same results are obtained by writing to an ImDisk RAM disk, so it seems independent of the SSD.

Time spent in filebuf::sputn calls vs. number of threads

blue: each thread uses their own core

red: all threads use the same core

Code

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
<code>#include <Windows.h>
#include <chrono>
#include <format>
#include <fstream>
#include <iosfwd>
#include <iostream>
#include <latch>
#include <mutex>
#include <ranges>
#include <ratio>
#include <thread>
#include <vector>
using namespace std;
using namespace chrono;
using double_milliseconds = duration<long double, milli>;
int main() {
constexpr auto maxNThreads = 56;
constexpr auto fileSize = 10'000'000'000;
constexpr auto chunkSize = 1'000'000;
mutex mutex;
vector<jthread> threads;
for (const auto sameCore : {false, true}) {
for (const auto nThreads : ranges::iota_view(1, maxNThreads + 1)) {
filebuf file;
file.open("out.tmp", ios::out | ios::binary);
latch commonStart(nThreads + 1);
streamsize written = 0;
nanoseconds writing{0};
for (const auto i : ranges::iota_view(0, nThreads)) {
const auto threadSize = fileSize / nThreads;
threads.emplace_back([&commonStart, &mutex, &file, &written, &writing, threadSize, sameCore, i] {
const auto mask = static_cast<DWORD_PTR>(1) << (sameCore ? 0 : i);
if (!::SetThreadAffinityMask(::GetCurrentThread(), mask)) return;
const vector<char> chunk(chunkSize);
streamsize writtenThread = 0;
commonStart.arrive_and_wait();
while (writtenThread < threadSize) {
const lock_guard lock(mutex);
const auto write_start = steady_clock::now();
const auto writtenIteration = file.sputn(chunk.data(), chunk.size());
const auto write_stop = steady_clock::now();
writtenThread += writtenIteration;
written += writtenIteration;
writing += write_stop - write_start;
}
});
}
commonStart.arrive_and_wait();
const auto start = steady_clock::now();
threads.clear();
file.close();
const auto stop = steady_clock::now();
const auto cores = sameCore ? "same core" : "diff. cores";
const auto written_gb = static_cast<double>(written) / 1'000'000'000;
const auto duration = duration_cast<milliseconds>(stop - start);
const auto rate_mb_s = static_cast<int>(static_cast<double>(written) / (double)duration.count() / 1000);
const auto writing_ms = duration_cast<milliseconds>(writing);
const auto writing_pct = static_cast<int>(duration_cast<double_milliseconds>(writing) / duration * 100);
cout << format(
"{:2d} threads(s), {}: {:.03f} GB / {} = {} MB/s ({} or {}% writing)",
nThreads,
cores,
written_gb,
duration,
rate_mb_s,
writing_ms,
writing_pct
) << endl;
}
}
}
</code>
<code>#include <Windows.h> #include <chrono> #include <format> #include <fstream> #include <iosfwd> #include <iostream> #include <latch> #include <mutex> #include <ranges> #include <ratio> #include <thread> #include <vector> using namespace std; using namespace chrono; using double_milliseconds = duration<long double, milli>; int main() { constexpr auto maxNThreads = 56; constexpr auto fileSize = 10'000'000'000; constexpr auto chunkSize = 1'000'000; mutex mutex; vector<jthread> threads; for (const auto sameCore : {false, true}) { for (const auto nThreads : ranges::iota_view(1, maxNThreads + 1)) { filebuf file; file.open("out.tmp", ios::out | ios::binary); latch commonStart(nThreads + 1); streamsize written = 0; nanoseconds writing{0}; for (const auto i : ranges::iota_view(0, nThreads)) { const auto threadSize = fileSize / nThreads; threads.emplace_back([&commonStart, &mutex, &file, &written, &writing, threadSize, sameCore, i] { const auto mask = static_cast<DWORD_PTR>(1) << (sameCore ? 0 : i); if (!::SetThreadAffinityMask(::GetCurrentThread(), mask)) return; const vector<char> chunk(chunkSize); streamsize writtenThread = 0; commonStart.arrive_and_wait(); while (writtenThread < threadSize) { const lock_guard lock(mutex); const auto write_start = steady_clock::now(); const auto writtenIteration = file.sputn(chunk.data(), chunk.size()); const auto write_stop = steady_clock::now(); writtenThread += writtenIteration; written += writtenIteration; writing += write_stop - write_start; } }); } commonStart.arrive_and_wait(); const auto start = steady_clock::now(); threads.clear(); file.close(); const auto stop = steady_clock::now(); const auto cores = sameCore ? "same core" : "diff. cores"; const auto written_gb = static_cast<double>(written) / 1'000'000'000; const auto duration = duration_cast<milliseconds>(stop - start); const auto rate_mb_s = static_cast<int>(static_cast<double>(written) / (double)duration.count() / 1000); const auto writing_ms = duration_cast<milliseconds>(writing); const auto writing_pct = static_cast<int>(duration_cast<double_milliseconds>(writing) / duration * 100); cout << format( "{:2d} threads(s), {}: {:.03f} GB / {} = {} MB/s ({} or {}% writing)", nThreads, cores, written_gb, duration, rate_mb_s, writing_ms, writing_pct ) << endl; } } } </code>
#include <Windows.h>

#include <chrono>
#include <format>
#include <fstream>
#include <iosfwd>
#include <iostream>
#include <latch>
#include <mutex>
#include <ranges>
#include <ratio>
#include <thread>
#include <vector>

using namespace std;
using namespace chrono;
using double_milliseconds = duration<long double, milli>;

int main() {
    constexpr auto maxNThreads = 56;
    constexpr auto fileSize = 10'000'000'000;
    constexpr auto chunkSize = 1'000'000;

    mutex mutex;
    vector<jthread> threads;
    for (const auto sameCore : {false, true}) {
        for (const auto nThreads : ranges::iota_view(1, maxNThreads + 1)) {
            filebuf file;
            file.open("out.tmp", ios::out | ios::binary);

            latch commonStart(nThreads + 1);

            streamsize written = 0;
            nanoseconds writing{0};
            for (const auto i : ranges::iota_view(0, nThreads)) {
                const auto threadSize = fileSize / nThreads;
                threads.emplace_back([&commonStart, &mutex, &file, &written, &writing, threadSize, sameCore, i] {
                    const auto mask = static_cast<DWORD_PTR>(1) << (sameCore ? 0 : i);
                    if (!::SetThreadAffinityMask(::GetCurrentThread(), mask)) return;

                    const vector<char> chunk(chunkSize);
                    streamsize writtenThread = 0;
                    commonStart.arrive_and_wait();
                    while (writtenThread < threadSize) {
                        const lock_guard lock(mutex);
                        const auto write_start = steady_clock::now();
                        const auto writtenIteration = file.sputn(chunk.data(), chunk.size());
                        const auto write_stop = steady_clock::now();

                        writtenThread += writtenIteration;
                        written += writtenIteration;
                        writing += write_stop - write_start;
                    }
                });
            }

            commonStart.arrive_and_wait();
            const auto start = steady_clock::now();
            threads.clear();
            file.close();
            const auto stop = steady_clock::now();

            const auto cores = sameCore ? "same core" : "diff. cores";
            const auto written_gb = static_cast<double>(written) / 1'000'000'000;
            const auto duration = duration_cast<milliseconds>(stop - start);
            const auto rate_mb_s = static_cast<int>(static_cast<double>(written) / (double)duration.count() / 1000);
            const auto writing_ms = duration_cast<milliseconds>(writing);
            const auto writing_pct = static_cast<int>(duration_cast<double_milliseconds>(writing) / duration * 100);
            cout << format(
                "{:2d} threads(s), {}: {:.03f} GB / {} = {} MB/s ({} or {}% writing)",
                nThreads,
                cores,
                written_gb,
                duration,
                rate_mb_s,
                writing_ms,
                writing_pct
            ) << endl;
        }
    }
}

Output

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
<code> 1 threads(s), diff. cores: 10.000 GB / 1825ms = 5479 MB/s (1824ms or 99% writing)
2 threads(s), diff. cores: 10.000 GB / 1826ms = 5476 MB/s (1819ms or 99% writing)
3 threads(s), diff. cores: 10.002 GB / 1897ms = 5272 MB/s (1887ms or 99% writing)
4 threads(s), diff. cores: 10.000 GB / 1838ms = 5440 MB/s (1826ms or 99% writing)
5 threads(s), diff. cores: 10.000 GB / 1893ms = 5282 MB/s (1880ms or 99% writing)
6 threads(s), diff. cores: 10.002 GB / 1999ms = 5003 MB/s (1885ms or 94% writing)
7 threads(s), diff. cores: 10.003 GB / 1919ms = 5212 MB/s (1903ms or 99% writing)
8 threads(s), diff. cores: 10.000 GB / 2013ms = 4967 MB/s (1927ms or 95% writing)
9 threads(s), diff. cores: 10.008 GB / 1969ms = 5082 MB/s (1953ms or 99% writing)
10 threads(s), diff. cores: 10.000 GB / 1972ms = 5070 MB/s (1956ms or 99% writing)
11 threads(s), diff. cores: 10.010 GB / 1982ms = 5050 MB/s (1966ms or 99% writing)
12 threads(s), diff. cores: 10.008 GB / 1986ms = 5039 MB/s (1969ms or 99% writing)
13 threads(s), diff. cores: 10.010 GB / 2116ms = 4730 MB/s (2099ms or 99% writing)
14 threads(s), diff. cores: 10.010 GB / 2086ms = 4798 MB/s (2055ms or 98% writing)
15 threads(s), diff. cores: 10.005 GB / 2080ms = 4810 MB/s (1997ms or 96% writing)
16 threads(s), diff. cores: 10.000 GB / 2185ms = 4576 MB/s (2095ms or 95% writing)
17 threads(s), diff. cores: 10.013 GB / 2126ms = 4709 MB/s (2109ms or 99% writing)
18 threads(s), diff. cores: 10.008 GB / 2236ms = 4475 MB/s (2181ms or 97% writing)
19 threads(s), diff. cores: 10.013 GB / 2212ms = 4526 MB/s (2133ms or 96% writing)
20 threads(s), diff. cores: 10.000 GB / 2185ms = 4576 MB/s (2168ms or 99% writing)
21 threads(s), diff. cores: 10.017 GB / 2192ms = 4569 MB/s (2174ms or 99% writing)
22 threads(s), diff. cores: 10.010 GB / 2171ms = 4610 MB/s (2152ms or 99% writing)
23 threads(s), diff. cores: 10.005 GB / 2172ms = 4606 MB/s (2154ms or 99% writing)
24 threads(s), diff. cores: 10.008 GB / 2290ms = 4370 MB/s (2271ms or 99% writing)
25 threads(s), diff. cores: 10.000 GB / 2281ms = 4384 MB/s (2262ms or 99% writing)
26 threads(s), diff. cores: 10.010 GB / 2372ms = 4220 MB/s (2352ms or 99% writing)
27 threads(s), diff. cores: 10.017 GB / 2368ms = 4230 MB/s (2349ms or 99% writing)
28 threads(s), diff. cores: 10.024 GB / 2362ms = 4243 MB/s (2343ms or 99% writing)
29 threads(s), diff. cores: 10.005 GB / 2361ms = 4237 MB/s (2341ms or 99% writing)
30 threads(s), diff. cores: 10.020 GB / 2388ms = 4195 MB/s (2369ms or 99% writing)
31 threads(s), diff. cores: 10.013 GB / 2297ms = 4359 MB/s (2277ms or 99% writing)
32 threads(s), diff. cores: 10.016 GB / 2274ms = 4404 MB/s (2255ms or 99% writing)
33 threads(s), diff. cores: 10.032 GB / 2306ms = 4350 MB/s (2286ms or 99% writing)
34 threads(s), diff. cores: 10.030 GB / 2341ms = 4284 MB/s (2321ms or 99% writing)
35 threads(s), diff. cores: 10.010 GB / 2404ms = 4163 MB/s (2383ms or 99% writing)
36 threads(s), diff. cores: 10.008 GB / 2555ms = 3917 MB/s (2446ms or 95% writing)
37 threads(s), diff. cores: 10.027 GB / 2461ms = 4074 MB/s (2440ms or 99% writing)
38 threads(s), diff. cores: 10.032 GB / 2508ms = 4000 MB/s (2488ms or 99% writing)
39 threads(s), diff. cores: 10.023 GB / 2470ms = 4057 MB/s (2449ms or 99% writing)
40 threads(s), diff. cores: 10.000 GB / 2543ms = 3932 MB/s (2521ms or 99% writing)
41 threads(s), diff. cores: 10.004 GB / 2549ms = 3924 MB/s (2528ms or 99% writing)
42 threads(s), diff. cores: 10.038 GB / 2592ms = 3872 MB/s (2570ms or 99% writing)
43 threads(s), diff. cores: 10.019 GB / 2672ms = 3749 MB/s (2651ms or 99% writing)
44 threads(s), diff. cores: 10.032 GB / 2790ms = 3595 MB/s (2700ms or 96% writing)
45 threads(s), diff. cores: 10.035 GB / 2701ms = 3715 MB/s (2676ms or 99% writing)
46 threads(s), diff. cores: 10.028 GB / 2746ms = 3651 MB/s (2720ms or 99% writing)
47 threads(s), diff. cores: 10.011 GB / 2794ms = 3583 MB/s (2770ms or 99% writing)
48 threads(s), diff. cores: 10.032 GB / 2850ms = 3520 MB/s (2823ms or 99% writing)
49 threads(s), diff. cores: 10.045 GB / 2983ms = 3367 MB/s (2936ms or 98% writing)
50 threads(s), diff. cores: 10.000 GB / 2965ms = 3372 MB/s (2939ms or 99% writing)
51 threads(s), diff. cores: 10.047 GB / 2904ms = 3459 MB/s (2879ms or 99% writing)
52 threads(s), diff. cores: 10.036 GB / 2893ms = 3469 MB/s (2866ms or 99% writing)
53 threads(s), diff. cores: 10.017 GB / 3011ms = 3326 MB/s (2981ms or 99% writing)
54 threads(s), diff. cores: 10.044 GB / 2856ms = 3516 MB/s (2830ms or 99% writing)
55 threads(s), diff. cores: 10.010 GB / 2917ms = 3431 MB/s (2891ms or 99% writing)
56 threads(s), diff. cores: 10.024 GB / 2857ms = 3508 MB/s (2829ms or 99% writing)
1 threads(s), same core: 10.000 GB / 1805ms = 5540 MB/s (1804ms or 99% writing)
2 threads(s), same core: 10.000 GB / 1796ms = 5567 MB/s (1817ms or 101% writing)
3 threads(s), same core: 10.002 GB / 1814ms = 5513 MB/s (1813ms or 99% writing)
4 threads(s), same core: 10.000 GB / 1800ms = 5555 MB/s (1798ms or 99% writing)
5 threads(s), same core: 10.000 GB / 1798ms = 5561 MB/s (1822ms or 101% writing)
6 threads(s), same core: 10.002 GB / 1777ms = 5628 MB/s (1814ms or 102% writing)
7 threads(s), same core: 10.003 GB / 1789ms = 5591 MB/s (1786ms or 99% writing)
8 threads(s), same core: 10.000 GB / 1789ms = 5589 MB/s (1830ms or 102% writing)
9 threads(s), same core: 10.008 GB / 1825ms = 5483 MB/s (1809ms or 99% writing)
10 threads(s), same core: 10.000 GB / 1810ms = 5524 MB/s (1804ms or 99% writing)
11 threads(s), same core: 10.010 GB / 1797ms = 5570 MB/s (1795ms or 99% writing)
12 threads(s), same core: 10.008 GB / 1848ms = 5415 MB/s (1845ms or 99% writing)
13 threads(s), same core: 10.010 GB / 1779ms = 5626 MB/s (1806ms or 101% writing)
14 threads(s), same core: 10.010 GB / 1786ms = 5604 MB/s (1816ms or 101% writing)
15 threads(s), same core: 10.005 GB / 1833ms = 5458 MB/s (1830ms or 99% writing)
16 threads(s), same core: 10.000 GB / 1829ms = 5467 MB/s (1826ms or 99% writing)
17 threads(s), same core: 10.013 GB / 1785ms = 5609 MB/s (1815ms or 101% writing)
18 threads(s), same core: 10.008 GB / 1789ms = 5594 MB/s (1825ms or 102% writing)
19 threads(s), same core: 10.013 GB / 1781ms = 5622 MB/s (1814ms or 101% writing)
20 threads(s), same core: 10.000 GB / 1768ms = 5656 MB/s (1803ms or 101% writing)
21 threads(s), same core: 10.017 GB / 1844ms = 5432 MB/s (1834ms or 99% writing)
22 threads(s), same core: 10.010 GB / 1822ms = 5493 MB/s (1818ms or 99% writing)
23 threads(s), same core: 10.005 GB / 1801ms = 5555 MB/s (1797ms or 99% writing)
24 threads(s), same core: 10.008 GB / 1796ms = 5572 MB/s (1832ms or 102% writing)
25 threads(s), same core: 10.000 GB / 1859ms = 5379 MB/s (1806ms or 97% writing)
26 threads(s), same core: 10.010 GB / 1791ms = 5589 MB/s (1827ms or 102% writing)
27 threads(s), same core: 10.017 GB / 1775ms = 5643 MB/s (1813ms or 102% writing)
28 threads(s), same core: 10.024 GB / 1798ms = 5575 MB/s (1830ms or 101% writing)
29 threads(s), same core: 10.005 GB / 1890ms = 5293 MB/s (1850ms or 97% writing)
30 threads(s), same core: 10.020 GB / 1755ms = 5709 MB/s (1785ms or 101% writing)
31 threads(s), same core: 10.013 GB / 1806ms = 5544 MB/s (1844ms or 102% writing)
32 threads(s), same core: 10.016 GB / 1799ms = 5567 MB/s (1826ms or 101% writing)
33 threads(s), same core: 10.032 GB / 1762ms = 5693 MB/s (1815ms or 103% writing)
34 threads(s), same core: 10.030 GB / 1776ms = 5647 MB/s (1813ms or 102% writing)
35 threads(s), same core: 10.010 GB / 1773ms = 5645 MB/s (1812ms or 102% writing)
36 threads(s), same core: 10.008 GB / 1826ms = 5480 MB/s (1863ms or 102% writing)
37 threads(s), same core: 10.027 GB / 1815ms = 5524 MB/s (1846ms or 101% writing)
38 threads(s), same core: 10.032 GB / 1823ms = 5503 MB/s (1830ms or 100% writing)
39 threads(s), same core: 10.023 GB / 1776ms = 5643 MB/s (1811ms or 102% writing)
40 threads(s), same core: 10.000 GB / 1769ms = 5652 MB/s (1807ms or 102% writing)
41 threads(s), same core: 10.004 GB / 1803ms = 5548 MB/s (1841ms or 102% writing)
42 threads(s), same core: 10.038 GB / 1852ms = 5420 MB/s (1841ms or 99% writing)
43 threads(s), same core: 10.019 GB / 1827ms = 5483 MB/s (1844ms or 100% writing)
44 threads(s), same core: 10.032 GB / 1787ms = 5613 MB/s (1817ms or 101% writing)
45 threads(s), same core: 10.035 GB / 1821ms = 5510 MB/s (1852ms or 101% writing)
46 threads(s), same core: 10.028 GB / 1814ms = 5528 MB/s (1842ms or 101% writing)
47 threads(s), same core: 10.011 GB / 1788ms = 5598 MB/s (1816ms or 101% writing)
48 threads(s), same core: 10.032 GB / 1794ms = 5591 MB/s (1820ms or 101% writing)
49 threads(s), same core: 10.045 GB / 1780ms = 5643 MB/s (1809ms or 101% writing)
50 threads(s), same core: 10.000 GB / 1776ms = 5630 MB/s (1841ms or 103% writing)
51 threads(s), same core: 10.047 GB / 1836ms = 5472 MB/s (1824ms or 99% writing)
52 threads(s), same core: 10.036 GB / 1890ms = 5310 MB/s (1835ms or 97% writing)
53 threads(s), same core: 10.017 GB / 1810ms = 5534 MB/s (1836ms or 101% writing)
54 threads(s), same core: 10.044 GB / 1783ms = 5633 MB/s (1815ms or 101% writing)
55 threads(s), same core: 10.010 GB / 1771ms = 5652 MB/s (1831ms or 103% writing)
56 threads(s), same core: 10.024 GB / 1793ms = 5590 MB/s (1820ms or 101% writing)
</code>
<code> 1 threads(s), diff. cores: 10.000 GB / 1825ms = 5479 MB/s (1824ms or 99% writing) 2 threads(s), diff. cores: 10.000 GB / 1826ms = 5476 MB/s (1819ms or 99% writing) 3 threads(s), diff. cores: 10.002 GB / 1897ms = 5272 MB/s (1887ms or 99% writing) 4 threads(s), diff. cores: 10.000 GB / 1838ms = 5440 MB/s (1826ms or 99% writing) 5 threads(s), diff. cores: 10.000 GB / 1893ms = 5282 MB/s (1880ms or 99% writing) 6 threads(s), diff. cores: 10.002 GB / 1999ms = 5003 MB/s (1885ms or 94% writing) 7 threads(s), diff. cores: 10.003 GB / 1919ms = 5212 MB/s (1903ms or 99% writing) 8 threads(s), diff. cores: 10.000 GB / 2013ms = 4967 MB/s (1927ms or 95% writing) 9 threads(s), diff. cores: 10.008 GB / 1969ms = 5082 MB/s (1953ms or 99% writing) 10 threads(s), diff. cores: 10.000 GB / 1972ms = 5070 MB/s (1956ms or 99% writing) 11 threads(s), diff. cores: 10.010 GB / 1982ms = 5050 MB/s (1966ms or 99% writing) 12 threads(s), diff. cores: 10.008 GB / 1986ms = 5039 MB/s (1969ms or 99% writing) 13 threads(s), diff. cores: 10.010 GB / 2116ms = 4730 MB/s (2099ms or 99% writing) 14 threads(s), diff. cores: 10.010 GB / 2086ms = 4798 MB/s (2055ms or 98% writing) 15 threads(s), diff. cores: 10.005 GB / 2080ms = 4810 MB/s (1997ms or 96% writing) 16 threads(s), diff. cores: 10.000 GB / 2185ms = 4576 MB/s (2095ms or 95% writing) 17 threads(s), diff. cores: 10.013 GB / 2126ms = 4709 MB/s (2109ms or 99% writing) 18 threads(s), diff. cores: 10.008 GB / 2236ms = 4475 MB/s (2181ms or 97% writing) 19 threads(s), diff. cores: 10.013 GB / 2212ms = 4526 MB/s (2133ms or 96% writing) 20 threads(s), diff. cores: 10.000 GB / 2185ms = 4576 MB/s (2168ms or 99% writing) 21 threads(s), diff. cores: 10.017 GB / 2192ms = 4569 MB/s (2174ms or 99% writing) 22 threads(s), diff. cores: 10.010 GB / 2171ms = 4610 MB/s (2152ms or 99% writing) 23 threads(s), diff. cores: 10.005 GB / 2172ms = 4606 MB/s (2154ms or 99% writing) 24 threads(s), diff. cores: 10.008 GB / 2290ms = 4370 MB/s (2271ms or 99% writing) 25 threads(s), diff. cores: 10.000 GB / 2281ms = 4384 MB/s (2262ms or 99% writing) 26 threads(s), diff. cores: 10.010 GB / 2372ms = 4220 MB/s (2352ms or 99% writing) 27 threads(s), diff. cores: 10.017 GB / 2368ms = 4230 MB/s (2349ms or 99% writing) 28 threads(s), diff. cores: 10.024 GB / 2362ms = 4243 MB/s (2343ms or 99% writing) 29 threads(s), diff. cores: 10.005 GB / 2361ms = 4237 MB/s (2341ms or 99% writing) 30 threads(s), diff. cores: 10.020 GB / 2388ms = 4195 MB/s (2369ms or 99% writing) 31 threads(s), diff. cores: 10.013 GB / 2297ms = 4359 MB/s (2277ms or 99% writing) 32 threads(s), diff. cores: 10.016 GB / 2274ms = 4404 MB/s (2255ms or 99% writing) 33 threads(s), diff. cores: 10.032 GB / 2306ms = 4350 MB/s (2286ms or 99% writing) 34 threads(s), diff. cores: 10.030 GB / 2341ms = 4284 MB/s (2321ms or 99% writing) 35 threads(s), diff. cores: 10.010 GB / 2404ms = 4163 MB/s (2383ms or 99% writing) 36 threads(s), diff. cores: 10.008 GB / 2555ms = 3917 MB/s (2446ms or 95% writing) 37 threads(s), diff. cores: 10.027 GB / 2461ms = 4074 MB/s (2440ms or 99% writing) 38 threads(s), diff. cores: 10.032 GB / 2508ms = 4000 MB/s (2488ms or 99% writing) 39 threads(s), diff. cores: 10.023 GB / 2470ms = 4057 MB/s (2449ms or 99% writing) 40 threads(s), diff. cores: 10.000 GB / 2543ms = 3932 MB/s (2521ms or 99% writing) 41 threads(s), diff. cores: 10.004 GB / 2549ms = 3924 MB/s (2528ms or 99% writing) 42 threads(s), diff. cores: 10.038 GB / 2592ms = 3872 MB/s (2570ms or 99% writing) 43 threads(s), diff. cores: 10.019 GB / 2672ms = 3749 MB/s (2651ms or 99% writing) 44 threads(s), diff. cores: 10.032 GB / 2790ms = 3595 MB/s (2700ms or 96% writing) 45 threads(s), diff. cores: 10.035 GB / 2701ms = 3715 MB/s (2676ms or 99% writing) 46 threads(s), diff. cores: 10.028 GB / 2746ms = 3651 MB/s (2720ms or 99% writing) 47 threads(s), diff. cores: 10.011 GB / 2794ms = 3583 MB/s (2770ms or 99% writing) 48 threads(s), diff. cores: 10.032 GB / 2850ms = 3520 MB/s (2823ms or 99% writing) 49 threads(s), diff. cores: 10.045 GB / 2983ms = 3367 MB/s (2936ms or 98% writing) 50 threads(s), diff. cores: 10.000 GB / 2965ms = 3372 MB/s (2939ms or 99% writing) 51 threads(s), diff. cores: 10.047 GB / 2904ms = 3459 MB/s (2879ms or 99% writing) 52 threads(s), diff. cores: 10.036 GB / 2893ms = 3469 MB/s (2866ms or 99% writing) 53 threads(s), diff. cores: 10.017 GB / 3011ms = 3326 MB/s (2981ms or 99% writing) 54 threads(s), diff. cores: 10.044 GB / 2856ms = 3516 MB/s (2830ms or 99% writing) 55 threads(s), diff. cores: 10.010 GB / 2917ms = 3431 MB/s (2891ms or 99% writing) 56 threads(s), diff. cores: 10.024 GB / 2857ms = 3508 MB/s (2829ms or 99% writing) 1 threads(s), same core: 10.000 GB / 1805ms = 5540 MB/s (1804ms or 99% writing) 2 threads(s), same core: 10.000 GB / 1796ms = 5567 MB/s (1817ms or 101% writing) 3 threads(s), same core: 10.002 GB / 1814ms = 5513 MB/s (1813ms or 99% writing) 4 threads(s), same core: 10.000 GB / 1800ms = 5555 MB/s (1798ms or 99% writing) 5 threads(s), same core: 10.000 GB / 1798ms = 5561 MB/s (1822ms or 101% writing) 6 threads(s), same core: 10.002 GB / 1777ms = 5628 MB/s (1814ms or 102% writing) 7 threads(s), same core: 10.003 GB / 1789ms = 5591 MB/s (1786ms or 99% writing) 8 threads(s), same core: 10.000 GB / 1789ms = 5589 MB/s (1830ms or 102% writing) 9 threads(s), same core: 10.008 GB / 1825ms = 5483 MB/s (1809ms or 99% writing) 10 threads(s), same core: 10.000 GB / 1810ms = 5524 MB/s (1804ms or 99% writing) 11 threads(s), same core: 10.010 GB / 1797ms = 5570 MB/s (1795ms or 99% writing) 12 threads(s), same core: 10.008 GB / 1848ms = 5415 MB/s (1845ms or 99% writing) 13 threads(s), same core: 10.010 GB / 1779ms = 5626 MB/s (1806ms or 101% writing) 14 threads(s), same core: 10.010 GB / 1786ms = 5604 MB/s (1816ms or 101% writing) 15 threads(s), same core: 10.005 GB / 1833ms = 5458 MB/s (1830ms or 99% writing) 16 threads(s), same core: 10.000 GB / 1829ms = 5467 MB/s (1826ms or 99% writing) 17 threads(s), same core: 10.013 GB / 1785ms = 5609 MB/s (1815ms or 101% writing) 18 threads(s), same core: 10.008 GB / 1789ms = 5594 MB/s (1825ms or 102% writing) 19 threads(s), same core: 10.013 GB / 1781ms = 5622 MB/s (1814ms or 101% writing) 20 threads(s), same core: 10.000 GB / 1768ms = 5656 MB/s (1803ms or 101% writing) 21 threads(s), same core: 10.017 GB / 1844ms = 5432 MB/s (1834ms or 99% writing) 22 threads(s), same core: 10.010 GB / 1822ms = 5493 MB/s (1818ms or 99% writing) 23 threads(s), same core: 10.005 GB / 1801ms = 5555 MB/s (1797ms or 99% writing) 24 threads(s), same core: 10.008 GB / 1796ms = 5572 MB/s (1832ms or 102% writing) 25 threads(s), same core: 10.000 GB / 1859ms = 5379 MB/s (1806ms or 97% writing) 26 threads(s), same core: 10.010 GB / 1791ms = 5589 MB/s (1827ms or 102% writing) 27 threads(s), same core: 10.017 GB / 1775ms = 5643 MB/s (1813ms or 102% writing) 28 threads(s), same core: 10.024 GB / 1798ms = 5575 MB/s (1830ms or 101% writing) 29 threads(s), same core: 10.005 GB / 1890ms = 5293 MB/s (1850ms or 97% writing) 30 threads(s), same core: 10.020 GB / 1755ms = 5709 MB/s (1785ms or 101% writing) 31 threads(s), same core: 10.013 GB / 1806ms = 5544 MB/s (1844ms or 102% writing) 32 threads(s), same core: 10.016 GB / 1799ms = 5567 MB/s (1826ms or 101% writing) 33 threads(s), same core: 10.032 GB / 1762ms = 5693 MB/s (1815ms or 103% writing) 34 threads(s), same core: 10.030 GB / 1776ms = 5647 MB/s (1813ms or 102% writing) 35 threads(s), same core: 10.010 GB / 1773ms = 5645 MB/s (1812ms or 102% writing) 36 threads(s), same core: 10.008 GB / 1826ms = 5480 MB/s (1863ms or 102% writing) 37 threads(s), same core: 10.027 GB / 1815ms = 5524 MB/s (1846ms or 101% writing) 38 threads(s), same core: 10.032 GB / 1823ms = 5503 MB/s (1830ms or 100% writing) 39 threads(s), same core: 10.023 GB / 1776ms = 5643 MB/s (1811ms or 102% writing) 40 threads(s), same core: 10.000 GB / 1769ms = 5652 MB/s (1807ms or 102% writing) 41 threads(s), same core: 10.004 GB / 1803ms = 5548 MB/s (1841ms or 102% writing) 42 threads(s), same core: 10.038 GB / 1852ms = 5420 MB/s (1841ms or 99% writing) 43 threads(s), same core: 10.019 GB / 1827ms = 5483 MB/s (1844ms or 100% writing) 44 threads(s), same core: 10.032 GB / 1787ms = 5613 MB/s (1817ms or 101% writing) 45 threads(s), same core: 10.035 GB / 1821ms = 5510 MB/s (1852ms or 101% writing) 46 threads(s), same core: 10.028 GB / 1814ms = 5528 MB/s (1842ms or 101% writing) 47 threads(s), same core: 10.011 GB / 1788ms = 5598 MB/s (1816ms or 101% writing) 48 threads(s), same core: 10.032 GB / 1794ms = 5591 MB/s (1820ms or 101% writing) 49 threads(s), same core: 10.045 GB / 1780ms = 5643 MB/s (1809ms or 101% writing) 50 threads(s), same core: 10.000 GB / 1776ms = 5630 MB/s (1841ms or 103% writing) 51 threads(s), same core: 10.047 GB / 1836ms = 5472 MB/s (1824ms or 99% writing) 52 threads(s), same core: 10.036 GB / 1890ms = 5310 MB/s (1835ms or 97% writing) 53 threads(s), same core: 10.017 GB / 1810ms = 5534 MB/s (1836ms or 101% writing) 54 threads(s), same core: 10.044 GB / 1783ms = 5633 MB/s (1815ms or 101% writing) 55 threads(s), same core: 10.010 GB / 1771ms = 5652 MB/s (1831ms or 103% writing) 56 threads(s), same core: 10.024 GB / 1793ms = 5590 MB/s (1820ms or 101% writing) </code>
 1 threads(s), diff. cores: 10.000 GB / 1825ms = 5479 MB/s (1824ms or 99% writing)
 2 threads(s), diff. cores: 10.000 GB / 1826ms = 5476 MB/s (1819ms or 99% writing)
 3 threads(s), diff. cores: 10.002 GB / 1897ms = 5272 MB/s (1887ms or 99% writing)
 4 threads(s), diff. cores: 10.000 GB / 1838ms = 5440 MB/s (1826ms or 99% writing)
 5 threads(s), diff. cores: 10.000 GB / 1893ms = 5282 MB/s (1880ms or 99% writing)
 6 threads(s), diff. cores: 10.002 GB / 1999ms = 5003 MB/s (1885ms or 94% writing)
 7 threads(s), diff. cores: 10.003 GB / 1919ms = 5212 MB/s (1903ms or 99% writing)
 8 threads(s), diff. cores: 10.000 GB / 2013ms = 4967 MB/s (1927ms or 95% writing)
 9 threads(s), diff. cores: 10.008 GB / 1969ms = 5082 MB/s (1953ms or 99% writing)
10 threads(s), diff. cores: 10.000 GB / 1972ms = 5070 MB/s (1956ms or 99% writing)
11 threads(s), diff. cores: 10.010 GB / 1982ms = 5050 MB/s (1966ms or 99% writing)
12 threads(s), diff. cores: 10.008 GB / 1986ms = 5039 MB/s (1969ms or 99% writing)
13 threads(s), diff. cores: 10.010 GB / 2116ms = 4730 MB/s (2099ms or 99% writing)
14 threads(s), diff. cores: 10.010 GB / 2086ms = 4798 MB/s (2055ms or 98% writing)
15 threads(s), diff. cores: 10.005 GB / 2080ms = 4810 MB/s (1997ms or 96% writing)
16 threads(s), diff. cores: 10.000 GB / 2185ms = 4576 MB/s (2095ms or 95% writing)
17 threads(s), diff. cores: 10.013 GB / 2126ms = 4709 MB/s (2109ms or 99% writing)
18 threads(s), diff. cores: 10.008 GB / 2236ms = 4475 MB/s (2181ms or 97% writing)
19 threads(s), diff. cores: 10.013 GB / 2212ms = 4526 MB/s (2133ms or 96% writing)
20 threads(s), diff. cores: 10.000 GB / 2185ms = 4576 MB/s (2168ms or 99% writing)
21 threads(s), diff. cores: 10.017 GB / 2192ms = 4569 MB/s (2174ms or 99% writing)
22 threads(s), diff. cores: 10.010 GB / 2171ms = 4610 MB/s (2152ms or 99% writing)
23 threads(s), diff. cores: 10.005 GB / 2172ms = 4606 MB/s (2154ms or 99% writing)
24 threads(s), diff. cores: 10.008 GB / 2290ms = 4370 MB/s (2271ms or 99% writing)
25 threads(s), diff. cores: 10.000 GB / 2281ms = 4384 MB/s (2262ms or 99% writing)
26 threads(s), diff. cores: 10.010 GB / 2372ms = 4220 MB/s (2352ms or 99% writing)
27 threads(s), diff. cores: 10.017 GB / 2368ms = 4230 MB/s (2349ms or 99% writing)
28 threads(s), diff. cores: 10.024 GB / 2362ms = 4243 MB/s (2343ms or 99% writing)
29 threads(s), diff. cores: 10.005 GB / 2361ms = 4237 MB/s (2341ms or 99% writing)
30 threads(s), diff. cores: 10.020 GB / 2388ms = 4195 MB/s (2369ms or 99% writing)
31 threads(s), diff. cores: 10.013 GB / 2297ms = 4359 MB/s (2277ms or 99% writing)
32 threads(s), diff. cores: 10.016 GB / 2274ms = 4404 MB/s (2255ms or 99% writing)
33 threads(s), diff. cores: 10.032 GB / 2306ms = 4350 MB/s (2286ms or 99% writing)
34 threads(s), diff. cores: 10.030 GB / 2341ms = 4284 MB/s (2321ms or 99% writing)
35 threads(s), diff. cores: 10.010 GB / 2404ms = 4163 MB/s (2383ms or 99% writing)
36 threads(s), diff. cores: 10.008 GB / 2555ms = 3917 MB/s (2446ms or 95% writing)
37 threads(s), diff. cores: 10.027 GB / 2461ms = 4074 MB/s (2440ms or 99% writing)
38 threads(s), diff. cores: 10.032 GB / 2508ms = 4000 MB/s (2488ms or 99% writing)
39 threads(s), diff. cores: 10.023 GB / 2470ms = 4057 MB/s (2449ms or 99% writing)
40 threads(s), diff. cores: 10.000 GB / 2543ms = 3932 MB/s (2521ms or 99% writing)
41 threads(s), diff. cores: 10.004 GB / 2549ms = 3924 MB/s (2528ms or 99% writing)
42 threads(s), diff. cores: 10.038 GB / 2592ms = 3872 MB/s (2570ms or 99% writing)
43 threads(s), diff. cores: 10.019 GB / 2672ms = 3749 MB/s (2651ms or 99% writing)
44 threads(s), diff. cores: 10.032 GB / 2790ms = 3595 MB/s (2700ms or 96% writing)
45 threads(s), diff. cores: 10.035 GB / 2701ms = 3715 MB/s (2676ms or 99% writing)
46 threads(s), diff. cores: 10.028 GB / 2746ms = 3651 MB/s (2720ms or 99% writing)
47 threads(s), diff. cores: 10.011 GB / 2794ms = 3583 MB/s (2770ms or 99% writing)
48 threads(s), diff. cores: 10.032 GB / 2850ms = 3520 MB/s (2823ms or 99% writing)
49 threads(s), diff. cores: 10.045 GB / 2983ms = 3367 MB/s (2936ms or 98% writing)
50 threads(s), diff. cores: 10.000 GB / 2965ms = 3372 MB/s (2939ms or 99% writing)
51 threads(s), diff. cores: 10.047 GB / 2904ms = 3459 MB/s (2879ms or 99% writing)
52 threads(s), diff. cores: 10.036 GB / 2893ms = 3469 MB/s (2866ms or 99% writing)
53 threads(s), diff. cores: 10.017 GB / 3011ms = 3326 MB/s (2981ms or 99% writing)
54 threads(s), diff. cores: 10.044 GB / 2856ms = 3516 MB/s (2830ms or 99% writing)
55 threads(s), diff. cores: 10.010 GB / 2917ms = 3431 MB/s (2891ms or 99% writing)
56 threads(s), diff. cores: 10.024 GB / 2857ms = 3508 MB/s (2829ms or 99% writing)
 1 threads(s), same core: 10.000 GB / 1805ms = 5540 MB/s (1804ms or 99% writing)
 2 threads(s), same core: 10.000 GB / 1796ms = 5567 MB/s (1817ms or 101% writing)
 3 threads(s), same core: 10.002 GB / 1814ms = 5513 MB/s (1813ms or 99% writing)
 4 threads(s), same core: 10.000 GB / 1800ms = 5555 MB/s (1798ms or 99% writing)
 5 threads(s), same core: 10.000 GB / 1798ms = 5561 MB/s (1822ms or 101% writing)
 6 threads(s), same core: 10.002 GB / 1777ms = 5628 MB/s (1814ms or 102% writing)
 7 threads(s), same core: 10.003 GB / 1789ms = 5591 MB/s (1786ms or 99% writing)
 8 threads(s), same core: 10.000 GB / 1789ms = 5589 MB/s (1830ms or 102% writing)
 9 threads(s), same core: 10.008 GB / 1825ms = 5483 MB/s (1809ms or 99% writing)
10 threads(s), same core: 10.000 GB / 1810ms = 5524 MB/s (1804ms or 99% writing)
11 threads(s), same core: 10.010 GB / 1797ms = 5570 MB/s (1795ms or 99% writing)
12 threads(s), same core: 10.008 GB / 1848ms = 5415 MB/s (1845ms or 99% writing)
13 threads(s), same core: 10.010 GB / 1779ms = 5626 MB/s (1806ms or 101% writing)
14 threads(s), same core: 10.010 GB / 1786ms = 5604 MB/s (1816ms or 101% writing)
15 threads(s), same core: 10.005 GB / 1833ms = 5458 MB/s (1830ms or 99% writing)
16 threads(s), same core: 10.000 GB / 1829ms = 5467 MB/s (1826ms or 99% writing)
17 threads(s), same core: 10.013 GB / 1785ms = 5609 MB/s (1815ms or 101% writing)
18 threads(s), same core: 10.008 GB / 1789ms = 5594 MB/s (1825ms or 102% writing)
19 threads(s), same core: 10.013 GB / 1781ms = 5622 MB/s (1814ms or 101% writing)
20 threads(s), same core: 10.000 GB / 1768ms = 5656 MB/s (1803ms or 101% writing)
21 threads(s), same core: 10.017 GB / 1844ms = 5432 MB/s (1834ms or 99% writing)
22 threads(s), same core: 10.010 GB / 1822ms = 5493 MB/s (1818ms or 99% writing)
23 threads(s), same core: 10.005 GB / 1801ms = 5555 MB/s (1797ms or 99% writing)
24 threads(s), same core: 10.008 GB / 1796ms = 5572 MB/s (1832ms or 102% writing)
25 threads(s), same core: 10.000 GB / 1859ms = 5379 MB/s (1806ms or 97% writing)
26 threads(s), same core: 10.010 GB / 1791ms = 5589 MB/s (1827ms or 102% writing)
27 threads(s), same core: 10.017 GB / 1775ms = 5643 MB/s (1813ms or 102% writing)
28 threads(s), same core: 10.024 GB / 1798ms = 5575 MB/s (1830ms or 101% writing)
29 threads(s), same core: 10.005 GB / 1890ms = 5293 MB/s (1850ms or 97% writing)
30 threads(s), same core: 10.020 GB / 1755ms = 5709 MB/s (1785ms or 101% writing)
31 threads(s), same core: 10.013 GB / 1806ms = 5544 MB/s (1844ms or 102% writing)
32 threads(s), same core: 10.016 GB / 1799ms = 5567 MB/s (1826ms or 101% writing)
33 threads(s), same core: 10.032 GB / 1762ms = 5693 MB/s (1815ms or 103% writing)
34 threads(s), same core: 10.030 GB / 1776ms = 5647 MB/s (1813ms or 102% writing)
35 threads(s), same core: 10.010 GB / 1773ms = 5645 MB/s (1812ms or 102% writing)
36 threads(s), same core: 10.008 GB / 1826ms = 5480 MB/s (1863ms or 102% writing)
37 threads(s), same core: 10.027 GB / 1815ms = 5524 MB/s (1846ms or 101% writing)
38 threads(s), same core: 10.032 GB / 1823ms = 5503 MB/s (1830ms or 100% writing)
39 threads(s), same core: 10.023 GB / 1776ms = 5643 MB/s (1811ms or 102% writing)
40 threads(s), same core: 10.000 GB / 1769ms = 5652 MB/s (1807ms or 102% writing)
41 threads(s), same core: 10.004 GB / 1803ms = 5548 MB/s (1841ms or 102% writing)
42 threads(s), same core: 10.038 GB / 1852ms = 5420 MB/s (1841ms or 99% writing)
43 threads(s), same core: 10.019 GB / 1827ms = 5483 MB/s (1844ms or 100% writing)
44 threads(s), same core: 10.032 GB / 1787ms = 5613 MB/s (1817ms or 101% writing)
45 threads(s), same core: 10.035 GB / 1821ms = 5510 MB/s (1852ms or 101% writing)
46 threads(s), same core: 10.028 GB / 1814ms = 5528 MB/s (1842ms or 101% writing)
47 threads(s), same core: 10.011 GB / 1788ms = 5598 MB/s (1816ms or 101% writing)
48 threads(s), same core: 10.032 GB / 1794ms = 5591 MB/s (1820ms or 101% writing)
49 threads(s), same core: 10.045 GB / 1780ms = 5643 MB/s (1809ms or 101% writing)
50 threads(s), same core: 10.000 GB / 1776ms = 5630 MB/s (1841ms or 103% writing)
51 threads(s), same core: 10.047 GB / 1836ms = 5472 MB/s (1824ms or 99% writing)
52 threads(s), same core: 10.036 GB / 1890ms = 5310 MB/s (1835ms or 97% writing)
53 threads(s), same core: 10.017 GB / 1810ms = 5534 MB/s (1836ms or 101% writing)
54 threads(s), same core: 10.044 GB / 1783ms = 5633 MB/s (1815ms or 101% writing)
55 threads(s), same core: 10.010 GB / 1771ms = 5652 MB/s (1831ms or 103% writing)
56 threads(s), same core: 10.024 GB / 1793ms = 5590 MB/s (1820ms or 101% writing)

24

One of the real issues that I see, is that you’re using a shared mutex between every thread. Essentially creating a queue for the threads to write independently. Secondly, even without the mutex, you would have a bottleneck effect having multiple threads write to the same file. So of course your timing will increase exponentially based on N threads, because in truth, they are not all writing at the same time.

A true test and solution to your multiple thread writing of a single data source would look something like this:

  1. Split the data source based on N number of threads.
  2. Have each thread write its data to a temporary file.
  3. Finally, concatenate the files sequentially together in order the threads are spawned.

10

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị
Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa
Thiết kế website Thiết kế website Thiết kế website Cách kháng tài khoản quảng cáo Mua bán Fanpage Facebook Dịch vụ SEO Tổ chức sinh nhật