A good C Variable Length Array example [closed]

This question got rather a freezing reception at SO, so I decided to delete it there and try here instead. If you think it does not fit here either, please at least leave a comment on suggestion how to find an example I’m after…

Can you give an example, where using C99 VLAs offers a real advantage over something like current standard heap-using C++ RAII mechanisms?

The example I am after should:

Achieve an easily measurable (10% maybe) performance advantage over using heap.
Not have a good workaround, which would not need the whole array at all.
Actually benefit from using dynamic size, instead of fixed maximum size.
Be unlikely to cause stack overflow in normal use scenario.
Be strong enough to tempt a developer needing the performance to include a C99 source file in a C++ project.

Adding some clarification on the context: I mean VLA as meant by C99 and not included in standard C++: int array[n] where n is a variable. And I am after an example of use case where it trumps the alternatives offered by other standards (C90, C++11):

int array[MAXSIZE]; // C stack array with compile time constant size
int *array = calloc(n, sizeof int); // C heap array with manual free
int *array = new int[n]; // C++ heap array with manual delete
std::unique_ptr<int[]> array(new int[n]); // C++ heap array with RAII
std::vector<int> array(n); // STL container with preallocated size

Some ideas:

Functions taking varargs, which naturally limits item count to something reasonable, yet is without any useful API-level upper limit.
Recursive functions, where wasted stack is undesirable
Many small allocations and releases, where heap overhead would be bad.
Handling multi-dimensional arrays (like arbitrarily sized matrices), where performance is critical, and small functions are expected to get inlined a lot.
From comment: concurrent algorithm, where heap allocation has synchronization overhead.

Wikipedia has an an example which does not fulfill my criteria, because the practical difference to using heap seems irrelevant at least without context. It is also non-ideal, because without more context, it seems item count could very well cause stack overflow.

Note: I’m specifically after an example code, or suggestion of an algorithm which would benefit from this, for me to implement the example myself.

I just hacked up a little program that generates a set of random numbers restarting at the same seed each time, to ensure that it’s “fair” and “comparable”. As it goes along, it figures out the min and max of these values. And when it has generated the set of numbers, it counts how many are above the average of min and max.

For VERY small arrays, it shows a clear benefit with VLA’s over std::vector<>.

It is not a real problem, but we can easily imagine something where we’d be reading the values from a small file instead of using random numbers, and doing some other, more meaningful counting/min/max calculations with the same sort of code.

For VERY small values of the “number of random numbers” (x) in the relevant functions, the vla solution wins by a huge margin. As the size goes larger, the “win” gets smaller, and given sufficient size, the vector solution appears to be MORE efficient – didn’t study that variant too much, as when we start having thousands of elements in a VLA, it’s not really what they were meant to do…

And I’m sure someone will tell me that there’s some way of writing all this code with a bunch of templates and get it to do this without running more than the RDTSC and cout bits at runtime… But I don’t think that’s really the point.

When running this particular variant, I get about 10% difference between the func1 (VLA) and func2 (std::vector).

count = 9884
func1 time in clocks per iteration 7048685
count = 9884
func2 time in clocks per iteration 7661067
count = 9884
func3 time in clocks per iteration 8971878

This is compiled with: g++ -O3 -Wall -Wextra -std=gnu++0x -o vla vla.cpp

Here’s the code:

#include <iostream>
#include <vector>
#include <cstdint>
#include <cstdlib>

using namespace std;

const int SIZE = 1000000;

uint64_t g_val[SIZE];


static __inline__ unsigned long long rdtsc(void)
{
    unsigned hi, lo;
    __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
    return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );
}


int func1(int x)
{
    int v[x];

    int vmax = 0;
    int vmin = x;
    for(int i = 0; i < x; i++)
    {
        v[i] = rand() % x;
        if (v[i] > vmax) 
            vmax = v[i];
        if (v[i] < vmin) 
            vmin = v[i];
    }
    int avg = (vmax + vmin) / 2;
    int count = 0;
    for(int i = 0; i < x; i++)
    {
        if (v[i] > avg)
        {
            count++;
        }
    }
    return count;
}

int func2(int x)
{
    vector<int> v;
    v.resize(x); 

    int vmax = 0;
    int vmin = x;
    for(int i = 0; i < x; i++)
    {
        v[i] = rand() % x;
        if (v[i] > vmax) 
            vmax = v[i];
        if (v[i] < vmin) 
            vmin = v[i];
    }
    int avg = (vmax + vmin) / 2;
    int count = 0;
    for(int i = 0; i < x; i++)
    {
        if (v[i] > avg)
        {
            count++;
        }
    }
    return count;
}    

int func3(int x)
{
    vector<int> v;

    int vmax = 0;
    int vmin = x;
    for(int i = 0; i < x; i++)
    {
        v.push_back(rand() % x);
        if (v[i] > vmax) 
            vmax = v[i];
        if (v[i] < vmin) 
            vmin = v[i];
    }
    int avg = (vmax + vmin) / 2;
    int count = 0;
    for(int i = 0; i < x; i++)
    {
        if (v[i] > avg)
        {
            count++;
        }
    }
    return count;
}    

void runbench(int (*f)(int), const char *name)
{
    srand(41711211);
    uint64_t long t = rdtsc();
    int count = 0;
    for(int i = 20; i < 200; i++)
    {
        count += f(i);
    }
    t = rdtsc() - t;
    cout << "count = " << count << endl;
    cout << name << " time in clocks per iteration " << dec << t << endl;
}

struct function
{
    int (*func)(int);
    const char *name;
};


#define FUNC(f) { f, #f }

function funcs[] = 
{
    FUNC(func1),
    FUNC(func2),
    FUNC(func3),
}; 


int main()
{
    for(size_t i = 0; i < sizeof(funcs)/sizeof(funcs[0]); i++)
    {
        runbench(funcs[i].func, funcs[i].name);
    }
}

The reason to use a VLA is primarily performance. It is a mistake to disregard the wiki example as having only an “irrelevant” difference. I can easily see cases where exactly that code could have a huge difference, for instance, if that function was called in a tight loop, where read_val was an IO function that returned very quickly on some sort of system where speed was critical.

In fact, in most places where VLAs are used in this manner, they don’t replace heap calls but instead replace something like:

float vals[256]; /* I hope we never get more! */

The thing about any local declaration is that it is extremely quick. The line float vals[n] generally only requires a couple of processor instructions (maybe just one.) It simply adds the value in n to the stack pointer.

On the other hand, a heap allocation requires walking a data structure to find a free area. The time is probably an order of magnitude longer even in the luckiest case. (I.e. just the act of placing n on the stack and calling malloc is probably 5-10 instructions.) Probably much worse if there’s any reasonable amount of data in the heap. It would not surprise me at all to see a case where malloc was 100x to 1000x slower in a real program.

Of course, then you also have some performance impact with the matching free, probably similar in magnitude to the malloc call.

In addition, there’s the issue of memory fragmentation. Lots of little allocations tend to fragment the heap. Fragmented heaps both waste memory and increase the time required to allocate memory.

Regarding VLAs versus a Vector

Did you consider that a Vector can take advantage of VLAs itself. Without VLAs, the Vector has to specify certain “scales” of arrays e.g. 10, 100, 10000 for storage so you end up allocating a 10000 item array to hold 101 items. With VLAs, if you resize to 200, the algorithm might assume that you’ll only need 200 and can allocate a 200 item array. Or it can allocate a buffer of say n*1.5.

Anyway, I’d argue that if you know how many items you’ll need at runtime, a VLA is more performant (as Mats’ benchmark demonstrated). What he demonstrated was a simple two pass iteration. Think of monte carlo simulations where random samples are taken repeatedly, or image manipulation (like Photoshop filters) where computations are done on each element multiple times and quite possibly each computation on each element involves looking at neighbors.

That extra pointer jump from the vector to its internal array adds up.

Answering the main question

But when you talk about using a dynamically allocated structure like a LinkedList, there is no comparison. An array provides direct access using pointer arithmetic to its elements. Using a linked list you have to walk the nodes to get to a specific element. So the VLA wins hands down in this scenario.

According to this answer, it is architecturally dependent, but in some cases memory access on the stack will be faster due to the stack being available on the cache. With a large number of elements this may be negated (potentially the cause of the diminishing returns Mats saw in his benchmarks). However, it’s worth noting that Cache sizes are growing significantly and you’ll potentially see more that number grow accordingly.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 16:07

Thẻ: c++, heap, stack

Thiết kế website giá rẻ

Danh mục

A good C Variable Length Array example [closed]