Garbage collection & memory leaks on hash tables

I was reading R. Read’s How to be a programmer, and I came accross something I didn’t understand:

…even with garbage collection, you can fill up all memory with
garbage. A classic mistake is to use a hash table as a cache and
forget to remove the references in the hash table. Since the reference
remains, the referent is noncollectable but useless. This is called a
memory leak. You should look for and fix memory leaks early. If you
have long running systems memory may never be exhausted in testing but
will be exhausted by the user.

So let’s say I have a dictionary structure in python, indexed on md5 hashes (is this the kind of hashtable he’s referring to?). Eg:

x = {}
x['c4ca4238a0b923820dcc509a6f75849b'] = 1
x['c81e728d9d4c2f636f067f89cc14862c'] = 2

Can someone now walk me through his example? What do I have to do now concretely to cause a memory leak?

5

It’s pretty basic. Walk through the code:

x = {}

Memory reserved for overhead

x['c4ca4238a0b923820dcc509a6f75849b'] = 1

Memory for one key/value pair allocated

x['c81e728d9d4c2f636f067f89cc14862c'] = 2

Memory for two key/value pairs allocated

Now imagine we do this 10,000 times. We will have allocated space for 10,000 key/value pairs. Each hash added increases memory usage. If these values will indeed never be used again, they are “useless”, but since you’ve told python to save them, they will not be collected. You’d need to remove the reference like this:

del x['c4ca4238a0b923820dcc509a6f75849b']

You have to do this because in general, the garbage collector can’t know that you are never going to look this value up.

The case talked about here is using a dictionary as a cache, and presumably you don’t really know if you will need a particular value. Also, presumably, there’s no real limit to the number of values. So with a long-running app, you could conceivably exhaust memory. You’d need to have some scheme to delete elements from the cache when they are no longer needed, presumably by testing recency of use.

GC can’t help you here any more than it can help you if you attempt to allocate a 100 gigabyte array. You’ve told it explicitly through referencing not to discard any of the data.

2

To me the semantics don’t matter. A program is leaky if it starts using up a boatload of memory and running slower and slower the longer you run it, like a video game which requires you to restart every 30 minutes because the frame rates keep dropping the longer you play it while it goes from taking megabytes to gigabytes of memory. Most games that exhibit these leaky symptoms use garbage collection and for a good reason: garbage collection tends to exchange immediately reproducible crashes for leaks that fly under the radar.

In that sense above, it can actually be easier to introduce leaks into a program built using a language with GC, since all you have to do to root a resource and prevent it from being freed is store a reference to it.

As a simple example, let’s say you have a video game which has a physics system to move particles around and a renderer to render a list of particles it references as well. When a particle dies, it fades off the screen and stops being visible, at which point the physics system removes it from the list of particles to process.

Voila, now you have a leak because you didn’t remove the particle reference from the renderer. However, it won’t be obvious in game because the dead particles have an opacity of zero. Nevertheless, the game would be creating a bigger and bigger list of particles that are never freed until the game is shut down, and spending more and more time in increasingly larger loops in the renderer as the particle list grows larger and larger. This might fly under the radar of the developers indefinitely to the point where they actually suggest to users to restart the game from time to time if it gets slow while bumping up the system requirements to beefy machines even for a simple 2D game.

Meanwhile in C, this would have simply lead to an immediately detectable crash since the physics system would have manually destroyed the particle when it died. The renderer would then try to access a particle which was destroyed and most likely come crashing down during the first play test which is arguably more preferable in this case than having game-halting leaks which go unsolved indefinitely.

A very firm way to avoid this problem is to rely on concepts like weak and phantom references and decide who actually manages the particles. If it’s the physics system, then the renderer should keep weak references to particles so that it doesn’t prolong their lifespan and can detect when they are destroyed and hopefully even run into a game-halting error if the renderer tries to access a particle that no longer exists.

In general GC doesn’t protect you against having to think about resource management and who owns a resource and having to manually free resources (assigning them as none/nil/null) to avoid logical leaks. Its primary usefulness in my opinion is in the context of areas like multithreading where you want to ensure that an object is not destroyed until a thread is finished processing the object.

The ideal solution to me if a language could ever provide it at the native language level (C++ is the closest I can think with shared_ptr, but it’s a library concept and can’t detect cycles since it uses basic ref counting) is one that could let you opt into garbage collection per object. For example, perhaps the object is normally destroyed when it either goes out of scope or is assigned a null, and everything that references it outside of its immediate scope effectively acts as a pointer to it, not preventing the object from being destroyed. However, something that wants to share ownership of that object, like a thread, could call a ref method to increase its reference count and deref to decrement it to prevent it from being destroyed until the thread is finished using it, or it could use some kind of shared reference concept that avoids the need for an explicit deref in the thread’s scope, with a “pay as you use it” cost.

Here’s an archived link to the article mentioned in the question.

I think the author is saying that in garbage-collected languages, memory leaks are introduced when there are unintended lingering references to objects, and that hash tables / dictionaries / maps / associative arrays (whatever the language calls them) are good hiding places for these references.

In many languages, it can be the case that a dictionary contains as values the last remaining references to objects which would otherwise be garbage-collected. Or, if the language allows it, lingering references could also be found among the keys of the structure.

For example, in JavaScript, you’d be able to use a Map that takes objects as keys, perhaps to store extra information about these objects:

let person_1 = { name: 'Dan' };
let person_2 = { name: 'Alex' };

let ages = new Map();

ages.set(person_1, 20);
ages.set(person_2, 30);

If all other references to person_1 were to disappear, ages would end up holding the last reference to the object and prevent it from being garbage-collected for as long as the Map is in use.

To support the use-case without causing memory leaks, the alternative structure is WeakMap, where keys are weakly referenced, meaning that whenever the WeakMap holds the last reference to an object, the object can be garbage-collected and the corresponding key-value pair from the WeakMap will be removed.

I think it’s a mere coincidence that caches (often implemented as hash tables) are susceptible to another form of memory hoarding — accumulating an ever-increasing amount of information, that may or may not be needed. This has a similar effect with (and may even be called) a memory leak, but not by preventing the GC from doing its job, which I think is what the author means to raise awareness about.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị
Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa

Garbage collection & memory leaks on hash tables

I was reading R. Read’s How to be a programmer, and I came accross something I didn’t understand:

…even with garbage collection, you can fill up all memory with
garbage. A classic mistake is to use a hash table as a cache and
forget to remove the references in the hash table. Since the reference
remains, the referent is noncollectable but useless. This is called a
memory leak. You should look for and fix memory leaks early. If you
have long running systems memory may never be exhausted in testing but
will be exhausted by the user.

So let’s say I have a dictionary structure in python, indexed on md5 hashes (is this the kind of hashtable he’s referring to?). Eg:

x = {}
x['c4ca4238a0b923820dcc509a6f75849b'] = 1
x['c81e728d9d4c2f636f067f89cc14862c'] = 2

Can someone now walk me through his example? What do I have to do now concretely to cause a memory leak?

5

It’s pretty basic. Walk through the code:

x = {}

Memory reserved for overhead

x['c4ca4238a0b923820dcc509a6f75849b'] = 1

Memory for one key/value pair allocated

x['c81e728d9d4c2f636f067f89cc14862c'] = 2

Memory for two key/value pairs allocated

Now imagine we do this 10,000 times. We will have allocated space for 10,000 key/value pairs. Each hash added increases memory usage. If these values will indeed never be used again, they are “useless”, but since you’ve told python to save them, they will not be collected. You’d need to remove the reference like this:

del x['c4ca4238a0b923820dcc509a6f75849b']

You have to do this because in general, the garbage collector can’t know that you are never going to look this value up.

The case talked about here is using a dictionary as a cache, and presumably you don’t really know if you will need a particular value. Also, presumably, there’s no real limit to the number of values. So with a long-running app, you could conceivably exhaust memory. You’d need to have some scheme to delete elements from the cache when they are no longer needed, presumably by testing recency of use.

GC can’t help you here any more than it can help you if you attempt to allocate a 100 gigabyte array. You’ve told it explicitly through referencing not to discard any of the data.

2

To me the semantics don’t matter. A program is leaky if it starts using up a boatload of memory and running slower and slower the longer you run it, like a video game which requires you to restart every 30 minutes because the frame rates keep dropping the longer you play it while it goes from taking megabytes to gigabytes of memory. Most games that exhibit these leaky symptoms use garbage collection and for a good reason: garbage collection tends to exchange immediately reproducible crashes for leaks that fly under the radar.

In that sense above, it can actually be easier to introduce leaks into a program built using a language with GC, since all you have to do to root a resource and prevent it from being freed is store a reference to it.

As a simple example, let’s say you have a video game which has a physics system to move particles around and a renderer to render a list of particles it references as well. When a particle dies, it fades off the screen and stops being visible, at which point the physics system removes it from the list of particles to process.

Voila, now you have a leak because you didn’t remove the particle reference from the renderer. However, it won’t be obvious in game because the dead particles have an opacity of zero. Nevertheless, the game would be creating a bigger and bigger list of particles that are never freed until the game is shut down, and spending more and more time in increasingly larger loops in the renderer as the particle list grows larger and larger. This might fly under the radar of the developers indefinitely to the point where they actually suggest to users to restart the game from time to time if it gets slow while bumping up the system requirements to beefy machines even for a simple 2D game.

Meanwhile in C, this would have simply lead to an immediately detectable crash since the physics system would have manually destroyed the particle when it died. The renderer would then try to access a particle which was destroyed and most likely come crashing down during the first play test which is arguably more preferable in this case than having game-halting leaks which go unsolved indefinitely.

A very firm way to avoid this problem is to rely on concepts like weak and phantom references and decide who actually manages the particles. If it’s the physics system, then the renderer should keep weak references to particles so that it doesn’t prolong their lifespan and can detect when they are destroyed and hopefully even run into a game-halting error if the renderer tries to access a particle that no longer exists.

In general GC doesn’t protect you against having to think about resource management and who owns a resource and having to manually free resources (assigning them as none/nil/null) to avoid logical leaks. Its primary usefulness in my opinion is in the context of areas like multithreading where you want to ensure that an object is not destroyed until a thread is finished processing the object.

The ideal solution to me if a language could ever provide it at the native language level (C++ is the closest I can think with shared_ptr, but it’s a library concept and can’t detect cycles since it uses basic ref counting) is one that could let you opt into garbage collection per object. For example, perhaps the object is normally destroyed when it either goes out of scope or is assigned a null, and everything that references it outside of its immediate scope effectively acts as a pointer to it, not preventing the object from being destroyed. However, something that wants to share ownership of that object, like a thread, could call a ref method to increase its reference count and deref to decrement it to prevent it from being destroyed until the thread is finished using it, or it could use some kind of shared reference concept that avoids the need for an explicit deref in the thread’s scope, with a “pay as you use it” cost.

Here’s an archived link to the article mentioned in the question.

I think the author is saying that in garbage-collected languages, memory leaks are introduced when there are unintended lingering references to objects, and that hash tables / dictionaries / maps / associative arrays (whatever the language calls them) are good hiding places for these references.

In many languages, it can be the case that a dictionary contains as values the last remaining references to objects which would otherwise be garbage-collected. Or, if the language allows it, lingering references could also be found among the keys of the structure.

For example, in JavaScript, you’d be able to use a Map that takes objects as keys, perhaps to store extra information about these objects:

let person_1 = { name: 'Dan' };
let person_2 = { name: 'Alex' };

let ages = new Map();

ages.set(person_1, 20);
ages.set(person_2, 30);

If all other references to person_1 were to disappear, ages would end up holding the last reference to the object and prevent it from being garbage-collected for as long as the Map is in use.

To support the use-case without causing memory leaks, the alternative structure is WeakMap, where keys are weakly referenced, meaning that whenever the WeakMap holds the last reference to an object, the object can be garbage-collected and the corresponding key-value pair from the WeakMap will be removed.

I think it’s a mere coincidence that caches (often implemented as hash tables) are susceptible to another form of memory hoarding — accumulating an ever-increasing amount of information, that may or may not be needed. This has a similar effect with (and may even be called) a memory leak, but not by preventing the GC from doing its job, which I think is what the author means to raise awareness about.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị
Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa
Thiết kế website Thiết kế website Thiết kế website Cách kháng tài khoản quảng cáo Mua bán Fanpage Facebook Dịch vụ SEO Tổ chức sinh nhật