Python iterators can be very memory efficient. Should I always prefer to use generators instead of just lists? In what situations should I prefer a plain array?
For example instead of this:
emails = [user.email for user in users]
should I prefer this?:
emails = (user.email for user in users)
Note: I do mean ‘generators’ not ‘iterators’.
6
The main drawback of generators is they can only be traversed in one direction. There’s no going back to a previous value. You also can’t share them. There are many cases where that can easily be accounted for, or even where it is preferable, but there are also many cases where it isn’t. Sorting, for example.
That’s why a lot of times you’ll see generators used for the initial processing stages of a large amount of data, then once it’s filtered into a subset and mapped into a nice format, it gets put into a long-lasting concrete data structure for further use. That way you don’t spend the memory allocation, cache misses, and garbage collection costs for large intermediate arrays you’ll immediately throw away.
You’ve got iterators confused with generators.
Your first example is a list iterator expression while the second is a generator expression. The key difference is that the generator creates each member of the given collection lazily (as needed) rather than eagerly (at once, whether needed or not). You can define your own generators by using yield rather than return.
As far as usage goes, you want an iterator when you need the list (or dictionary or whatever) for something in and of itself. You’d use a generator when the collection is incidental to the final product. For example, you might use a generator to get a list of numbers that satisfy some criteria from some other criteria. You don’t care about the original list, only the members which fit the criteria. Thus, you use a generator to only get those numbers.
A more concrete example would be finding the angles of triangles whose sides are a given length. We don’t want all the poor, deformed failed triangles, only the strong, healthy successful triangles. Thus we’d use a generator.
This Stack Overflow post goes into more detail.