Python – Generator

In Python Generators are a way to create an iterator. If you are unfamiliar with what an iterator is in python for the sake of this article it is an object that you can loop over in a for loop. Generators are a very quick and short way of creating simple iterators and in this article I will take you through a practical example of creating one.


Lets say that you have a log file that is over 30Gb in size and you need to open it and parse it, you cant open the file and read it all into memory as most computers have no where near 30GB of memory so we need to have a way of reading that file in smaller chunks at a time and process each of those chunks. This is a perfect candidate for a generator, while there are ways of opening a file and reading it chunks at a time using a for loop, if we make this into a generator we only need to write this functionality once and we can import it, using it in various parts of our program.


Yield Statement

The yield statement is the heart of generators, in essence a generator is simply a function with a loop and a yield statement instead of a return statement. It is important to understand that instead of stopping the execution of the function and returning data as the return statement does, the yield statement pauses the execution of the function and returns the data.


Reading a File Using a Function

First I will make a standard function to read a file and return its contents, below is that code.

Walking through this function we first create an empty string which we will later store our file contents in

We then open the file in read mode and drop into a while True loop, in each iteration we read 1024 bytes from the file and store it in a temporary variable which if we have no data we are done reading the file and break out of the loop returning the entire file otherwise we append the content to the ‘data’ variable and continue to read the file.


Reading a File with a Generator

After reading that breakdown on the code above you may be thinking to yourself “This will not work for the scenario outlined above because we cant read the entire file like this” which is perfectly correct, it will not work due to the previously mentioned memory limitation. In order to solve that problem we can easily convert this function to a generator, lets take a look at the code below

Looking at this code we can see we removed our old ‘data’ variable which stored the entire file.

The majority of the function is the same but instead after we read a chunk of the file and validate that we in fact have data we use the yield statement to return the data and pause the execution of the function.

Lets write some additional code to utilize the generator.

This is assuming we have a large file named file.txt.

And when we run that code from the user perspective nothing will appear different but if you pull up your system memory stats (top on Linux, task manager on Windows) we will only be consuming small amounts of memory as we step through our large file vs if we were to attempt to read the entire file into a variable as in our first example.


In Conclusion

This is a practical example of when using a generator in Python can achieve things that otherwise would take extra code to implement. If this was able to help please share this post and if you would like to read more posts about Python check out my Python Posts Collection.

Leave a Comment