Python generators are great!


Python generators are great!

Python is an amazing language that can be both forgiving for beginners and still provide more and more advanced features as we dig through. Sometimes, you hear or read about some of its powerfull tools, and you feel that you have nothing to use it on. But one day that time comes, and you realise that it can make things easier, more readable or more performant. Generators are one of those nice little things.

I have used generators before, every Python programmer uses them even without knowing about them, because in fact generators are used to make some of the most useful tools in the Standard Library, like range(), open(), os.walk() and many others. Generators are nice because they work in a way that allows to acess and/or process data in small pieces as needed, allowing to save memory and processing time. This approach can save time in cases where you don’t need the whole data to be accessed and processed in one single batch.

Some days ago, I had my first real-life encounter with a generator. I knew about them, I read a couple of books that mentioned them, saw videos that praised their usefulness, and yet, I had never made a generator. Looking back, I can tell for sure that I certainly could and should have used generators in some of my previous code. But everything has its time. And this week I was destined to make my first generator. And it felt very good!

So… remember that Count Files project that I mentioned some time ago? I have been working on it with Nataliia, and we have implemented a new file search feature that lists all files that have a specific extension in its name (for instance, searching for all *.py files in a project directory, or in your computer). It will show a list of all found file paths and count them for you:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
$ countfiles -fe py

Recursively searching all files with extension .py, ignoring hidden files and directories, in /Users/npk/python-projects/Count-files

/Users/npk/python-projects/Count-files/setup.py (1.5 KiB)
/Users/npk/python-projects/Count-files/countfiles/__init__.py (0.0 B)
/Users/npk/python-projects/Count-files/countfiles/__main__.py (7.9 KiB)
/Users/npk/python-projects/Count-files/countfiles/settings.py (891.0 B)
/Users/npk/python-projects/Count-files/countfiles/utils/__init__.py (0.0 B)
/Users/npk/python-projects/Count-files/countfiles/utils/decorators.py (785.0 B)
/Users/npk/python-projects/Count-files/countfiles/utils/file_handlers.py (6.9 KiB)
/Users/npk/python-projects/Count-files/countfiles/utils/file_preview.py (2.7 KiB)
/Users/npk/python-projects/Count-files/countfiles/utils/word_counter.py (4.3 KiB)
/Users/npk/python-projects/Count-files/tests/test_argument_parser.py (3.2 KiB)
/Users/npk/python-projects/Count-files/tests/test_some_functions.py (6.2 KiB)
/Users/npk/python-projects/Count-files/tests/test_word_counter.py (3.2 KiB)
/Users/npk/python-projects/Count-files/tests/data_for_tests/py_file_for_tests.py (50.0 B)
/Users/npk/python-projects/Count-files/tests/data_for_tests/django_staticfiles_for_test/py_file.py (50.0 B)
/Users/npk/python-projects/Count-files/tests/hidden_py/hidden_test.py (0.0 B)
/Users/npk/python-projects/Count-files/tests/hidden_py/test_file.py (0.0 B)
/Users/npk/python-projects/Count-files/tests/hidden_py/hidden/hidden_test_sub.py (0.0 B)
/Users/npk/python-projects/Count-files/tests/hidden_py/hidden/test_file_sub.py (0.0 B)
/Users/npk/python-projects/Count-files/tests/test_hidden_windows/folder_hidden_for_win/hidden_for_win.py (0.0 B)
/Users/npk/python-projects/Count-files/tests/test_hidden_windows/folder_hidden_for_win/not_hidden.py (0.0 B)

   Found 20 file(s).
   Total combined size: 37.7 KiB.
   Average file size: 1.9 KiB (max: 7.9 KiB, min: 0.0 B).

With a small number of files, it was pretty fast, but when trying to get results for large directories or the whole filesystem, the user would get no feedback for several seconds (or minutes, or even hours). And the memory needed to store that list of paths would keep growing during that time, which could eventually result in unwanted performance or stability issues.

Having completed that first version, and knowing by then the kind of results we wanted, it just made sense to use a generator in there. It was easy: I replaced the return statements by yield statements and made a few other small adjustments, and it was in fact pretty straightforward. Now, that feature uses less than one third of the RAM and takes a few seconds less to complete its operation. But the best part is that it provides imediate and constant feedback to the user, by printing immediately to the screen each matching file path it finds, instead of having to wait for the whole list to be built on memory.

It’s one of those simple changes in the code that end up making a big difference for the user.