Count Lines of Code in a Directory using Python

This article discusses how to count lines in all the files in all the folders inside the root directory of our interest. Here is a look at the directory structure we will use to test our code.

Figure 1: The structure of the directory we will use in our examples (Source: Author).

To count the number of lines of any file, we will use two methods:

  1. Using the <obj>.readlines() to read and count lines on an opened file,
  2. Using the pygount module

Method A: Using the readlines() function on an open file

In this method, we need to do two things – first, iterate through all the folders inside the directories to look for the files of interest, and second, open the files and count the lines. Let’s cover these steps separately and then, after that, put it all together.

Step 1: Iteratively search all the folders for files of interest.

In this step, os.walk() function comes in handy. Given a directory, os.walk() yields a tuple with the following elements:

  • root – the directory specified when calling os.walk(),
  • subdirs – all the directories inside the root,
  • files – all files inside the root and the subdirs.

Let’s see an example of how we can traverse the directory shown in Figure 1 at the beginning of the article.

Output:

./count_pygount.py
./count_lines2.py
./count_lines.py
./test_folder/send_email.py
./test_folder/api/setup.py
./test_folder/api/run_api.py

In the example above, the function os.walk(<path>) can pick all the files in all the directories. We also filtered the output based on file extension – we only print out Python (.py) files.

Note that <path> does not have to be the current path. You can provide any path you want. Once we have the files we are interested in, we can go to the next one to read one of the files and count the number of lines in it.

Step 2: Open a file and count the lines

For example, from the example above, let’s read the file ./test_folder/api/run_api.py.

Output:

201

The code snippet above opens the file in reading mode, reads all the lines to a list using the readlines() function, and then finds the number of all the lines using the len() function. Note that f.readlines() reads all the lines, including the empty lines. If you want to skip all blank lines, replace the second line with

This line iterates through the output of f.readlines() and eliminates lines with new line (“\n”) characters only (the empty lines).

Putting it all Together

We can now put together the two concepts discussed in steps 1 and 2. Once again, remember the idea is to count for lines in all the files of interest in all the folders inside the root directory. The following code can accomplish that:

Output:

./count_pygount.py ------> 15
./count_lines2.py ------> 27
./count_lines.py ------> 46
./test_folder/send_email.py ------> 29
./test_folder/api/setup.py ------> 46
./test_folder/api/run_api.py ------> 178
341

There are 341 lines of code in all 6 Python files in the root directory and subfolders, excluding the empty lines. The code above is contained in the ./count_lines2.py file. The code says that this file has 27 non-empty lines (and that is true).

Note: You can use the code above to count the number of lines for other files as well (based on extension); it doesn’t have to be only Python files.

Method B: Using the pygount module

The package is a command line tool that can scan folders for source code files and count the number of lines in it. If you don’t have pygount installed in your system, you can do so using pip or conda by running “pip install pygount” or “conda install pygount“, respectively.

Note: The commands after the dollar ($) must be executed on the terminal (command line) and not directly in a Python script. We will discuss how to run the commands directly on a Python script later in the article.

To get a list of line counts for all the files in a given folder (for example, test_folder in our case), run:

Output:

182 	Markdown    	test_folder 	test_folder/api/README.md
141 	Python  test_folder 	test_folder/api/run_api.py
43  	Python  test_folder 	test_folder/api/setup.py
185 	Gosu	test_folder 	test_folder/learn.gs
0   	__unknown__ 	test_folder 	test_folder/logfile.log
9   	Text only   	test_folder 	test_folder/readme.txt
3   	Text only   	test_folder 	test_folder/requirements.txt
30  	Bash	test_folder 	test_folder/run_wrf_ecmwf.sh
29  	Python  test_folder 	test_folder/send_email.py

The pygount module counts only the code lines, skipping the empty lines and the documentation lines (like comments). That is why this method yields a slightly different result. The output is detailed, but the structure is not so lovely. Therefore, we can pipe the output through a pretty printer to make them look nice. For this case, we can use the Python json module as follows:

Output (truncated):

…
{
  	"emptyCount": 24,
  	"documentationCount": 36,
  	"group": "test_folder",
  	"isCountable": true,
  	"language": "Python",
  	"path": "test_folder/api/run_api.py",
  	"state": "analyzed",
  	"stateInfo": null,
  	"sourceCount": 142
},
{
  	"emptyCount": 8,
  	"documentationCount": 0,
  	"group": "test_folder",
  	"isCountable": true,
  	"language": "Python",
  	"path": "test_folder/api/setup.py",
  	"state": "analyzed",
  	"stateInfo": null,
  	"sourceCount": 43
},
…

This output shows that both methods yield the same result (in most cases). For example, the “test_folder/api/run_api.py” file has 178 non-empty lines in the first method, and pygount yields 142+36 comment lines =178.

To limit the analysis to a specific file, issue the suffix option as follows (there should be no white spaces between suffixes)

Output:

40  	Python  .   	./count_lines.py
16  	Python  .   	./count_lines2.py
12  	Python  .   	./count_pygount.py
1   	Text only   	.   	./ref.txt
141 	Python  .   	./test_folder/api/run_api.py
43  	Python  .   	./test_folder/api/setup.py
9   	Text only   	.   	./test_folder/readme.txt
3   	Text only   	.   	./test_folder/requirements.txt
29  	Python  .   	./test_folder/send_email.py

If you need a summarized result, issue the format option with the value “summary.”

Output:

Reference: You can read more about pygount in its documentation.

Conclusion

This article covered two methods for counting lines in files in a directory. The two methods are essentially the same but take note that the readlines() method practically counts the lines in a file, but pygount scans the source code lines and skips the documentation within a file.