This article discusses how to count lines in all the files in all the folders inside the root directory of our interest. Here is a look at the directory structure we will use to test our code.
To count the number of lines of any file, we will use two methods:
- Using the <obj>.readlines() to read and count lines on an opened file,
- Using the pygount module
Method A: Using the readlines() function on an open file
In this method, we need to do two things – first, iterate through all the folders inside the directories to look for the files of interest, and second, open the files and count the lines. Let’s cover these steps separately and then, after that, put it all together.
Step 1: Iteratively search all the folders for files of interest.
In this step, os.walk() function comes in handy. Given a directory, os.walk() yields a tuple with the following elements:
- root – the directory specified when calling os.walk(),
- subdirs – all the directories inside the root,
- files – all files inside the root and the subdirs.
Let’s see an example of how we can traverse the directory shown in Figure 1 at the beginning of the article.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
import os # loop through all subfolders and files on the current working dir for root, subdirs, files in os.walk("."): # loop through the files for filename in files: # if file does not end with .py skip it and start # the loop to check the next file if filename.endswith(".py"): continue # file path file = os.path.join(root, filename) print(file) |
Output:
./count_pygount.py ./count_lines2.py ./count_lines.py ./test_folder/send_email.py ./test_folder/api/setup.py ./test_folder/api/run_api.py
In the example above, the function os.walk(<path>) can pick all the files in all the directories. We also filtered the output based on file extension – we only print out Python (.py) files.
Note that <path> does not have to be the current path. You can provide any path you want. Once we have the files we are interested in, we can go to the next one to read one of the files and count the number of lines in it.
Step 2: Open a file and count the lines
For example, from the example above, let’s read the file ./test_folder/api/run_api.py.
1 2 3 |
with open(file="./test_folder/api/run_api.py", mode="r") as f: all_lines = f.readlines() print(len(all_lines)) |
Output:
201
The code snippet above opens the file in reading mode, reads all the lines to a list using the readlines() function, and then finds the number of all the lines using the len() function. Note that f.readlines() reads all the lines, including the empty lines. If you want to skip all blank lines, replace the second line with
1 |
all_lines = [i for i in f.readlines() if i.strip()] |
This line iterates through the output of f.readlines() and eliminates lines with new line (“\n”) characters only (the empty lines).
Putting it all Together
We can now put together the two concepts discussed in steps 1 and 2. Once again, remember the idea is to count for lines in all the files of interest in all the folders inside the root directory. The following code can accomplish that:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
import os def countlines(directory = "./", lines=0, ext=".py", skip_blank=False): # initialize lines to 0 at the start # loop through all subfolders and files on the directory for root, dirs, files in os.walk(directory): # loop through the files for filename in files: # if file does not end with ext skip it and start # the loop to check the next file if not filename.endswith(ext): continue # relative path to the file file = os.path.join(root, filename) # Open the file in read mode (r) with open(file, "r", encoding="utf-8") as f: if skip_blank: # skip blank spaces. i.strip() captures non-blank. new_lines = len([i for i in f.readlines() if i.strip()]) else: # count all the lines including blank ones new_lines = len(f.readlines()) # add the new_lines found on the current file to the total (lines) lines = lines + new_lines print(file,"------>",new_lines) return lines # call the function print(countlines(directory="./",ext="py", skip_blank=True)) |
Output:
./count_pygount.py ------> 15 ./count_lines2.py ------> 27 ./count_lines.py ------> 46 ./test_folder/send_email.py ------> 29 ./test_folder/api/setup.py ------> 46 ./test_folder/api/run_api.py ------> 178 341
There are 341 lines of code in all 6 Python files in the root directory and subfolders, excluding the empty lines. The code above is contained in the ./count_lines2.py file. The code says that this file has 27 non-empty lines (and that is true).
Note: You can use the code above to count the number of lines for other files as well (based on extension); it doesn’t have to be only Python files.
Method B: Using the pygount module
The package is a command line tool that can scan folders for source code files and count the number of lines in it. If you don’t have pygount installed in your system, you can do so using pip or conda by running “pip install pygount” or “conda install pygount“, respectively.
Note: The commands after the dollar ($) must be executed on the terminal (command line) and not directly in a Python script. We will discuss how to run the commands directly on a Python script later in the article.
To get a list of line counts for all the files in a given folder (for example, test_folder in our case), run:
1 |
$ pygount test_folder/ |
Output:
182 Markdown test_folder test_folder/api/README.md 141 Python test_folder test_folder/api/run_api.py 43 Python test_folder test_folder/api/setup.py 185 Gosu test_folder test_folder/learn.gs 0 __unknown__ test_folder test_folder/logfile.log 9 Text only test_folder test_folder/readme.txt 3 Text only test_folder test_folder/requirements.txt 30 Bash test_folder test_folder/run_wrf_ecmwf.sh 29 Python test_folder test_folder/send_email.py
The pygount module counts only the code lines, skipping the empty lines and the documentation lines (like comments). That is why this method yields a slightly different result. The output is detailed, but the structure is not so lovely. Therefore, we can pipe the output through a pretty printer to make them look nice. For this case, we can use the Python json module as follows:
1 |
$ pygount test_folder/ --format json | python -m json.tool |
Output (truncated):
… { "emptyCount": 24, "documentationCount": 36, "group": "test_folder", "isCountable": true, "language": "Python", "path": "test_folder/api/run_api.py", "state": "analyzed", "stateInfo": null, "sourceCount": 142 }, { "emptyCount": 8, "documentationCount": 0, "group": "test_folder", "isCountable": true, "language": "Python", "path": "test_folder/api/setup.py", "state": "analyzed", "stateInfo": null, "sourceCount": 43 }, …
This output shows that both methods yield the same result (in most cases). For example, the “test_folder/api/run_api.py” file has 178 non-empty lines in the first method, and pygount yields 142+36 comment lines =178.
To limit the analysis to a specific file, issue the suffix option as follows (there should be no white spaces between suffixes)
1 |
$ pygount --suffix=py,txt . |
Output:
40 Python . ./count_lines.py 16 Python . ./count_lines2.py 12 Python . ./count_pygount.py 1 Text only . ./ref.txt 141 Python . ./test_folder/api/run_api.py 43 Python . ./test_folder/api/setup.py 9 Text only . ./test_folder/readme.txt 3 Text only . ./test_folder/requirements.txt 29 Python . ./test_folder/send_email.py
If you need a summarized result, issue the format option with the value “summary.”
1 |
$ pygount --suffix=py,txt --format=summary . |
Output:
Reference: You can read more about pygount in its documentation.
Conclusion
This article covered two methods for counting lines in files in a directory. The two methods are essentially the same but take note that the readlines() method practically counts the lines in a file, but pygount scans the source code lines and skips the documentation within a file.