Glob Pattern Exclude Directory in Python

The Python glob package comes in handy when searching the filesystem for a list of directories or files matching a given pattern. If you need to find files with a specific extension, prefix, or other standard string, use glob instead of writing long code to scan your filesystem manually.

We will start by covering the basics of glob. To see how directories are excluded, you can go directly to the Section “Excluding Directories in the Search.

The general syntax of glob is given by

glob.glob(<path>, *, recursive=False)

Where <path> is the path we wish to scan, and the recursive parameter determines how deep glob goes down the directory tree. If recursive is True, the pattern “**” will match any files and zero or more directories, subdirectories, and symbolic links to directories (sounds confusing? We will work on an example shortly).

Note: It is only in Python 3.5 and later that recursive globs using “**” is supported. Python 3.10 added more parameters, including “root_dir”, which defines the root directory.

We will use the following directory structure in our examples.

Here are the wildcards that can be used to match patterns in glob (It uses fnmatch.fnmatch() under-the-hood for matching. You can read more on https://docs.python.org/3/library/fnmatch.html#fnmatch.fnmatch)

Pattern Meaning
* Matches everything,
? It matches any single character,
[seq] Matches any character in seq, and,
[!seq] Matches any character not in the seq.

Getting started with glob – Examples

Example 1: Scan the directory for files and folders

from glob import glob
files = glob("./*")
print(files)

Output:

['./test_folder2', './test_folder1', './address1', './control', './glob_script3.py']

As shown in the output, glob picked the files and folders in the current working directory (.) – of course, we could have given a full path name.

Example 2: Scan directories, subdirectories, and files

This is where we need to use recursive globs to scan through all the directories, subdirectories, and files.

from glob import glob
files = glob("./**/*", recursive=True)
print(files)

Output:

['./test_folder2', './test_folder1', './address1', './control', './glob_script3.py', './test_folder2/pc_specs.html', './test_folder2/code_1.69.2-1658162013_amd64.deb', './test_folder1/cutlery.txt', './test_folder1/groceries.json', './address1/address2_map.jpg', './address1/alice.json', './address1/smith.txt', './control/control2', './control/dummy.jpg', './control/dummy.txt', './control/dummy.json', './control/control2/dummy2.txt', './control/control2/dummy2.json']

Example 3: Scan directories for files with a specific extension

The following code snippet will capture JSON files only.

from glob import glob
files = glob('./**/*[.json]', recursive=True)
print(files)

Output:

['./test_folder1/groceries.json', './address1/alice.json', './control/dummy.json', './control/control2/dummy2.json']

Example 4: Scan directories for files with specific extensions excluded

We need to use the “!” character to exclude some files. For example, we exclude files with the json extension in the following code.

from glob import glob
files = glob('./**/*[!.json]', recursive=True)
print(files)

Output:

['./test_folder2', './test_folder1', './address1', './control', './glob_script3.py', './test_folder2/pc_specs.html', './test_folder2/code_1.69.2-1658162013_amd64.deb', './test_folder1/cutlery.txt', './address1/address2_map.jpg', './address1/smith.txt', './control/control2', './control/dummy.jpg', './control/dummy.txt', './control/control2/dummy2.txt']

Excluding Directories in the Search

In this section, let’s see how we can exclude director(y/ies) from the output from globs.

Case 1: Excluding Directories Starting with a Given Character

In this case, we will still use the “!” to match characters. In the following example, [!t] matches any directory that does not start with the letter “t”.

from glob import glob
files = glob('./[!t]*/**', recursive=True)
print(files)

Output:

['./address1/', './address1/address2_map.jpg', './address1/alice.json', './address1/smith.txt', './control/', './control/control2', './control/control2/dummy2.txt', './control/control2/dummy2.json', './control/dummy.jpg', './control/dummy.txt', './control/dummy.json']

The output now excludes directories starting with the letter “t”, that is, test_folder1 and test_folder2.

Another way is to scan files in all the folders and deduct folders we want to exclude (in this example, we use full paths instead of relative).

files = list(set(glob("/home/kiprono/Desktop/root_folder/**/*")) - set(glob("/home/kiprono/Desktop/root_folder/test_folder*/**/*", recursive=True)))
print(files)

Output:

['/home/kiprono/Desktop/root_folder/address1/address2_map.jpg', '/home/kiprono/Desktop/root_folder/address1/alice.json', '/home/kiprono/Desktop/root_folder/control/dummy.txt', '/home/kiprono/Desktop/root_folder/address1/smith.txt', '/home/kiprono/Desktop/root_folder/control/dummy.jpg', '/home/kiprono/Desktop/root_folder/control/control2', '/home/kiprono/Desktop/root_folder/control/dummy.json']

The output shows that all the files in test_folder* folders (test_folder1, and test_folder2) have been excluded from the results.

Case 2: Exclude Multiple Paths with a Given Substring

We want to exclude paths here with a given substring. In the following case, we exclude paths with “test” in them. In doing so, we eliminate two folders, “test_folder1” and “test_folder2“.

# finds all the folders, subfolders, and files in the directory
all_files = glob("/home/kiprono/Desktop/root_folder/**/*")
# picks the results with the substring "test"
filtered_list = [i for i in all_files if "test" not in i]
print(filtered_list)

Output:

['/home/kiprono/Desktop/root_folder/address1/address2_map.jpg', '/home/kiprono/Desktop/root_folder/address1/alice.json', '/home/kiprono/Desktop/root_folder/address1/smith.txt', '/home/kiprono/Desktop/root_folder/control/control2', '/home/kiprono/Desktop/root_folder/control/dummy.jpg', '/home/kiprono/Desktop/root_folder/control/dummy.txt', '/home/kiprono/Desktop/root_folder/control/dummy.json']

Note that this will eliminate folders and subfolders with the substring.

Case 3: Exclude Multiple Directories

This is a case we want to exclude more than one folder. We will use the re package to match regular expressions.

The following example shows how we can filter out file paths with “test_folder1” or “control“. That effectively excludes folders containing any of those substrings.

from glob import glob
import re
def exludeFolders(root_folder, exclude_list):
	# finds all the folders, subfolders, and files in the directory
	all_files = glob(f"{root_folder}/**/*")
	# multiple conditions are joined with | in re. eg a|b matches a and b.
	multiple_paths = "|".join(exclude_list)
	# filter the results using re.
	filtered_list = list(filter(lambda x: not re.search(multiple_paths, x), all_files))
	return filtered_list
root_dir = "/home/kiprono/Desktop/root_folder"
# of course, you can provide full paths
exclude_folders = ["test_folder1", "control"]
# calling excludeFolders() function
filtered = exludeFolders(root_folder=root_dir, exclude_list=exclude_folders)
print(filtered)

Output:

['/home/kiprono/Desktop/root_folder/test_folder2/pc_specs.html', '/home/kiprono/Desktop/root_folder/test_folder2/code_1.69.2-1658162013_amd64.deb', '/home/kiprono/Desktop/root_folder/address1/address2_map.jpg', '/home/kiprono/Desktop/root_folder/address1/alice.json', '/home/kiprono/Desktop/root_folder/address1/smith.txt']

Conclusion

Python glob is an excellent package for scanning filesystems. This article covered the basics of using glob and how to exclude some folders from the search.

We noted that pattern matching in glob is limited; therefore, we needed to go through an extra step to filter the results to exclude folders we wanted to eliminate.