Read zip file in Python

ZIP is an archive file format for the compression of files. A ZIP file contains one or more compressed files. In this article, we will discuss how to read the contents of a ZIP file (without necessarily extracting its content) using the zipfile module in Python.

We will use the libs22.zip file containing the following contents in our examples.

Reading a file inside a ZIP file without extracting it

The following example reads the contents of the file example.sh file inside the libs2 folder in libs22.zip.

Output (contents of example.sh):

echo hello_world

In the example above, we have to open the ZIP file using the ZipFile function and then open the file using the built-in function open().

Notice that the ZipFile function can be used as a context manager and, therefore, supports the with-statement.

In the same way, we can also load an image within the ZIP.

Output:

<PIL.PngImagePlugin.PngImageFile image mode=RGBA size=640x480 at 0x7FB288FCC940>

Iterating through all files within a ZIP file

The ZipFile.namelist() inside the zipfile package comes in handy in this case. The function returns a list of archive members by name.

Output (truncated):

['libs2/', 'libs2/codebyte.py', 'libs2/Assignment22.pdf', 'libs2/example.sh',..., 'libs2/folder1/', 'libs2/folder1/Assignment22.pdf', 'libs2/folder1/codebyte.py', 'libs2/folder1/example.sh',..., 'libs2/folder1/methodology.txt', 'libs2/folder1/SUSTAINABILITY.docx']

With the list of items returned by ZipFile.namelist() we can iterate through each element, reading its content.

The following example loops through all files inside the zipped file, filtering image files (files ending in .png or .jpg) using the if-statement.

Output:

libs2/IMG20200912142011.jpg
libs2/img_truth_contors.png
libs2/folder1/IMG20200912142011.jpg
libs2/folder1/img_truth_contors.png

Bonus: Extracting any Archived File – .zip, .tar, .tar.xz, .rar, etc

If you want to extract a zipped file, you can use ZipFile.extractall(path=None, pwd=None) function in zipfile module.

The path specifies a different directory to extract to, and pwd is the password used for encrypted files.

That will extract the contents of libs22.zip into the libs22_extracted folder.

The zipfile module works only with ZIP files. Any other archival format cannot be used with this package. That is where the shutil module comes in.

The latter has an unpack_archive() function, which detects the compression format automatically from the filename’s extension.

Here is the general syntax.

In the above syntax, the archived_file is extracted to target_dir. If target_dir is not supplied, the compressed file is unpacked to the current working directory.

The following example extracts libs2.tar.xz TAR file into the libs2_extracted directory.

Conclusion

A zipfile module is an excellent tool for working with ZIP files in Python. It allows a user to work with files without extracting them. In cases where you want to unpack a compressed file, use the zipfile module for ZIP files and shutil for any other compression format.