Read zip file in Python

ZIP is an archive file format for the compression of files. A ZIP file contains one or more compressed files. In this article, we will discuss how to read the contents of a ZIP file (without necessarily extracting its content) using the zipfile module in Python.

We will use the libs22.zip file containing the following contents in our examples.

Reading a file inside a ZIP file without extracting it

The following example reads the contents of the file example.sh file inside the libs2 folder in libs22.zip.

from zipfile import ZipFile
# Reading sh file on the ZIP
with ZipFile("libs22.zip") as infile:
    with infile.open("libs2/example.sh") as myfile:
        print(myfile.read().decode())

Output (contents of example.sh):

echo hello_world

In the example above, we have to open the ZIP file using the ZipFile function and then open the file using the built-in function open().

Notice that the ZipFile function can be used as a context manager and, therefore, supports the with-statement.

In the same way, we can also load an image within the ZIP.

# Reading an image file from a ZIP archive
from zipfile import ZipFile
# You may need to install pillow to use PIL.
# pip install pillow
from PIL import Image
with ZipFile("libs22.zip") as infile:
    myfile = infile.open("libs2/img_truth_contors.png")
    img = Image.open(myfile)
    print(img)

Output:

<PIL.PngImagePlugin.PngImageFile image mode=RGBA size=640x480 at 0x7FB288FCC940>

Iterating through all files within a ZIP file

The ZipFile.namelist() inside zipfile package comes in handy in this case. The function returns a list of archive members by name.

from zipfile import ZipFile
# Open the zip file in read-only mode.
archive = ZipFile("libs22.zip", "r")
# This returns the list of all files in the archive.
print(archive.namelist())

Output (truncated):

['libs2/', 'libs2/codebyte.py', 'libs2/Assignment22.pdf', 'libs2/example.sh',..., 'libs2/folder1/', 'libs2/folder1/Assignment22.pdf', 'libs2/folder1/codebyte.py', 'libs2/folder1/example.sh',..., 'libs2/folder1/methodology.txt', 'libs2/folder1/SUSTAINABILITY.docx']

With the list of items returned by ZipFile.namelist() we can iterate through each element, reading its content.

The following example loops through all files inside the zipped file, filtering image files (files ending in .png or .jpg) using the if-statement.

from zipfile import ZipFile
# Open the zip file in read-only mode.
archive = ZipFile("libs22.zip", "r")
# archive.namelist() returns the list of all files in the archive.
for file in archive.namelist():
    # Filter out image files - ending or jpg or .png.
    if file.endswith((".jpg", ".png")):
        # or do something else.
        print(file)

Output:

libs2/IMG20200912142011.jpg
libs2/img_truth_contors.png
libs2/folder1/IMG20200912142011.jpg
libs2/folder1/img_truth_contors.png

Bonus: Extracting any Archived File – .zip, .tar, .tar.xz, .rar, etc

If you want to extract a zipped file, you can use ZipFile.extractall(path=None, pwd=None) function in zipfile module.

The path specifies a different directory to extract to, and pwd is the password used for encrypted files.

from zipfile import ZipFile
ZipFile("libs22.zip").extractall(path="libs22_extracted")

That will extract the contents of libs22.zip into the libs22_extracted folder.

The zipfile module works only with ZIP files. Any other archival format cannot be used with this package. That is where the shutil module comes in.

The latter has unpack_archive() function, which detects the compression format automatically from the filename’s extension.

Here is the general syntax.

import shutil
shutil.unpack_archive(archived_file, target_dir)

In the above syntax, the archived_file is extracted to target_dir. If target_dir is not supplied, the compressed file is unpacked to the current working directory.

The following example extracts libs2.tar.xz TAR file into the libs2_extracted directory.

import shutil
shutil.unpack_archive("libs2.tar.xz", "libs2_extracted")

Conclusion

The zipfile module is an excellent tool for working with ZIP files in Python. It allows a user to work with files without extracting them. In cases where you want to unpack a compressed file, use the zipfile module for ZIP files and shutil for any other compression format.