ZIP is an archive file format for the compression of files. A ZIP file contains one or more compressed files. In this article, we will discuss how to read the contents of a ZIP file (without necessarily extracting its content) using the zipfile module in Python.
We will use the libs22.zip file containing the following contents in our examples.
The following example reads the contents of the file example.sh file inside the libs2 folder in libs22.zip.
from zipfile import ZipFile # Reading sh file on the ZIP with ZipFile("libs22.zip") as infile: with infile.open("libs2/example.sh") as myfile: print(myfile.read().decode())
Output (contents of example.sh):
In the example above, we have to open the ZIP file using the ZipFile function and then open the file using the built-in function open().
Notice that the ZipFile function can be used as a context manager and, therefore, supports the with-statement.
In the same way, we can also load an image within the ZIP.
# Reading an image file from a ZIP archive from zipfile import ZipFile # You may need to install pillow to use PIL. # pip install pillow from PIL import Image with ZipFile("libs22.zip") as infile: myfile = infile.open("libs2/img_truth_contors.png") img = Image.open(myfile) print(img)
<PIL.PngImagePlugin.PngImageFile image mode=RGBA size=640x480 at 0x7FB288FCC940>
The ZipFile.namelist() inside zipfile package comes in handy in this case. The function returns a list of archive members by name.
from zipfile import ZipFile # Open the zip file in read-only mode. archive = ZipFile("libs22.zip", "r") # This returns the list of all files in the archive. print(archive.namelist())
['libs2/', 'libs2/codebyte.py', 'libs2/Assignment22.pdf', 'libs2/example.sh',..., 'libs2/folder1/', 'libs2/folder1/Assignment22.pdf', 'libs2/folder1/codebyte.py', 'libs2/folder1/example.sh',..., 'libs2/folder1/methodology.txt', 'libs2/folder1/SUSTAINABILITY.docx']
With the list of items returned by ZipFile.namelist() we can iterate through each element, reading its content.
The following example loops through all files inside the zipped file, filtering image files (files ending in .png or .jpg) using the if-statement.
from zipfile import ZipFile # Open the zip file in read-only mode. archive = ZipFile("libs22.zip", "r") # archive.namelist() returns the list of all files in the archive. for file in archive.namelist(): # Filter out image files - ending or jpg or .png. if file.endswith((".jpg", ".png")): # or do something else. print(file)
libs2/IMG20200912142011.jpg libs2/img_truth_contors.png libs2/folder1/IMG20200912142011.jpg libs2/folder1/img_truth_contors.png
If you want to extract a zipped file, you can use ZipFile.extractall(path=None, pwd=None) function in zipfile module.
The path specifies a different directory to extract to, and pwd is the password used for encrypted files.
from zipfile import ZipFile ZipFile("libs22.zip").extractall(path="libs22_extracted")
That will extract the contents of libs22.zip into the libs22_extracted folder.
The zipfile module works only with ZIP files. Any other archival format cannot be used with this package. That is where the shutil module comes in.
The latter has unpack_archive() function, which detects the compression format automatically from the filename’s extension.
Here is the general syntax.
import shutil shutil.unpack_archive(archived_file, target_dir)
In the above syntax, the archived_file is extracted to target_dir. If target_dir is not supplied, the compressed file is unpacked to the current working directory.
The following example extracts libs2.tar.xz TAR file into the libs2_extracted directory.
import shutil shutil.unpack_archive("libs2.tar.xz", "libs2_extracted")
The zipfile module is an excellent tool for working with ZIP files in Python. It allows a user to work with files without extracting them. In cases where you want to unpack a compressed file, use the zipfile module for ZIP files and shutil for any other compression format.