Unzip Gz File Using Python

There are so many compression schemes available for different platforms. This article will focus on extracting .gz, .tar.gz, and .tgz files using Python (we will explain these extensions shortly). We will also cover how to read files from an archive without extracting them into a disk.

Before we do that, however, let’s briefly define gz compression and other related terms.

The .gz, .tar.gz and .tgz

.gz, or GNU Zip, is a primary compression scheme used by UNIX devices. This compression format is officially called gzip.

On the other hand, tape archive (tar) is an archival format used for UNIX-like systems. It is generally used with compression formats like gzip, xz or bzip2.

When tar is used with gzip compression are compiled, we get a “tarball” file format. Tarball files usually come with .tar.gz or .tgz file extensions.

In simple terms, a tar file is an archive containing multiple files put into one, whereas a gz file is a compressed file.

Note: All the code examples used in this post have been tested on Windows and Linux (Debian). That means they should be working across all platforms, even Mac.

Extracting .gz Files

This section discusses extracting single or multiple GZIP files in a folder.

Example 1: Unzipping a single GZIP file

The unzipping task, in this case, happens in two steps – first, open the GZIP file using the gzip package, and second, write the file’s contents into another file using shutil.

The following example shows how to extract a gzipped README markdown file.

The code example is shorted into this snippet.

Example 2: Extracting multiple GZIP files in a folder

This example shows how to extract all .gz files in a given directory. The idea is to loop through all files in the given folder extracting the GZIP files as discussed in Example 1.

Extracting .tar.gz or .tgz Files

As said before, a tarball (.tar.gz or .tgz files) is an archive consisting of multiple files put together into one. The idea is to extract the archive to get a folder containing some files.

The tarfile module comes in handy in this case. The following syntax shows how to use the tool to read tarball archives.

If the <destination directory> is not provided in the code example above, the archive will be extracted into the current working directory.

Reading Compressed/Archived Files without Extracting them into a Disk

The examples above involve writing the extracted files into a disk. What if you don’t want to do that? You only want to read the files inside the archive (without extracting).

The tools we have discussed – gzip and tarball packages – can also serve this purpose. Here is an example code used to read README.md file using gzip.open() function.

If you want to read tar file contents without untarring it, you can use <tar>.extractfile(<member>) function from the tarfile module, as shown below.

Key functions in the code above:

  • <tar>.getmembers() returns a list of all directories, subdirectories, and files in <tar>,
  • <tar>.extractfile(member) extracts the member (without writing it into the disk).

Conclusion

This guide discussed extracting GZIP compressed gz files and tarballs (files with .tar.gz or .tgz extensions). We showed that gzipped files could be extracted using gzip, shutil modules, and tarballs can be extracted using the tarfile package.

We also showed that you could read the contents of compressed/archived files without extracting them into the disk.