Read a CSV File From a Url in Python

Python provides capabilities to read CSV data directly from the web. This post will show how to load CSV in three ways-: using pandas, urllib, and requests packages in python.

Method A: Loading CSV from URL using Pandas

The read_csv() function can read CSV files directly from an online source. In the following example, the function loads CSV data from GitHub and stores it in a DataFrame df.

Output (description):

A DataFrame of 891 rows and 12 columns.

When reading data from GitHub, ensure you read from the raw URL. For example, in the above, if you try loading the data directly from https://github.com/datasciencedojo/datasets/blob/master/titanic.csv, the download process will fail. When you land in such a URL, open raw content using the “Raw” button at the top of the DataFrame then you will have the correct URL like the one we used in the above example.

CSV stands for Comma-Separated Values, but in some cases, the values of a CSV file are not comma-delimited. In some cases, other characters like “;”, tab (“\t”), etc. If you attempt to load a CSV that is not comma-separated, pass the “sep” argument to the read_csv() function. For example,

Output:

ParserError: Error tokenizing data.

Output (description):

A data table of 78 rows and 16 columns

Method B: Reading CSV from URL using urllib

Python’s urllib module is used to interact with and get URLs from various protocols. To connect to a URL and read its contents, use the urllib.urlopen() function.

Once the response is received, we can utilize the csv.reader() function to parse the received content. The reader allows us to iterate through the CSV row by row.

Output (description):

139 rows of data

Note that the first row is the header in most cases.

Method C: Use requests and csv to load CSV from an Online Source

Like urllib, the requests module can fetch CSV data from a URL. It is a straightforward HTTP library with enhanced error handling.

The get() function in this module can retrieve the response from a link to the content iterated using the iter_lines() function.

The finished data is then parsed using the csv.reader() method, which allows us to iterate through the rows. Here is an example.

Output (description):

41715 rows of data, including the header row

Conclusion

This article covered three ways of loading CSV data from an online source. If you are loading data from the GitHub repository, use the URL for the raw content. For other sources, make sure to get the correct link as well. A simple way to test the link is to click it. If the CSV starts downloading, hovering over the URL, right-click, and “Copy Link” should get you the correct URL.