Read a CSV File From a Url in Python

Python provides capabilities to read CSV data directly from the web. This post will show how to load CSV in three ways-: using pandas, urllib, and requests packages in python.

Method A: Loading CSV from URL using Pandas

The read_csv() function can read CSV files directly from an online source. In the following example, the function loads CSV data from GitHub and stores it in a DataFrame df.

import pandas as pd

url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"

df = pd.read_csv(url)

Output (description):

A DataFrame of 891 rows and 12 columns.

When reading data from GitHub, ensure you read from the raw URL. For example, in the above, if you try loading the data directly from https://github.com/datasciencedojo/datasets/blob/master/titanic.csv, the download process will fail. When you land in such a URL, open raw content using the “Raw” button at the top of the DataFrame then you will have the correct URL like the one we used in the above example.

CSV stands for Comma-Separated Values, but in some cases, the values of a CSV file are not comma-delimited. In some cases, other characters like “;”, tab (“\t”), etc. If you attempt to load a CSV that is not comma-separated, pass the “sep” argument to the read_csv() function. For example,

import pandas as pd

# csv not comma-delimited

df1 = pd.read_csv("https://perso.telecom-paristech.fr/eagan/class/igr204/data/cereal.csv")

Output:

ParserError: Error tokenizing data.

import pandas as pd

#data is colon-separated

df1 = pd.read_csv("https://perso.telecom-paristech.fr/eagan/class/igr204/data/cereal.csv", sep=";")

Output (description):

A data table of 78 rows and 16 columns

Method B: Reading CSV from URL using urllib

Python’s urllib module is used to interact with and get URLs from various protocols. To connect to a URL and read its contents, use the urllib.urlopen() function.

Once the response is received, we can utilize the csv.reader() function to parse the received content. The reader allows us to iterate through the CSV row by row.

# Load packages

from urllib.request import urlopen

import csv

import codecs

# the URL

url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"

# fetch the source using urlopen

response = urlopen(url)

# parse the fetched data using csv.read

# codecs allow us to decode the byte response into a string

csvfile = csv.reader(codecs.iterdecode(response, "utf-8"))

# Loop through the rows

# enumerate() allows us to index the iterable

for index, row in enumerate(csvfile):

print(index, row) # do something with row - note: the first row is the header

Output (description):

139 rows of data

Note that the first row is the header in most cases.

Method C: Use requests and csv to load CSV from an Online Source

Like urllib, the requests module can fetch CSV data from a URL. It is a straightforward HTTP library with enhanced error handling.

The get() function in this module can retrieve the response from a link to the content iterated using the iter_lines() function.

The finished data is then parsed using the csv.reader() method, which allows us to iterate through the rows. Here is an example.

import requests

import csv

import codecs

# url - it is a long url, so we break it using \ for better viewing using

url = "https://www.stats.govt.nz/assets/Uploads/Annual-enterprise-survey/\

Annual-enterprise-survey-2021-financial-year-provisional/Download-data/\

annual-enterprise-survey-2021-financial-year-provisional-csv.csv"

# fetch page source using requests.get()

res = requests.get(url)

# create an iterator for all lines

lines_iterator = res.iter_lines()

# create a CSV reader object and encode the content using the codecs module

data = csv.reader(codecs.iterdecode(lines_iterator, encoding="utf-8"), delimiter=",")

# loop through the rows on the "data" list

for index, row in enumerate(data):

print(index, row)

# iterate through rows - note: the first row is the header

Output (description):

41715 rows of data, including the header row

Conclusion

This article covered three ways of loading CSV data from an online source. If you are loading data from the GitHub repository, use the URL for the raw content. For other sources, make sure to get the correct link as well. A simple way to test the link is to click it. If the CSV starts downloading, hovering over the URL, right-click, and “Copy Link” should get you the correct URL.

Codeigo

Just programming

Read a CSV File From a Url in Python

Method A: Loading CSV from URL using Pandas

Method B: Reading CSV from URL using urllib

Method C: Use requests and csv to load CSV from an Online Source

Conclusion