Skip Header Row When Processing CSV in Python

Two common libraries to load CSV files into Python are pandas and the inbuilt csv module. In this article, we will cover how to skip the header row of the CSV file when reading it using either of the said libraries. We will look at three methods to do this:

  1. The next() function when loading CSV data using csv.reader(),
  2. csv.DictReader(),
  3. Pandas skiprows parameter on pandas.read_csv() function

Example

In our examples in this article, we will use the “marks.csv” file, which contains the following content:

mark1,mark2,mark3,gender
12,15,15,M
15,15,16,M
13,13,12,F
10,9,8,
6,9,8,F
12,12,11,F
15,16,15,F

The file, “marks.csv,” is comma-delimited, and the first row contains the headers.

Method 1: Using the next() Function with csv.reader()

The next(iterator, [default]) function retrieves the next item from the iterator. The default value is returned if there’s no next element in the given iterator. Let’s see how to use this function with the csv.reader() to skip the header row.

Output (truncated):

['12', '15', '15', 'M']
['15', '15', '16', 'M']
['13', '13', '12', 'F']
['10', '9', '8', '']
['6', '9', '8', 'F']
['12', '12', '11', 'F']
['15', '16', '15', 'F']

When we use the next() function on the csv reader object immediately after creating the reader, we get the header values. And since we do not want to use the header row, we do nothing with the values returned by the next() function (note that we did not even assign a variable).

Method 2: Using csv.DictReader() Instead of csv.reader()

The csv.DictReader(file, fieldnames=None, *args, **kwargs) operates like a regular csv.reader() but maps the information in each row to a dictionary whose keys are given by the optional fieldnames parameter. If no fieldnames are explicitly defined, the first row of the CSV file will be used (what we want ideally). Here is an example,

Output (truncated):

12 15 15 M
15 15 16 M
13 13 12 F
10 9 8 
6 9 8 F
12 12 11 F
15 16 15 F

Method 3: Using skiprows Parameter in pandas.read_csv()

When reading a CSV file in pandas, you can choose to skip some rows using the skiprows argument. You can issue an integer value to determine the number of lines to be skipped or a list of numbers for the indices of the rows to be skipped.

This method is particularly useful for cases where the header row is not the first line of the CSV file. The contents of “marks2.csv” shown below depict such a scenario.

0,1,2,3
mark1,mark2,mark3,gender
12,15,15,M
15,15,16,M
13,13,12,F
10,9,8,
6,9,8,F
12,12,11,F
15,16,15,F

In such a case, pandas.read_csv() function will use the values on the first row as the column values. That is not what we want (ideally). We want to use the second line as the header row. In that case, we can issue the skiprows=1 to skip the first row as follows.

Output (formatted and truncated for better viewing):

If you want to skip the header row on the “marks.csv” file, you can still use the skiprows argument. However, pandas will use the next line as the columns. Let’s see it as an example.

Output (truncated and formatted for a better view):

As earlier said, pandas will assign the next row as the columns after skipping the first row. If this is not what you want, you can cast the DataFrame into another data structure. In the following snippets, we convert the original pandas DataFrame into a list of lists and a list of dictionaries, respectively.

Output (truncated):

[[12, 15, 15, 'M'], [15, 15, 16, 'M'], ..., [15, 16, 15, 'F']]
[{'mark1': 12, 'mark2': 15, 'mark3': 15, 'gender': 'M'}, {'mark1': 15, 'mark2': 15, 'mark3': 16, 'gender': 'M'}, …, {'mark1': 15, 'mark2': 16, 'mark3': 15, 'gender': 'F'}]

In the first case, we skip the original header row by casting the values of our Dataframe, df, into a list of lists (this is pretty similar to what we did in Method 1). Converting the DataFrame into a list of dictionaries matches what we did in Method 2. The header values are only used as the keys for accessing the values.

Conclusion

We have discussed skipping the header rows when processing CSV using either csv or pandas libraries. We saw that the next() function could effectively skip the header row when using a csv.reader().

Alternatively, csv.DictReader() can be used to read CSV files as a dictionary and use the header values as keys. Lastly, we discussed how to use the skiprows parameter in pandas.read_csv() to skip some rows.

We further noticed that pandas will still assign the first row of the resulting data as the column after skipping some rows. For that reason, we discussed how to cast a pandas DataFrame into a list or dictionaries to skip the header row.