Two common libraries to load CSV files into Python are pandas and the inbuilt csv module. In this article, we will cover how to skip the header row of the CSV file when reading it using either of the said libraries. We will look at three methods to do this:
- The next() function when loading CSV data using csv.reader(),
- csv.DictReader(),
- Pandas skiprows parameter on pandas.read_csv() function
Example
In our examples in this article, we will use the “marks.csv” file, which contains the following content:
mark1,mark2,mark3,gender 12,15,15,M 15,15,16,M 13,13,12,F 10,9,8, 6,9,8,F 12,12,11,F 15,16,15,F
The file, “marks.csv,” is comma-delimited, and the first row contains the headers.
Method 1: Using the next() Function with csv.reader()
The next(iterator, [default]) function retrieves the next item from the iterator. The default value is returned if there’s no next element in the given iterator. Let’s see how to use this function with the csv.reader() to skip the header row.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
import csv # Opening marks.csv in read mode ("r") with open("marks.csv", "r") as infile: # Creating a csv reader iterator object csv_reader = csv.reader(infile, delimiter=",") # The next function will call the next values of the iterator # which are the header values. By ignoring the, returned values # we effectively skip the header row next(csv_reader, None) # We are now printing all rows except the first row of the csv for row in csv_reader: print(row) |
Output (truncated):
['12', '15', '15', 'M'] ['15', '15', '16', 'M'] ['13', '13', '12', 'F'] ['10', '9', '8', ''] ['6', '9', '8', 'F'] ['12', '12', '11', 'F'] ['15', '16', '15', 'F']
When we use the next() function on the csv reader object immediately after creating the reader, we get the header values. And since we do not want to use the header row, we do nothing with the values returned by the next() function (note that we did not even assign a variable).
Method 2: Using csv.DictReader() Instead of csv.reader()
The csv.DictReader(file, fieldnames=None, *args, **kwargs) operates like a regular csv.reader() but maps the information in each row to a dictionary whose keys are given by the optional fieldnames parameter. If no fieldnames are explicitly defined, the first row of the CSV file will be used (what we want ideally). Here is an example,
1 2 3 4 5 6 7 8 9 10 11 12 13 |
import csv # Open the csv file in read mode ("r") with open("marks.csv", "r") as infile: # The following csv reader object will parse csv rows as a dictionary # with the header row values as the keys. csv_reader = csv.DictReader(infile, delimiter=",") # In that case, you can iterate over the rows on the object without # actually, using the header row - we are just using them as the dict keys # # iterate through the rows using header values as the keys for row in csv_reader: print(row["mark1"], row["mark2"], row["mark3"], row["gender"]) |
Output (truncated):
12 15 15 M 15 15 16 M 13 13 12 F 10 9 8 6 9 8 F 12 12 11 F 15 16 15 F
Method 3: Using skiprows Parameter in pandas.read_csv()
When reading a CSV file in pandas, you can choose to skip some rows using the skiprows argument. You can issue an integer value to determine the number of lines to be skipped or a list of numbers for the indices of the rows to be skipped.
This method is particularly useful for cases where the header row is not the first line of the CSV file. The contents of “marks2.csv” shown below depict such a scenario.
0,1,2,3 mark1,mark2,mark3,gender 12,15,15,M 15,15,16,M 13,13,12,F 10,9,8, 6,9,8,F 12,12,11,F 15,16,15,F
In such a case, pandas.read_csv() function will use the values on the first row as the column values. That is not what we want (ideally). We want to use the second line as the header row. In that case, we can issue the skiprows=1 to skip the first row as follows.
1 2 3 4 |
import pandas as pd df = pd.read_csv("marks2.csv", skiprows=1) print(df) |
Output (formatted and truncated for better viewing):
If you want to skip the header row on the “marks.csv” file, you can still use the skiprows argument. However, pandas will use the next line as the columns. Let’s see it as an example.
1 2 3 4 5 6 7 |
import pandas as pd # skiprows=1 skips the first row. You can also use skiprows=[0] to skip row at index 0 (first row of the data). # use skiprows=lambda x: x in range(0,5) to skip the first 5 rows # and skiprows=lambda x: x in range(0, 5, 2) to skip even rows df = pd.read_csv("marks.csv", skiprows=1) #or skiprows=[0] print(df) |
Output (truncated and formatted for a better view):
As earlier said, pandas will assign the next row as the columns after skipping the first row. If this is not what you want, you can cast the DataFrame into another data structure. In the following snippets, we convert the original pandas DataFrame into a list of lists and a list of dictionaries, respectively.
1 2 3 4 5 |
import pandas as pd df = pd.read_csv("marks.csv") print(df.values.tolist()) print(df.to_dict(orient="records")) |
Output (truncated):
[[12, 15, 15, 'M'], [15, 15, 16, 'M'], ..., [15, 16, 15, 'F']] [{'mark1': 12, 'mark2': 15, 'mark3': 15, 'gender': 'M'}, {'mark1': 15, 'mark2': 15, 'mark3': 16, 'gender': 'M'}, …, {'mark1': 15, 'mark2': 16, 'mark3': 15, 'gender': 'F'}]
In the first case, we skip the original header row by casting the values of our Dataframe, df, into a list of lists (this is pretty similar to what we did in Method 1). Converting the DataFrame into a list of dictionaries matches what we did in Method 2. The header values are only used as the keys for accessing the values.
Conclusion
We have discussed skipping the header rows when processing CSV using either csv or pandas libraries. We saw that the next() function could effectively skip the header row when using a csv.reader().
Alternatively, csv.DictReader() can be used to read CSV files as a dictionary and use the header values as keys. Lastly, we discussed how to use the skiprows parameter in pandas.read_csv() to skip some rows.
We further noticed that pandas will still assign the first row of the resulting data as the column after skipping some rows. For that reason, we discussed how to cast a pandas DataFrame into a list or dictionaries to skip the header row.