Ignoring Comments in a CSV File in Python

Comments in CSV files are those lines or parts of lines preceded by a specific character. For example, the following CSV data have comments introduced by the “#” symbol (the data is saved as employees-roles.csv).

# This data should be joined with employee names
# extraction started
employee_id,Role,Department
1,CEO,Management
2,CFO,Management
3,Managing director,Management# Officer
# Data point couldn't be fetched
5,Data analyst,Data
6,Software developer,Technology
# extraction completed
# this is the end

This article discusses two methods for removing commented content, like lines 1, 2, 6 (partly commented), 7, 10, and 11, in the CSV data above.

Method 1: Using the Pandas Package

The pandas.read_csv() function has a “comments” attribute that can be used to specify and remove comments in the loaded CSV file.

You may need to install or upgrade pandas using pip with the following command:

Note: the code below was tested on pandas v2.0.2.

The following code can be used to remove comments on the CSV file given above.

Output:

The output above shows that all comments were ignored – including the partly commented part at index=2 of the output. If you want to only ignore lines that are wholly commented, then Method 2 should serve the purpose.

Method 2: Using csv package

If you are using csv package to manipulate your CSV data, then this method is for you.

Let’s start with the case when we want to only ignore lines that were fully commented out.

Output:

['employee_id', 'Role', 'Department']
['1', 'CEO', 'Management']
['2', 'CFO', 'Management']
['3', 'Managing director', 'Management# Officer']
['5', 'Data analyst', 'Data']
['6', 'Software developer', 'Technology']

As shown in the output, the code above only removed lines that were commented out – it did not remove partly commented lines like in line 4 of the output.

If you want to remove all commented content, the following code should suffice.

Output:

['employee_id', 'Role', 'Department']
['1', 'CEO', 'Management']
['2', 'CFO', 'Management']
['3', 'Managing director', 'Management']
['5', 'Data analyst', 'Data']
['6', 'Software developer', 'Technology']

You can also rewrite the code above, as shown below. This approach is particularly useful when you intend to read many CSV files and use one function to remove comments for each.

Output:

['employee_id', 'Role', 'Department']
['1', 'CEO', 'Management']
['2', 'CFO', 'Management']
['3', 'Managing director', 'Management']
['5', 'Data analyst', 'Data']
['6', 'Software developer', 'Technology']

Conclusion

This article discussed using pandas and csv packages to ignore comments when loading CSV files in Python. Method 1 (using pandas) ignores comments anywhere in the file. In Method 2, we covered how to use csv package in two cases – to ignore lines that are wholly commented out and/or parts of lines that are partly commented out.