Update CSV file in Python

The pandas and csv libraries are the most common modules for handling CSV data. This article will discuss how to use these libraries to update a CSV file.

We will see how to edit a CSV file by inserting a row, updating a row (or rows) based on a column value(s), or deleting a row (or rows) based on the values of a given column. We will also see how to insert a column into a CSV file.

The examples shown in this article will use a CSV file named “employees.csv” with the following contents:

ID,Position,Salary,Joining Date,Grade
2001,Junior web developer,5600,8/27/2020,AB
2006,Accountant,7200,7/22/2019,BA
3014,Senior Data Scientist,7800,5/1/2013,AA
1946,HR Manager,6200,9/9/2021,DB
1947,HR Manager 3,6250,1/3/2016,DC

The file has six columns and five rows and is saved in the same directory as the script containing our code.

Note: Each code snippet shown will start with “employees.csv” containing the content above.

Inserting a Row into a CSV File

In csv library, we can easily write a row into a CSV file using csv.writer() object. In this case, we need to know the columns on the original data and their order.

Output (contents of employees.csv after update):

ID,Position,Salary,Joining Date,Grade
2001,Junior web developer,5600,8/27/2020,AB
2006,Accountant,7200,7/22/2019,BA
3014,Senior Data Scientist,7800,5/1/2013,AA
1946,HR Manager,6200,9/9/2021,DB
1947,HR Manager 3,6250,1/3/2016,DC
2008,Procurement officer,6100,2/8/2011,CA

The above code snippet opens the CSV file in appending mode (“a”). That means the row we are writing is added as the last row.

Alternatively, you can use pandas to add a row to a CSV file. There are two ways to do that.

  • Opening CSV file in append mode, then update it, or,
  • Using pandas.DataFrame.loc[ ] function.

The first method works like open() and csv.writer() in the previous example.

We need to open the CSV file in appending mode and update it with a new DataFrame containing the row(s) we wish to add. Here is an example.

Output:

ID,Position,Salary,Joining Date,Grade
2001,Junior web developer,5600,8/27/2020,AB
2006,Accountant,7200,7/22/2019,BA
3014,Senior Data Scientist,7800,5/1/2013,AA
1946,HR Manager,6200,9/9/2021,DB
1947,HR Manager 3,6250,1/3/2016,DC
1901,Procurement officer 2,6100,12/8/2010,CA
1902,HR,5700,12/8/2010,DA

In the second method, it is essential to get the concept of indexing correctly—the function pandas.DataFrame.loc uses the index to identify the position of the new row. If the index label does not exist, it is created; otherwise, the existing row values are replaced.

Output:

ID,Position,Salary,Joining Date,Grade
2001,Junior web developer,5600,8/27/2020,AB
2006,Accountant,7200,7/22/2019,BA
3014,Senior Data Scientist,7800,5/1/2013,AA
1946,HR Manager,6200,9/9/2021,DB
1947,HR Manager 3,6250,1/3/2016,DC
1963,Operations Manager,17000,10/3/2007,AS

Editing a Row Based on Conditions

This section discusses how to update a given row based on column values. Suppose we want to update the row where ID=1947 using csv module. We need to read the file row by row, update the row that matches the condition (ID=1947), and write the changes to the same file.

Reading and writing back the update to the same opened file is not recommended. Instead, it is a best practice to write the update into a temp file and use the contents of this file to update the original file after the update is complete. Here is a good example.

Output (stdout and the updated CSV file):

Updating row with ID= 1947
ID,Position,Salary,Joining Date,
2001,Junior web developer,5600,8/27/2020,
2006,Accountant,7200,7/22/2019,
3014,Senior Data Scientist,7800,5/1/2013,
1946,HR Manager,6200,9/9/2021,
1947,Manging Director,22000,9/24/2009,

And if you want to update a cell using pandas, then pandas.DataFrame.at should suffice. Here is an example.

Output:

ID,Position,Salary,Joining Date,Grade
2001,Junior web developer,5600,8/27/2020,AB
2006,Senior Accountant,7200,7/22/2019,BA
3014,Senior Data Scientist,7800,5/1/2013,AA
1946,HR Manager,6200,9/9/2021,DB
1947,HR Manager 3,6250,1/3/2016,DC

Deleting a Row in a CSV File Using pandas

We can delete a row in pandas by simply filtering a DataFrame based on some conditions. We need to load the CSV file into a pandas DataFrame, apply the filters, and then save the resulting DataFrame back to a CSV file. In the following example, we want to keep the rows with ID=1946 or 2001 only and discard the rest.

Output:
ID,Position,Salary,Joining Date,Grade
2001,Junior web developer,5600,8/27/2020,AB
1946,HR Manager,6200,9/9/2021,DB

Replace Cell Values in a CSV

The pandas.DataFrame.replace() function is used to replace cell values. The example below shows how to replace two values

Output:

ID,Position,Salary,Joining Date,Grade
2001,Web Developer,5600,8/27/2020,AB
2006,Accountant,7200,7/22/2019,BA
3014,Senior Data Scientist,7800,5/1/2013,AA
1946,Human Resource Manager,6200,9/9/2021,DB
1947,HR Manager 3,6250,1/3/2016,DC

As mentioned above, the replace() function replaces cell values and not substrings of the cell values. That is why the cell value “HR Manager” is replaced, but the substring “HR Manager 3” in the last row is not.

Insert Column Based on Condition(s)

In this case, we want to create a new column with values derived based on another column’s values. In particular, we want to create a column called “Above7000” which has the value “Yes” if Salary>7000 and “No” otherwise. We use NumPy to leverage the vectorization of the computation property.

Output:

ID,Position,Salary,Joining Date,Grade,Above_7000
2001,Junior web developer,5600,8/27/2020,AB,No
2006,Accountant,7200,7/22/2019,BA,Yes
3014,Senior Data Scientist,7800,5/1/2013,AA,Yes
1946,HR Manager,6200,9/9/2021,DB,No
1947,HR Manager 3,6250,1/3/2016,DC,No

The numpy.select(conditions, choicelist, default=default_value) returns an array drawn from elements in choicelist, depending on conditions. If no condition is met, the function returns the default_value given, otherwise 0.

Conclusion

This article discussed different ways of updating a CSV file using csv and pandas modules.

In most cases, we had to load the original CSV file, make the updates, and write the updated data into the file. When dealing with a CSV with a few rows, most methods discussed here are fast enough, but as the CSV file grows, some methods become slow.

In such a case, storing data in a database and updating it from there is recommended.