XLSX is a file extension for Microsoft Excel spreadsheets, while CSV is a Comma-Separated Value file.
This article discusses using Python to convert XLSX into CSV using two methods.
- Method 1: Using the pandas package and,
- Method 2: Using openpyxl and csv modules.
We will use the employees.xlsx Excel with two worksheets – names and roles. See the Figure below.
The objective is to learn how to use the two methods stated above to convert any or all of the sheets in the XLSX file into CSV.
Method 1: Using pandas Package
This method involves reading the XLSX file into pandas DataFrame using pandas.read_excel() function and then write the DataFrame into a CSV file using DataFrame.to_csv().
For this method, you may need to install pandas and openpyxl packages using pip as follows:
pip install openpyxl
pip install pandas
Let’s see an example.
1 2 3 4 5 6 7 8 9 10 |
# You may need to install pandas and its dependency - openpyxl and pandas # using pip or conda # pip install openpyxl # pip install pandas import pandas as pd # Change the path of the XLSX file accordingly. df_xlsx = pd.read_excel("employees.xlsx") # Convert the active sheet on the XLSX into employees.csv df_xlsx.to_csv("employees.csv", index=False) |
The code snippet above converts the first sheet only. You can also specify the XLSX worksheet you want to load and convert.
1 2 3 4 5 6 |
# Convert specific sheet on XLSX into CSV import pandas as pd # For pandas < 0.21.0, use sheetname argument, not sheet_name. df_xlsx = pd.read_excel("employees.xlsx", sheet_name="roles") df_xlsx.to_csv("employees_roles.csv") |
Lastly, you can implement a for-loop to convert each sheet into a CSV. We can do that as follows.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
import pandas as pd # Convert all sheets through a for-loop. # Create xlsx_file handler filepath = "employees.xlsx" xlsx_file = pd.ExcelFile(filepath) # List of all sheet names sheets = xlsx_file.sheet_names # Loop through each sheet, load it, and convert it to CSV. for sheet in sheets: # read the worksheet on the XLSX df = pd.read_excel(filepath, sheet_name=sheet) # Convert the sheet into CSV naming it like the worksheet. df.to_csv(f"{sheet}.csv", index=False) |
Method 2: Using openpyxl and csv packages
This method involves opening the XLSX file and writing its content into a CSV row by row. If the openpyxl package is not installed, you can do that using pip by running the following command line.
pip install openpyxl
The following code snippet shows how to convert the first worksheet (or any other sheet) on the XLSX file into CSV.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
import openpyxl import csv # Load in the workbook workbook = openpyxl.load_workbook(filename="employees.xlsx") # Grap the active worksheet - the first sheet, by default. worksheet = workbook.active # use wb[<sheet>] to get specific sheet # eg wb["roles"] # Create the CSV file and write the rows of the XLSX file into it. with open("results.csv", "w", newline="") as infile: c = csv.writer(infile) # Loop through each row of the XLSX file and write the result into CSV for row in worksheet.rows: c.writerow([cell.value for cell in row]) |
Like in Method 1, we can convert all the sheets on the XLSX file into CSV through a for-loop, as shown below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
# Convert all sheets from openpyxl import load_workbook import csv # Load in the workbook workbook = load_workbook(filename="employees.xlsx") # get all sheets on the XLSX file. sheets = workbook.sheetnames # Loop through each sheet and convert it to CSV for sheet in sheets: # Create the CSV file and write the rows to it with open(f"{sheet}.csv", "w", newline="") as infile: c = csv.writer(infile) # Loop all rows of a given sheet. for row in workbook[sheet]: c.writerow([cell.value for cell in row]) |
Conclusion
This article discussed two methods of converting XLSX to CSV in Python: using pandas and openpyxl.
You can choose one of the methods based on the task at hand or the data size.
If you are dealing with many data manipulation tasks, you can go for pandas because it is a great tool for that purpose. Otherwise, if you need to read and write Excel files and maintain Excel format, you should use openpyxl.
Note also that the method using pandas is slightly faster than openpyxl when converting a large XLSX into CSV.