Reading and Writing CSV files in Python
Comma Separated Values (CSV) files are a common file format for storing tabular data such as spreadsheets or databases. The CSV format is readable by many applications and languages including Python. In this comprehensive guide, we will cover everything you need to know about handling CSV files in Python.
What is a CSV File?
A CSV file stores tabular data (numbers and text) in plain text format. Each row in the CSV file is a data record and each record consists of one or more fields, separated by commas. For example:
Name,Age,City
John,20,New York
Sarah,24,Chicago
The first row usually contains the column names or headers. The following rows contain the actual data values. CSV files are a popular format for exporting and importing data between applications.
Reading CSV Files in Python
Python provides the csv module for reading and writing CSV files.
To read a CSV file in Python, we can use the csv.reader() method, which returns a reader object that iterates over the lines of the CSV file.
Here is an example code snippet to read a CSV file:
import csv
with open('file.csv') as f:
reader = csv.reader(f)
for row in reader:
print(row)
This opens the CSV file, creates a csv.reader object, and prints each row as a list.
The 'f' is the file object that opens the file 'file.csv'.
We can also read CSV files using Pandas, which provides a higher level interface:
import pandas as pd
df = pd.read_csv("file.csv")
This reads the CSV into a Pandas DataFrame, which provides powerful data analysis capabilities.
Assuming 'file.csv' contains:
Name,Age,Location
John,25,New York
Alice,30,Los Angeles
Bob,22,Chicago
Output of your code will be:
['Name', 'Age', 'Location']
['John', '25', 'New York']
['Alice', '30', 'Los Angeles']
['Bob', '22', 'Chicago']
Writing to CSV Files
To write data to a CSV file in Python, we use the csv.writer class.
For example:
import csv
with open('output.csv', 'w') as f:
writer = csv.writer(f)
writer.writerow(['Name', 'Age'])
writer.writerow(['John', 20])
writer.writerow(['Sarah', 24])
This opens output.csv for writing as a CSV file. We create a csv.writer and use the writerow() method to write each row of data.
The 'w' in the open() function specifies that we are opening the file for writing.
Here's what the 'output.csv' file would contain:
Name,Age
John,20
Sarah,24
Similarly, with Pandas we can output a DataFrame to a CSV easily:
df.to_csv("output.csv", index=False)
This will save the DataFrame df as a CSV file without writing row indexes.
CSV File Attributes
When reading and writing CSV files, there are some important attributes to consider:
- Delimiter - The delimiter is the character used to separate each field in a row. By default this is a comma, but can be changed to another character like a semicolon.
- Quotechar - The quote character is used to wrap strings that may contain the delimiter. This defaults to a double quote.
- Escapechar - Used to escape special characters within fields.
- Header - Determines whether the first row should be treated as a header containing column names.
All of these can be configured when creating the reader or writer objects in Python's csv module.
When to Use CSV Files
CSV is a simple file format that works well for storing spreadsheet data or exported results from a database or analytics application.
The CSV format makes it easy to share data or import into other applications. It's lightweight and universal, but lacks more complex data model and relationship support.
For building full-featured applications, a format like JSON or a database may be more appropriate for complex data. But for simple tabular data exchange, CSV is a ubiquitous and handy format.