Working with CSV in Python
CSV stands for Comma Separated Values. CSV files stores tabular data in plain text meaning you can create, view, and manipulate CSV files in any plain text editor. It is a portable format for creating, storing, and transforming data in a data-table like manner. Almost all the spreadsheet programs can create, edit, and view CSV files. There are many database systems that support exporting and importing data in CSV formats. A lot of consumer, business, educational and scientific applications support CSV.
CSV Format
A format describes a method of storing data in files or in any other medium. CSV files are plain text files but their format is what makes it different. Think of a CSV file as a table that is made up of rows and columns of data, and within the data are cells. In CSV the data is organized in the following ways.
- Every row is separated by a newline.
- Every piece of data in a cell is separated by a comma (that's why it is called Comma Separated)
- Every row can expand to any number of columns. Unlike some other data storing file formats you cannot specify a fixed amount of columns.
But there are some things to worry about:
- What if a data contains commas?
CSV format is not a strictly defined format. So, different implementation of CSV may choose different approaches. - Data in a cell can be put in quotation marks to store commas inside of the cell.
- But, what if the quoted data contains quotes too?
- We can use some escape sequence to escape that quotation marks.
- But, what if the data contains newlines?
Things get complicated very quickly. What do we do in that case? Unlike many other standardized proprietary or open file formats, the format of CSV is not strictly defined. Different implementations choose different ways of handling things and we call these different implementation dialects.
Data Types
Unlike many other tabular data storing formats, in CSV you cannot specify the data type of cells. Any type of data that can be encoded in plain text can be stored by CSV. So, if you want to declare some specific type of data in CSV you have to specify that yourself in your program, develop a custom program or script to handle it, or use some specialized program for handling it.
Encoding
CSV is a plain text format and thus encoding comes into play. We do not always store only ASCII characters. We may need to store data from any character set or language. But CSV does not specify any specific type of encoding. Its up to the application or up to you, how you store data and in which encoding. It is very common to use utf-8 as the encoding. It is also the recommended way of storing data in CSV and other commonly used plain text data such as HTML, XML, JSON, etc.
Handling CSV With Python
There are a lot of issues in handling such a simple plain text format. Without handling it by hand it is better to handle it in some programmatic ways. Python is great at handling CSV files. The module provided for working with CSV can also handle different dialects efficiently. In the standard library of Python you can find a module called csv for handling CSV files and data.
Getting Started With Code
To get started you need to have Python installed on your system. You can use any plain text editor, code editor or IDE to code with Python. I am going to use Python 3.6 and the IDLE provided with Python installation. If you are on Unix systems you may need to install it separately if you want to use it. I am going to create a file called py_csv.py, you are free to name it whatever you like.
import csv
Preparing a File for Reading or Writing as CSV
The function and classes inside the csv module do not directly open a csv file. Instead it requires an open file or file like object. This may not be a shortcut but it comes with a lot of flexibility as we can use any file like object to work with csv data. So, let's open a file for use with the csv reader.
Create a text file in your current working directory named person_data with .csv as extension and put some data inside it as follows:
Name,Age,Profession
Jhon,26,Marketer
Scott,29,Marketer
Issa,27,Marketer
Now write the code like below:
import csv
f1 = open("person_data.csv", "r", encoding="utf-8", newline="")
# Our code goes here
f1.close()
Hit run to see if everything is alright.
Reading CSV With CSV Reader
In the csv module there is a function called reader(), that returns an object for reading the data from the file object. It accepts the file object as the first parameter. The dialect keyword parameter is set to 'excel' by default. You can pass other keyword arguments as required.
import csv
f1 = open("person_data.csv", "r", encoding="utf-8", newline="")
reader = csv.reader(f1)
f1.close()
You can iterate over the reader object to get the rows from the csv file. Let's iterate over it and print out all the data we have.
import csv
f1 = open("person_data.csv", "r", encoding="utf-8", newline="")
reader = csv.reader(f1)
for row in reader:
print(row)
f1.close()
Output:
['Name', 'Age', 'Profession']
['Jhon', '26', 'Marketer']
['Scott', '29', 'Marketer']
['Issa', '27', 'Marketer']
See, the rows of data are returned as arrays. You can pull out your required data from the array now and process it accordingly.
Writing to CSV Files
Like reading from csv files we can write to csv files in a similar fashion. You need to have a file opened in writing mode or you can use any other file like object in place. I am going to open a file called person_data2.csv in the current working directory by opening it in write mode.
import csv
f1 = open("person_data2.csv", "w", encoding="utf-8", newline="")
f1.close()
To write data to the csv file you will have to write rows as an array of strings. Let's write few rows.
import csv
f1 = open("person_data2.csv", "w", encoding="utf-8", newline="")
writer = csv.writer(f1)
writer.writerow(['Sabuj', '28', 'Programmer'])
writer.writerow(['Greene', '65', 'Author'])
f1.close()
Hit run. Look at the current working directory. You will see a file named person_data2.csv, open it with a plain text editor. You will see data like the following:
Sabuj,28,Programmer
Greene,65,Author
Conclusion
The csv modules provide some other functions and classes. Look at the official Python reference to learn more about them. CSV is a great way of saving, viewing, and sending data. In your career you will encounter it a lot and luckily it is not hard to learn. So, keep practicing and soon you’ll have it mastered. You can pratice by storing all the tabular data you need in your everyday life with the help of Python and CSV.
Need to brush up on your Python? Check out our review of the best online Python courses.
And be sure to stop by the homepage to search and compare SDKs, Dev Tools, and Libraries.
Recent Stories
Top DiscoverSDK Experts
Compare Products
Select up to three two products to compare by clicking on the compare icon () of each product.
{{compareToolModel.Error}}
{{CommentsModel.TotalCount}} Comments
Your Comment