Reading CSV data

Last week I ran across this article on Real Python on how to code a simple CSV file reader, using test driven development, and work with data from the files.  For a long time I’ve understood the value of TDD but wasn’t quite sure how to apply it meaningfully to writing code involving data from external files.  This article helped tremendously — and just in time for me to write code for the very first step of my Quiffer project:  reading financial transactions from a CSV file.

The sample data I’m using for this project is from a PayPal transaction file download.  Here are my first tests, for which I have written code that passes:

import unittest
from quiffer import read_data

class QuifferTest(unittest.TestCase):

    def setUp(self):
        self.data = 'sample_data/paypal_sample.csv'

    def test_read_data_headers(self):
        self.assertEqual(
            read_data(self.data)[0],
            ['Date', 
             ' Time', 
             ' Time Zone', 
             ' Name', 
             ' Type', 
             ' Status', 
             ' Currency', 
             ' Gross', 
             ' Fee', 
             ' Net', 
             ' From Email Address', 
             ' To Email Address', 
             ' Counterparty Status', 
             ' Address Status', 
             ' Item Title', 
             ' Item ID', 
             ' Shipping and Handling Amount', 
             ' Insurance Amount', 
             ' Sales Tax', 
             ' Option 1 Name', 
             ' Option 1 Value', 
             ' Option 2 Name', 
             ' Option 2 Value', 
             ' Auction Site', 
             ' Buyer ID', 
             ' Item URL', 
             ' Closing Date', 
             ' Reference Txn ID', 
             ' Invoice Number', 
             ' Custom Number', 
             ' Receipt ID', 
             ' Balance']
            )

    def test_read_data_name(self):
        self.assertEqual(read_data(self.data)[1][3], 'Almech Devices, LLC')

    def test_read_data_name_chinese(self):
        self.assertEqual(read_data(self.data)[3][3], '广州满翼易有限公司')

    def test_read_data_gross(self):
        self.assertEqual(read_data(self.data)[1][7], '-2.75')

if __name__ == '__main__':
    unittest.main()

As you can see, I’m following right along with the RealPython article’s methods. Here’s the Quiffer code so far:


import csv

def read_data(data):
    with open(data, encoding='utf-8') as csv_file:
        data = list(csv.reader(csv_file))
       
    return data

data = 'sample_data/paypal_sample.csv'

paypal_data = read_data(data)

I added the “encoding=’utf-8′” to the ‘open’ statement, because otherwise it threw a UnicodeDecode error. The original article had line 5 as:

data = [row for row in csv.reader(f.read().splitlines())]

..but thanks to what I learned from Trey Hunner’s CSV Python chat, I simplified this to:

data = list(csv.reader(csv_file))

I’m sure there are more tests I can add to be more thorough, so that’s the next step.  If anyone has any suggestions, hit me up on Twitter.