Making Dates Behave

[UPDATE: 2016-07-05: Kudos to me for re-inventing the wheel..again!  Someone pointed out to me that there is already a module that does this, and it does it much better than me. So, ignore all the code below, and go get dateutil instead! ]

My financial management SQLite database, for which I hope to eventually design a Python/Qt GUI front end, relies on data downloaded from various financial institutions. My current focus is developing a Python application to automatically import and correctly format this data so that it can be appended to my database.

A huge annoyance in this process is that different institutions export dates in different formats. For example, I deal regularly with PayPal, Ebay and Amazon. PayPal’s downloaded transactions have their dates like “7/29/2015.” Ebay’s are “Jul-29-15” (why on earth is ANYONE still using two digit years???) Amazon’s are “Dec 9, 2015.” Appending dates to my database, however, requires them to be in the increasingly standard, reasonable, logical format, “2015-12-17”.

Python to the rescue! Here is my solution: a function called “date_fixer.” Give it a date in any of the above three formats, and it spits out the date as “YYYY-MM-DD”. Detailed documentation and explanation are in the comments. I’m sure it’s not coded as efficiently as it could be, but at this point, I only care that it works! Hope others find it useful as well. Cheers!

def date_fixer(datestring):
    Converts a date, 'datestring', to YYYY-MM-DD (ISO 8601) format.
    'datestring' must be in month, day, year order. The month, day and year
    can be separated by a space, "/" or "-".  The month can be either a number 
    from 1 to 12, the full month name or the firt 3 letters of the month name.   
    Day must be in the range 1 - 31. Year can be either two or four digits. 
    Examples: "12/17/2015", "December 17, 2015", "12-17-15", "12-17-2015",
    "1-2-12", "Dec 17 15", "Dec 17, 2015".  

    from datetime import date
    # dict for looking up month number by abbreviation
    months = {"JAN":1,"FEB":2,"MAR":3,"APR":4,"MAY":5,"JUN":6,
        # determine which character separates m, d, y
        if datestring.find(" ") != -1:
            delim = " "

        elif datestring.find("/") != -1:
            delim = "/"
        elif datestring.find("-") != -1:
            delim = "-"
            return "Error: '{}' is not a valid date string.".format(datestring)

        # Get the position of the first occurance of the delimiter    
        delim1 = datestring.find(delim)
        delim1 = datestring.find(delim)

        # Get the position of the 2nd occurance of the delimiter
        delim2 = datestring[delim1+1:].find(delim)
        delim2 = delim1 + delim2 + 1

        # the month is the segment of datestring that is from 
        # 0 to the 1st delimiter occurance.
        month = datestring[:delim1]

        # month is all or part of the month name. 
        if not month.isnumeric():
            # get the 1st three letters of 'month' & look up in 'months' dict.
            month_abr = month[0:3]
                # look up month abbreviation in months dict
                month = months[month_abr.upper()]
            except KeyError:
                return "Error: '{}' is not a valid month.".format(month)
        month = int(month)
        if month > 12 or month < 1:
            return "Error: '{}' is not a valid month.".format(str(month))
        # the day is the segment between the two occurances of the delimiter.
        day = datestring[delim1+1:delim2]

        # strip out the comma if there's one after the day.
        if day[len(day)-1] == ",":
            day = day[:len(day)-1]

        day = int(day)
        if day > 31:
            return "Error: '{}' is not a valid day.".format(str(day))
        year = ""
        # check for consective numerals in the segment after the 2nd delimiter.
        for char in range(delim2+1,len(datestring)):
            # adds each consective numeral to 'year.'
            # stops when a non-numeric character is encountred. 
            if datestring[char].isnumeric:  
                year = year + datestring[char]
        if int(year) < 100:  # year is 2 digits
            if int(year) <50:
                year = int(year) + 2000  # assume year is in 21st century
                year = int(year) + 1900 # assume year is in 20th century
        year = int(year)
        return date(year, month, day).isoformat()
    except ValueError:
        return ("Error: Unable to process datestring '{}'.".format(datestring))

From knowledge import improvement

2014-04-02 UPDATE: The only reason ‘from tools.DbSession..’ (see below) worked was because ‘tools’ was a subdirectory of the directory I was in (‘projects’). Had NOTHING do to with the PATH variable. I FINALLY found the solution!

In my post about importing Python modules, I wrote that I preferred setting the PATH variable in my code instead of changing my computer’s PATH variable. With so much of my code being new, that did not seem like much of a burden. It was, however, pointed out to me that it could become a major hassle down the road, if I have tons of scripts set to a certain path using sys.path.append(). They would all need to be updated if used on a computer with a different directory structure, or even if I just moved my code around on my current computer. I had previously tried working with the PATH setting without success, so for the time being it was easier to specify it in code.

Now, I am happy to report, I have discovered the better way.

I have a class called ‘DbSession’ in the module ‘’ I keep it, and other modules that I want to share among all my projects, in a directory called ‘tools.’ Project-specific code is kept in project folders which reside in the same parent directory as does ‘tools.’

Here’s how I figured it out how to make ‘DbSession’ available to all:

First,  I added the ‘tools’ directory to my system PATH variable and confirmed it was correct. Here is is displayed in a command prompt window, including my ‘tools’ directory:

echo %PATH%
C:\Documents and Settings\Rob\My Documents\Dropbox\projects\tools;

I run Python from the ‘tools’ directory and confirm the import works, without errors (that is, it does nothing but put me back to the Python prompt), from there:

>>>from DbSession import DbSession

I run Python from a different directory (‘projects’) and retry the import. It chokes:

>>>from DbSession import DbSession
Traceback (most recent call last):
File “”, line 1, in
ImportError: No module named ‘DbSession’

Huh? I thought I had the ‘tools’ directory in my PATH variable, so shouldn’t Python be able to see what’s in it?

Behold, the kicker:

>>>from tools.DbSession import DbSession

And there we have it! Apparently, when a directory for modules is specified in PATH, the name of that directory must also be specified in the import statement when attempting to access those modules.

Now, at last, I’ve found the right path to the right use of PATH.

Importance of importing, without imploding

2014-04-02 UPDATE: I found a a much better way to solve this problem.

I have spent the better part of a day, and several rounds of digging at Stackoverflow, to figure out the solution to this issue. It was so tricky (for my middle-aged brain at least) that I want to share it with the world in as simple and straightforward a way as possible in the hope I save some fellow newbies some aggravation.

The more I code, and the more problems I solve with code, the more I don’t want to have to solve the same problem more than once. I’ve tasted the elegant efficiency of re-using code. Once I have a piece of code “perfected” I want to put it in a directory that I can access from any of my projects. How can I share code among several projects in different directories? Here’s my example:

I have two project directories and a directory called “tools” that contains a module “”. All three directories are subdirectories of ‘Projects.’ “’ contains a function also called “coolstuff.” I want to be able to import and use the “coolstuff” function in either of my project directories.


Contents of

def coolstuff():
    This is a cool function you should use often.
    print ("Cool function activated!")

I want to be able to call “coolstuff()” from either “” or “” Obviously I need an import statement. I started off with just ‘import coolstuff’, but that didn’t work. I’ll spare you the gory details of what I went through to find this solution, but suffice it to say, it involved just about every version of “from … import ….”, “import….”, etc., so I’ll just skip straight to the answer:

Contents of ‘’:

import sys
from coolstuff import coolstuff

The “sys” module allows run-time addition to the “PATH” environmental variable using the ‘sys.path.append’ statement. This allows the ‘import’ statement to find ‘tools’ and whatever is in it.

Since the function I want is ALSO named ‘coolstuff’, I have to reference it to the right of the import statement. When I finally got the path issue solved so that the statement ‘import coolstuff’, by itself, went error free, I got a ‘module object is not callable’ error when I tried to actually use the function. Turns out, of course, that the FILE ‘coolstuff’ is the ‘module object’, and the FUNCTION ‘coolstuff’ inside the FILE ‘coolstuff’ has to be called in order to work, hence: ‘from coolstuff import coolstuff.’

For the time being, I prefer to name my modules with the same name as the functions they contain, so that one glance in the ‘tools’ directory is all I need to find the function I want. I prefer to set the “PATH” variable at runtime so I don’t have to monkey with my PC’s environmental settings manually ever time I move code to a different folder.

Hope this clears things up for anyone who was similarly vexed.

DataTools: A Hopefully Useful Toolbox

This set of tools has evolved from CsvChamp and CsvChamp3. Its purpose is to read data from a typical *.csv file, with the first row as column headers, into a Python dictionary. Once in dictionary form, the data can then be formatted for output to a standard text file for viewing and reporting, or uploading to applications (such as payroll processing sites) that require text files with precisely spaced columns of data. The file is loaded into the dictionary using the CsvDict tool. The user can then view the column names (DisplayAllFields), select the desired fields for the output file (SelectFields), and finally, create a nice, neat text file (OutputText). The tools could be modified to display columns of data on a screen or in a GUI element instead of a text file.

This will likely be the final version of my *.csv tinkering. I’m itching to get back to Tkinter and databases which is my ultimate goal. Here is a preview of coming attractions in that regard.

“DataTools” was written in Python 3. Since it’s so rudimentary I didn’t bother version-controlling it or putting it up on Github. So, here it is…enjoy!

# Data Tools 1.0
# by Rob Fowler
# Last Update: 2014-03-05
# Python 3.3.2

# A set of tools for getting and manipulating data from a *.csv file.

import csv
import tkinter
from tkinter import filedialog
import subprocess
import os

# Define exceptions
class DataToolsError(Exception): pass
class WidthError(DataToolsError): pass
class AlignmentSpecError(DataToolsError): pass
class ColumnError(DataToolsError): pass
class FileError(DataToolsError): pass

def GetFile(op):
    Prompts the user to select a file for input or output.
    op = "o" (open a text file) or "s" (save a file).
    Returns a file path for opening or saving.
    root = tkinter.Tk()

    if op.lower() == "o":
        file_path = filedialog.askopenfilename()
        return file_path

    elif op.lower() == "s":
        file_path = filedialog.asksaveasfilename()
        return file_path

        raise(FileError,"Valid arguments are 'o' (open) or 's' (save).")

def CsvDict(file_path):
    Reads each row of the *.csv file into its own
    dictionary. First row of the *.csv file must be field names. Function adds each row as a sub dictionary, using a unique integer row ID as the key for each row.
    Row ID 0 is added as a specification row, providing 
    instructions on how each column is to be output:
        'order': The order in which the column is output.
        'align': How the column is to be aligned within the space
                 allotted to it. 1=left, 2=center, 3=right.
        'display': 1=display column, 0=skip column
    *.csv file:

    delimiter = ','
    quote_character = '"'

    # Read each row into a dictionary.
    file_data = csv.DictReader(open(file_path, "rt"),delimiter=',',

    field_names = file_data.fieldnames

    # Create the new dictionary that will serve as the data source.
    data_dict = {}

    # Create data spec in 1st row and add column order and display & alignment
    # defaults.
    data_dict[0] = {}
    for field_order in range(0,len(field_names)):
        data_dict[0][field_names[field_order]]["order"] = field_order
        data_dict[0][field_names[field_order]]["align"] = 1
        data_dict[0][field_names[field_order]]["display"] = 1

    # Add a unique integer ID as a key for each row, and add the rows as
    # sub dictionaries of data_dict.
    count = 1
    for row in file_data:
        data_dict[count] = row
        count += 1

    # Get number of columns in 1st row of data
    cols = len(data_dict[1])

    for key in data_dict:
        if len(data_dict[key]) != cols:
            raise ColumnError("Data rows don't have same number of columns.")

    for field_name in data_dict[0]:
        width = FieldMaxLen(data_dict, field_name) + 1
        data_dict[0][field_name]["width"] = width

    return data_dict

def ColAlign(text, width, align):
    Returns text aligned within column according to alignment specified.
        text: the text to be aligned
        width: total with of column including blank spaces
        align: how the text is to be aligned within the given 
               column width. 1=left, 2=center, 3=right. 
    # Width must be at least 1 more tha len(text) so there will be at least
    # one space between columns.
    if width <= len(text):
        raise WidthError("Invalid width. Must be > length of text.")

    # Padding = total number of blank spaces
    padding = width - len(text)

    if align == 1: # left
        aligned = text + (padding * " ")

    elif align == 3: # right
        aligned = (padding * " ") + text

    elif align == 2: # centered
        # Divide padding by 2 and round down to nearest integer
        spaces_before = int(padding/2)

        # Put the rest of the spaces after the text
        spaces_after = padding-spaces_before

        aligned = (spaces_before * " ") + text + (spaces_after * " ")

        raise AlignmentSpecError("invalid alignment specification.")

    return aligned

def DisplayAllFields(data_dict):
    Lists fields and their order, width and alignment specifications.
    for display_order in range(0,len(data_dict[0])):
        for field_name in data_dict[0]:
            if data_dict[0][field_name]["order"] == display_order:
                print(str(data_dict[0][field_name]["order"]) + ": " +

def DisplaySelected(data_dict):
    Lists fields with display set to 1 and their order, width and align specifications.
    for display_order in range(0,len(data_dict[0])):
        for field_name in data_dict[0]:
            if data_dict[0][field_name]["order"] == display_order\
                    and data_dict[0][field_name]["display"] == 1:
                print(str(data_dict[0][field_name]["order"]) + ": " +

def UnselectAll(data_dict):
    Sets display = 0 on all fields.
    for field_name in data_dict[0]:
        data_dict[0][field_name]["display"] = 0

def SelectAll(data_dict):
    Sets display = 1 on all fields.
    for field_name in data_dict[0]:
        data_dict[0][field_name]["display"] = 1

def SelectFields(data_dict):
    Prompts user to select fields to display, by number.  Use
    DisplayAllFields to obtain field numbers.
    fields_input = input("Enter list of field numbers to display: ")
    field_list = fields_input.split(',')
    fields_int = []
    for num in field_list:


    # Find the field number and set its display flag to 1
    for field_name in data_dict[0]:
        for field_num in fields_int:
            if data_dict[0][field_name]["order"] == field_num:
                data_dict[0][field_name]["display"] = 1

def FieldMaxLen(data_dict, field_name):
    Returns the length of the longest field in a specified column of data
    from a dictionary created by CsvDict.
    max_len = len(field_name)

    # Iterate over each record in dict
    for id in data_dict:

        # Get the length of the current field in the column
        field_len = len(data_dict[id][field_name])

        # If it's greater than the current max length
        if field_len > max_len:

            max_len = field_len

    return max_len

def ResetWidthSpec(data_dict):
    Default column width specification. align=left, width= max text width+1.
    for field_name in data_dict[0]:
        width = FieldMaxLen(data_dict, field_name) + 1
        data_dict[0][field_name]["width"] = width

def OutputText(data_dict, file_path):
    Creates a text file from dict using specs

    with open(file_path, "w") as OutputFile:

        # Write the field names first
        heading = ""

        # len(data_dict[0] = total number of columns
        for display_order in range(0,len(data_dict[0])):
            for field_name in data_dict[0]:
                if data_dict[0][field_name]["order"] == display_order:
                    if data_dict[0][field_name]["display"] == 1:
                        heading += (ColAlign(field_name,
        heading += "\n"

        # Now write the actual data.
        # Iterate over each record in data_dict.  Since record 0 is
        # the specs, record 1 is the first data record.
        for record in range(1,len(data_dict)):
            row = ""
            # Count from zero to total number of columns. To make sure the
            # columns are displayed in the correct order, need to compare
            # each column order spec with the count and display it only if
            # it matches
            for display_order in range(0,len(data_dict[0])):

                # Check the specified display order of each field name
                for field_name in data_dict[0]:

                    # If the current field name's order matches the count
                    if data_dict[0][field_name]["order"] == display_order:

                        # If the field's display flag is set
                        if data_dict[0][field_name]["display"] == 1:
                            row += (ColAlign(data_dict[record][field_name],
            row += "\n"
    # Open the file.
    #'xdg-open', file_path)) #Linux
    os.startfile(file_path) # Windows

A Tried and Tested True Believer

I was first introduced to the concept of testing by Killer Web Development. I sort of understood it, but at first glance it sounded ridiculous. How can I possibly write an app to test an app that doesn’t exist yet? I thought computer programmers were logical people! And doesn’t this double my work, having to write two apps instead of one? I honestly didn’t fully grasp it. Who has time to write tests? All I have to do is run my app and look at it myself and see if it does what I want..should be simple, right? I want to crank out my code and see results NOW! So, what changed my mind? My work on CsvChamp, that’s what. I found myself tediously opening CSV files, running them through my module and looking at the output. I got weird results I didn’t understand, and my brain seemed to freeze up trying to figure them out. I spent what seemed like hours staring at my screen, trying to untangle the convoluted flight path my data took as it crashed and burned. Somewhere I read something about testing that spoke to me right where I was: wouldn’t it be great if I could have all this work done by….Python? That got my attention. Back to Google…

Thanks to this chapter of Dive Into Python, I now have my very first snippet of “tested first” code. It’s my Python 3 revision of my column alignment function in CsvChamp. Following along with the example, I started off with a blank code file. As I tried to figure out how to write my test, lo and behold — I found myself thinking much more clearly about what I wanted the function to do and what would constitute valid input. I also liked the idea of Python doing my work for me! Unlike before, when I’d run the function again and again on live data and view the results myself to see if they looked OK, I only had to run it through a unit test, with data I already knew was right, and have the test results display a happy green “ok” symbol in my console. After a few go-arounds, here’s the result:

import csv

# Define exceptions
class CsvChampError(Exception): pass
class WidthError(CsvChampError): pass
class AlignmentSpecError(CsvChampError): pass

def StrAlign(text, width, align):
    """Returns text aligned within column according to alignment specified"""

    # Width must be at least 1 more tha len(text) so there will be at least
    # one space between columns.
    if width <= len(text):
        raise WidthError("Invalid width. Must be > length of text.")

    # Padding = total number of blank spaces
    padding = width - len(text)

    if align == "L":
        aligned = text + (padding * " ")

    elif align == "R":
        aligned = (padding * " ") + text

    elif align == "C":
        # Divide padding by 2 and round down to nearest integer
        spaces_before = int(padding/2)

        # Put the rest of the spaces after the text
        spaces_after = padding-spaces_before

        aligned = (spaces_before * " ") + text + (spaces_after * " ")

        raise AlignmentSpecError("invalid alignment spec (must be L, R or C).")

    return aligned

For comparison’s sake, here’s the old version of the same function:

def FormatCsvCol(width, align, cont):
    Returns a column of data with content aligned within an allotted width'
    column = ""
    # The total number of empty spaces in the column.
    spaces = width - len(cont)

    if align == "L":
        # Place all the spaces after the content
        column = cont + (spaces * " ")

    elif align == "R":
        # Place all the spaces before the content.
        column = (spaces * " ") + cont

    elif align == "C":
        # Divide the number of spaces by 2 and round down to the nearest
        # integer.
        # Put the content between this number of spaces, and the rest.
        sp_before = int(spaces / 2) * " "
        sp_after = (spaces - len(sp_before)) * " "
        column = sp_before + cont + sp_after

    return column

Obviously, the big difference is the addition validity checking and informative error messages when invalid input is received. This was a direct result of testing using Dive Into Python’s example, forcing me to look at the big picture. This function could end up anywhere: It could be used to display neat columns of data in a console, in a GUI widget or in a text file. The deeper it gets buried in a complicated app, the more crucial it is to not just know invalid inputs somehow crept into it, but WHY those inputs were invalid. Now I know the function works, without having had to keep feeding it actual CSV data and repeatedly viewing text files myself. Here’s the unit test I came up with:

"""Unit test for CsvChamp3"""

from CsvChamp3 import StrAlign
import unittest

Function StrAlign to align a string within a column of given width, according
to an alignment specification of left, center or right.
class StrValues(unittest.TestCase):
    StrValues = (("example",10,"L","example   "),
                 ("example",10,"C"," example  "),
                 ("example",10,"R","   example"),
                 ("platypus",12,"L","platypus    "),
                 ("boomerang",15,"R","      boomerang"),
                 ("bubba",13,"C","    bubba    "))

    def testValidValues(self):
        """StrAlign should give known results with known values"""
        for text, width, align, text_out in self.StrValues:
            result = StrAlign(text,width,align)
            self.assertEqual(text_out, result)

if __name__ == "__main__":

Of course, I still need to progress to testing invalid input, and functional testing. But this is certainly an encouraging start.

Leaps and Bounds

Some realizations and decisions I have come to in the last week or so…

It appears most Python programmers work in Linux. Many Python learning resources seem to assume a Linux environment. One of my reasons for adopting Python was the fact that it’s open source, unlike MS Access wherein is all my previous coding experience. On the other hand, I need Windows 7 running natively on my laptop so I can take full advantage of all the functions of my two USB scanners. To save my life, I couldn’t get these to work on Linux, either running natively on my laptop, or with Windows 7 in a Virtual Box VM. I have therefore switched my Python coding to Linux Mint, running in a Virtual Box virtual machine under Windows 7. I have all my code projects in Dropbox, which works great in Linux and Windows 7, and a selected few also on GitHub.

I have come to the conclusion that, going forward, my personal coding projects will be in Python 3. If I run across a learning opportunity that requires Python 2.x (for example, something requiring wxPython), then I will revert back. On that topic, this article by Jeff Knupp alerted me to the necessity of virtual environments.

Oldřich Vetešník became my first collaborator on GitHub and brought me back to the necessity of unit testing and had some helpful suggestions on basic app design, i.e., making the app itself a class. This poor soul heard loudly from Snarkoverflow (oops I mean Stackoverflow) about the undesirability of writing an app first and THEN testing it. Ouch. That was me with my CsvChamp module. Luckily Mr. Vetešník was much kinder. But I do need to get with the program regarding testing, so in keeping with that goal and my need to learn Python 3, I’ve decided to re-do CsvChamp from scratch, in Python 3, starting out with unit testing. That is my current project, and updates will follow!

Next challenge: How do I pull together the need to work in both Python 3 and Python 2, version control with GitHub, coding in Linux. Virtual environments, testing and debugging? I was using Notepad++ and IDLE alternately, and the GitHub Windows app, plus a whole lot of directory-switching in both the Windows and Linux command lines. All of these had their quirks and annoyances. IDLE is decent for debugging, but its code editor doesn’t display line numbers. How annoying is an IDE that tells me what line in my code has an error, but can’t show me the line number? IDLE also has the annoying habit of giving me blue screens of death on Windows XP (yes, there is a situation in which I have to use XP. Don’t ask). Notepad++ is a good editor but doesn’t do debugging. Regarding GitHub, the Windows GitHub app is slow, bloated and feature-sparse. I read an article that suggested many Python developers use Vim, but from the little I know about Vim it appears there would be a steep learning curve with its multitude of keyboard-based commands and all the add-ons I’d need to accomplish what I want.

Enter PyCharm. I did some research on IDE’s and decided to give this one a spin, and so far, I love it. It connects seamlessly to GitHub and incorporates virtualenv in a way that makes it easy to switch between Python 2 and 3 and maintain separate sets of libraries for each, and even lists what’s in each library set in case I forget what I installed and where. I was able to configure its text editor in the same nice colors I enjoyed in Notepad++. It displays line numbers, debugs, and even admonishes me when my coding style strays from PEP 8, among other things. It has both Windows and Linux versions. Perhaps by doing so much for me, it’s short-changing some more learning experiences, but that’s a trade off I’m willing to make to be able to focus more on getting some actual coding done, and done right.