DataTools: A Hopefully Useful Toolbox

This set of tools has evolved from CsvChamp and CsvChamp3. Its purpose is to read data from a typical *.csv file, with the first row as column headers, into a Python dictionary. Once in dictionary form, the data can then be formatted for output to a standard text file for viewing and reporting, or uploading to applications (such as payroll processing sites) that require text files with precisely spaced columns of data. The file is loaded into the dictionary using the CsvDict tool. The user can then view the column names (DisplayAllFields), select the desired fields for the output file (SelectFields), and finally, create a nice, neat text file (OutputText). The tools could be modified to display columns of data on a screen or in a GUI element instead of a text file.

This will likely be the final version of my *.csv tinkering. I’m itching to get back to Tkinter and databases which is my ultimate goal. Here is a preview of coming attractions in that regard.

“DataTools” was written in Python 3. Since it’s so rudimentary I didn’t bother version-controlling it or putting it up on Github. So, here it is…enjoy!

# Data Tools 1.0
# by Rob Fowler
# Last Update: 2014-03-05
# Python 3.3.2

# A set of tools for getting and manipulating data from a *.csv file.


import csv
import tkinter
from tkinter import filedialog
import subprocess
import os


# Define exceptions
class DataToolsError(Exception): pass
class WidthError(DataToolsError): pass
class AlignmentSpecError(DataToolsError): pass
class ColumnError(DataToolsError): pass
class FileError(DataToolsError): pass


def GetFile(op):
    """
    Prompts the user to select a file for input or output.
    op = "o" (open a text file) or "s" (save a file).
    Returns a file path for opening or saving.
    """
    root = tkinter.Tk()
    root.withdraw()

    if op.lower() == "o":
        file_path = filedialog.askopenfilename()
        return file_path

    elif op.lower() == "s":
        file_path = filedialog.asksaveasfilename()
        return file_path

    else:
        raise(FileError,"Valid arguments are 'o' (open) or 's' (save).")


def CsvDict(file_path):
    """
    Reads each row of the *.csv file into its own
    dictionary. First row of the *.csv file must be field names. Function adds each row as a sub dictionary, using a unique integer row ID as the key for each row.
    
    Row ID 0 is added as a specification row, providing 
    instructions on how each column is to be output:
        'order': The order in which the column is output.
        'align': How the column is to be aligned within the space
                 allotted to it. 1=left, 2=center, 3=right.
        'display': 1=display column, 0=skip column
        
    Example:
    *.csv file:
    heading1,heading2,heading3,heading4,heading5
    data1,data2,data3,data4,data5
    data6,data7,data8,data9,data10

    Result:
    {0:{'heading1':{'order':0,'align':1,'display':1},
        'heading2:{'order':1,'align':1,'display':1},
        'heading3:{'order':2,'align':1,'display':1},
        'heading4:{'order':3,'align':1,'display':1},
        'heading5:{'order':4,'align':1,'display':1}},
     1:{"heading1":"data1",
        "heading2":"data2",
        "heading3":"data3",
        "heading4":"data4",
        "heading5":"data5"},
     2:{"heading1":"data6",
        "heading2":"data7",
        "heading3":"data8",
        "heading4":"data9",
        heading5":"data10"}}
    """
    delimiter = ','
    quote_character = '"'

    # Read each row into a dictionary.
    file_data = csv.DictReader(open(file_path, "rt"),delimiter=',',
                               quotechar='"')

    field_names = file_data.fieldnames

    # Create the new dictionary that will serve as the data source.
    data_dict = {}

    # Create data spec in 1st row and add column order and display & alignment
    # defaults.
    data_dict[0] = {}
    for field_order in range(0,len(field_names)):
        data_dict[0][field_names[field_order]]={}
        data_dict[0][field_names[field_order]]["order"] = field_order
        data_dict[0][field_names[field_order]]["align"] = 1
        data_dict[0][field_names[field_order]]["display"] = 1

    # Add a unique integer ID as a key for each row, and add the rows as
    # sub dictionaries of data_dict.
    count = 1
    for row in file_data:
        data_dict[count] = row
        count += 1

    # Get number of columns in 1st row of data
    cols = len(data_dict[1])

    for key in data_dict:
        if len(data_dict[key]) != cols:
            raise ColumnError("Data rows don't have same number of columns.")

    for field_name in data_dict[0]:
        width = FieldMaxLen(data_dict, field_name) + 1
        data_dict[0][field_name]["width"] = width

    return data_dict


def ColAlign(text, width, align):
    """
    Returns text aligned within column according to alignment specified.
        text: the text to be aligned
        width: total with of column including blank spaces
        align: how the text is to be aligned within the given 
               column width. 1=left, 2=center, 3=right. 
    """
    # Width must be at least 1 more tha len(text) so there will be at least
    # one space between columns.
    if width <= len(text):
        raise WidthError("Invalid width. Must be > length of text.")

    # Padding = total number of blank spaces
    padding = width - len(text)

    if align == 1: # left
        aligned = text + (padding * " ")

    elif align == 3: # right
        aligned = (padding * " ") + text

    elif align == 2: # centered
    
        # Divide padding by 2 and round down to nearest integer
        spaces_before = int(padding/2)

        # Put the rest of the spaces after the text
        spaces_after = padding-spaces_before

        aligned = (spaces_before * " ") + text + (spaces_after * " ")
    else:

        raise AlignmentSpecError("invalid alignment specification.")

    return aligned


def DisplayAllFields(data_dict):
    """
    Lists fields and their order, width and alignment specifications.
    """
    for display_order in range(0,len(data_dict[0])):
        for field_name in data_dict[0]:
            if data_dict[0][field_name]["order"] == display_order:
                print(str(data_dict[0][field_name]["order"]) + ": " +
                      field_name)


def DisplaySelected(data_dict):
    """
    Lists fields with display set to 1 and their order, width and align specifications.
    """
    for display_order in range(0,len(data_dict[0])):
        for field_name in data_dict[0]:
            if data_dict[0][field_name]["order"] == display_order\
                    and data_dict[0][field_name]["display"] == 1:
                print(str(data_dict[0][field_name]["order"]) + ": " +
                      field_name)


def UnselectAll(data_dict):
    """
    Sets display = 0 on all fields.
    """
    for field_name in data_dict[0]:
        data_dict[0][field_name]["display"] = 0


def SelectAll(data_dict):
    """
    Sets display = 1 on all fields.
    """
    for field_name in data_dict[0]:
        data_dict[0][field_name]["display"] = 1


def SelectFields(data_dict):
    """
    Prompts user to select fields to display, by number.  Use
    DisplayAllFields to obtain field numbers.
    """
    fields_input = input("Enter list of field numbers to display: ")
    field_list = fields_input.split(',')
    fields_int = []
    for num in field_list:
        fields_int.append(int(num))

    UnselectAll(data_dict)

    # Find the field number and set its display flag to 1
    for field_name in data_dict[0]:
        for field_num in fields_int:
            if data_dict[0][field_name]["order"] == field_num:
                data_dict[0][field_name]["display"] = 1


def FieldMaxLen(data_dict, field_name):
    """
    Returns the length of the longest field in a specified column of data
    from a dictionary created by CsvDict.
    """
    max_len = len(field_name)

    # Iterate over each record in dict
    for id in data_dict:

        # Get the length of the current field in the column
        field_len = len(data_dict[id][field_name])

        # If it's greater than the current max length
        if field_len > max_len:

            max_len = field_len

    return max_len


def ResetWidthSpec(data_dict):
    """
    Default column width specification. align=left, width= max text width+1.
    """
    for field_name in data_dict[0]:
        width = FieldMaxLen(data_dict, field_name) + 1
        data_dict[0][field_name]["width"] = width


def OutputText(data_dict, file_path):
    """
    Creates a text file from dict using specs
    """

    with open(file_path, "w") as OutputFile:

        # Write the field names first
        heading = ""

        # len(data_dict[0] = total number of columns
        for display_order in range(0,len(data_dict[0])):
            for field_name in data_dict[0]:
                if data_dict[0][field_name]["order"] == display_order:
                    if data_dict[0][field_name]["display"] == 1:
                        heading += (ColAlign(field_name,
                                    data_dict[0][field_name]['width'],
                                    data_dict[0][field_name]['align']))
        heading += "\n"
        OutputFile.writelines(heading)

        # Now write the actual data.
        # Iterate over each record in data_dict.  Since record 0 is
        # the specs, record 1 is the first data record.
        for record in range(1,len(data_dict)):
            row = ""
            # Count from zero to total number of columns. To make sure the
            # columns are displayed in the correct order, need to compare
            # each column order spec with the count and display it only if
            # it matches
            for display_order in range(0,len(data_dict[0])):

                # Check the specified display order of each field name
                for field_name in data_dict[0]:

                    # If the current field name's order matches the count
                    if data_dict[0][field_name]["order"] == display_order:

                        # If the field's display flag is set
                        if data_dict[0][field_name]["display"] == 1:
                            row += (ColAlign(data_dict[record][field_name],
                                        data_dict[0][field_name]['width'],
                                        data_dict[0][field_name]['align']
                                        ))
            row += "\n"
            OutputFile.writelines(row)
            
    # Open the file.
    # subprocess.call(('xdg-open', file_path)) #Linux
    os.startfile(file_path) # Windows