From Evernote to ZimWiki via Python

For many years I’ve struggled with finding the best way to collect and store knowledge…from articles I read online to what I learn from PyBites code challenges.

I was a fan of Evernote for several years. There is no beating it for convenience: a random thought, an article on the web, a photo of a brochure can be quickly saved anytime and anywhere with a tap or two. It has an extremely powerful search system, including searching for text within documents and PDF files. Its structure consisting of notebooks, sub-notebooks and tags is extremely flexible, allowing a user to come up with almost any organizational system they please. It utilizes the cloud but unlike Google Keep, it is not cloud dependent; the desktop app syncs all content to a local database that is fully accessible and usable without internet access.

Despite all these benefits, in the past couple of years I have become discontented with Evernote, for the following reasons:

  • Its rising annual subscription cost, which doubled from $35 in 2017 to $70 this year. This is in addition to the $99 per year I am already paying for Dropbox to do much of the same thing: sync my content online.
  • I have found that Evernote’s convenience has almost become a liability. It’s so easy to impulsively save articles and such that I quickly forget what I have because it gets lost among my 3000+ notes. If I have completely forgotten an article,including its title, content and subject matter, and it’s buried among thousands of notes and tens of notebooks, it might as well not be there.
  • Searching Evernote using the Windows desktop app is slow. Typing a search term such as intitle: (for searches in note titles) causes the app to freeze for several seconds before I even finish typing the word.
  • All content, including attachments such as PDFs, is all squashed together in a proprietary format called exb that only the Evernote app can access. It is possible to export notes out of Evernote into a browser-friendly set of HTML pages, but only as an “index” page with one long list of randomly sorted links to all my notes; exporting wipes out my carefully crafted notebook and tag hierarchy.
  • The Evernote Web Clipper is unreliable, especially on mobile devices.  Sometimes it can’t save a page at all, other times it gets the text and images jumbled so they are unreadable on my phone.  I have found I prefer “cleaning up” web articles using the Mercury Reader plugin for Chrome and printing to PDF.  Yes, this requires more steps, but it also acts as a natural filter of trivial content, because I wouldn’t go through those steps unless it’s an article I feel is worth the effort.  I have also found it acceptable to just copy and paste the URL into Dropbox, if I’m using my phone, then print the article to PDF when I get home to my laptop.

In a previous blog post I demonstrated another attempt to store what I learn. First, I would take notes in Evernote on what I learned from projects such as Bites of Py. Next, I would add what learned to a “knowledge base” in ZimWiki. This worked well at first, then I realized it was cumbersome to copy and paste between two different applications.

Next, I experimented with writing notes in ReStructuredText and exporting to HTML using Sphinx, whose main advantage is easy, free hosting of content at ReadTheDocs. I soon discovered the downside of using Sphinx like this is that I have to export frequently in order to keep that nice web version of my notes up to date.

Now, at last, I have concluded that ZimWiki is an acceptable replacement for Evernote. Here’s why…

  • Zim stores notes in plain text, in a logical file and folder system, that does not require Zim to read and edit. Any text editor will do, including Dropbox’s built in text editor.
  • I can store my Zim notebook in Dropbox and read, edit and even search my notes using the Dropbox Android app. They are not attractive looking when viewed outside Zim but they are 100% usable on my mobile phone despite the fact that there’s no Android ap for Zim (maybe I’ll be able to write one someday!)
  • As I shared in my post, I used Zim for my Python “knowledgebase” because provides the browseable, ‘birds eye’ view of my notes and topics that I’ve been looking for, kind of like a table of contents.
  • Zim is open source and cross platform. It works on Windows, Linux and Mac.
  • Like Evernote, Zim uses tags, but unlike Evernote, tags can be used to tag specific content within a note, not just the note as a whole.
  • Zim keeps things simple. It’s basically a text file creator, organizer and viewer. It does not try to be a photo editor, a PDF reader, a workplace collaboration app or a presentation tool. For non text documents such as PDFs, it makes it easy to create linked notes for these documents. Clicking on the link within the note opens the documents in my operating system’s default applications. Zim lets other applications do their jobs, while it focuses on doing its job. This simplicity makes Zim faster and more responsive than Evernote.

Therefore, as of now, I am transitioning my notetaking and knowledge organizing from Evernote to ZimWiki.

Oh, and one more thing….ZimWiki is written in Python!

Here is an application that is not only incredibly useful to me on a daily basis, but is also an open source project I hope to one day contribute to. In other words, the tool I use to store knowledge can itself help me improve my knowledge of Python! (is that an example of recursion?).

My number one goal now, naturally, is to writing Python scripts to get all my Evernote content (almost 3000 notes!) into ZimWiki. I have already written code that grabs saved PDF documents from an “inbox” and moves them into Zim’s folder structure, automatically creating a note for each document that links to the document and can serve as a place to write notes on the document’s content. For example, here’s the note I created for Jason Wattier’s inspiring article about PyCon, as viewed in Zim Wiki:

Clicking on the link brings up the article in a PDF reader:

Here’s an example of how Zim organizes PDFs. There is the note ABOUT the article (Crack_the_Code_of_PyCon-Wattier.txt) which sits in the “Python” folder. This note contains my comments on the article, and a link to the article, which is stored in a folder of its own. Note the folder title is the same as the filename of the note (without the *.txt extension):

Here’s the actual article PDF in its folder:

This arrangement of the note, the folder and the target PDF is the way Zim organizes PDFs and links to them. The code I wrote (see above) took the file “Crack the Code of PyCon-Wattier” and moved it into its own folder, and created the note that links to it. ZimWiki’s indexing system updated, making the new note visible and the link clickable. I added my comments on the article to the note afterward.

My current project is to write a Python app to Evernote export files, which look like this in a text editor:

?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE en-export SYSTEM "http://xml.evernote.com/pub/evernote-export2.dtd">
<en-export export-date="20181007T111836Z" application="Evernote/Windows" version="6.x">
<note><title>2018-08-02-2043 Bite 006 PyBites Die Hard</title><content><![CDATA[<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd">
<en-note>
<div><span style="font-weight: bold;">Requirements</span></div>
<div><span style="font-size: 15.4px; letter-spacing: normal;
orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px;
.
(and so on)
.
</content><created>20180803T004348Z</created><updated>20181007T111318Z</updated><note-attributes><author>Rob Fowler</author>
<source>desktop.win</source><source-url>https://codechalleng.es/bites/6/</source-url><source-application>evernote.win32</source-application></note-attributes></note></en-export>

to Zim Wiki notes, which look like this in a text editor:

Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2018-10-07T07:13:45-04:00

====== 2018-08-02 Bite 06 PyBites Die Hard ======
2018-08-02

===== Requirements =====
Given a listing of files of our community branch determine who PR'd (= submitted pull request) the most (excluding PyBites) and what challenge is
the most popular (PR'd) as per snapshot of today (8th of Dec 2017). See preparation done in the code template below. Replace ''pass'' with your code
to make the test pass. Good luck and have fun!

===== My Code =====
"""Checks community branch dir structure to see who submitted most
and what challenge is more popular by number of PRs"""
from collections import Counter, namedtuple
import csv
import os
import urllib.request
.
.
(and so on)
.
.
What I Learned
'''
* Each line is split on the comma: line.split(','). Does not need the csv module.
* endswith() method

'''''''

Now, which one would YOU rather read and edit in your text editor?

Of course, it looks best when viewed in ZimWiki. Here’s a screenshot..also note Zim’s similarity to Evernote in its layout…

The Plan

Here’s where I hope to go, moving forward:

  • Write and share my code for helping Evernote users move to ZimWiki.
  • Study Zim Wiki’s source code in depth to understand how it works, taking careful notes (in ZimWiki!) on what I learn.
  • Write some plugins for ZimWiki, perhaps a built-in “Import from Evernote” feature.
  • Contribute to ZimWiki….which would be my first participation in an open source project.

csv.Sniffer gets the sniffles

Since I regularly work with both comma delimited and tab delimited files, I’d like for my Quiffer project to be able to automatically detect if a csv file is tab delimited, and if it is, specify tab (‘\t’) as the delimiter when reading it using Python’s csv.reader class.

Here’s a sample of tab delimited data — an Amazon transaction file:

Transaction Summary for December 1, 2016 to December 8, 2016
Transactions: 3

Date Order ID SKU Transaction type Payment Type Payment Detail Amount Quantity Product Title
Dec 3, 2017 114-5190104 DI-2QQ3-4X5H Order Payment Amazon fees Shipping commission $-0.50 Disney's 2012 Lady and the Tramp Sketchbook Ornament
Dec 3, 2017 114-5190104 DI-2QQ3-4X5H Order Payment Product charges $29.86 1 Disney's 2012 Lady and the Tramp Sketchbook Ornament
Dec 3, 2017 114-5190104 DI-2QQ3-4X5H Order Payment Other Shipping $3.99 Disney's 2012 Lady and the Tramp Sketchbook Ornament

I need Quiffer to be able to parse each row into a list, like so:


['Date', 'Order ID', 'SKU', 'Transaction type', 'Payment Type', 'Payment Detail', 'Amount', 'Quantity', 'Product Title']
['Dec 3, 2017', '114-5190104', 'DI-2QQ3-4X5H', 'Order Payment', 'Amazon fees', 'Shipping commission', '$-0.50', '', "Disney's 2012 Lady and the Tramp Sketchbook Ornament"]
['Dec 3, 2017', '114-5190104', 'DI-2QQ3-4X5H', 'Order Payment', 'Product charges', '', '$29.86', '1', "Disney's 2012 Lady and the Tramp Sketchbook Ornament"]
['Dec 3, 2017', '114-5190104', 'DI-2QQ3-4X5H', 'Order Payment', 'Other', 'Shipping', '$3.99', '', "Disney's 2012 Lady and the Tramp Sketchbook Ornament"]

Of course, the above is the default behavior when a comma delimited file is fed to csv.reader without a delimiter specified, as in the following code:


import csv

src_file = 'sample_data/comma_sample.csv'

with open(src_file, newline='', encoding='utf-8') as csv_file:
    data = list(csv.reader(csv_file))

When a tab-delimited file is opened with this code, however, things get messy:


[['Transaction Summary for December 1', ' 2016 to December 8', ' 2016'],
['Transactions: 3'],
[],
['Date\tOrder ID\tSKU\tTransaction type\tPayment Type\tPayment Detail\tAmount\tQuantity\tProduct Title'],
['Dec 3', " 2017\t114-5190104\tDI-2QQ3-4X5H\tOrder Payment\tAmazon fees\tShipping commission\t$-0.50\t\tDisney's 2012 Lady and the Tramp Sketchbook Ornament"],
['Dec 3', " 2017\t114-5190104\tDI-2QQ3-4X5H\tOrder Payment\tProduct charges\t\t$29.86\t1\tDisney's 2012 Lady and the Tramp Sketchbook Ornament"],
['Dec 3', " 2017\t114-5190104\tDI-2QQ3-4X5H\tOrder Payment\tOther\tShipping\t$3.99\t\tDisney's 2012 Lady and the Tramp Sketchbook Ornament"]]

Since csv.reader is expecting a comma delimited file, it splits each row in two at the comma after the month in the Date column.  The rest of the columns are mashed together in a single string, with ‘\t’ in between.

I’d rather not code two different versions of Quiffer for different types of files, with the only difference being the ‘delimiter’ specification in csv.reader.  Surely, I thought, there must be a way for Python to examine a file and determine how its fields are delimited, and parse it accordingly.

A while back I discovered a class in the Python csv module called csv.Sniffer.  My understanding is that Sniffer could figure out if a file was tab delimited or comma delimited, and provide this information to csv.reader in the ‘dialect’ parameters.  I wrote the following to test it with the Amazon sample file:

import csv

def sniff_test(file):
    with open(file, newline='') as csvfile:
        dialect = csv.Sniffer().sniff(csvfile.read(1024))
    
        csvfile.seek(0)

        reader = csv.reader(csvfile, dialect)

        return [row for row in reader]

This was the output:

['Transaction Summary for December 1, 2016 to December 8, 2016']
['Transactions: 3']
[]
['Date\tOrder ID\tSKU\tTransaction type\tPayment Type\tPayment Detail\tAmount\tQuantity\tProduct Title']
["Dec 3, 2017\t114-5190104\tDI-2QQ3-4X5H\tOrder Payment\tAmazon fees\tShipping commission\t$-0.50\t\tDisney's 2012 Lady and the Tramp Sketchbook Ornament"]
["Dec 3, 2017\t114-5190104\tDI-2QQ3-4X5H\tOrder Payment\tProduct charges\t\t$29.86\t1\tDisney's 2012 Lady and the Tramp Sketchbook Ornament"]
["Dec 3, 2017\t114-5190104\tDI-2QQ3-4X5H\tOrder Payment\tOther\tShipping\t$3.99\t\tDisney's 2012 Lady and the Tramp Sketchbook Ornament"]

I tried again, using a list of delimiters, as suggested in How to use csv.Sniffer for 2 different CSV-types? and the code threw a Could not determine parameter error. I did some more experimenting, and to make a long story short, I found that what was giving Sniffer a stuffy nose was the inconsistent layout of the Amazon file.  Note that the first three lines are like a report header, and do not contain any delimiters.  I removed these three lines and made a copy of the file without them, and Sniffer worked.  Nevertheless, I don’t want to have to physically alter my files before parsing them with Quiffer. Solution: have Quiffer read the file a line at a time and look for tabs, and if tabs are present, open as a tab delimited file.  Here’s the latest, tested version:

import csv

def read_data(src_file):
    """
    Import rows of data from a csv file into a list of rows, where each row is
     a list containing column values.
    """

    with open(src_file, newline='', encoding='utf-8') as csv_file:

        # Check the file for tabs.
        line = csv_file.readline()

        while line != "":
            if '\t' in line:

                # tab was found, so read it as csv with delimiter specified
                csv_file.seek(0)

                return list(csv.reader(csv_file, delimiter='\t'))
                
            line = csv_file.readline()

        # if no tabs present, read as a normal csv
        csv_file.seek(0)

        return list(csv.reader(csv_file))

Reading CSV data

Last week I ran across this article on Real Python on how to code a simple CSV file reader, using test driven development, and work with data from the files.  For a long time I’ve understood the value of TDD but wasn’t quite sure how to apply it meaningfully to writing code involving data from external files.  This article helped tremendously — and just in time for me to write code for the very first step of my Quiffer project:  reading financial transactions from a CSV file.

The sample data I’m using for this project is from a PayPal transaction file download.  Here are my first tests, for which I have written code that passes:

import unittest
from quiffer import read_data

class QuifferTest(unittest.TestCase):

    def setUp(self):
        self.data = 'sample_data/paypal_sample.csv'

    def test_read_data_headers(self):
        self.assertEqual(
            read_data(self.data)[0],
            ['Date', 
             ' Time', 
             ' Time Zone', 
             ' Name', 
             ' Type', 
             ' Status', 
             ' Currency', 
             ' Gross', 
             ' Fee', 
             ' Net', 
             ' From Email Address', 
             ' To Email Address', 
             ' Counterparty Status', 
             ' Address Status', 
             ' Item Title', 
             ' Item ID', 
             ' Shipping and Handling Amount', 
             ' Insurance Amount', 
             ' Sales Tax', 
             ' Option 1 Name', 
             ' Option 1 Value', 
             ' Option 2 Name', 
             ' Option 2 Value', 
             ' Auction Site', 
             ' Buyer ID', 
             ' Item URL', 
             ' Closing Date', 
             ' Reference Txn ID', 
             ' Invoice Number', 
             ' Custom Number', 
             ' Receipt ID', 
             ' Balance']
            )

    def test_read_data_name(self):
        self.assertEqual(read_data(self.data)[1][3], 'Almech Devices, LLC')

    def test_read_data_name_chinese(self):
        self.assertEqual(read_data(self.data)[3][3], '广州满翼易有限公司')

    def test_read_data_gross(self):
        self.assertEqual(read_data(self.data)[1][7], '-2.75')

if __name__ == '__main__':
    unittest.main()

As you can see, I’m following right along with the RealPython article’s methods. Here’s the Quiffer code so far:


import csv

def read_data(data):
    with open(data, encoding='utf-8') as csv_file:
        data = list(csv.reader(csv_file))
       
    return data

data = 'sample_data/paypal_sample.csv'

paypal_data = read_data(data)

I added the “encoding=’utf-8′” to the ‘open’ statement, because otherwise it threw a UnicodeDecode error. The original article had line 5 as:

data = [row for row in csv.reader(f.read().splitlines())]

..but thanks to what I learned from Trey Hunner’s CSV Python chat, I simplified this to:

data = list(csv.reader(csv_file))

I’m sure there are more tests I can add to be more thorough, so that’s the next step.  If anyone has any suggestions, hit me up on Twitter.

‘Quiffer’: Getting CSV Data into GnuCash via QIF Import

I use the excellent open source accounting software GnuCash to manage my personal finances.  I discovered it earlier this year and found it to be just as good as Quicken and without the evil.   I’m old-fashioned when it comes to managing finances.  I don’t need to have access to my financial records on all my devices; I am content to have a single app running on my laptop (backed up using Dropbox, of course) and to enter my receipts into GnuCash when I get home each day.  I prefer apps I can easily get my data into and out of.  Finally, I don’t believe handing my bank account login information to a third party is a good idea, preferring to download financial transactions from my accounts, in CSV format, manually.

GnuCash has the option to import transactions from QIF format, which allows for “split” (also known as double entry) transactions.   For example, this is what a QIF file would look like for my monthly electric payment:

!Account
NAssets:Checking Account
^
!Type:Bank
D2017-11-28
T-50.00
PFlorida Power and Light
SExpenses:Electric
$50.00
^

This transaction has two entries: the deduction of $50.00 from the bank account to the power company, and the increase of $50.00 in the GnuCash account “Expenses:Electric.”  The same transaction as downloaded in a CSV file from my bank would be something like this:

11/28/17,Florida Power and Light,-50.00

Obviously, this is a radically different file format than QIF.  Also note that the bank only has one “side” of the transaction, the deduction from the bank account.  Getting transactions like these into GnuCash, then, requires the additional step of adding the correct GnuCash expense account to each transaction — not just converting from CSV to QIF.

Fortunately, I’ve discovered Python is very good at converting between very different text file formats.

In my transition to GnuCash, the biggest challenge was converting four years of financial transactions from Microsoft Access 2010 to QIF.  I started by exporting from Access to CSV.   The resulting CSV file had over 20,000 lines.  I was able to throw together some messy Python code to convert the 20,000 lines of CSV to QIF, and then get all the data into Gnu Cash.  There were so many lines that I worked with them in batches so I could check the account balances as I went.

I still use a slightly cleaner version of this code on a monthly basis to import transactions from PayPal, my Amazon seller account and eBay into QIF format, for import into GnuCash.  This makes it easy to keep detailed sales and expense records.  Of all the Python code I’ve written, this QIF creator has been by far the most useful to me.  Now I hope to make it useful to others.

The more I learn about Python, the more I see how much my code can be improved.   My existing code for CSV to QIF conversion is neither user-friendly nor GitHub worthy, so I plan to clean up, refactor and redesign it.  From this point forward, I’ll call this project “Quiffer.”

To illustrate the challenges faced by this project, here’s a sample PayPal CSV file, which is the most complicated file type I deal with regularly:

 
Date, Time, Time Zone, Name, Type, Status, Currency, Gross, Fee, Net, From Email Address, To Email Address, Counterparty Status, Address Status, Item Title, Item ID, Shipping and Handling Amount, Insurance Amount, Sales Tax, Option 1 Name, Option 1 Value, Option 2 Name, Option 2 Value, Auction Site, Buyer ID, Item URL, Closing Date, Reference Txn ID, Invoice Number, Custom Number, Receipt ID, Balance
11/28/2017,21:26:38,EST,"Almech Devices, LLC",Express Checkout Payment Sent,Completed,USD,-2.75,0,-2.75,bxx4@gmail.com,bxxxk@gmail.com,Verified,Confirmed,Tempered Glass Protective Screen Protector Film for Samsung Galaxy S5 Active,131511092340,0,0,0,,,,,Ebay,rxx04,http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItem&item=131511092340,,,,,,58.28
11/28/2017,21:26:36,EST,rosie castle,Express Checkout Payment Sent,Completed,USD,-7.98,0,-7.98,bxx4@gmail.com,ixx976@gmail.com,Verified,Confirmed,"The Good Dinosaur (DVD, 2016) NEW MOVIE! NOW SHIPPING ! ",222735118701,0,0,0,,,,,Ebay,rxx4,http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItem&item=222735118701,,,,,,61.03
11/28/2017,21:26:34,EST,广州满翼易有限公司,Express Checkout Payment Sent,Completed,USD,-5.5,0,-5.5,bxx4@gmail.com,vxxxl@48express.com,Non-U.S. - Verified,Confirmed,Cotton Linen Harry Potter Pillow Case Sofa Waist Throw Cushion Cover Home Decor [S1],282606610348,0,0,0,,,,,Ebay,rxx04,http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItem&item=282606610348&var=581788364837,,,,,,69.01

Note there are a few difficulties here:

  1. There are numerous columns I don’t need.
  2. The date format will need to be changed to YYYY-MM-DD for the QIF file.
  3. There are multiple columns (Name, Type, Status, Item Title) containing data that I want to appear in GnuCash’s “Description” field.  These columns will therefore need to be combined into a single column.
  4. Transactions that are payments for products I’ve sold include an amount in separate column, “Fee”, which is the fee charged by PayPal to receive funds.  Such transactions will actually need FOUR entries in GnuCash: two each for the payment and fee.
  5. My code needs to include a means of determining, automatically, the correct GnuCash accounts where the income and expense will be posted.

The first step for Quiffer will be to import the data from the CSV file into a Python data structure.  That will be the topic of my next post.  Stay tuned.