Getting my Personal Finance data back with hCalendar and hCard
The Quicken Interchange Format (QIF) is notoriously inadequate for clean import/export. The instructions for migrating Quicken data across platforms say:
- From the old platform, dump it out as QIF
- On the new platform, read in the QIF data
- After importing the file, verify that account balances in your new Quicken for Mac 2004 data file are the same as those in Quicken for Windows. If they don't match, look for duplicate or missing transactions.
I have not migrated my data from Windows98 to OS X because of this mess. I use win4lin on my debian linux box as life-support for Quicken 2001.
Meanwhile, Quicken supports printing any report to a tab-separated file, and I found that an exhaustive transaction report represents transfers unambiguously. Since October 2000, when my testing showed that I could re-create various balances and reports from these tab-separated reports, I have been maintaining a CVS history of my exported Quicken data, splitting it every few years:
$ wc *qtrx.txt
4785 38141 276520 1990-1996qtrx.txt
6193 61973 432107 1997-1999qtrx.txt
4307 46419 335592 2000qtrx.txt
5063 54562 396610 2002qtrx.txt
5748 59941 437710 2004qtrx.txt
26096 261036 1878539 total
I started a little module on dev.w3.org... I call it Quacken currently, but I think I'm going to have to rename it for trademark reasons. I started with normalizeQData.py to load the data into postgress for use with saCASH, but then saCASH went Java/Struts and all way before debian supported Java well enough for me to follow along. Without a way to run them in parallel and sync back and forth, it was a losing proposition anyway.
Then I managed to export the data to the web by first converting it to RDF/XML:
qtrx93.rdf: $(TXTFILES)
$(PYTHON) $(QUACKEN)/grokTrx.py $(TXTFILES) >$@
... and then using searchTrx.xsl (inside a trivial CGI script) that puts up a search form, looks for the relevant transactions, and returns them as XHTML. I have done a few other reports with XSLT; nothing remarkable, but enough that I'm pretty confident I could reproduce all the reports I use from Quicken. But the auto-fill feature is critical, and I didn't see a way to do that.
Then came google suggest and ajax. I'd really like to do an ajax version of Quicken.
I switched the data from CVS to mercurial a few months ago, carrying the history over. I seem to have 189 commits/changesets, of which 154 are on the qtrx files (others are on the makefile and related scripts). So that's about one commit every two weeks.
Mercurial makes it easy to keep the whole 10 year data set, with all the history, in sync on several different computers. So I had it all with me on the flight home from the W3C Tech Plenary in France, where we did a microformats panel. Say... transactions are events, right? And payee info is kinda like hCard...
So factored out the parts of grokTrx.py that do the TSV file handling (trxtsv.py) and wrote an hCalendar output module (trxht.py).
I also added some SPARQL-ish filtering, so you can do:
python trxht.py --account 'MIT 2000' --class 200009xml-ny 2000qtrx.txt
And get a little microformat expense report:
9/20/00 SEPTEMBERS STEAKHOUSE ELMSFORD NY MIT 2000 19:19 c [Citi Visa HI]/200009xml-ny 29.33 9/22/00 RAMADA INNS ELMSFORD GR ELMSFORD NY MIT 2000 3 nights c [Citi Visa HI]/200009xml-ny 603.96 9/24/00 AVIS RENT-A-CAR 1 WHITE PLAINS NY MIT 2000 c [Citi Visa HI]/200009xml-ny 334.45 1/16/01 MIT MIT 2000 MIT check # 20157686 dated 12/28/00 c [Intrust Checking]/200009xml-ny -967.74
Mercurial totally revolutionizes coding on a plane. There's no way I would have been as productive if I couldn't commit and diff and such right there on the plane. I'm back to using CVS for the project now, in order to share it over the net, since I don't have mercurial hosting figured out just yet. But here's the log of what I did on the plane:
changeset: 19:d1981dd8e140
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 20:48:44 2006 -0600
summary: playing around with places
changeset: 18:9d2f0073853b
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 18:21:35 2006 -0600
summary: fixed filter arg reporting
changeset: 17:3993a333747b
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 18:10:10 2006 -0600
summary: more dict work; filters working
changeset: 16:59234a4caeae
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 17:30:28 2006 -0600
summary: moved trx structure to dict
changeset: 15:425aab9bcc52
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 20:57:17 2006 +0100
summary: vcards for payess with phone numbers, states
changeset: 14:cbd30e67647a
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 19:12:38 2006 +0100
summary: filter by trx acct
changeset: 13:9a2b49bc3303
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 18:45:06 2006 +0100
summary: explain the filter in the report
changeset: 12:2ea13bafc379
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 18:36:09 2006 +0100
summary: class filtering option
changeset: 11:a8f550c8759b
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 18:24:45 2006 +0100
summary: filtering in eachFile; ClassFilter
changeset: 10:acac37293fdd
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 17:53:18 2006 +0100
summary: moved trx/splits fixing into eachTrx in the course of documenting trxtsv.py
changeset: 9:5226429e9ef6
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 17:28:01 2006 +0100
summary: clarify eachTrx with another test
changeset: 8:afd14f2aa895
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 17:19:36 2006 +0100
summary: replaced fp style grokTransactions with iter style eachTrx
changeset: 7:eb020cda1e67
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 16:16:43 2006 +0100
summary: move isoDate down with field routines
changeset: 6:123f66ac79ed
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 16:14:45 2006 +0100
summary: tweak docs; noodle on CVS/hg scm stuff
changeset: 5:4f7ca3041f9a
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 16:04:07 2006 +0100
summary: split trxtsv and trxht out of grokTrx
changeset: 4:95366c104b42
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 14:48:04 2006 +0100
summary: idea dump
changeset: 3:62057f582298
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 09:55:48 2006 +0100
summary: handle S in num field
changeset: 2:0c23921d0dd3
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 09:38:54 2006 +0100
summary: keep tables bounded; even/odd days
changeset: 1:031b9758304c
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 09:19:05 2006 +0100
summary: table formatting. time to land
changeset: 0:2d515c48130b
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 07:55:58 2006 +0100
summary: working on plane
I used doctest unit testing quite a bit, and rst for documentation:
Usage
Run a transaction report over all of your data in some date range and print it to a tab-separated file, say, 2004qtrx.txt. Then invoke a la:
$ python trxht.py 2004qtrx.txt >,x.html
$ xmlwf ,x.html
$ firefox ,x.htmlYou can give multiple files, as long as the ending balance of one matches the starting balance of the next:
$ python trxht.py 2002qtrx.txt 2004qtrx.txt >,x.htmlSupport for SPARQL-style filtering is in progress. Try:
$ python trxht.py --class myclass myqtrx.txt >myclass-transactions.htmlto simulate:
describe ?TRX where { ?TRX qt:split [ qs:class "9912mit-misc"] }.Future Work
- add hCards for payees (in progress)
- pick out phone numbers, city/state names
- support a form of payee smushing on label
- make URIs for accounts, categories, classses, payees
- support round-trip with QIF; sync back up with RDF export work in grokTrx.py
- move the quacken project to mercurial
- proxy via dig.csail.mit.edu or w3.org? both?
- run hg serve on homer? swada? login.csail?
- publish hg log stuff in a _scm/ subpath; serve the current version at the top