Migrating Breadcrumbs
Breadcrumbs was the blog of DIG, the Decentralized Information Group at MIT CSAIL.
In a 2015 #microformats chat, I discovered that it was down:
DanC> grr... the blog is down. http://dig.csail.mit.edu/breadcrumbs/node/228
"Unable to connect to database server"
DanC verifies that he has an export of his work there...
DanC> interesting... my backup is evidently python pickles of XMLRPC responses from the API of that CMS (drupal?)
>>> x['dateCreated']
<DateTime '20080306T17:00:05' at 7f20e8aef5f0>
>>> x['dateCreated'].class
<class xmlrpclib.DateTime at 0x7f20e444eef0>
The files are numbered:
def _numbered_files(pattern='[0-9]*',
breadcrumbs='/home/connolly/sites/breadcrumbs'):
from pathlib import Path
return Path(breadcrumbs).glob(pattern)
breadcrumbs_bak = list(_numbered_files())
sorted(int(f.parts[-1]) for f in breadcrumbs_bak)[:10]
[4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
Each is a pickled XMLRPC response:
import pickle
breadcrumbs_xmlrpc = dict((int(f.parts[-1]), pickle.load(f.open('rb'))) for f in breadcrumbs_bak)
x = breadcrumbs_xmlrpc[228]
x['title'], x['dateCreated'], x['dateCreated'].__class__
('hAudio for microformats mixtapes, in progress',
<DateTime '20080306T17:00:05' at 7fa8242a5320>,
<class xmlrpclib.DateTime at 0x7fa82427cf58>)
MadMode blog pages
from collections import OrderedDict
from __future__ import print_function
from sys import stderr
class BlogWriter(object):
def __init__(self, pages):
self._pages = pages
def addPage(self, body, title, date, tags, published, slug):
datestr = date.isoformat()
headings = OrderedDict(title=repr(title),
date=datestr[:10],
tags="[%s]" % (', '.join("'%s'" % tag for tag in tags)),
published=published)
header = '\n'.join(["%s: %s" % (k, v) for k, v in headings.iteritems()])
yyyy = datestr[:4]
page = (self._pages / yyyy / slug).with_suffix('.md')
print("addPage: ", page, tags, file=stderr)
with page.open('wb') as out:
out.write(header)
out.write('\n\n')
out.write(body.encode('utf-8'))
def _madmode():
from pathlib import Path
return BlogWriter(Path('/home/connolly/sites') / 'madmode-blog' / 'pages')
mmwr = _madmode()
from time import mktime
from datetime import datetime
import re
def drupal2md(body):
body = body.split('</title>', 1)[1] # remove redundant title
body = body.replace('\r', '') # unix newlines
return body
def findTags(body):
tags = []
for txt in body.split('<'):
if txt.startswith('a '):
txt = txt[len('a '):]
attrs = {}
while '=' in txt and not txt.startswith('>'):
name, txt = txt.split('=', 1)
name = name.strip()
txt = txt.strip()
_, value, txt = txt.split('"', 2)
attrs[name] = value
txt = txt.strip()
href = attrs.get('href', '')
if 'tag' in attrs.get('rel', '') or 'del.icio.us' in href:
if href.endswith('/'):
href = href[:-1]
tags.append(href.split('/')[-1])
return tags
for postid, item in sorted(breadcrumbs_xmlrpc.items()):
print(postid, item['title'], file=stderr)
dt = datetime.fromtimestamp(mktime(item['dateCreated'].timetuple()))
tags = ['breadcrumbs'] + findTags(item['content'])
mmwr.addPage(drupal2md(item['content']), title=item['title'], date=dt,
tags=tags,
published=True, slug='breadcrumbs_%04d' % postid)
4 On OpenID and comment policies
addPage: /home/connolly/sites/madmode-blog/pages/2005/breadcrumbs_0004.md ['breadcrumbs']
5 little burst of PAW demo hacking
addPage: /home/connolly/sites/madmode-blog/pages/2005/breadcrumbs_0005.md ['breadcrumbs']
6 DIG blog wish list
addPage: /home/connolly/sites/madmode-blog/pages/2005/breadcrumbs_0006.md ['breadcrumbs', 'connolly']
7 Fire at Southampton... hope everything's alright soon
addPage: /home/connolly/sites/madmode-blog/pages/2005/breadcrumbs_0007.md ['breadcrumbs']
8 Sourceforge is the place... to sell soap?
addPage: /home/connolly/sites/madmode-blog/pages/2005/breadcrumbs_0008.md ['breadcrumbs']
9 Reflecting blog structure into the Semantic Web with SIOC?
addPage: /home/connolly/sites/madmode-blog/pages/2005/breadcrumbs_0009.md ['breadcrumbs']
10 I'd rather be...
addPage: /home/connolly/sites/madmode-blog/pages/2005/breadcrumbs_0010.md ['breadcrumbs']
11 PHP angst
addPage: /home/connolly/sites/madmode-blog/pages/2005/breadcrumbs_0011.md ['breadcrumbs']
12 Shopping for a client-side blogging editor
addPage: /home/connolly/sites/madmode-blog/pages/2005/breadcrumbs_0012.md ['breadcrumbs', 'authoring']
13 presented Issues in Semantic Web Logic to 6.898
addPage: /home/connolly/sites/madmode-blog/pages/2005/breadcrumbs_0013.md ['breadcrumbs']
14 xchat RFE: "mail a log of this chat to mbox@domain" macro
addPage: /home/connolly/sites/madmode-blog/pages/2005/breadcrumbs_0014.md ['breadcrumbs']
15 U.S. papertrail: the federal register
addPage: /home/connolly/sites/madmode-blog/pages/2005/breadcrumbs_0015.md ['breadcrumbs']
16 XHTML for computer science research papers and bibliographies
addPage: /home/connolly/sites/madmode-blog/pages/2005/breadcrumbs_0016.md ['breadcrumbs']
17 ISWC buzz
addPage: /home/connolly/sites/madmode-blog/pages/2005/breadcrumbs_0017.md ['breadcrumbs']
18 Why isn't bill payee set-up integrated with address book or yellow pages lookup?
addPage: /home/connolly/sites/madmode-blog/pages/2005/breadcrumbs_0018.md ['breadcrumbs']
23 RDF Calendar, GRDDL, Microformats, and all that at XML2005 in Atlanta
addPage: /home/connolly/sites/madmode-blog/pages/2005/breadcrumbs_0023.md ['breadcrumbs', 'quality']
24 SKOS, SIOC, and drupal taxonomy
addPage: /home/connolly/sites/madmode-blog/pages/2005/breadcrumbs_0024.md ['breadcrumbs']
25 sorry about overriding your font size
addPage: /home/connolly/sites/madmode-blog/pages/2005/breadcrumbs_0025.md ['breadcrumbs']
26 Ray Ozzie's take on diff/sync
addPage: /home/connolly/sites/madmode-blog/pages/2005/breadcrumbs_0026.md ['breadcrumbs']
27 a fly-by of XACML
addPage: /home/connolly/sites/madmode-blog/pages/2005/breadcrumbs_0027.md ['breadcrumbs']
28 MathML as a rule interchange format
addPage: /home/connolly/sites/madmode-blog/pages/2005/breadcrumbs_0028.md ['breadcrumbs']
29 GRDDL transform wanted: National Information Exchange Model (NIEM)
addPage: /home/connolly/sites/madmode-blog/pages/2005/breadcrumbs_0029.md ['breadcrumbs']
30 Go-Karting rush tainted by lack of OpenID for bug reporting about hypertext editing
addPage: /home/connolly/sites/madmode-blog/pages/2005/breadcrumbs_0030.md ['breadcrumbs']
45 Toward richtext syndicated feed
addPage: /home/connolly/sites/madmode-blog/pages/2005/breadcrumbs_0045.md ['breadcrumbs']
46 Toward better documentation of some schemas for the W3C digital library
addPage: /home/connolly/sites/madmode-blog/pages/2005/breadcrumbs_0046.md ['breadcrumbs']
47 Brought my hockey skates with me this time
addPage: /home/connolly/sites/madmode-blog/pages/2005/breadcrumbs_0047.md ['breadcrumbs']
52 Connecting DIG Student Projects to the MIT UROP listing
addPage: /home/connolly/sites/madmode-blog/pages/2005/breadcrumbs_0052.md ['breadcrumbs']
55 Drupal, OpenID, and the Mac OS X Keychain
addPage: /home/connolly/sites/madmode-blog/pages/2005/breadcrumbs_0055.md ['breadcrumbs']
56 Wikicompany?
addPage: /home/connolly/sites/madmode-blog/pages/2005/breadcrumbs_0056.md ['breadcrumbs']
57 upgrade to CivicSpace?
addPage: /home/connolly/sites/madmode-blog/pages/2005/breadcrumbs_0057.md ['breadcrumbs']
61 frbr:embodiment is enough without frbr:embodimentOf, no?
addPage: /home/connolly/sites/madmode-blog/pages/2005/breadcrumbs_0061.md ['breadcrumbs']
63 On Google, Jabber, and Jingle and good and evil in IM and IP networks
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0063.md ['breadcrumbs']
66 Arpeggio in D, a little three chord ditty
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0066.md ['breadcrumbs']
69 Fun with Policy Aware Web at UMD, AFS/SVN at CSAIL
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0069.md ['breadcrumbs']
70 Using truth maintenance techniques in RDF stores?
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0070.md ['breadcrumbs']
77 MadScientistMode
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0077.md ['breadcrumbs']
78 RSS is dead; long live RSS
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0078.md ['breadcrumbs']
82 python, javascript, and PHP, oh my!
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0082.md ['breadcrumbs', 'installation', 'javascript', 'python', 'quality', 'testing', 'programming']
84 tabulator use cases: when can we meet? and PathCross
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0084.md ['breadcrumbs']
85 bnf2turtle -- write a turtle version of an EBNF grammar
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0085.md ['breadcrumbs']
86 formally closing the feedback loop
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0086.md ['breadcrumbs']
87 Using RDF and OWL to model language evolution
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0087.md ['breadcrumbs']
88 Toward integration of cwm's proof structures with InferenceWeb's PML
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0088.md ['breadcrumbs']
89 Investigating logical reflection, constructive proof, and explicit provability
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0089.md ['breadcrumbs']
90 Fun with Embedded RDF and DOAP for the GRDDL profile
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0090.md ['breadcrumbs']
91 Toward Semantic Web data from Wikipedia
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0091.md ['breadcrumbs', u'connolly']
92 Reflections on the W3C Technical Plenary week
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0092.md ['breadcrumbs', 'NCE']
93 Getting (dis)organized for SxSWi in Austin
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0093.md ['breadcrumbs', 'Austin']
94 Dates in drupal vs planetrdf
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0094.md ['breadcrumbs']
96 Getting my Personal Finance data back with hCalendar and hCard
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0096.md ['breadcrumbs']
97 A look at emerging Web security architectures from a Semantic Web perspective
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0097.md ['breadcrumbs']
98 a quick take on Kiko, a nifty looking calendar service
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0098.md ['breadcrumbs']
99 using JSON and templates to produce microformat data
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0099.md ['breadcrumbs']
100 geocoding and hCards for airports from wikipedia
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0100.md ['breadcrumbs', 'geo']
101 time, context, quoting, and reification
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0101.md ['breadcrumbs']
102 no more life in a textarea: MozEx and emacs to the rescue!
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0102.md ['breadcrumbs']
107 hacking soccer schedules into hCalendar and into my sidekick
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0107.md ['breadcrumbs']
123 A step forward with python and sshagent, and a walk around gnome security tools
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0123.md ['breadcrumbs', 'web', 'policy', 'security', 'python', 'programming']
124 Consensus and community review in open source and open standards
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0124.md ['breadcrumbs']
127 busy day in #microformats
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0127.md ['breadcrumbs']
129 Access control and version control: an over-constrained problem?
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0129.md ['breadcrumbs']
130 citing W3C specs from WWW conference papers
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0130.md ['breadcrumbs']
131 On GData, SPARQL update, and RDF Diff/Sync
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0131.md ['breadcrumbs', 'diff', 'sync', 'sparql', 'calendar', 'web+architecture']
133 RDF, Microformats, and Javascript hacking in person at the 'tute
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0133.md ['breadcrumbs', 'mobile', 'javascript', 'microformats', 'travel', 'calendar', 'BOS', 'bos']
135 webizing TaskJuggler
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0135.md ['breadcrumbs', 'calendar']
139 WWW2006 in Edinburgh: Identity, Reference, and Meaning
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0139.md ['breadcrumbs', 'www2006', 'EDI', 'travel', 'web+architecture', 'URI']
140 Exporting databases in the Semantic Web with SPARQL, D2R, dbview, ARC, and such
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0140.md ['breadcrumbs', 'www2006', 'EDI', 'travel', 'sparql']
141 Equality and inconsistency in the rules layer
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0141.md ['breadcrumbs']
142 fun with flock
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0142.md ['breadcrumbs', u'flock', u'writing', u'editing', u'drupal']
146 converting vcard .vcf syntax to hcard and catching up on CALSIFY
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0146.md ['breadcrumbs']
148 a walk thru the tabulator calendar view
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0148.md ['breadcrumbs', 'calendar', 'SeedApplications']
151 Choosing flight itineraries using tabulator and data from Wikipedia
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0151.md ['breadcrumbs']
154 OpenID, verisign, and my life: mediawiki, bugzilla, mailman, roundup, ...
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0154.md ['breadcrumbs']
155 tabulator maps in Argentina
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0155.md ['breadcrumbs']
156 how much do I want to know about drupal?
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0156.md ['breadcrumbs']
157 on Wikimania 2006, from a few hundred miles away
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0157.md ['breadcrumbs']
158 Stitching the Semantic Web together with OWL at AAAI-06
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0158.md ['breadcrumbs', 'RdfAndSql', 'AAAI', 'public-sparql-dev', 'citation']
159 On the Future of Research Libraries at U.T. Austin
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0159.md ['breadcrumbs', 'Austin', 'URI', 'web+architecture']
160 ACL 2 seminar at U.T. Austin: Toward proof exchange in the Semantic Web
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0160.md ['breadcrumbs', 'Austin', 'semantic', 'web', 'logic', 'research']
161 Talking with U.T. Austin students about the Microformats, Drug Discovery, the Tabulator, and the Semantic Web
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0161.md ['breadcrumbs', 'Austin', 'semantic', 'web']
162 Wishing for XOXO microformat support in OmniOutliner
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0162.md ['breadcrumbs']
163 Trip reporting with flock
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0163.md ['breadcrumbs']
164 Adding Shoenfield, Brachman books to my bookshelf?
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0164.md ['breadcrumbs']
165 Now is a good time to try the tabulator
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0165.md ['breadcrumbs']
171 Celebrating OWL interoperability and spec quality
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0171.md ['breadcrumbs']
172 A new Basketball season brings a new episode in the personal information disaster
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0172.md ['breadcrumbs']
178 Modelling HTTP cache configuration in the Semantic Web
addPage: /home/connolly/sites/madmode-blog/pages/2006/breadcrumbs_0178.md ['breadcrumbs']
179 She's a witch and I have the proof (in N3)
addPage: /home/connolly/sites/madmode-blog/pages/2007/breadcrumbs_0179.md ['breadcrumbs']
180 A design for web content labels built from GRDDL and rules
addPage: /home/connolly/sites/madmode-blog/pages/2007/breadcrumbs_0180.md ['breadcrumbs']
187 The Mercurial SCM: great for lots of stuff, but not the holy grail
addPage: /home/connolly/sites/madmode-blog/pages/2007/breadcrumbs_0187.md ['breadcrumbs', 'python+scm']
192 Collaboration and crime at a distance at HASTAC, WWW2007
addPage: /home/connolly/sites/madmode-blog/pages/2007/breadcrumbs_0192.md ['breadcrumbs', 'openid', 'hastac', 'Duke', 'RDU', 'digital+media']
193 IKL by Hayes et al. provides a semantics for N3?
addPage: /home/connolly/sites/madmode-blog/pages/2007/breadcrumbs_0193.md ['breadcrumbs']
194 Linked Data at WWW2007: GRDDL, SPARQL, and Wikipedia, oh my!
addPage: /home/connolly/sites/madmode-blog/pages/2007/breadcrumbs_0194.md ['breadcrumbs', u'banff', u'grddl', u'www2007', u'travel']
198 Units of measure and property chaining
addPage: /home/connolly/sites/madmode-blog/pages/2007/breadcrumbs_0198.md ['breadcrumbs']
201 Soccer schedules, flight itineraries, timezones, and python web frameworks
addPage: /home/connolly/sites/madmode-blog/pages/2007/breadcrumbs_0201.md ['breadcrumbs']
206 FOAF and OpenID: two great tastes that taste great together
addPage: /home/connolly/sites/madmode-blog/pages/2007/breadcrumbs_0206.md ['breadcrumbs']
207 brainstorming, issue tracking, and problem reporting... with tabulator?
addPage: /home/connolly/sites/madmode-blog/pages/2007/breadcrumbs_0207.md ['breadcrumbs']
214 Free Culture: Why buy the Amazon Kindle when you can give and get an OLPC XO-1 for the same price?
addPage: /home/connolly/sites/madmode-blog/pages/2007/breadcrumbs_0214.md ['breadcrumbs']
221 I can only imagine...
addPage: /home/connolly/sites/madmode-blog/pages/2007/breadcrumbs_0221.md ['breadcrumbs']
228 hAudio for microformats mixtapes, in progress
addPage: /home/connolly/sites/madmode-blog/pages/2008/breadcrumbs_0228.md ['breadcrumbs']
229 sidekick calendar subscription for SXSW
addPage: /home/connolly/sites/madmode-blog/pages/2008/breadcrumbs_0229.md ['breadcrumbs']
240 The details of data in documents; GRDDL, profiles, and HTML5
addPage: /home/connolly/sites/madmode-blog/pages/2008/breadcrumbs_0240.md ['breadcrumbs']
246 OpenID "Hello World" on apache still deep magic
addPage: /home/connolly/sites/madmode-blog/pages/2009/breadcrumbs_0246.md ['breadcrumbs']
250 DIG losing the battle with spammers again
addPage: /home/connolly/sites/madmode-blog/pages/2009/breadcrumbs_0250.md ['breadcrumbs']
251 migrating from danger/sidekick to android/G1
addPage: /home/connolly/sites/madmode-blog/pages/2009/breadcrumbs_0251.md ['breadcrumbs']
252 Existentials in ACL2 and Milawa make sense; how about level breakers?
addPage: /home/connolly/sites/madmode-blog/pages/2010/breadcrumbs_0252.md ['breadcrumbs']
253 Map and Territory in RDF APIs
addPage: /home/connolly/sites/madmode-blog/pages/2010/breadcrumbs_0253.md ['breadcrumbs']
PyData Tools
import pandas as pd
dict(pandas=pd.__version__)
{'pandas': u'0.17.1'}
items = pd.DataFrame.from_records(breadcrumbs_xmlrpc.values())
items.postid = items.postid.astype(int)
items = items.set_index('postid')
print(items.dtypes)
items[['title', 'dateCreated']].sort_values('dateCreated').head()
content object
dateCreated object
description object
link object
mt_allow_comments int64
mt_convert_breaks object
permaLink object
title object
userid object
dtype: object
title | dateCreated | |
---|---|---|
postid | ||
4 | On OpenID and comment policies | 20051024T23:28:49 |
5 | little burst of PAW demo hacking | 20051026T20:12:18 |
6 | DIG blog wish list | 20051026T20:14:27 |
7 | Fire at Southampton... hope everything's alrig... | 20051031T11:59:08 |
9 | Reflecting blog structure into the Semantic We... | 20051031T13:18:51 |
items.loc[[228], ['title', 'dateCreated']]
title | dateCreated | |
---|---|---|
postid | ||
228 | hAudio for microformats mixtapes, in progress | 20080306T17:00:05 |