Syncing a 5 Year iPhoto Library with flickr
Yay! jawj's iphoto-flickr sync'd a 30GB iPhoto library flickr. Not only did it upload the images, but it made a map from iPhoto metadata to flickr metadata that lets me continue on with the flickr API, syncing dates and such, during and after the upload.
OS X Yosemite runs iPhoto, reluctantly¶
Having replaced it with a newer model, I gave airbook, our late 2008 MacBook Air MB543LL/A, a complete labotomy and installed OS X Yosemite. When I tried to re-introduce it to our photo archive, Apple told me iPhoto is no longer; Photos is the new thing, complete with iCloud hip-ness.
So I'm faced with another "to Mac or not to Mac?" moment.
This time I figure no, I lead a multi-platform life and I want something more web-native.
I spent a bunch of time trying to downgrade to Mavericks. Just when I had given up trying to do it myself and ordered Mavericks on a USB flash drive via eBay, I learned that if you start iPhoto from a command prompt, it runs on Yosemite after all.
The only complication was a dangling reference to iLifeSlideshow.framework
in /System/Library/PrivateFrameworks
. (Thank you Carbon Copy Cloner for a complete backup!)
Home directory in encrypted sparsebundle¶
This photo library is on an external USB drive, in an encrypted sparsebundle. The sparsebundle support discussion said all I have to do is double-click it, but it's hidden (filename starts with a dot). Command-line to the rescue, again: open /Users/.maryc
.
Flickr is the web photo service for the closet librarian¶
As an Android mobile user, Google's photo offerings were tempting. Then I discovered browsing photos of person X only works within one album. And to put a photo in multiple albums, you have to copy it, i.e. maintain the tags and such twice.
My friends-and-family photo sharing community mostly uses facebook these days, but for curating an archive, flickr is a much better match. Imagine my horror when I downloaded some of my photos from facebook and discovered they were only available at reduced resolution. Perhaps they've addressed that since, but I still haven't seen any support for date-taken as separate from date-uploaded on facebook. There's little, if any, support for quietly curating without notifications firing every which way.
My photostream on flickr goes back to Dec 2004 when it was big in the open web community. I could never bring myself to go premium, but in May 2013 when they announced the terabyte storage offer, I dusted it off. Re-establishing my long lost yahoo credentials was no small feat, but I managed.
Flickr Backup from Mac App Store was a Bust¶
A quick search of the Mac App store turned up promising results:
- Backup to Flickr for iPhoto
By Sonia Bohelay
It was just a few bucks, so I went ahead. But oops: Your iPhoto library is either too old (iPhoto version < 9.0) or no photo found. Indeed, my library is from 8.1.2. I might have been able to upgrade the library, but with Apple pushing Photos over iPhoto, I didn't want to bet on it.
iPhoto -> Flickr in 350 lines of code¶
I was thinking about rolling my own with the flickr API when I discovered a kindred spirit had already been down this path and come up with iPhoto -> Flickr.
It worked in one go, so the incremental upload support wasn't necessary for the initial bulk upload, but to further sync the metadata, the resulting uploaded-photo-ids-map.txt
is critical. In fact, I had to wrestle with iPhoto a bit to get ids that are useful without iPhoto running.
Just Add API Key and Authorize with OAuth¶
It works pretty much like it says on the tin. (The colorize dependency issue was easy enough to figure out.)
airbook:src connolly$ git clone https://github.com/jawj/flickrbackup.git # e339169212
airbook:flickrbackup connolly$ sudo gem install flickraw-cached colorize
Successfully installed flickraw-0.9.8
Successfully installed flickraw-cached-20120701
Successfully installed colorize-0.7.7
airbook:flickrbackup connolly$ ruby flickrbackup.rb
Flickr API key: 0481...
Flickr API shared secret: 897...
Authorise access to your Flickr account: press [Return] when ready
Authorisation code: 162-...
2015-07-04 13:47:28 -0500 Authenticated as: DanC
2015-07-04 13:47:44 -0500 8057 photos and 78 standard albums in iPhoto library
2015-07-04 13:47:44 -0500 8057 photos not yet uploaded to Flickr
Platform Independent Data¶
The kernel for this notebook is on my linux desktop, but iPhoto is running on airbook.
Since spaces in filenames are a royal pain over ssh, I made a convenient symlink.
!ssh airbook.local ls -l Pictures/flickrbackup
Upload Map DataFrame¶
It carefully logs the correspondence to support incremental update:
2015-07-04 13:47:44 -0500 (1/8057) Uploading '...2002/Sep 25, 2002/....jpg' ... 4294967334 -> 19226418710
Let's make sure we have redundant copies of the map. And let's use ordinary CSV rather than the funky ->
format.
from IPython.display import display, Image
import pandas as pd
import numpy as np
dict(pandas=pd.__version__, numpy=np.__version__)
upload_map = !ssh airbook.local cat Pictures/flickrbackup/uploaded-photo-ids-map.txt
upload_map = pd.DataFrame(dict(apple=int(a), flickr=int(f))
for (a, f) in [line.split(' -> ') for line in upload_map])
upload_map.head()
upload_map.to_csv('uploaded-photo-ids-map.csv')
Flickr metadata access¶
Let's take a at the results on flickr. I experimented with python flickr apis; the main one seems to be Python Flickr. flickdata.py (in palmagent) is a least-authority packaging of that API.
TODO: use a separate Photo object for setDates.
import flickdata
reload(flickdata)
flickdata.__version__
To make a flickdata.Account
, we use the privileged iPython notebook environment to get network access and the API key (and OAuth credentials... where do they get squirrelled away?) and pass it to flickdata
:
import json
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
logging.info('We try to log I/O.')
def myFlickrAcct(user_id='14874637@N00'):
import pathlib
import flickrapi
api_secret = pathlib.Path('flickr_api_secret').open().read().strip()
return flickdata.Read.make(flickrapi, api_secret, user_id)
myAcct = myFlickrAcct()
Photostream confused about recent photos¶
A bunch of old photos and videos are showing up as recent in my photostream as the upload progresses.
Flickr seems to set datetaken = date uploaded when there's no EXIF date, so let's look at these supposedly recent photos.
records = [
r
for page in
myAcct.getPhotos(
min_taken_date='2015-07',
max_taken_date='2015-08',
sort='date-taken-asc')
for r in page]
photo = pd.DataFrame(records)
photo['id'] = photo.id.astype(int) # odd... even in JSON format, ids come back as strings
photo = photo.set_index('id')
len(photo)
Photo URLs: Thumbnail¶
Flickr's URLs API uses "secrets" so they serve as nice tasty capability URLs for the photos. So we can see this thumbnail from this iPython notebook even though we're not logged in to flickr here.
Image(url=photo.iloc[0].url_t)
photo.iloc[0]
Ah. Good. When the display defaults date taken to upload date, the underlying data tells us so:
import datetime
photo['upload_date'] = [np.datetime64(datetime.datetime.fromtimestamp(int(ts))) for ts in photo.dateupload]
photo[photo.datetakenunknown == '1'][['datetaken', 'upload_date', 'title', 'width_o', 'height_o']].head()
These were uploaded very soon after being taken; I suppose I turned on auto-upload on my phone:
photo[photo.datetakenunknown == '0'][['datetaken', 'upload_date', 'title', 'width_o', 'height_o']].head()
I verified the most recent upload dates against iphoto-flickr logs to be sure there were no timezone issues:
photo[['datetaken', 'upload_date', 'title', 'width_o', 'height_o']].sort('upload_date', ascending=False).head()
iPhoto, give me my data back!¶
flickrbackup found the library I'm interested in even though it's not in the default path. Ah... it's using Applescript.
iPhoto uses fairly nice .xml and .db files with a nice, sturdy uuid for each photo. But the id flickerbackup.rb
got via applescript is nowhere to be found in there!
!ssh airbook.local grep {photo.index[0]} Pictures/flickrbackup/uploaded-photo-ids-map.txt
!ssh airbook.local grep 4294967334 Pictures/iphoto-maryc/AlbumData.xml
!ssh airbook.local sqlite3 Pictures/iphoto-maryc/iPhotoMain.db .dump | grep 4294967334
The iPhoto script dictionary doesn't show a uid property. Darn. We'll have to use file paths or something.
iPhoto takes orders in JavaScript¶
export_keys = '''
#!/usr/bin/env osascript -l JavaScript
function save(info, where) {
console.log('saving to...', where)
var str = $.NSString.alloc.initWithUTF8String(JSON.stringify(info));
str.writeToFileAtomicallyEncodingError(where, true, $.NSUTF8StringEncoding, null);
}
function getKeys(iPhoto) {
var photos = iPhoto.photoLibraryAlbum().photos;
return {
id: photos.id(),
date: photos.date(),
width: photos.width(),
height: photos.height(),
originalPath: photos.originalPath(),
imagePath: photos.imagePath()
};
}
function run(argv) {
out = argv[0];
iPhoto = Application('iPhoto');
save(getKeys(iPhoto), out)
}
'''.strip()
Let's save it, scp
it over, run it, and scp
the results back:
def save_script(name, text):
from pathlib import Path
with Path('photo_keys.js').open('wb') as out:
out.write(export_keys)
save_script('photo_keys.js', export_keys)
!scp photo_keys.js airbook.local:Pictures/
!ssh airbook.local osascript -l JavaScript Pictures/photo_keys.js Pictures/keys.json
!scp airbook.local:Pictures/keys.json .
Now we can ground these ids in key information such as file paths, dates, and image sizes that we can join with other sources:
pk = pd.DataFrame(json.load(open('keys.json'))).set_index('id')
pk.head()
Applescript reports the full image paths, but we'll need library-relative paths for our work below.
libloc = '/Volumes/maryc/Pictures/iPhoto Library/' # TODO: get from applescript?
pk['relativePath'] = [p[len(libloc):] for p in pk.imagePath]
pk[(pk.date >= '2002-09') & (pk.date < '2002-10')][['date', 'height', 'width', 'relativePath']]
iPhoto data without iPhoto¶
iPhoto keeps nice sqlite3 databases.
def my_photo_db(path='maryc-airbook-iphoto-meta/iPhotoMain.db'):
import sqlite3
return sqlite3.connect(path)
db1 = my_photo_db()
q = '''
select count(distinct uid) from SqPhotoInfo
'''
pd.read_sql(q, db1)
q = '''
select count(*) qty, year from (
select substr(datetime(photoDate + julianday('2000-01-01 00:00:00')), 1, 4) year
from SqPhotoInfo
) t
group by year
having count(*) > 10
'''
pd.read_sql(q, db1)
Cameras¶
q = '''
select qty,
datetime(min_date + julianday('2000-01-01 00:00:00')) min_date,
datetime(max_date + julianday('2000-01-01 00:00:00')) max_date,
cameraModel from (
select count(*) qty, min(photoDate) min_date, max(photoDate) max_date, cameraModel
from
sqphotoinfo
where photoDate > julianday('1993-01-01') - julianday('2000-01-01 00:00:00')
group by cameraModel
)
where qty >= 10
order by 1 desc
'''
pd.read_sql(q, db1)
Photos and Images¶
The model is nice and clean, separating photos, relating any number of possibly-edited images to each photo-taking event, and issuing a uuid to the photo-taking event.
q = '''
select photo.primaryKey, photo.uid, datetime(photo.photoDate + julianday('2000-01-01 00:00:00')) as photoDate,
photo.cameraModel, photo.archiveFilename,
fi.imageWidth, fi.imageHeight, fi.fileSize, fi.imageType, fi.version,
fl.relativePath, fl.aliasPath
-- TODO: decode fl.format
from SqPhotoInfo photo
join SqFileImage fi on fi.photoKey = photo.primaryKey
join SqFileInfo fl on fi.sqFileInfo = fl.primaryKey
where fileSize > 0
order by photo.photoDate desc
'''
pdb = pd.read_sql(q, db1)
pdb.head()
Joining sqlite data with flickr via applescript key info¶
Ah... excellent... even though there are more image files than photos, we get an exact 1-1 match when we join with our photo keys (implicitly on relativePath
).
len(pdb), len(pk), len(pdb.merge(pk))
pkdb = pk.reset_index().merge(pdb).set_index('id')
pkdb.head()
Merging with the upload_map
gives us a clear correspondence between iPhoto applescript ids and flickr ids.
upkdb = upload_map.merge(pkdb, left_on='apple', right_index=True)
len(upkdb)
upkdb.head()
Fixing Dates¶
Let's grab flickr photos with unkonwn date taken (with upload date, title, and original size).
Then merge with the date information from the sqlite3 db.
tofix = photo[photo.datetakenunknown == '1'][['datetaken', 'upload_date', 'title', 'width_o', 'height_o']]
fixed = tofix.merge(upkdb, left_index=True, right_on='flickr')[
['date', 'photoDate', 'upload_date', 'title', 'archiveFilename',
'width_o', 'imageWidth', 'height_o', 'imageHeight',
'flickr', 'uid']].set_index('flickr')
print len(tofix), len(fixed)
fixed.head()
For this, we need write access.
def myFlickrEdit(user_id='14874637@N00'):
import pathlib
import flickrapi
api_secret = pathlib.Path('flickr_api_secret').open().read().strip()
return flickdata.Write.make(flickrapi, api_secret, user_id)
edit = myFlickrEdit()
Let's work with one photo at first, verifying with the flickr web UI as we go.
Image(url=photo.loc[19426161436].url_t)
fixed.loc[19426161436].photoDate
edit.setDates(19426161436, date_taken=fixed.loc[19426161436].photoDate)
Now we can iterate over all the fixes.
Incremental updates came in handy here. At first, I forgot to rate-limit my requests and flickr noticed after a few hundred. I went back and fetched metadata for recent photos in my photostream again and finished off the rest.
import time
def do_fixes():
for pid, photo in fixed.iterrows():
edit.setDates(pid, date_taken=photo.photoDate)
time.sleep(0.5)
do_fixes()
Future Work¶
- make an album of all the photos uploaded in this process?
- tag flickr photos with uids
- don't lose "untagged" state, though! capture untagged-ness in an album or something.
- sync events... using photosets?
- sync faces
Additional notes bookmarked under: mac photos, mac sysadmin