(x)html, linux, perl, photography

Picasa, JAlbum, and null bytes

I have recently been trying to transition from Mac to Linux, with much success for the most part, but a few hiccups as well, as is to be expected. One of the important uses of the computer for me is photo editing and sharing, especially since we got our Canon Rebel XT last year, which takes absolutely beautiful pictures. I had developed a fairly nice workflow on my Mac for photo editing and sharing, consisting of:

  1. Import pictures from camera to iPhoto
  2. Delete bad pictures
  3. Create new albums
  4. Edit some photos for lighting, cropping etc.
  5. Add titles and comments to photos
  6. Use caption buddy to write iPhoto comments to IPTC tags
  7. Use an applescript to rename all the images in an album with a meaningful name followed by an automatically increasing number
  8. Export the images from iPhoto
  9. Use rsync to upload the pictures to my webserver
  10. The webserver has a cron script which looks for new pictures, and then runs JAlbum to create new web albums for my pictures

It sounds like a lot of steps, but actually it was going quite quickly for me, and without many hitches. The biggest issue was with the IPTC tags. Using caption buddy is kind of a hack to get around that. It would be nice if iPhoto just wrote tags to the files by default. Oh well.

To get the same results with Linux, there are a number of changes. Obviously there is no iPhoto for Linux. The next best thing (actually better in some regards) is Picasa. Picasa is a Windows program put out by Google (they acquired it several years ago). As of about a year ago, there is now a Linux version as well. The Linux version requires WINE, which allows one to run Windows programs on other platforms (mostly Linux, but there is also a Mac version). Especially considering that Picasa is not a native Linux application, it runs remarkably fast and is very stable. It has some nice editing features, including a “I’m Feeling Lucky” button, which seems to do much better than iPhotos “enhance” button. It also includes a unique scrollbar, which changes scrolling speed, depending on how much you move it. Also, unlike iPhoto, it writes captions as IPTC captions, which are embedded in the file, which is handy, especially since that is how JAlbum knows about them. JAlbum will extract the IPTC tags, along with EXIF tags which include information about the settings of the camera for each picture. It will then make some nicely compressed and small images suitable for web-viewing. JAlbum is also highly customizable, which I like very much.

All that being said, both Picasa and JAlbum seem to have a few bugs.

Bug #1 — Null bytes

When uploading some pictures I had processed with Picasa recently, I noticed a strange character at the end of each caption. Viewing the album with Firefox on a Mac, this showed up as a question mark, making it look as if I were unsure about all of my captions. After some reading on the JAlbum forum, I saw a post claiming that this was a null byte character. That was helpful. I tried looking at the image files in my favorite text editor, vim, and saw a bunch of gobblety gook, along with some captions that I could recognize. There did seem to be some extra characters after the caption, but I couldn’t figure out which one was the null byte character. After some more searching about vim and control characters, I found out that the null byte character shows up as ^@ in vim, and that if I want to type one, I have to type Ctrl-V Ctrl-J. That meant I could remove the null byte character in vim! Well, I tried this with the image files, and that corrupted them. Bummer. Then I started looking at the html that JAlbum was generating, and indeed, the null byte character was still there. I tried doing a search and replace with vim, and that worked wonderfully. Unfortunately though, using vim to hand-edit a bunch of files was not acceptable to me, so I looked for a perl solution. After yet more searching, I discovered I could so a search and replace with perl, and that perl represents the null byte character as \0. Finally I had a solution. I simply added the following line to the shell script that runs JAlbum (after it is done processing with JAlbum)

for file in *.php; do cat $file|perl -pe 's/\0//g'>$file.tmp; mv $file.tmp $file; done

Bug #2 incorrectly ordered metadata

This seemed to be working fine, until I noticed that JAlbum was not able to process several of my pictures. Instead of getting a picture with a caption, I only got a caption, and JAlbum would return an error that it failed to process several pictures.

JFIF APP0 must be first marker after SOI

This had actually been happening for awhile with 2 of the 1000 or so images I have. Since it was only 2, I did not worry about it. But now this was happening with many of the new images I had just uploaded. It quickly became apparent that images that I had tweaked (color, cropping, etc.) with Picasa were the ones that were not getting processed correctly. After searching a bunch more, I have come to the conclusion that it has something to do with the ordering of metadata in images, and how java processes that metadata. It seems that java expects the metadata to be in a very particular order — a different order than what Picasa outputs. That being said, it seems that many other programs can read in the files that Picasa produces. So it is not really clear to me what program is at fault. I just want a solution.

Solution #1 — Reprocess with JAlbum

After quite a bit of searching around the interweb, I found a post on the JAlbum forum that mentioned this bug. The solution: “Turn off the EXIF info”. Sure enough, this did the trick. However, that means that I was losing valuable information, and that was not acceptable for me. However, I noticed that if I processed the images with the EXIF info off, then turned it on, both my images and my captions would show up, in spite of the error message the second time around. So I decided to process all of the photos twice. This was not ideal, but it seemed like a solution.

Solution #2 — Reprocess with ImageMagick

I then started thinking some more, and I recalled a LaTeX problem I had back when I was just learning. I was learning how to import images, and I was having a problem getting the right bounding box on the .eps file I was trying to import. A more experienced TeXnician told me to use eps2eps on the image, and that that often corrected bounding boxes, when the program that produced the image in the first place had screwed it up. I found it very odd that there was a program to convert eps to eps, but that is what that program does. Sure enough, it worked. So I started thinking if I could use a similar technique here. I had also read on the JAlbum forum that someone tried simply opening the image with Photoshop and resaving it. That sounded like a good idea, but I needed a solution which could be automated. So I tried using convert from the ImageMagick suite, and that did the trick. Convert produces a new file though, which I did not want. So instead, I tried mogrify, which changes the original file. That worked! To re-iterate the conundrum again, JAlbum does not like the ordering of metadata in files generated with Picasa, but ImageMagick inputs them just fine, and outputs a format that JAlbum likes. Strange.

UPDATE: I just discovered that ImageMagick version 6.3.2-6.3.3 has a bug with IPTC captions, which simply deletes them altogether. Make sure your ImageMagick version is either newer or older than this range.

Okay then. Now to present my new photo editing and sharing workflow:

  1. Import pictures from camera to Desktop/originals (Picasa automatically adds them to its database
  2. Delete bad pictures
  3. Create new albums
  4. Edit some photos for lighting, cropping etc.
  5. Add captions to photos (Picasa only has one IPTC tag available for editing)
  6. Export the images from Picasa to Desktop/modified
  7. for file in *.jpg; do mogrify $file; done
  8. Run shell script to rename pictures to meaningful names with auto-incrementing numbers (see below script)
  9. Use rsync to upload the pictures to my webserver
  10. The webserver has a cron script which looks for new pictures, and then runs JAlbum to create new web albums for my pictures

# this script renames pictures based on user input, and automatically numbers
# them, including 0 padding
for file in `ls ${dir}/*.${ext}`; do
if [[ $iter -lt 10 ]]; then
elif [[ $iter -lt 100 ]]; then
mv -f $file $newpic
let "iter = $iter +1"