Picasa, JAlbum, and null bytes

I have recently been trying to transition from Mac to Linux, with much success for the most part, but a few hiccups as well, as is to be expected. One of the important uses of the computer for me is photo editing and sharing, especially since we got our Canon Rebel XT last year, which takes absolutely beautiful pictures. I had developed a fairly nice workflow on my Mac for photo editing and sharing, consisting of:

  1. Import pictures from camera to iPhoto
  2. Delete bad pictures
  3. Create new albums
  4. Edit some photos for lighting, cropping etc.
  5. Add titles and comments to photos
  6. Use caption buddy to write iPhoto comments to IPTC tags
  7. Use an applescript to rename all the images in an album with a meaningful name followed by an automatically increasing number
  8. Export the images from iPhoto
  9. Use rsync to upload the pictures to my webserver
  10. The webserver has a cron script which looks for new pictures, and then runs JAlbum to create new web albums for my pictures

It sounds like a lot of steps, but actually it was going quite quickly for me, and without many hitches. The biggest issue was with the IPTC tags. Using caption buddy is kind of a hack to get around that. It would be nice if iPhoto just wrote tags to the files by default. Oh well.

To get the same results with Linux, there are a number of changes. Obviously there is no iPhoto for Linux. The next best thing (actually better in some regards) is Picasa. Picasa is a Windows program put out by Google (they acquired it several years ago). As of about a year ago, there is now a Linux version as well. The Linux version requires WINE, which allows one to run Windows programs on other platforms (mostly Linux, but there is also a Mac version). Especially considering that Picasa is not a native Linux application, it runs remarkably fast and is very stable. It has some nice editing features, including a “I’m Feeling Lucky” button, which seems to do much better than iPhotos “enhance” button. It also includes a unique scrollbar, which changes scrolling speed, depending on how much you move it. Also, unlike iPhoto, it writes captions as IPTC captions, which are embedded in the file, which is handy, especially since that is how JAlbum knows about them. JAlbum will extract the IPTC tags, along with EXIF tags which include information about the settings of the camera for each picture. It will then make some nicely compressed and small images suitable for web-viewing. JAlbum is also highly customizable, which I like very much.

All that being said, both Picasa and JAlbum seem to have a few bugs.

Bug #1 — Null bytes

When uploading some pictures I had processed with Picasa recently, I noticed a strange character at the end of each caption. Viewing the album with Firefox on a Mac, this showed up as a question mark, making it look as if I were unsure about all of my captions. After some reading on the JAlbum forum, I saw a post claiming that this was a null byte character. That was helpful. I tried looking at the image files in my favorite text editor, vim, and saw a bunch of gobblety gook, along with some captions that I could recognize. There did seem to be some extra characters after the caption, but I couldn’t figure out which one was the null byte character. After some more searching about vim and control characters, I found out that the null byte character shows up as ^@ in vim, and that if I want to type one, I have to type Ctrl-V Ctrl-J. That meant I could remove the null byte character in vim! Well, I tried this with the image files, and that corrupted them. Bummer. Then I started looking at the html that JAlbum was generating, and indeed, the null byte character was still there. I tried doing a search and replace with vim, and that worked wonderfully. Unfortunately though, using vim to hand-edit a bunch of files was not acceptable to me, so I looked for a perl solution. After yet more searching, I discovered I could so a search and replace with perl, and that perl represents the null byte character as \0. Finally I had a solution. I simply added the following line to the shell script that runs JAlbum (after it is done processing with JAlbum)

for file in *.php; do cat $file|perl -pe 's/\0//g'>$file.tmp; mv $file.tmp $file; done

Bug #2 incorrectly ordered metadata

This seemed to be working fine, until I noticed that JAlbum was not able to process several of my pictures. Instead of getting a picture with a caption, I only got a caption, and JAlbum would return an error that it failed to process several pictures.

JFIF APP0 must be first marker after SOI

This had actually been happening for awhile with 2 of the 1000 or so images I have. Since it was only 2, I did not worry about it. But now this was happening with many of the new images I had just uploaded. It quickly became apparent that images that I had tweaked (color, cropping, etc.) with Picasa were the ones that were not getting processed correctly. After searching a bunch more, I have come to the conclusion that it has something to do with the ordering of metadata in images, and how java processes that metadata. It seems that java expects the metadata to be in a very particular order — a different order than what Picasa outputs. That being said, it seems that many other programs can read in the files that Picasa produces. So it is not really clear to me what program is at fault. I just want a solution.

Solution #1 — Reprocess with JAlbum

After quite a bit of searching around the interweb, I found a post on the JAlbum forum that mentioned this bug. The solution: “Turn off the EXIF info”. Sure enough, this did the trick. However, that means that I was losing valuable information, and that was not acceptable for me. However, I noticed that if I processed the images with the EXIF info off, then turned it on, both my images and my captions would show up, in spite of the error message the second time around. So I decided to process all of the photos twice. This was not ideal, but it seemed like a solution.

Solution #2 — Reprocess with ImageMagick

I then started thinking some more, and I recalled a LaTeX problem I had back when I was just learning. I was learning how to import images, and I was having a problem getting the right bounding box on the .eps file I was trying to import. A more experienced TeXnician told me to use eps2eps on the image, and that that often corrected bounding boxes, when the program that produced the image in the first place had screwed it up. I found it very odd that there was a program to convert eps to eps, but that is what that program does. Sure enough, it worked. So I started thinking if I could use a similar technique here. I had also read on the JAlbum forum that someone tried simply opening the image with Photoshop and resaving it. That sounded like a good idea, but I needed a solution which could be automated. So I tried using convert from the ImageMagick suite, and that did the trick. Convert produces a new file though, which I did not want. So instead, I tried mogrify, which changes the original file. That worked! To re-iterate the conundrum again, JAlbum does not like the ordering of metadata in files generated with Picasa, but ImageMagick inputs them just fine, and outputs a format that JAlbum likes. Strange.

UPDATE: I just discovered that ImageMagick version 6.3.2-6.3.3 has a bug with IPTC captions, which simply deletes them altogether. Make sure your ImageMagick version is either newer or older than this range.

Okay then. Now to present my new photo editing and sharing workflow:

  1. Import pictures from camera to Desktop/originals (Picasa automatically adds them to its database
  2. Delete bad pictures
  3. Create new albums
  4. Edit some photos for lighting, cropping etc.
  5. Add captions to photos (Picasa only has one IPTC tag available for editing)
  6. Export the images from Picasa to Desktop/modified
  7. for file in *.jpg; do mogrify $file; done
  8. Run shell script to rename pictures to meaningful names with auto-incrementing numbers (see below script)
  9. Use rsync to upload the pictures to my webserver
  10. The webserver has a cron script which looks for new pictures, and then runs JAlbum to create new web albums for my pictures
#!/bin/bash
#renamePics
# this script renames pictures based on user input, and automatically numbers
# them, including 0 padding
dir=$1
base=$2
ext='jpg'
iter=1
for file in `ls ${dir}/*.${ext}`; do
  if [[ $iter -lt 10 ]]; then
    newpic="${dir}/${base}00${iter}.${ext}"
  elif [[ $iter -lt 100 ]]; then
    newpic="${dir}/${base}0${iter}.${ext}"
  else
    newpic="${dir}/${base}${iter}.${ext}"
  fi
  mv -f $file $newpic
  let "iter = $iter +1"
done
This entry was posted in (x)html, linux, perl, photography. Bookmark the permalink.

7 Responses to Picasa, JAlbum, and null bytes

  1. Pramila says:

    Just to add to your fantastic post on JAlbum and Picasa,
    Here’s a piece of info, I ‘ll share with you.

    You can make a JAlbum web-accessible directly through your PC through a new web server software (Purplenova).
    You no longer have to host the album with an ISP for your friends to view them.
    Here’s the how you can do this,
    Simply drag and drop your JAlbum directory (Gigabytes?… no problem!) into Purplenova and pass a single URL to your friends. Your JAlbum remains with you, and pictures are streamed securely on demand when your friends access them through any plain Browser.
    I have also kept a detailed instruction (a user guide) for the same, at the Purplenova Discussion Forum, here’s the link for it.
    http://purplenova.justdiscussion.com/Hosting-of-JAlbum-Album-f3/User-Guide-for-Hosting-the-JAlbum-through-Purplenova-30-t3.htm
    Please have a look at the Product and let me know your thoughts over it. I welcome your feedback,
    Thanks,
    Pramila

  2. argh says:

    Your solution for the erroneous “null byte” is horrible. I’m amazed that this didn’t totally corrupt the file. First, there’s no reason to expect that legal null bytes might show up at other spots in the file. Nuking every null byte from the file is almost certainly going to mess things up.

    Furthermore, most binary files have offset fields for quickly reading chunks. Deleting bytes out of the file will screw up these offsets. It probably will resync (because of the tag-based format) but there’s a good chance it will see incorrect tags. For example:

    A B C 5 x x x x x D E F 6 x x x x x x

    Lets say that the last “x” in the ABC block is a null byte, and you nuke it. Now when the reader goes in, it will still see an “ABC” block of length 5, but because you nuked the null, its going to read the “D” as the 5th byte. So now instead of a null, you have a D. Continuing the badness, it now goes to try to read a tag. It gets “EF6″ instead of “DEF”. Maybe that was the camera info tag. Who knows. Now its corrupt.

    Anyway. Don’t do it that way. The only way to fix a bad null in the tag is to use a format-aware tool that can properly edit the file format.

  3. robfelty says:

    Argh,

    Not sure why you didn’t leave a real e-mail address, but if you had read the post carefully, you would have noticed that I did not end up removing null bytes from the jpegs, because it did corrupt the file. Instead, I ended up reprocessing the images with image magic. I did try removing null byte characters from TEXT files (php,html), which did work fine, but was ultimately unnecessary after reprocessing with image magick.

  4. Pingback: jalbum 6 5 4

  5. VinCanFixIt says:

    You are easily one of the best tech authors I have come across. And after a dozen or so languages spanning 3 decades I have come across a few. Most notably the entries you have posted here are engineered in direct opposition to any and all Microsoft documentation. High praise comes to you in a cathartic response to being told something can be done and not being told how. (Just ask Vista how to pause search indexing for example. One of many) So my hat’s off to you and your skills. I have come to expect exceptional documentation from the Linux world. Do us all a favor and don’t change a hair if you care. Well crafted indeed and as far as pressing F1 for help? Vista is under the impression that the answer is correct…I’m just lacking the proper question. (By virtue of a complete lack of virtue.)
    Weal’p, I’ve got lots of cool new tricks to try. The little web server sounds like it’ll add years to my life. The subscription thing is a little scary and out of the box. Anything new like it since your 2007 posting? We appreciate the being in the loop. Thanks.

    I wish you the The best of everything in the world.

    Vine’Beau
    (Vin)

  6. Thanks for the imformative info.Very neat blog layout. Easy on the eyes. Thanks and happy holidays.

  7. Mee Ganz says:

    I have been examinating out some of your posts and i must say nice stuff. I will surely bookmark your site.