Tech Talk

Permanent link to Photo Management Photo Management

Friday, October 5, 2001

So, you have a digital camera or scanner, and lots of photos to publish to the web. I mean lots of photos, like thousands. It would be nice if it were simply a matter of just copying them all to a web site, and voila! there they are. Publishing photos is all a matter of compromise, starting with two key goals:

Goals for photo publishing

  1. Photos should download as fast as possible
  2. Photos should be searchable

Optimizing for fast downloading

The smaller the file is, the faster it'll download. There are two ways to get a file smaller: by shrinking it's dimensions, or increasing its compression. However, as it gets smaller the quality goes down, too. So there has to be a point of balance somewhere.

What I have decided to do is create several sizes of each photo, and allow the viewer to download the size he/she thinks is worth the wait. I usually have four sizes: the original, and three with the longer dimension scaled back to 600, 400, and 200 pixels. If the original size falls somewhere in between any of the smaller sizes, I only generate the next smaller ones down. The 200 size photo is default, allowing fastest download in an image still large enough to see the content. Much smaller than this simply is way too hard to see anything.

I use JPEG compression on all my photos, mostly because it's built in to all browsers, and works well. I compress all my photos by 10% at each size differential (I also auto-equalize and sharpen at each level). The photo sizes average at 24k for 200, 86k for 400, and 177k for 600: very fast. The original sizes average around a MB, which is a pretty big download, but provides a source for excellent-quality printouts.

I store all the photos as individual files on my server. I don't put them in a database, because they don't need to by dynamically generated - all the work is done when they're posted.  I use Corel PhotoPaint, which lets me Batch process as many files at a time that I want with a script (which by the way Photoshop hasn't been able to do until version 6.5 - Corel's been doing this since v8!). The other big advantage to storing them simply as files is that most browsers will save them to the local computer's hard drive, and not download them every time. This makes the page appear much more quickly on subsequent visits.

Searching Photos

If you've ever used a Clipart gallery tool, you know what I'm talking about: type in a few keywords, and up come all the images that match. This is a lot harder to implement that it appears. Photos don't have any words - they're just a collection of colored dots. So to search for objects, people, or places in the images somebody has to associate all relevant information with the photo.

The following discussion mostly applies to technical types: webmasters, programmers. If you're not that type, you'll want to just head straight for, create your free account and start sharing there. (That's where I got my basic ideas for my system, by the way.)

  1. File name. Most basic of all photo management systems, but most widely-compatible. Make the file name contain the information you want, then use the computer's search capability to find the ones with the text you're looking for. You can categorize the photos by using folders. However, this limits you to one searchable item: the file name. The other big drawback is that you can't modifiy the file's date information, so can't count on the creation date to tell you when it was taken. This actually works fine for a few files, but really isn't searchable in any meaningful way when the numbers start going up.
  2. Catalogs. The best way to find a picture is to see it. Good clipart collections come with books with page after page of images you can scan over to find the ones you want. The online version of this is web pages full of photos with captions. The text in the captions can be searched using your web site's search engine, which would return the page with the photo(s). Keeping the number of photos on a given page down to a fairly small amount makes it easier to figure out which picture on the page matched the search. Of course, the browser's Find function can be used, too. This solution works pretty well, but needs a properly configured web site with a search engine. Also, all the information has the same relevance; you can't search specifically for a title match, for example.
  3. OLE Document Properties. This is a flexible and sticky solution, but requires NTFS. Windows 2000, ME, and XP support NTFS 5.0, which allows you to assign Title, Subject, Category, Keywords, Author, and Comments to any file, including your images. The has the huge advantage of sticking the text right into the file, which means it can't get disconnected, and is also fully-supported by Windows Explorer, so you don't have to have a fancy web site to get the information you want. The disadvantages are that you have to have this specific file system, and your web server would have to be also in order to search those properties.
  4. Database. This is what I do. I run a SQL Server database to store all the information I wan to associate with the pictures. The text and the image are linked by path and filename; it's very fast and very flexible. But, it means that when the photo is moved, the link has to be changed, too. If the photo is copied to another machine, the information is lost. That can be a good thing if privacy of information is a concern, but not good if you're concerned about keeping that information together!

    I have three tables. One stores the title, subject, photographer, date, location, equipment, and path to the file - one record per photo, not per file. The second stores file details about each file version that has been created (all those different sizes): size rank, height, width, resolution, type of file, size index, file size, and style. These two tables let me enter the Title, etc once, and have as many version of it without duplicating that info. A very big advantage, let me tell you! The third table stores ratings for each photos: quality, subject, artistic rankings, and comments.

    Using this database of information, I have written a program that will stuff all the image files with the OLE Document data, so I then move them from the context of the web site and still have relevant information. I also use it to generate catalog pages, which I can then burn onto a CD to give to friends. Combined with programming know-how, the database solution is best because it can generate all the others!



LaRocque Family