Project 1: Colorizing the Prokudin-Gorskii photo collection

-Boyu Zhu

Project Introduction

Sergey Mikhaylovich Prokudin-Gorsky was a Russian chemist and photographer. He recorded three exposures of every scene onto a glass plate using a red, a green, and a blue filter.

The goal of this assignment is to take the digitized Prokudin-Gorskii glass plate images and, using image processing techniques, automatically produce a color image with as few visual artifacts as possible. In order to do this, extract the three color channel images, place them on top of each other, and align them so that they form a single RGB color image.

JPG Image Process Method

For low-resolution JPG images, I used a direct shift method. The blue channel image was taken as the base, while the other two channels were shifted based on the displacement of a sliding window, and both were compared against the blue channel. The evaluation metric used was the sum of squared differences to compare the two images. This method works relatively well for small-sized images.

This method does not work well when used directly, primarily due to the presence of edges in scanned images, which affect alignment. Therefore, I used a cropping approach, trimming 10% from each side of the image to leave only the central portion before performing the alignment.

JPG Images Output

cathedral

g (5, 2), r (12, 3)

monastery

g (-3, 2), r (3, 2)

tobolsk

g (3, 2), r (6, 3)

TIFF Image Process Method

However, the low-resolution version method is not suitable for large-sized TIFF images, as exhaustive search over such a large number of pixels becomes too time-consuming.

To align high-resolution images, an image pyramid method was used for layer-by-layer alignment. By recursively downscaling the image using the imresize function to a more manageable size, alignment is first performed at a lower resolution. The alignment method is the same as that used for low-resolution images. Once the optimal displacement is found, the process moves up to the next higher-resolution image. At this level, the corresponding pixel location is identified, and the search resumes within a sliding window range based on the previously found displacement. In this case, I use the downscale factor of 2.

TIFF Images Output

church

g (25, 3), r (58, -5)

emir

g (49, 24), r (88, 43)

harvesters

g (60, 16), r (124, 13)

icon

g (41, 17), r (89, 23)

lady

g (55, 8), r (111, 12)

melons

g (82, 9), r (177, 11)

onion_church

g (51, 26), r (108, 36)

sculpture

g (33, -11), r (140, -27)

self_portrait

g (79, 29), r (175, 34)

train

g (42, 6), r (87, 32)

Bells and Whistles

three_generations

g (55, 13), r (112, 10)

After inspecting each aligned image, although all other imaged are good, I found that there was still some displacement between the three channels of the EMIR photos. This was particularly noticeable in images involving portraits, where facial blurring was especially apparent to the naked eye. Therefore, I began investigating the cause of this issue.