Scale Invariant Feature Transform (SIFT): Performance and Application Veteran Andersen. Lars Pellagra and Rene’ e Anderson e June 4, 2006 Introduction In 2004, David G. Lowe published his paper “Distinctive Image Features from Egalitarians Checkpoints” (Lowe, 2004, [2]), outlining a method he developed for finding distinctive, scale and rotation invariant features In Images that can be used to perform matching between different views of an object or scene.

HIS method, Scale- Invariant Feature Transform (SIFT) combines scale-space theory and feature detection, geared toward a broad variety of applications in the field of computer Sino, such as object recognition and stereo correspondence. As a part of the course Advanced Image Analysis at the Technical university of Denmark (DITZ), we conducted a mini-project where we 1) studied Low’s work, 2) tested SIFT, 3) implemented a portion of SIFT ourselves, and 4) applied SIFT (combined with RANSACK algorithm) to automatic Image stitching and automatic calculation of the fundamental matrix. SIFT algorithm A hallmark function of SIFT Is Its ability to extract features that are Invariant to scale and rotation; additionally, these features are robust with respect to noise, occlusion, mom forms of affine distortion, shift in D perspective, and illumination changes (Lowe, 2004, [2]). The approach generates large number of features, densely covering the image over all scales and locations. The components of the SIFT framework for checkpoint detection are as follows: 1 . Scale-space extreme detection. Sing a cascade filtering approach a set of octaves are generated, each octave containing the difference-of-Gaussian Images covering the range of scales. Local maxima and minima are then detected over all scales and Image locations. This forms a set of candidate checkpoints. 2. Checkpoint localization. Each candidate checkpoint is fit to a detailed model to determine location and scale. The points with low contrast and poorly localized edge points are rejected. 3. Orientation assignment. Based on local Image gradient, each keypunch Is assigned a direction.

In case of more strong directions, additional checkpoints are created. 1 4. Checkpoint descriptor. This is accomplished by sampling image gradient magnitudes and orientations around each checkpoint and putting those in an array of orientation histograms covering the region around the checkpoint. Gradients are at the scale of the pinpoint (providing scale invariance), and all orientations are relative to checkpoint direction (providing rotation invariance). The entries of all histograms are then put in a descriptor vector which is also normalized to reduce the effects of illumination changes.

The results of carrying out these steps are depicted in the top row of Figure 1 . For image matching, descriptor vectors of all checkpoints are stored in a database, and matches between checkpoints are found based on Euclidean distance. The suggested method of matching to large database is the nearest neighbor algorithm combined tit comparing the distance to the second-nearest neighbor (Lowe, 2004). (See Fig. 1, bottom row. ) Figure 1: First row: SIFT checkpoints for two different images of the same scene. Checkpoints are displayed as vectors indicating location, scale, and orientation.

Bottom row: Checkpoint matches for the two images. Out of 459 checkpoints in the left image and 566 checkpoints in the right image, 177 matches were made. Knowing the setting of the scene, we were able to use RANSACK and find the outliers (wrong matches). Only 5 outliers were found, representing 2. 8% of all matches. In the same paper Lowe describes SIFT application for recognition of small or highly occluded objects. Many false matches may arise from the background; therefore, it is recommend to identify objects by clustering in pose space using the Hough transform. Figure 2: Scale testing. From top left, scales at 1:1, 1:1. 6, 1:2. 2, 1:2. 8, 1:3. 4, 1:4. Bottom row, bar plot depicting the comparison of results. 4 PERFORMANCE: TESTING SIFT Performance: Testing SIFT We tested Gift’s performance in a series of controlled tests. A testing image was transformation, we were able to sort out possible false matches. A match was labeled else if the matched checkpoints were at the distance of more than 2 pixels. We tested scale and rotation invariance, robustness to projective transformations and given the presence of noise.

The results of each test are presented in bar plots, with the red tips of bars indicating the number of false matches. The size of the original (left) image is always 320 x 240 pixels. 3. 1 Scaling We tested SIFT at a variety of scales, from 1:1 to 1:4. Shown in Figure 2 are the results of checkpoint matching at a variety of scales, depicting the number of correct and false checkpoint matches. In the lowest row of the figure, we present the comparison of the umber of checkpoints matched for each scale we tested.

The highest number of matches, not surprisingly, was for 1:1 scale, but in general the number of matches does not change dramatically at other scales. The number of false matches never exceeds 2%. We can conclude that SIFT is indeed scale-invariant. 3. 2 Rotation We also tested Gift’s invariance to rotation. Figure 3 presents the results of checkpoint matching when the target image was rotated 15, 30, 45, 60, 75, and 90 degrees, and gives the true and false checkpoint matches. The bar graph illustrates the comparison of the number of checkpoints matched for each rotation we tested.

There was never more than 1% false matches. In the worst case (60 degrees), 4 out of 508 matches were false. SIFT otherwise clearly favors rotations of 90 degrees, those being a rotations in which interpolation plays no role. We can therefore conclude that SIFT is also rotation-invariant. 3. 3 Procreativity Gift’s robustness was tested under various projective transformations; the testing image was shrunk and one of it’s sides scaled by factors of from 1 to 0. 3, as illustrated in Figure 4.

The number of matching checkpoints falls steadily with decreasing p, partly because the area of the warped image falls with up . Results included few outliers: in the worst case 11%, on average 3%. We concluded that, even though the number of matches drops steadily, we can still rely on SIFT to match checkpoints correctly; this confirms Gift’s robustness to projective transformations. We conducted another experiment (Fig. Procreativity scaled), in which we wished to discrepancy was what was affecting the results. In this second experiment, we sized the target figure so that its area was the same as the original.

We noted that although the number of matched checkpoints still dropped dramatically, the drop was not as drastic as 3. 3 Procreativity 5 Figure 3: Rotation testing, angles = 15, 30, … , deg. Percentage of outliers never over 1 . Worst case: 4 false matches per 508 true matches. Original image: 320 x 240 pixels. Distance threshold: 2 pixels. 6 Figure 4: Procreativity testing, p = 1, 0. 9, /loots. 8. There was never a high percentage of outliers?in the worst case 11%, on average 3%. The number of matching features falls, but the area of the warped image falls with up .

Original image: 320 x 240 pixels, the area of the smallest warped image approve. 1/5 of the original size. 7 Figure 5: Procreativity testing, continued. Similar experiment to the previous, but the argue image is scaled so that its area is the same as that of the original. We provide only two examples here, with p = 0. 7 and p = 0. 5, but results for all tests are provided in the bar graph. 8 APPLICATION: USING SIFT in the first experiment. We can therefore conclude that the Procreativity will influence the number of checkpoints matched, but not the correctness of the matches. . 4 We also tested Gift’s robustness given the presence of noise. An image is matched against itself, with 5% Gaussian noise added with each iteration, for 30 total iterations. In Figure 6, we show the checkpoint matches for selected iterations. After the first additions of noise, the number of checkpoints matched drops significantly, but matches are still mostly correct. After 15 iterations, the drop in the number of matches slows, but false matches represent up to one quarter of all matches. We can thereby conclude that SIFT is reasonably robust given the presence of noise.

Application: Using SIFT David Lowe, the developer of SIFT, describes its application for object recognition, and having to deal with a great many outliers. He proposes using the Hough transform for clustering. We applied SIFT for determining stereo correspondence. We used SIFT to identify initial corresponding points between two views of the same scene. Knowing the geometric setting of the problem, and because we did not expect the presence of many outliers, we were able to use RANSACK (Shoves, 2000, to determine the set of milliners and to estimate the transformation between the images.

This framework allowed us to implement automatic stitching of panoramic images and automatic estimation of fundamental matrix for stereo view. 4. 1 Automatic Image Stitching We begin with a sequence of panoramic images that we would like to stitch together into one seamless composite. The images were taken from the same location, so the relationship between the images is determined by a homographs (a projective transformation). At least four corresponding points are needed to determine the homographs between the images.

SIFT produces a set of initial corresponding points, which are fed to RANSACK for fitting the homographs. One of the images is then warped according to the homographs (we used bilinear transformation when warping), and images are stitched together (we used the “hat” weighting function, ranging from O at the borders to 1 in the center, when stitching to obtain a smoother exult). Applying SIFT in this way results in fully automated image stitching. The only user input that may be required is to set the distance threshold for RANSACK.

We set that threshold to a rather low value (t = 0. 01 , corresponding to a couple of pixels), which possibly eliminates a few of the milliners, but at least we could be rather certain not to have any outliers sneaking in. The number of matches returned by SIFT varies depending on the size of the overlapping area, but the percentage of milliners is always large enough for RANSACK to estimate a homographs that results in a satisfying stitching. In Figures 7 and 8 we display both an example of many matches and one of 4. 1 Automatic Image Stitching 9 Figure 6: Noise testing.