Basics of Computer Vision: 1. Interpolation and Resizing

9 min readNov 28, 2021

Hello people :-)

I am glad to start a new series, this time on the Basics of Computer Vision which I hope to continue to work on parallelly along with my other series Neural Networks Intuitions.

The series “Basics of Computer Vision” will focus on low level computer vision tasks such as resizing, convolutions, keypoint detection, edge detection(to name a few) and many more. Today, in the first article of Basics of Computer Vision, we will be looking into Image Resizing, where it is being used and how does it work from an algorithmic viewpoint.

Throughout this article, I will be coding these basic resizing algorithms and show examples to keep it interactive and make it more intuitive for anyone who reads this article :)

Let’s start!

Image Resampling/Rescaling/Resizing:

Image Resizing is the process of increasing or decreasing the size of an image.

What does it mean?

In terms of a 2-dimensional image, more specifically, it means scaling up/down the width and height of an image by computing the pixel values for the newer(or resized) image.

Example:

Consider the following high resolution image of size 1200x1200 with 3 channels and the file size is 1MB(assume).

Let’s say you want to downscale this image for some reason — either you would want your photos to occupy less space on disk(not worrying about its quality anymore) or you would even want to train a neural network to detect Horses where the network’s input size is 300x300.

So how one can achieve this?

Using Opencv(or PIL) or any other image processing library in Python, we could easily resize the image.

In opencv, it is done using cv2.resize function:

image = cv2.imread('/tmp/horse.jpg')
resized = cv2.resize(image, (300, 300)) # (300, 300) is (width, height) of the new image

Following is the new resized image,

Why does one need to resize an image in the first place?

Resizing is one of the basic tasks in Image processing which is useful in image editing softwares or even when someone has to fit their profile pictures on social media platforms or for few other reasons I had mentioned above.

How does this work? or to be more precise, how are these pixel values for the new image calculated?

There are multiple algorithms which are used to resize images — but the core idea is to interpolate the pixel values for the new image, given the original set of pixels.

Cool! So what is interpolation then?

Let’s look at the next section to understand it!

Interpolation:

Interpolation(in mathematics) means finding a new set of values for a function given a set of prior values for the same function.

For eg:

https://en.wikipedia.org/wiki/Interpolation

There are different types of interpolation, a few being:

Nearest neighbour interpolation
Linear interpolation
Polynomial interpolation

Where is Interpolation used?

Interpolation is generally used to estimate new data points from the known data points by statisticians for better understanding of the underlying data, can be used to approximate complex functions for efficient experimentations or even used for scaling images!

*Note that I won’t be getting into the mathematical details of the above interpolation methods, but instead look at them from the perspective of how these can be used to resize images.

Image Resizing — Basic Idea:

Before looking at the image resizing algorithms, we need to understand why is the concept of interpolation relevant here?

A 2-d image is basically represented as a 2-dimensional matrix with each cell in the matrix containing a pixel value. So when we say scaling up this matrix, it means creating a bigger matrix than the original one and fill up the missing pixel values in this bigger matrix.

Let us look at an image to make the idea more concrete:

Consider an input matrix of size 3x3 with each cell containing some pixel values in the range 0–255. row, col indices starts from 0 to n-1 where n is the row/col length.
Now let’s create a new 6x6 output(upscaled) empty matrix.
Inorder to start filling the pixel values for the new matrix, we first have to represent the output coordinate space interms of the input coordinate space i.e. for every (row, col) in the output matrix, what is the corresponding (row, col) in the input matrix? This is just the scaling factor which is 1/2 in our case.
To make it more clearer, the row scaling factor is 1/2 and column scaling factor is 1/2(row and col will have separate scaling factors but since our eg considers a square matrix, both are same here).
row 0, col 0 in output is mapped to row 0, col 0 in input, whereas row 1, col 1 in output is mapped to row 0.5, col 0.5 in input and so on.

Now let’s look at the mapped coordinate space:

The above image shows how the transformed (row, col) coordinates look like for the output matrix.

One can see that there are known cells such as 0, 0 or 2, 2(in integers) and there are also unknown cells such as 1.5, 2 or 0, 2.5(in floating point).

It is easier to fill in the pixel values for the known cells by simply picking the corresponding pixel values of those (row, col) cells from the original matrix but how do we find the pixel values for the unknown cells?

This is same as estimating new data points given some set of input data points where the given data points are pixel intensities from our input 3x3 matrix and the unknown pixel intensities(or new data points) are computed using them. Hence interpolation is used to find those unknown pixel values.

Nearest Neighbour Interpolation:

Nearest neighbour interpolation is quite a straightforward way of computing the pixel values for a new image.

Let me take the previous image to explain things clearly,

Here in the above image, a simple way to find the values for unknown cells such as (0.5, 0.5) or (2, 2.5) is to simply round them off to nearest integer. for eg: (2, 2.5) to (2, 2) or (1.5, 2.5) to (1, 2). This is called nearest neighbour interpolation.

The resultant coordinate space looks like something like:

Now that all are known cells, the pixel values can simply be taken from the original matrix.

The algorithm is as follows:

Consider an input image I of dimensions (hi, wi, c) where c=3. Let’s assume the image I has to be resized to twice its size. i.e. output image R of dimensions (hr, wr, c).
For every x, y coordinates in the output image, find its corresponding x, y coordinates in the input image.
Now each of the remapped x and y output coordinates can either correspond to the actual input pixel coordinates or coordinates in-between them (i.e. floating point coordinates).
For floating-point coordinates, we could simply round them off and pick the corresponding integer coordinate — hence called nearest neighbour interpolation.

Now let us try and translate each of the step into a python code snippet!

Nearest Neighbour Resizing Algorithm

Bilinear Interpolation:

Bilinear interpolation makes use of linear interpolation method to compute the pixel values for a new image.

The method works as follows:

For any unknown (row, col) cell in the upsampled matrix, pick the 4 nearest pixels. These nearest pixels can be obtained by doing int(row), int(col), int(row) + 1 and int(col) + 1. Let’s call it row1, col1, row2 and col2
Perform linear interpolation at (row1, col) using (row1, col1) and (row1, col2) and similarly linear interpolation at (row2, col) using (row2, col1) and (row2, col2). Both are along x-directions.
Do one final linear interpolation at (row, col) using (row1, col) and (row2, col).
The above steps are repeated for every unknown (row, col) cell in the new matrix.
Note that the same pixel value can be obtained by doing two linear interpolation along y-directions first and then along x-direction.

Let us take an example and see how this method works in practice:

For any unknown (row, col) cell in the upsampled matrix, pick the 4 nearest pixels. For eg: for cell (0.5, 0.5), the 4 nearest pixels are (0, 0), (0, 1), (1, 0), (1, 1).

2. Finding the pixel value at (0.5, 0.5) means first finding the pixels values at (0, 0.5) and (1, 0.5) and then using them to find the value at (0.5, 0.5)

3. Do linear interpolation twice along x-direction — one at (0, 0.5) using <(0, 0), (0, 1)> and another at (1, 0.5) using <(1, 0), (1, 1)>

4. Then another interpolation along y-direction — at (0.5, 0.5) using (0, 0.5) and (1, 0.5)

But how is this linear interpolant is computed?

Let’s perform the first interpolation along x-direction to compute the pixel value at (0, 0.5) using (0, 0) and (0, 1)

I(0, 0) = 233, I(0, 1) = 200 where I denotes the pixel intensity(refer above image for more clarity)

I(0, 0.5) = 233 * 0.5 + 200 * 0.5 = 116.5 + 100 = 216.5

Bilinear Interpolation Equation

Let (x1, y) and (x2, y) be the pixel coordinates in the new matrix and their intensities be I1 and I2 where x2 > x1.

Pixel intensity at an unknown point (x, y1) can be computed as:

where x1 ≥ x ≥ x2

Pictorially, this can be depicted as:

New pixel intensity is nothing but the weighted sum of pixel intensities of the nearest two pixels where the weight is determined by the distance from the nearest pixels.

The algorithm is as follows:

Consider an input image I of dimensions (hi, wi, c) where c=3. Let’s assume the image I has to be resized to twice its size. i.e. output image R of dimensions (hr, wr, c).
For every x, y coordinates in the output image, find its corresponding x, y coordinates in the input image.
Now each of the remapped x and y output coordinates can either correspond to the actual input pixel coordinates or coordinates in-between them (i.e. floating point coordinates).
For floating-point coordinates, get the nearest four pixels by by doing int(row), int(col), int(row) + 1 and int(col) + 1.
For every unknown cell, perform linear interpolation twice along x-direction and one more along y-direction to compute the pixel intensity. Note that there can also be only one coordinate which is float while the other is integer — for eg: (0, 0.5). In such cases, it is enough to do linear interpolation once along either x or y direction to compute the unknown cell.
Note: It is not required to do explicit normalization of the distances since every two pixels are always at a distance of 1 pixel.

Check the following python code snippet!

Implementation Results:

Below are the results of a smaller image(less than 200x200px) being upsampled to 500x500 and above.

Original Image, Nearest Neighbour, Bilinear Interpolation

As one can notice, Nearest neighbour despite being the fastest produces jagged arrays and looks more pixelated whereas Bilinear interpolation produces a much better and smoother version.

Please note that both resizing code is written only for the purpose of this blog post and are not efficient. I will soon be adding a github link here which has both naive and vectorized implementations.
I have also not talked about bicubic interpolation in this article, will be covering it in future posts :)

That’s all in this first article of Basic of Computer Vision. I hope I was able to present the basic idea behind image resizing and how it can be implemented with the help of interpolation.

Will be back soon with another interesting problem in image processing!

Stay safe. Cheers :)

References:

The Ancient Secrets of Computer Vision by Joesph Redmon — https://www.youtube.com/watch?v=hpqrDUuk7HY