Introduction

Convolution is a mathematical operation that makes the product integral of 2 functions (signals), with one of the signals upside down. For instance, below we convolve 2 f(t) and g(t) signals.

So the first thing to do is to flip the g signal horizontally (180 degrees), and then slide the g upside down on f, multiplying and accumulating all its values.

The order in which the signals are convolved has no importance for the final result, so conv(a,b)==conv(b,a)

In this case consider that the blue signal f(τ) is our input signal and g(t) our kernel, the term kernel is used when using convolutions to filter signals.

Output signal size 1D

In the case of 1D convolution, the output size is calculated in this way:

outputSize=(InputSize-KernelSize)+1

Application of convolutions

People use the convolution on signal processing for the following use cases:

Filtering signals (1D audio, 2D image processing)

Check how much one signal is related to another

Finding patterns in signals

A simple example in matlab and python (numbed)

Below are two signals x = (0,1,2,3,4) with w = (1,-1,2).

Do by hand

To understand better the concept of convolution, let’s take the above example by hand. Basically we are going to convolve 2 signals (x,w). The first thing is to flip W horizontally (O turn left 180 degrees)

After that we need to slide the flipped W over the input X

Note that in steps 3,4,5 the inverted window is completely inside the input signal. These results are called “valid” convolutions. The cases in which the inverted window is not completely inside the input window(X), we can consider them as zero, or we can calculate what can be calculated, for example in step 1 we multiply 1 by zero, and the rest is simply ignored.

Input padding

To keep the size of the convolution result equal to that of the input, and to avoid an effect called circular convolution, we pad the signal with zeros.

Where you put the zeros depends on what you want to do, that is: on the 1D case you can concatenate them on each end, but on a 2D case you normally place them all around the original signal.

On matlab you can use the command ‘padarray’ to pad the input:

>> x

x(:,:,1) =

1 1 0 2 0

2 2 2 2 1

0 0 0 2 1

2 2 2 2 1

2 0 2 2 1

x(:,:,2) =

2 1 0 0 0

0 2 0 1 0

1 0 1 2 0

1 2 0 2 1

1 2 1 2 2

x(:,:,3) =

2 1 1 2 2

1 1 1 0 0

2 0 1 0 2

0 2 0 2 1

0 0 2 1 0

>> padarray(x,[1 1])

ans(:,:,1) =

0 0 0 0 0 0 0

0 1 1 0 2 0 0

0 2 2 2 2 1 0

0 0 0 0 2 1 0

0 2 2 2 2 1 0

0 2 0 2 2 1 0

0 0 0 0 0 0 0

ans(:,:,2) =

0 0 0 0 0 0 0

0 2 1 0 0 0 0

0 0 2 0 1 0 0

0 1 0 1 2 0 0

0 1 2 0 2 1 0

0 1 2 1 2 2 0

0 0 0 0 0 0 0

ans(:,:,3) =

0 0 0 0 0 0 0

0 2 1 1 2 2 0

0 1 1 1 0 0 0

0 2 0 1 0 2 0

0 0 2 0 2 1 0

0 0 0 2 1 0 0

0 0 0 0 0 0 0

Output size for 2D

Please know what the size of our output will be after performing some convolution operations on it. Fortunately there is a handy formula that tells us exactly that.

If we consider the convolution of an input, of spatial dimensions [H, W] padded with P, with a square kernel of size F and using stride S, then the output size of the convolution is defined as:

outputSizeW=(W-F+2P)/S+1 outputSizeH=(H-F+2P)/S+1 output

F is the size of the kernel, we normally use square kernels, so F is both the width and height of the kernel

Convolution operation implementation

The example below will involve a 5x5x3 input (LxHx3), with a conv level with the following parameters Stride=2, Pad=1, F=3 (3×3 kernel), and K=2 (two filters).

Our input has 3 channels, so we need a 3x3x3 kernel weight. We have 2 filters (K=2) so we will have 2 output activations at the end. A further we can calculate the size of these two outputs to be: (5 – 3 + 2)/2 + 1 = 3.

This code below (vanilla version) can’t be used in real life because it will be slow but it’s good for a basic understanding. Usually deep learning libraries do convolution as a single matrix multiplication, using the im2col/col2im method.

%% Convolution n dimensions

% The following code is just a extension of conv2d_vanila for n dimensions.

% Parameters:

% Input: H x W x depth

% K: kernel F x F x depth

% S: stride (How many pixels he window will slide on the input)

% This implementation is like the ‘valid’ parameter on normal convolution

function outConv = convn_vanilla(input, kernel, S)

% Get the input size in terms of rows and cols. The weights should have

% same depth as the input volume(image)

[rowsIn, colsIn, ~] = size(input);

% Get volume dimensio

depthInput = ndims(input);

% Get the kernel size, considering a square kernel always

F = size(kernel,1);

%% Initialize outputs

sizeRowsOut = ((rowsIn-F)/S) + 1;

sizeColsOut = ((colsIn-F)/S) + 1;

outConvAcc = zeros(sizeRowsOut , sizeColsOut, depthInput);

%% Do the convolution

% Convolve each channel on the input with it’s respective kernel channel,

% at the end sum all the channel results.

for depth=1:depthInput

% Select input and kernel current channel

inputCurrDepth = input(:,:,depth);

kernelCurrDepth = kernel(:,:,depth);

% Iterate on every row and col, (using stride)

for r=1:S:(rowsIn-1)

for c=1:S:(colsIn-1)

% Avoid sampling out of the image.

if (((c+F)-1) <= colsIn) && (((r+F)-1) <= rowsIn)

% Select window on input volume (patch)

sampleWindow = inputCurrDepth(r:(r+F)-1,c:(c+F)-1);

% Do the dot product

dotProd = sum(sampleWindow(:) .* kernelCurrDepth(:));

% Store result

outConvAcc(ceil(r/S),ceil(c/S),depth) = dotProd;

end

end

end

end

% Sum elements over the input volume dimension (sum all the channels)

outConv = sum(outConvAcc,depthInput);

end