In this article we will explain what Computer Vision is. We will do so from the point of view of image processing. First we will explain how the computer perceives an image and then how it is the basic processing of an image to recognise its content.
We will not go too far into the field of Artificial Intelligence, because we need a basis that is not the subject of this article, but we have to mention it because through its advances and techniques Computer Vision is possible.
First of all, how does Human Vision work? 👁
Human vision revolves around light and does not involve repetition or patterns. In other words, we do not need to learn to see, it is biologically embedded in us. Human vision consists of several steps. First, light bounces off the image and enters the eyes through the cornea. Then, the cornea directs light to the pupils and iris, which work together to control the amount of light entering the eye. Once the light passes through the cornea, it enters the retina; the retina has special sensors called cones and rods, which are involved in colour vision.
Therefore, vision is, in the first place, an information processing task since in order to understand what there is an image, our brain must be able to represent this information as: colour, shape, movement, detail and beauty.
What is Computer Vision? 💻
It is a field of artificial intelligence that aims to mathematically model the processes of visual perception in living beings, and to generate models, algorithms and programs that allow the simulation of these visual abilities using the capacity of computers.
This means that computers can make inferences about images without human assistance. This seems simple because humans can effortlessly see the world around them; however, teaching a computer to see like a human is difficult because we still do not really understand how human vision works.
So how does Computer Vision work? 🤔
To understand how a computer perceives the world, let’s start by defining what a digital image is and the basic processing it performs.
The image 📷
A digital image is a matrix. Yes, the same ones we saw in maths at school.
A digital image is composed of a finite number of elements, each of which has a position on a (Cartesian) plane with x,y coordinates, and a value associated with the colour of the image at that point.
These elements are called elemental points of the image or pixels, the latter being the term commonly used to denote the minimum unit of measurement of a digital image.
The color 🔴
The representation of the color of the image in a pixel, can be binary, in the case of black and white, or a number that represents the intensity in the grey scale, or a range of colours, which we will explain later.
Figure 1 is a binary representation of a black and white image. It is assigned 1 (black) or 0 (white).
Figure 2, represents an image with 256 levels of intensity. In it, each of the pixels represents a whole number that is interpreted as the level of light intensity on the grey scale. By enlarging the image in any area, these values can be seen, which are shown in the same figure in the form of a matrix, each element of the matrix corresponding to the coordinates in the plane.
In a colour image, each pixel (or dot) in the image is represented by three values, which encode its colour as a combination of the amount of red, green and blue, known as RGB.
The resulting colour of the pixel will therefore be defined by the “amount” of intensity that each component has. Thus, the white colour will be composed of the maximum colour intensity for the three components.
The resolution 💎
Another parameter in a digital image is its resolution. Resolution is the number of pixels contained in an image. It is also used to classify almost all devices related to digital images. The resolution of an image is represented by two numerical values, where the first is the number of columns of pixels (width) and the second is the number of rows of pixels (height).
Basic digital processing ⚙️
In the following images we can see what a photograph would look like (black and white) if we represent it in a matrix of frames or pixels (4×4, 8×8 and 16×16 resolution) and to each pixel we attribute one of the two colours: black or white, depending on which colour predominates in each of the squares.
If we look at the following image, we can see that the greater the number of pixels, the more similar the image is to the real photograph (what we know as resolution).
We can check this in more detail if we select the region of the face and observe its binary representation.
In the next post of Artificial Vision we will talk about automated image processing where AI becomes very important. We will also talk about neural networks and comment on all the applications that Computer Vision currently has.
Don’t miss the next post! 😀