Visión Artificial ¿Cómo funciona? Artificial Intelligence

Computer Vision VS Human Vision: This is how computers see

26/11/20 5 min. read

In this article we will explain what Computer Vision is. We will do so from the point of view of image processing. First we will explain how the computer perceives an image and then how it is the basic processing of an image to recognise its content.

We will not go too far into the field of Artificial Intelligence, because we need a basis that is not the subject of this article, but we have to mention it because through its advances and techniques Computer Vision is possible.

First of all, how does Human Vision work? 👁

Human vision revolves around light and does not involve repetition or patterns. In other words, we do not need to learn to see, it is biologically embedded in us. Human vision consists of several steps. First, light bounces off the image and enters the eyes through the cornea. Then, the cornea directs light to the pupils and iris, which work together to control the amount of light entering the eye. Once the light passes through the cornea, it enters the retina; the retina has special sensors called cones and rods, which are involved in colour vision.

Therefore, vision is, in the first place, an information processing task since in order to understand what there is an image, our brain must be able to represent this information as: colour, shape, movement, detail and beauty.

Vision of the human eye

What is Computer Vision? 💻

It is a field of artificial intelligence that aims to mathematically model the processes of visual perception in living beings, and to generate models, algorithms and programs that allow the simulation of these visual abilities using the capacity of computers.

This means that computers can make inferences about images without human assistance. This seems simple because humans can effortlessly see the world around them; however, teaching a computer to see like a human is difficult because we still do not really understand how human vision works.

So how does Computer Vision work? 🤔

To understand how a computer perceives the world, let’s start by defining what a digital image is and the basic processing it performs.

The image 📷

A digital image is a matrix. Yes, the same ones we saw in maths at school.

Coordinates of a digital image

A digital image is composed of a finite number of elements, each of which has a position on a (Cartesian) plane with x,y coordinates, and a value associated with the colour of the image at that point.

These elements are called elemental points of the image or pixels, the latter being the term commonly used to denote the minimum unit of measurement of a digital image.

The color 🔴

The representation of the color of the image in a pixel, can be binary, in the case of black and white, or a number that represents the intensity in the grey scale, or a range of colours, which we will explain later.

Figure 1 is a binary representation of a black and white image. It is assigned 1 (black) or 0 (white).

Binary representation of an image
Figure 1

Figure 2, represents an image with 256 levels of intensity. In it, each of the pixels represents a whole number that is interpreted as the level of light intensity on the grey scale. By enlarging the image in any area, these values can be seen, which are shown in the same figure in the form of a matrix, each element of the matrix corresponding to the coordinates in the plane.

Greyscale representation of an image
Figure 2. Gray scale from 0 to 250
Greyscale representation of an image
Another example of grayscale assignment

In a colour image, each pixel (or dot) in the image is represented by three values, which encode its colour as a combination of the amount of red, green and blue, known as RGB.

Representation of a colour image

The resulting colour of the pixel will therefore be defined by the “amount” of intensity that each component has. Thus, the white colour will be composed of the maximum colour intensity for the three components.

RGB model
RGB model
Grayscale VS RGB
Greyscale and colour image representation

The resolution 💎

Another parameter in a digital image is its resolution. Resolution is the number of pixels contained in an image. It is also used to classify almost all devices related to digital images. The resolution of an image is represented by two numerical values, where the first is the number of columns of pixels (width) and the second is the number of rows of pixels (height).

Resolution of an image

Basic digital processing ⚙️

In the following images we can see what a photograph would look like (black and white) if we represent it in a matrix of frames or pixels (4×4, 8×8 and 16×16 resolution) and to each pixel we attribute one of the two colours: black or white, depending on which colour predominates in each of the squares.

If we look at the following image, we can see that the greater the number of pixels, the more similar the image is to the real photograph (what we know as resolution).

Number of pixels VS Resolution
4×4 8×8 16×16

We can check this in more detail if we select the region of the face and observe its binary representation.

Image enlarged 16x16
Example of predominant colour in the pixel of the photograph and its representation

In the next post of Artificial Vision we will talk about automated image processing where AI becomes very important. We will also talk about neural networks and comment on all the applications that Computer Vision currently has.

Don’t miss the next post! 😀

Leo Gamboa Uribe

Santander Global Tech

Professional with more than 20 years of experience leading technological projects in the areas of Telecommunications, Aerospace, IoT and Finance. Founder of Molino Valley, Technology Incubator in Las Rozas de Madrid.

Nature lover, sportsman (Triathlon), passionate about technology and the divulgation of STEM areas (Science, Technology, Engineering, and Math).


👉 My LinkedIn profile



Other posts