Procesamiento automatizado de imágenes Artificial Intelligence

Computer Vision and Image Recognition: automated processing

14/12/20 7 min. read

Once the basic processing of a digital image has been explained, will you ask yourself how it is possible to perform automated image processing? How do computers recognize which pixels make up certain images?

Computer Vision essentially revolves around the recognition of specific patterns or characteristics in images, therefore as we shall see, each pixel or group of pixels must be analysed.

To do this we need algorithms (1) that analyse all the conditions and patterns that make it possible to identify the presence of a given object in an image, which is known as training.

(1) Algorithm: It is a set of defined and unambiguous, ordered and finite instructions or rules that allows, typically, to solve a problem, perform a computation, process data and carry out other tasks or activities.

Automatic identification of patterns in an image

Artificial Intelligence 🎇

Training can be achieved through repetition; computers should receive as many identified or tagged images as possible. For example, if you wanted to teach a computer to identify sheep, it would be shown numerous images with tagged sheep. To tag the sheep, the computer would recognize which specific pixels have patterns or characteristics of a sheep and then associate that pixel structure with sheep.

This case is known as Supervised Learning, within the area of Machine Learning.

Sheep pixels

Now, these patterns or characteristics can be millions, and for this we will be helped by Artificial Intelligence, in this example, by neural networks, which is one of the most popular types of Machine Learning.

Neural Networks

And… What are neural networks?🧠

No!. I will not explain what a neural network is, I will only mention that in the case of Computer Vision, it is an algorithm that processes and imitates the visual cortex of the human eye to identify different characteristics in the inputs (we have already seen this in image processing) and that ultimately makes it possible to identify objects and finally “see”.

It can detect points, lines, curves and specialises in recognising complex shapes such as a face or the silhouette of an animal.

Remember that the neuronal network must learn by itself to recognise a diversity of objects within images and for this we will need a large quantity of images.

Identification of patterns and neural networks 🧬

We will now explain basically one of the automatic processing techniques, to understand the complexity and the steps involved. There are many more techniques but they all seek the same thing, to identify patterns.

Until a few years ago, the option was to manually label the images to show the machine where the ears or tail of a cat were. This was known as “hand-made features” and used to be widely used, but it was very laborious and purely “comparative” and not for learning the features of the images.

In the following image we see the figure of a cat, then the conversion to grey, which will allow us to better identify the main lines in groups of pixels, and then the selection of parts of the cat (ears, mouth, nose, etc.). With this information our neural network should be able to identify a cat in the image.

Processing of an image

But what happens if a cat has its ears down or moves away from the camera: the neural network will not be able to identify it.

That is why we have to try to get the machine to learn the characteristics on its own, based on basic lines. One method can be to divide the whole image into blocks of 8 × 8 pixels and assign each one a dominant line type, either horizontal [-], vertical [|] or diagonal [/].

Learning of the neural network

The output of our process, will be several tables formed by “sticks” which are, in fact, the simplest characteristics that represent the edges of the objects in the image. They are only images, but built from sticks. So, once again, we can take an 8 × 8 block and see if they match. And again and again.

Sticks processing

This operation is called “convolution“, and this name gave the name to the algorithm. Convolution can be represented as a layer of a neural network, because each neuron can act as any function.

When we feed our neural network with many pictures of cats, it automatically assigns larger weights (importance) to the combinations of sticks it sees most often. It doesn’t matter if it’s a straight line from a cat’s back or a geometrically complicated object like a cat’s face.

To obtain our output we would put a simple neuron that would look for the most common combinations and, based on this, we could differentiate cats from dogs.

In the following figure we can see an example of this type of processing of “sticks”, for faces, cars, elephants… and chairs.

Examples of convolution

What is interesting about this idea of “sticks” is that we have a neural network that seeks out the most distinctive characteristics of objects by itself. We don’t need to pick them up manually. We can feed any number of images of any object just by looking for billions of images and our network will create feature maps from sticks and learn to differentiate any object by itself.

Something similar, but with other techniques, we can see it in the area of facial recognition, one of the applications of Computer Vision. In this case, instead of sticks we look for dots as facial references, but finally the learning process is similar to that of the cat.

Facial Recognition

After reading this, we can see that neural networks and artificial intelligence are in our daily lives, with multiple applications. Let’s look at some 👇

Computer Vision Applications🧿

Computer Vision is an area with more than 40 years of research, so we already have several applications of the techniques developed. These applications include:

  • Optical Character Recognition (OCR): This consists of the automatic identification from an image of symbols or characters belonging to a certain alphabet, and then storing them in the form of data.
  • Robotic inspection: The fast inspection of parts to ensure the quality of manufacturing components using stereo vision with specialised lighting.
  • 3D model construction: The automated construction of 3D models from photographs.
  • Medical imaging: Technology used to take X-rays and the techniques to detect malignant tumours and abnormalities in them.
  • Automotive safety: Helping to detect obstacles through an assisted driving system using different cameras.
  • Motion Capture: Using retro-reflective markers seen from multiple cameras or other techniques to capture actors’ movements for use in computer animation.
  • Surveillance: Intruder monitoring, road traffic analysis, and pool monitoring for drowning victims.
  • Fingerprint recognition and biometrics: For automatic access identification and also used for forensic applications.
  • Face detection: Used to improve the focus of cameras and to make a more relevant search of people in images.

Like these, there are more applications of Computer Vision techniques; many of which are already commonplace for us 😁

Now we can understand this image a little more 😎


Leo Gamboa Uribe

Santander Global Tech

Professional with more than 20 years of experience leading technological projects in the areas of Telecommunications, Aerospace, IoT and Finance. Founder of Molino Valley, Technology Incubator in Las Rozas de Madrid.

Nature lover, sportsman (Triathlon), passionate about technology and the divulgation of STEM areas (Science, Technology, Engineering, and Math).


👉 My LinkedIn profile



Other posts