Demystifying Computer Vision

7 min readFeb 18, 2020

“One of the first areas artificial intelligence will take many jobs is transportation. Bus drivers, truckers, taxicab drivers… need to start thinking about new careers.”
- Dave Waters

Here Dave Waters pointed out the concept of autonomous vehicles. Cutting edge technologies like Artificial Intelligence and Machine Learning have made machines to independently carry out most of the tasks done by humans and even outperform them in some others. Concept of autonomous driving had been “unimaginable” until recently, but not any more. According to latest reports from many perception prototypes, it has now become a “definitely possible” thing. Computer vision is the major feature that made this dream a reality. What is computer vision? Let’s demystify it.

What is Computer Vision?

Computer vision is a subclass of artificial intelligence and machine learning enabling a smarter image/video analysis. On a layman’s terms, it is the technique that help machines to “see and interpret” our world. The technique is analogous to human visual system with which humans examine their surroundings and make decisions from it. With computer vision now computers are also able to identify and process an image or video they capture.

Where do computer vision is applied? Anywhere were machines need to see and interpret the data. Simply take the example of search engines and social platforms like Facebook. Every day users from various sources upload an infinite number of images and videos to it. For a long time unlike the text data, which could be easily interpreted, images and videos were not interpretable. Most of the time they were classified according to the descriptions and titles assigned to them. With computer vision machines are now able to unlock the meaning of images and videos and index them properly. As visual data is taking the world by storm, development of computer vision has also become a necessity. Other common examples of computer vision model include higher-level applications like autonomous vehicles, robotics, etc.

Traditional Computer Vision Vs Deep Learning

Computer vision approaches are broadly classified into two, namely, Traditional Computer Vision and Deep Learning approach. For a better understanding let us consider computer vision as a 3 block system consisting an Input, Feature engineering and Output blocks.

It is the step of feature engineering that distinguishes conventional methods from the latest techniques. Feature engineering includes feature learning and classification. In traditional computer vision approach both these were done separately as shown in the figure. Here manual learning and engineering has to be done on every image consisting of diverse objects and classes. This made the works of feature engineering very cumbersome in nature. Deep Learning approach made the process lot more easier. Learning and classification were included in a single entity known as Deep Neural Networks.

Traditional computer vision methods use techniques like SIFT, SURF, BRIEF, etc. Using these techniques several simple tasks like edge detection, curve detection, etc. can be carried out more efficiently. For complex tasks and large data sets conventional methods become primitive. This is where the Deep Learning approach comes in to action.

Deep Learning In Computer Vision

Deep learning in computer vision has helped machines to improve its accuracy(recent report says around 90+%) in visual interpretation tasks. Convolutional Neural Network is the technique that enables deep learning in computer vision. Network consists of numerous nodes characterizing various features as weights in it. The network first learns the features of the image and then classifies the image accordingly. Here CNN does both the steps simulatneously. These network constantly update values with new values obtained while interacting with new input. For a detailed understanding of how neural networks works take a look at our Neural Networks blog.

Computer Vision Tasks

Basic data interpretation methods of computer vision include Image classification, object localization, object detection, and Segmentation’s. Here is a brief look at these basic tasks of computer vision.

Image Classification

Images are classified based on the objects in it irrespective of its location in the image. The task is a single object type. If image contains only a single object, the image is tagged with that label. And if the image contains multiple objects, the image is tagged with the label that corresponds to the most visible (or “main”) single object.

Object Localization

This is the task of defining the location of a particular object in an image using bounding boxes. Images are classified based on the object in it. Again this is a single object type and if image contains multiple objects then the most visible(or “main”) single object is localized.

Object Detection

It is the combination of classification and localization. But unlike other two this is a multiple object type. Here instances of every object in an image is labeled, usually using bounding boxes. Autonomous vehicle is a fine example of object detection.

Segmentation

Here the model segment an image at its pixel. This helps the machines to drive meaningful insights. The two types of segmentation are standard segmentation and instance segmentation.

In semantic or standard segmentation, all similar objects are given the same label and id(i.e represented using the same colour). While in instance segmentation, every object of a similar class are given different id’s so as to represent different instances(ie represented using different colour). An example of both are shown below.

Computer Vision In Autonomous Vehicle

Autonomous vehicle is one of the greatest applications of computer vision. For enabling a safe drive, the technique helps these prototype in detection and recognition of traffic signs, traffic lights, pedestrians, adjacent vehicles, lanes etc.

Object Detection In AV

Object detection is basically the combination of object classification and object localization. As autonomous vehicles need to deal with multiple objects on roads simultaneously object detection will be the most effective method. Object detection in autonomous vehicles are usually done using bounding boxes. Perfectly enclosed and labeled bounding boxes teaches perception models what is what and what decision to take when. From bounding boxes, model extract box parameters like labels, box height, box width, etc for gaining needful insights from it.

A bounding box isn’t enough for a CV model in Autonomous vehicles to work. Conventionally, CNN used sliding window technique to learn the art of object detection. Images are first divided into small squares or sections. Algorithm then search for the required object in these segments. This helps the machine to narrow down its search on the image. But as tasks kept on evolving more complexities like real time video detection,etc. need for advanced techniques like YOLO (You Only Look Once) also increased. As computer vision is a vast topic, discussion of such techniques might take you off-road now and are also out of our current discussion scope.

Segmentation In AV

Segmentation is another approach of computer vision. Here it helps the perception models to segment its surrounding at pixel level and derive a better insight from what it sees. In object detection parameters like box label, box parameters etc, are fed into models for image interpretation. In case of segmentation a pixel segmented image is generated for model functioning. Even for us rather than some box parameters like Xmin and Ymin a well pixel segmented image would make more sense and so for machines too.

Semantically Segmented Street View For AV Training

Both are effective in its own way and are choice of its usage vary according to the nature of its application.

What We Do?

So far we have discussed about what computer vision is and even took an application of it for better understanding. Image Annotation is an important step in enabling these computer vision tasks. It is the act of labeling images generating training data for such models. Read here to learn more about what is Image Annotation. We process quality training data utilizing various annotation techniques with our highly experienced team of annotators. Got to annotate? Feel free to let us know about your annotation needs.

As we said CV tasks ranges from a QR code scanning to even a disease detection. Thanks to google. Recently, Google’s CV has figured out a math to detect diabetic retinopathy just from the retinal images. And it has proven to be better than well qualified and experienced ophthalmologists. Wondering about how far computer vision can develop? Let me know your thoughts through your comments here.

Originally published at https://www.infolks.info on February 18, 2020.