What Is Image Annotation?

4 min readSep 28, 2019

Image annotation is the labeling of objects in an image to give it as data to the programs using algorithms like CNN. This enables programs to see and understand things like humans do. Hoping that in the future they’ll assist and work with us in solving problems and exploring the unexplored.

Annotation

Annotation is familiar to us back from the schooling days. While reading something we used to mark doubtful areas for later references. Even the way of annotation differs according to the people, its purpose remains the same. In computer programming, annotation refers to documentation and the addition of comments to a logic code to make it more meaningful.

The more we annotate a script, a document, or a code, the more deeply we can know about how things are connected?

Will Annotations Improve Intelligence?

Humans can learn, understand, reason, form concepts, apply logic and make decisions. It is with our intelligence we have these cognitive abilities. Through artificial intelligence, we look to emulate this type of intelligent behavior in machines. We want the machines to work for us, assist us, and even work with us. This could be the cure to solve many of the complex human-development issues.

For machines to behave like a human, they need to be trained by data sets called training data sets. Annotation is the way of labeling these data. Text annotation, audio annotation, video annotation, image annotation are some of the types of annotations. Using more training data can increase the accuracy of the algorithm. Let us consider the example of a car geek. Such an expert can easily spot a car and tell which one it is in no time. And a man less into cars will do this only at a slower pace. The thing that separates a car geek from a normal man is the knowledge about cars. A machine trained with more training data will be similar to such a geek in the car spotting game. And a program with less training data will not be able to reach that performance level.

Teaching machines to ‘see’ like we do

Cameras take photos by converting light into a two-dimensional array of numbers called pixels. Taking pictures is not the same as to see. Seeing really means understanding. We can recollect stories of people, places and things the moment we lay our gaze on them.

For a machine to see, name objects, identify people, infer geometry of things, or understand situations like we do computer vision is required. Similar to how a human child learns to do it through his/her real-life experiences and examples. Giving the same kind of training we can also make this happen in machines also.
But a single object could look different when perspective changes. So multiple data should be given to the system to identify a single object, this is where image annotation comes to play.

Image annotation

Image annotation is marking of various objects in the image and labeling them. The annotations reduces the search area of the objects in an image for the machines, the coordinates of the label around the objects in an image helps for this task. The annotated images are fed into the systems using an image classification algorithm called Convolutional Neural Network (CNN). The algorithm consists of several neurons like nodes that take input and sends out output to other nodes. Which are organized in various hierarchical layers similar to a human brain.

A CNN breaks images down into pixels or sometimes smaller groups of pixels called a filter. The network does a series of calculations on them and compares them against pixels of specific patterns that network is looking for. In the first layer, CNN detects low-level patterns like rough edges and curves. As the network performs more convolutions it begins to identify specific object features. A much more detailed look on neural networks can be found here.
In computer vision, CNN is the brain and image annotation is the technique to nourish that brain. Annotation makes images readable for computer vision through various methods. Different types of techniques used for image annotation are;

Bounding box
Polygonal
Key point
Cuboid
Semantic segmentation
Polyline
For a much detailed study on annotation techniques look here.

Why should humans do it?

Annotating images according to the machine’s task is a critical thing to mind. Machines learn to see with the trials provided to them processed through image annotations. Annotation quality can never be compromised, maintaining which can be a hard task in automatic image annotation. Humans find it easy to annotate the images as per the context provided. Artificial intelligence has been developing and for it to carry out the simplest reasoning is still an arduous task. It has more distance to cover to go humane.

AI companies depend on crowd sourcing companies like INFOLKS for getting a large number of data annotated within the demanded deadline. Human annotated images assure AI companies that these data have been encompassed with human reasoning. And this will help in humanizing the machines.

Originally published at https://www.infolks.info on September 28, 2019.