logo logo

December 02, 2018 05:25

Summary : What's the point : Semantic Segmentation with Point Supervision


Autors : Amy Bearman, Olga Russakovsky, Vittorio Ferrari, and Li Fei-Fei

Published : 23 July 2016

Motivation to Read

I'm recently working on object detection. In the realm of object detection, how to supervise the detector contribute significantly to the efficiency of learning process. For instance, it is more preferable to give bounding box information in addition to their class label. What's more, images segmented in pixel level is informationally richer than bounding box, and may help to increase the detection accuracy. Using this kind of annotated data for training is called strong or weak supervising according to their degree. In general, the more the annotation becomes richer, the more the detection accuracy becomes higher. Conversely, it becomes harder for human annotators. 

This paper quantitatively measure this trade-off between detection accuracy and annotation time. Moreover, they maintain that their point annotation meets this balance quite well. It is quite interesting that how we train our machine affects significantly to our facing problem not only for tuning them.


  • They ask annotators to point to an object if one exists
  • They incorporate this annotated code into their loss function
  • They find that this process affects quite well compared to image level annotation. In the case of fixed time budget, this way defeats pixel level annotation

Main Part

In this paper, segmentation results are evaluated with mean intersection over union (mIOU). They basically obtain segmentation of the image through fully convolutional network. They try several kind of annotations as mentioned in the image below. As we can see in the third panel of the lower row, point level annotation seems easy to obtain in low cost. The fourth panel image is obtained by other classifier mentioned in here. As we can see in the result section, this objectness image can be exploited as automatically annotated data during learning process, and positively contribute to the detection. We can see the detailed design of the loss functions for each different annotation methods mentioned in this paper.

Main Result

Related Resources

Source and Codes


Fully Convolutional Network (FCN) : composed of convolutional layer only, and no fully connected layer or multi layered perceptron. FCN's output is same as input image dimension and typically used as segmentation task.