Project Introduction

VIsual Seek for Interactive Image Retrieval (VISIIR) is a project aiming at exploring new methods for semantic image annotation. This topic is extensively studied for more than a decade now due to its large number of applications in areas as diverse as Information Retrieval, Computer Vision, Image Processing, and Artificial Intelligence. Semantic annotation refers to the ability of predicting a semantic concept based on the visual content of the image. Filling the semantic gap between visual data and concepts is the main goal followed by researchers in the field. In supervised learning, a large amount of labeled data is mandatory to provide effective semantic annotation tools. In interactive Image Retrieval Systems (CBIR), the annotation requires to formulate the user query with an example, i.e. an image. User feedback interaction is commonly used to interactively refine a query concept by asking the user whether some selected images are relevant or not. To be effective, one major challenge in interactive CBIR is to minimize the required number of feedback loops to grasp the semantic query of the user.

The VISIIR project proposes new interactive methods for providing powerful semantic annotation systems. The originality of the proposal is three-fold:

  • Eye-tracker-driven system. A major specificity of the project is to use the latest eye-tracker techniques for validating and improving the vision and learning models developed by the academic partners.
  • New paradigm for visual representation and learning. We introduce a novel learning scheme combining supervised and interactive methods.
  • Web filtering for food annotation. The new methods developed in the project will be validated in a specific application dedicated to retrieve, filter and classify images of recipes.

In terms of methodology, the first lock for semantic annotation relies on the representation of visual content. In order to make one step further compared to current state-of-the-art methods, we want to develop new bio-inspired representations. One key idea is to provide a hybrid representation, combining visual saliency models and unsupervised deep networks. In the second part of VISIIR, we design new interactive learning schemes. We exploit the additional source of information provided by the eye-tracker to boost the learning quality (i.e. the active learning convergence), at two different levels. Firstly, eye-tracker features are used in conjunction to user’s annotation to jointly optimize the classification function and the visual representations learned off-line in task 1. In addition to this gaze analysis purpose, we propose to use the eye-tracker to control the learning process and develop new Human-Computer Interactions (HCI). Typically, eye-tracking statistics will act as user feedback. Finally, one strong axis of VISIIR is the rigorous evaluation of the proposed semantic annotation methods in a specific web filtering application dedicated to food retrieval. A complete database will be provided through the project with the goal of finding images of recipes. This fine-grained classification task will be used as a use case to validate the visual representations and interactive learning methods of task 1-2. A methodological aspect addressed in this task is the scalability of the interactive search when applied to the huge amount of images harvested from the web. We want to tackle this scalability lock by marrying efficient hashing structures for indexing and search with exploration techniques.

To carry out VISIIR, the various required skills will be provided by the consortium partners. UPMC will bring skills in image classification and statistical learning, I3S on CBIR and scalability, L3I on visual saliency and attentional models, and Tobii on Eye-tracker technology.