Séminaire de Master : Document Recognition and Retrieval


Denis Lalanne & Rolf Ingold


Documents play an important role in everyday communication. With the ever-increasing use of the Web, a growing number of documents are published and accessed on-line. Unfortunately, document structures are not often considered, which considerably weaken users's browsing and searching experience.

There are many levels of abstraction in a document, conveyed by its various structures: thematic, physical, logical, relational or even temporal. In most of the search engines and information retrieval systems, this multi-layered structure is not taken into account; documents are indexed in the best case according to their thematic structure or simply represented as a bag of words. The form of the documents, i.e. their layout and logical structures, is underestimated and could carry important clues about how the document is organized, which could drastically improve indexing and retrieval.

We believe that the various document structures extraction will improve (a) documents indexing and retrieval and (b) linking with other media. In particular, we will see in this seminar how documents can be integrated in multimedia and multimodal applications and how document-based interfaces can improve searching and retrieval in multimedia databases.


  1. Document visualization
  2. Thematic/Topic segmentation of documents
  3. Document: Information/Content extraction and indexing
  4. Document modeling and formatting
  5. Document physical and logical structure extraction
  6. Document classification
When and what

  • Various internal speakers will present their works or views:
    • (Alphabetically) Ardhendu Behera, Karim Hadjar, Dalila Mekhaldi, Maurizio Rigamonti, etc.


  • Presentations will be in French, German or English.


