Following the first CLEF-HIPE-2020 evaluation lab on historical newspapers in three languages, HIPE-2022 is based on diverse datasets and aims at confronting systems with the challenges of dealing with more languages, learning domain-specific entities, and adapting to diverse annotation tag sets.

The objectives of HIPE-2022 are to:

  • assess and advance the development of robust, adaptable and transferable named entity processing systems across languages, time periods, document types, and annotation tag sets.
  • deal with challenging historical material, thereby supporting information extraction and text understanding of cultural heritage data.

Compared to the first edition, HIPE-2022 introduces several novelties, with:

  • the addition of a new type of document alongside historical newspapers, namely classical commentaries;
  • the consideration of a broader language spectrum, with 5 languages for historical newspapers and 3 for classical commentaries;
  • the confrontation with the issue of the heterogeneity of annotation tag sets and guidelines.

Therefore the main challenges of the HIPE-2022 edition are:

  • multilingual corpora from different countries: English, German (AU,DE,CH) ,French (CH,FR), Finnish, Swedish
  • different document types (historical newspapers and classic commentaries)
  • noisy OCR
  • partial coverage of KBs with respect to historical entities
  • different annotation tagsets

Registration and information

CLEF lab registration opens on 15 November 2021 and closes on 22 April 2022 (registration link).

Please mail to our HIPE 2022 mailing list for any questions.