HIPE 2026 evaluation results
On This Page
- Official Results
- Main Results
- Accuracy Profile Ranking
- Generalization Profile Ranking
- Accuracy-Efficiency Profile Ranking
- Dataset-Specific Evaluations
- Additional Balanced Accuracy-Efficiency Analysis
- Additional Binary Analysis
- Reproducibility
Official Results
We evaluated submissions across three main profiles: Accuracy using macro Recall, Generalization on an unseen test set, and Accuracy-Efficiency, which highlights systems that combine strong performance with a lighter model footprint – an important aspect for sustainable and in-house processing of large historical text collections.
Accuracy Ranking – Top 3 Teams
| rank | team | affiliation |
|---|---|---|
| 1 | Spinfo | Universität zu Köln |
| 2 | MaxFo-Ajie | Foshan University |
| 3 | whereami | Alexandria University |
Generalization Ranking – Top 3 Teams
| rank | team | affiliation |
|---|---|---|
| 1 | MaxFo-Ajie | Foshan University |
| 2 | Spinfo | Universität zu Köln |
| 3 | BIU_NLP | Bar-Ilan University |
Accuracy-Efficiency Ranking – Top 3 Teams
| rank | team | affiliation |
|---|---|---|
| 1 | MILRIT | University of Toulouse & La Rochelle University |
| 2 | FI-CODE | University of the Bundeswehr Munich |
| 3 | DS@GT_HIPE | Georgia Institute of Technology |
Full generated reports: official evaluation report and additional binary evaluation report.
Many congratulations to the top-ranked teams!
We also warmly thank all participating teams for their contributions. In total, 17 teams participated, submitting 45 runs. Beyond the main rankings, several teams achieved strong results in language-specific evaluations or offered useful accuracy-efficiency trade-offs.
Main Results
The official ranking uses the ternary at labels TRUE, PROBABLE, and FALSE. The tables below replicate the main aggregated profile rankings from the generated official evaluation report. Dataset-specific tables, diagnostics links, score definitions, and additional efficiency tables are linked below rather than repeated on this page.
- Download complete official results archive, including all per-run scores, diagnostics, diagnostic metrics, and validation files.
Accuracy Profile Ranking
| rank | team | affiliation | run | mean impresso profile score | languages |
|---|---|---|---|---|---|
| 1 | Spinfo | Universität zu Köln | run1 | 0.7479 | de, en, fr |
| 2 | Spinfo | Universität zu Köln | run3 | 0.7289 | de, en, fr |
| 3 | MaxFo-Ajie | Foshan University | run1 | 0.7001 | de, en, fr |
| 4 | Spinfo | Universität zu Köln | run2 | 0.689 | de, en, fr |
| 5 | whereami | Alexandria University | run1 | 0.688 | de, en, fr |
| 6 | whereami | Alexandria University | run2 | 0.6833 | de, en, fr |
| 7 | MaxFo-Ajie | Foshan University | run2 | 0.669 | de, en, fr |
| 8 | Awakened | National University of Science and Technology Politehnica Bucharest | run3 | 0.6671 | de, en, fr |
| 9 | Awakened | National University of Science and Technology Politehnica Bucharest | run1 | 0.6584 | de, en, fr |
| 10 | MaxFo-Ajie | Foshan University | run3 | 0.6544 | de, en, fr |
| 11 | INSA Lyon | INSA Lyon - University of Lyon | run1 | 0.639 | de, en, fr |
| 12 | gipplab | University of Göttingen | run2 | 0.6271 | de, en, fr |
| 13 | Hansel&Gretel | IIT Roorkee | run3 | 0.6221 | de, en, fr |
| 14 | gipplab | University of Göttingen | run1 | 0.6141 | de, en, fr |
| 15 | MILRIT | University of Toulouse & La Rochelle University | run3 | 0.5951 | de, en, fr |
| 16 | UMUTEAM | Universidad de Murcia | run2 | 0.5856 | de, en, fr |
| 17 | Ministral-3-3B-Instruct GGUF baseline 0.2.2 random seed 42 | HIPE-2026 organizers | run1 | 0.5818 | de, en, fr |
| 18 | VerbaNexAI II | Universidad Tecnológica de Bolívar | run3 | 0.5795 | de, en, fr |
| 19 | Hansel&Gretel | IIT Roorkee | run2 | 0.5788 | de, en, fr |
| 20 | BIU_NLP | Bar-Ilan University | run2 | 0.5781 | de, en, fr |
| 21 | MILRIT | University of Toulouse & La Rochelle University | run1 | 0.5623 | de, en, fr |
| 22 | Awakened | National University of Science and Technology Politehnica Bucharest | run2 | 0.5494 | de, en, fr |
| 23 | Hansel&Gretel | IIT Roorkee | run1 | 0.5458 | de, en, fr |
| 24 | BIU_NLP | Bar-Ilan University | run3 | 0.539 | de, en, fr |
| 25 | VerbaNexAI II | Universidad Tecnológica de Bolívar | run2 | 0.5187 | de, en, fr |
| 26 | DS@GT_HIPE | Georgia Institute of Technology | run1 | 0.5142 | de, en, fr |
| 27 | gipplab | University of Göttingen | run3 | 0.5069 | de, en, fr |
| 28 | VerbaNexAI II | Universidad Tecnológica de Bolívar | run1 | 0.5004 | de, en, fr |
| 29 | VerbaNexAI I | Universidad Tecnológica de Bolívar | run2 | 0.4842 | de, en, fr |
| 30 | DS@GT_HIPE | Georgia Institute of Technology | run2 | 0.4836 | de, en, fr |
| 31 | DS@GT_HIPE | Georgia Institute of Technology | run3 | 0.4771 | de, en, fr |
| 32 | FI-CODE | University of the Bundeswehr Munich | run2 | 0.4734 | de, en, fr |
| 33 | INSA Lyon | INSA Lyon - University of Lyon | run3 | 0.4731 | de, en, fr |
| 34 | INSA Lyon | INSA Lyon - University of Lyon | run2 | 0.4708 | de, en, fr |
| 35 | FI-CODE | University of the Bundeswehr Munich | run3 | 0.4645 | de, en, fr |
| 36 | VerbaNexAI I | Universidad Tecnológica de Bolívar | run1 | 0.4628 | de, en, fr |
| 37 | ROSTI | Université Lumière Lyon | run3 | 0.4564 | de, en, fr |
| 38 | ROSTI | Université Lumière Lyon | run2 | 0.4507 | de, en, fr |
| 39 | UMUTEAM | Universidad de Murcia | run3 | 0.4495 | de, en, fr |
| 40 | ROSTI | Université Lumière Lyon | run1 | 0.446 | de, en, fr |
| 41 | BIU_NLP | Bar-Ilan University | run1 | 0.4429 | de, en, fr |
| 42 | UMUTEAM | Universidad de Murcia | run1 | 0.4408 | de, en, fr |
| 43 | FI-CODE | University of the Bundeswehr Munich | run1 | 0.427 | de, en, fr |
| 44 | MILRIT | University of Toulouse & La Rochelle University | run2 | 0.4264 | de, en, fr |
| 45 | FourBytes | Sri Sivasubramaniya Nadar College of Engineering | run1 | 0.4061 | de, en, fr |
| 46 | Random Decision Baseline | HIPE-2026 organizers | run1 | 0.4049 | de, en, fr |
Only runs with submissions for all impresso languages are included in the overall Accuracy Profile ranking. Teams with partial results are shown in the dataset-specific tables in the full report.
Generalization Profile Ranking
| rank | team | affiliation | run | surprise profile score |
|---|---|---|---|---|
| 1 | MaxFo-Ajie | Foshan University | run3 | 0.8163 |
| 2 | MaxFo-Ajie | Foshan University | run1 | 0.7945 |
| 3 | MaxFo-Ajie | Foshan University | run2 | 0.7712 |
| 4 | Spinfo | Universität zu Köln | run3 | 0.6984 |
| 5 | Spinfo | Universität zu Köln | run1 | 0.691 |
| 6 | BIU_NLP | Bar-Ilan University | run1 | 0.6837 |
| 7 | Spinfo | Universität zu Köln | run2 | 0.6674 |
| 8 | whereami | Alexandria University | run2 | 0.6665 |
| 9 | gipplab | University of Göttingen | run2 | 0.6647 |
| 10 | Awakened | National University of Science and Technology Politehnica Bucharest | run3 | 0.6613 |
| 11 | Hansel&Gretel | IIT Roorkee | run2 | 0.6349 |
| 12 | Awakened | National University of Science and Technology Politehnica Bucharest | run1 | 0.6338 |
| 13 | whereami | Alexandria University | run1 | 0.6325 |
| 14 | Hansel&Gretel | IIT Roorkee | run3 | 0.6187 |
| 15 | Hansel&Gretel | IIT Roorkee | run1 | 0.6107 |
| 16 | gipplab | University of Göttingen | run1 | 0.6085 |
| 17 | BIU_NLP | Bar-Ilan University | run3 | 0.6 |
| 18 | VerbaNexAI II | Universidad Tecnológica de Bolívar | run3 | 0.5724 |
| 19 | UMUTEAM | Universidad de Murcia | run2 | 0.5723 |
| 20 | Awakened | National University of Science and Technology Politehnica Bucharest | run2 | 0.5509 |
| 21 | gipplab | University of Göttingen | run3 | 0.5382 |
| 22 | BIU_NLP | Bar-Ilan University | run2 | 0.5265 |
| 23 | MILRIT | University of Toulouse & La Rochelle University | run3 | 0.5152 |
| 24 | VerbaNexAI II | Universidad Tecnológica de Bolívar | run2 | 0.5076 |
| 25 | Ministral-3-3B-Instruct GGUF baseline 0.2.2 random seed 42 | HIPE-2026 organizers | run1 | 0.5062 |
| 26 | INSA Lyon | INSA Lyon - University of Lyon | run1 | 0.4705 |
| 27 | MILRIT | University of Toulouse & La Rochelle University | run1 | 0.4679 |
| 28 | VerbaNexAI II | Universidad Tecnológica de Bolívar | run1 | 0.4419 |
| 29 | INSA Lyon | INSA Lyon - University of Lyon | run2 | 0.4231 |
| 30 | INSA Lyon | INSA Lyon - University of Lyon | run3 | 0.3986 |
| 31 | DS@GT_HIPE | Georgia Institute of Technology | run3 | 0.3919 |
| 32 | ROSTI | Université Lumière Lyon | run3 | 0.384 |
| 33 | ROSTI | Université Lumière Lyon | run1 | 0.3773 |
| 34 | FI-CODE | University of the Bundeswehr Munich | run2 | 0.3755 |
| 35 | MILRIT | University of Toulouse & La Rochelle University | run2 | 0.3742 |
| 36 | VerbaNexAI I | Universidad Tecnológica de Bolívar | run2 | 0.3726 |
| 37 | DS@GT_HIPE | Georgia Institute of Technology | run2 | 0.3721 |
| 38 | ROSTI | Université Lumière Lyon | run2 | 0.366 |
| 39 | Random Decision Baseline | HIPE-2026 organizers | run1 | 0.3628 |
| 40 | DS@GT_HIPE | Georgia Institute of Technology | run1 | 0.3626 |
| 41 | UMUTEAM | Universidad de Murcia | run3 | 0.362 |
| 42 | FI-CODE | University of the Bundeswehr Munich | run1 | 0.358 |
| 43 | FI-CODE | University of the Bundeswehr Munich | run3 | 0.3546 |
| 44 | FourBytes | Sri Sivasubramaniya Nadar College of Engineering | run1 | 0.3445 |
| 45 | VerbaNexAI I | Universidad Tecnológica de Bolívar | run1 | 0.3346 |
| 46 | UMUTEAM | Universidad de Murcia | run1 | 0.3333 |
Accuracy-Efficiency Profile Ranking
| rank | team | run | mean efficiency profile rank | accuracy score | parameter count | model size |
|---|---|---|---|---|---|---|
| 1 | MILRIT | run3 | 9.6667 | 0.5951 | 277,730,309 | 1111 MB |
| 2 | FI-CODE | run2 | 10.3333 | 0.4734 | 0 | 0 MB |
| 3 | DS@GT_HIPE | run1 | 10.6667 | 0.5142 | 2,087,375 | 87 MB |
| 4 | DS@GT_HIPE | run2 | 12 | 0.4836 | 2,087,375 | 87 MB |
| 5 | DS@GT_HIPE | run3 | 12.3333 | 0.4771 | 2,087,375 | 87 MB |
| 6 | MILRIT | run1 | 13.6667 | 0.5623 | 466,577,920 | 1780 MB |
| 6 | ROSTI | run1 | 13.6667 | 0.446 | 12,279 | 0.8 MB |
| 6 | ROSTI | run2 | 13.6667 | 0.4507 | 12,365 | 0.81 MB |
| 6 | ROSTI | run3 | 13.6667 | 0.4564 | 12,399 | 0.81 MB |
| 7 | Random Decision Baseline | run1 | 15 | 0.4049 | 0 | 0 MB |
| 8 | Awakened | run2 | 15.3333 | 0.5494 | 560,965,127 | 2140 MB |
| 9 | Ministral-3-3B-Instruct GGUF baseline 0.2.2 random seed 42 | run1 | 15.6667 | 0.5818 | 3,000,000,000 | 2147.023 MB |
| 9 | VerbaNexAI I | run2 | 15.6667 | 0.4842 | 355,000,000 | 1424 MB |
| 10 | INSA Lyon | run2 | 16 | 0.4708 | 278,043,651 | 1061 MB |
| 11 | whereami | run1 | 16.6667 | 0.688 | 5,123,178,979 | 9600 MB |
| 12 | whereami | run2 | 17 | 0.6833 | 5,123,178,979 | 9600 MB |
| 12 | FI-CODE | run1 | 17 | 0.427 | 208,935,168 | 816 MB |
| 12 | VerbaNexAI II | run3 | 17 | 0.5795 | 4,000,000,000 | 2840 MB |
| 13 | UMUTEAM | run1 | 17.3333 | 0.4408 | 270,000,000 | 1030 MB |
| 14 | gipplab | run2 | 17.6667 | 0.6271 | 4,465,470,464 | 9012 MB |
| 15 | VerbaNexAI I | run1 | 18 | 0.4628 | 355,000,000 | 1424 MB |
| 16 | UMUTEAM | run2 | 18.3333 | 0.5856 | 4,000,000,000 | 7600 MB |
| 17 | VerbaNexAI II | run1 | 18.6667 | 0.5004 | 1,500,000,000 | 2340 MB |
| 18 | gipplab | run3 | 19.6667 | 0.5069 | 1,949,101,888 | 3845 MB |
| 18 | FourBytes | run1 | 19.6667 | 0.4061 | 278,054,405 | 1060 MB |
| 19 | Hansel&Gretel | run1 | 20 | 0.5458 | 3,000,000,000 | 6248 MB |
| 20 | Spinfo | run1 | 20.3333 | 0.7479 | 116,830,000,000 | 65238 MB |
| 21 | gipplab | run1 | 20.6667 | 0.6141 | 9,300,029,952 | 18398 MB |
| 21 | Spinfo | run3 | 20.6667 | 0.7289 | 116,830,000,000 | 65238 MB |
| 21 | INSA Lyon | run3 | 20.6667 | 0.4731 | 838,778,678 | 3217 MB |
| 22 | Spinfo | run2 | 21 | 0.689 | 116,830,000,000 | 65238 MB |
| 23 | VerbaNexAI II | run2 | 21.6667 | 0.5187 | 5,900,000,000 | 5980 MB |
| 23 | Hansel&Gretel | run2 | 21.6667 | 0.5788 | 7,000,000,000 | 15300 MB |
| 24 | MILRIT | run2 | 22 | 0.4264 | 466,585,989 | 1866 MB |
| 25 | INSA Lyon | run1 | 22.6667 | 0.639 | 101,927,226,758 | 195716 MB |
| 26 | FI-CODE | run3 | 23 | 0.4645 | 2,274,069,824 | 4442 MB |
| 27 | Awakened | run3 | 23.6667 | 0.6671 | 999,999,999,999 | 999999 MB |
| 28 | Awakened | run1 | 24 | 0.6584 | 999,999,999,999 | 999999 MB |
| 29 | Hansel&Gretel | run3 | 24.3333 | 0.6221 | 120,000,000,000 | 240000 MB |
| 30 | BIU_NLP | run2 | 24.6667 | 0.5781 | 27,000,000,000 | 54000 MB |
| 30 | BIU_NLP | run3 | 24.6667 | 0.539 | 24,000,000,000 | 48000 MB |
| 31 | UMUTEAM | run3 | 26 | 0.4495 | 4,000,000,000 | 7600 MB |
| 32 | BIU_NLP | run1 | 31 | 0.4429 | 26,000,000,000 | 52000 MB |
Dataset-Specific Evaluations
The official report contains the detailed dataset-specific rankings and per-run diagnostics links:
- Accuracy Profile by
impressolanguage: German, English, French - Generalization Profile on
surprise: French - Accuracy-Efficiency Profile by
impressolanguage: German, English, French
Main table TSV files: Accuracy, Generalization, Accuracy-Efficiency.
Additional Balanced Accuracy-Efficiency Analysis
The balanced Accuracy-Efficiency analysis gives equal total weight to the Accuracy Profile rank and the combined resource ranks. It is provided as an additional analysis alongside the official Accuracy-Efficiency Profile.
- Rendered official evaluation report
- Download complete official results archive, including all per-run scores, diagnostics, diagnostic metrics, and validation files.
- Balanced Accuracy-Efficiency results: Balanced Efficiency Profile Ranking Overall; TSV table.
Additional Binary Analysis
The binary analysis maps PROBABLE to TRUE for the at relation in both reference and system labels. It is provided as an additional analysis alongside the official ternary evaluation.
- Rendered binary evaluation report
- Download complete binary results archive, including all per-run scores, diagnostics, diagnostic metrics, and validation files.
- Binary dataset-specific evaluations: Accuracy Profile German, English, French; Generalization Profile French; Accuracy-Efficiency Profile German, English, French.
Reproducibility
All competition data, submissions, and evaluation scripts are available in the HIPE-2026 evaluation repository. The task description and data documentation remain available on the Tasks & Data page.