HIPE 2026 evaluation results

Official Results

We evaluated submissions across three main profiles: Accuracy using macro Recall, Generalization on an unseen test set, and Accuracy-Efficiency, which highlights systems that combine strong performance with a lighter model footprint – an important aspect for sustainable and in-house processing of large historical text collections.

Accuracy Ranking – Top 3 Teams

rank	team	affiliation
1	Spinfo	Universität zu Köln
2	MaxFo-Ajie	Foshan University
3	whereami	Alexandria University

Generalization Ranking – Top 3 Teams

rank	team	affiliation
1	MaxFo-Ajie	Foshan University
2	Spinfo	Universität zu Köln
3	BIU_NLP	Bar-Ilan University

Accuracy-Efficiency Ranking – Top 3 Teams

rank	team	affiliation
1	MILRIT	University of Toulouse & La Rochelle University
2	FI-CODE	University of the Bundeswehr Munich
3	DS@GT_HIPE	Georgia Institute of Technology

Full generated reports: official evaluation report and additional binary evaluation report.

Many congratulations to the top-ranked teams!

We also warmly thank all participating teams for their contributions. In total, 17 teams participated, submitting 45 runs. Beyond the main rankings, several teams achieved strong results in language-specific evaluations or offered useful accuracy-efficiency trade-offs.

Main Results

The official ranking uses the ternary at labels TRUE, PROBABLE, and FALSE. The tables below replicate the main aggregated profile rankings from the generated official evaluation report. Dataset-specific tables, diagnostics links, score definitions, and additional efficiency tables are linked below rather than repeated on this page.

Download complete official results archive, including all per-run scores, diagnostics, diagnostic metrics, and validation files.

Accuracy Profile Ranking

rank	team	affiliation	run	mean impresso profile score	languages
1	Spinfo	Universität zu Köln	run1	0.7479	de, en, fr
2	Spinfo	Universität zu Köln	run3	0.7289	de, en, fr
3	MaxFo-Ajie	Foshan University	run1	0.7001	de, en, fr
4	Spinfo	Universität zu Köln	run2	0.689	de, en, fr
5	whereami	Alexandria University	run1	0.688	de, en, fr
6	whereami	Alexandria University	run2	0.6833	de, en, fr
7	MaxFo-Ajie	Foshan University	run2	0.669	de, en, fr
8	Awakened	National University of Science and Technology Politehnica Bucharest	run3	0.6671	de, en, fr
9	Awakened	National University of Science and Technology Politehnica Bucharest	run1	0.6584	de, en, fr
10	MaxFo-Ajie	Foshan University	run3	0.6544	de, en, fr
11	INSA Lyon	INSA Lyon - University of Lyon	run1	0.639	de, en, fr
12	gipplab	University of Göttingen	run2	0.6271	de, en, fr
13	Hansel&Gretel	IIT Roorkee	run3	0.6221	de, en, fr
14	gipplab	University of Göttingen	run1	0.6141	de, en, fr
15	MILRIT	University of Toulouse & La Rochelle University	run3	0.5951	de, en, fr
16	UMUTEAM	Universidad de Murcia	run2	0.5856	de, en, fr
17	Ministral-3-3B-Instruct GGUF baseline 0.2.2 random seed 42	HIPE-2026 organizers	run1	0.5818	de, en, fr
18	VerbaNexAI II	Universidad Tecnológica de Bolívar	run3	0.5795	de, en, fr
19	Hansel&Gretel	IIT Roorkee	run2	0.5788	de, en, fr
20	BIU_NLP	Bar-Ilan University	run2	0.5781	de, en, fr
21	MILRIT	University of Toulouse & La Rochelle University	run1	0.5623	de, en, fr
22	Awakened	National University of Science and Technology Politehnica Bucharest	run2	0.5494	de, en, fr
23	Hansel&Gretel	IIT Roorkee	run1	0.5458	de, en, fr
24	BIU_NLP	Bar-Ilan University	run3	0.539	de, en, fr
25	VerbaNexAI II	Universidad Tecnológica de Bolívar	run2	0.5187	de, en, fr
26	DS@GT_HIPE	Georgia Institute of Technology	run1	0.5142	de, en, fr
27	gipplab	University of Göttingen	run3	0.5069	de, en, fr
28	VerbaNexAI II	Universidad Tecnológica de Bolívar	run1	0.5004	de, en, fr
29	VerbaNexAI I	Universidad Tecnológica de Bolívar	run2	0.4842	de, en, fr
30	DS@GT_HIPE	Georgia Institute of Technology	run2	0.4836	de, en, fr
31	DS@GT_HIPE	Georgia Institute of Technology	run3	0.4771	de, en, fr
32	FI-CODE	University of the Bundeswehr Munich	run2	0.4734	de, en, fr
33	INSA Lyon	INSA Lyon - University of Lyon	run3	0.4731	de, en, fr
34	INSA Lyon	INSA Lyon - University of Lyon	run2	0.4708	de, en, fr
35	FI-CODE	University of the Bundeswehr Munich	run3	0.4645	de, en, fr
36	VerbaNexAI I	Universidad Tecnológica de Bolívar	run1	0.4628	de, en, fr
37	ROSTI	Université Lumière Lyon	run3	0.4564	de, en, fr
38	ROSTI	Université Lumière Lyon	run2	0.4507	de, en, fr
39	UMUTEAM	Universidad de Murcia	run3	0.4495	de, en, fr
40	ROSTI	Université Lumière Lyon	run1	0.446	de, en, fr
41	BIU_NLP	Bar-Ilan University	run1	0.4429	de, en, fr
42	UMUTEAM	Universidad de Murcia	run1	0.4408	de, en, fr
43	FI-CODE	University of the Bundeswehr Munich	run1	0.427	de, en, fr
44	MILRIT	University of Toulouse & La Rochelle University	run2	0.4264	de, en, fr
45	FourBytes	Sri Sivasubramaniya Nadar College of Engineering	run1	0.4061	de, en, fr
46	Random Decision Baseline	HIPE-2026 organizers	run1	0.4049	de, en, fr

Only runs with submissions for all impresso languages are included in the overall Accuracy Profile ranking. Teams with partial results are shown in the dataset-specific tables in the full report.

Generalization Profile Ranking

rank	team	affiliation	run	surprise profile score
1	MaxFo-Ajie	Foshan University	run3	0.8163
2	MaxFo-Ajie	Foshan University	run1	0.7945
3	MaxFo-Ajie	Foshan University	run2	0.7712
4	Spinfo	Universität zu Köln	run3	0.6984
5	Spinfo	Universität zu Köln	run1	0.691
6	BIU_NLP	Bar-Ilan University	run1	0.6837
7	Spinfo	Universität zu Köln	run2	0.6674
8	whereami	Alexandria University	run2	0.6665
9	gipplab	University of Göttingen	run2	0.6647
10	Awakened	National University of Science and Technology Politehnica Bucharest	run3	0.6613
11	Hansel&Gretel	IIT Roorkee	run2	0.6349
12	Awakened	National University of Science and Technology Politehnica Bucharest	run1	0.6338
13	whereami	Alexandria University	run1	0.6325
14	Hansel&Gretel	IIT Roorkee	run3	0.6187
15	Hansel&Gretel	IIT Roorkee	run1	0.6107
16	gipplab	University of Göttingen	run1	0.6085
17	BIU_NLP	Bar-Ilan University	run3	0.6
18	VerbaNexAI II	Universidad Tecnológica de Bolívar	run3	0.5724
19	UMUTEAM	Universidad de Murcia	run2	0.5723
20	Awakened	National University of Science and Technology Politehnica Bucharest	run2	0.5509
21	gipplab	University of Göttingen	run3	0.5382
22	BIU_NLP	Bar-Ilan University	run2	0.5265
23	MILRIT	University of Toulouse & La Rochelle University	run3	0.5152
24	VerbaNexAI II	Universidad Tecnológica de Bolívar	run2	0.5076
25	Ministral-3-3B-Instruct GGUF baseline 0.2.2 random seed 42	HIPE-2026 organizers	run1	0.5062
26	INSA Lyon	INSA Lyon - University of Lyon	run1	0.4705
27	MILRIT	University of Toulouse & La Rochelle University	run1	0.4679
28	VerbaNexAI II	Universidad Tecnológica de Bolívar	run1	0.4419
29	INSA Lyon	INSA Lyon - University of Lyon	run2	0.4231
30	INSA Lyon	INSA Lyon - University of Lyon	run3	0.3986
31	DS@GT_HIPE	Georgia Institute of Technology	run3	0.3919
32	ROSTI	Université Lumière Lyon	run3	0.384
33	ROSTI	Université Lumière Lyon	run1	0.3773
34	FI-CODE	University of the Bundeswehr Munich	run2	0.3755
35	MILRIT	University of Toulouse & La Rochelle University	run2	0.3742
36	VerbaNexAI I	Universidad Tecnológica de Bolívar	run2	0.3726
37	DS@GT_HIPE	Georgia Institute of Technology	run2	0.3721
38	ROSTI	Université Lumière Lyon	run2	0.366
39	Random Decision Baseline	HIPE-2026 organizers	run1	0.3628
40	DS@GT_HIPE	Georgia Institute of Technology	run1	0.3626
41	UMUTEAM	Universidad de Murcia	run3	0.362
42	FI-CODE	University of the Bundeswehr Munich	run1	0.358
43	FI-CODE	University of the Bundeswehr Munich	run3	0.3546
44	FourBytes	Sri Sivasubramaniya Nadar College of Engineering	run1	0.3445
45	VerbaNexAI I	Universidad Tecnológica de Bolívar	run1	0.3346
46	UMUTEAM	Universidad de Murcia	run1	0.3333

Accuracy-Efficiency Profile Ranking

rank	team	run	mean efficiency profile rank	accuracy score	parameter count	model size
1	MILRIT	run3	9.6667	0.5951	277,730,309	1111 MB
2	FI-CODE	run2	10.3333	0.4734	0	0 MB
3	DS@GT_HIPE	run1	10.6667	0.5142	2,087,375	87 MB
4	DS@GT_HIPE	run2	12	0.4836	2,087,375	87 MB
5	DS@GT_HIPE	run3	12.3333	0.4771	2,087,375	87 MB
6	MILRIT	run1	13.6667	0.5623	466,577,920	1780 MB
6	ROSTI	run1	13.6667	0.446	12,279	0.8 MB
6	ROSTI	run2	13.6667	0.4507	12,365	0.81 MB
6	ROSTI	run3	13.6667	0.4564	12,399	0.81 MB
7	Random Decision Baseline	run1	15	0.4049	0	0 MB
8	Awakened	run2	15.3333	0.5494	560,965,127	2140 MB
9	Ministral-3-3B-Instruct GGUF baseline 0.2.2 random seed 42	run1	15.6667	0.5818	3,000,000,000	2147.023 MB
9	VerbaNexAI I	run2	15.6667	0.4842	355,000,000	1424 MB
10	INSA Lyon	run2	16	0.4708	278,043,651	1061 MB
11	whereami	run1	16.6667	0.688	5,123,178,979	9600 MB
12	whereami	run2	17	0.6833	5,123,178,979	9600 MB
12	FI-CODE	run1	17	0.427	208,935,168	816 MB
12	VerbaNexAI II	run3	17	0.5795	4,000,000,000	2840 MB
13	UMUTEAM	run1	17.3333	0.4408	270,000,000	1030 MB
14	gipplab	run2	17.6667	0.6271	4,465,470,464	9012 MB
15	VerbaNexAI I	run1	18	0.4628	355,000,000	1424 MB
16	UMUTEAM	run2	18.3333	0.5856	4,000,000,000	7600 MB
17	VerbaNexAI II	run1	18.6667	0.5004	1,500,000,000	2340 MB
18	gipplab	run3	19.6667	0.5069	1,949,101,888	3845 MB
18	FourBytes	run1	19.6667	0.4061	278,054,405	1060 MB
19	Hansel&Gretel	run1	20	0.5458	3,000,000,000	6248 MB
20	Spinfo	run1	20.3333	0.7479	116,830,000,000	65238 MB
21	gipplab	run1	20.6667	0.6141	9,300,029,952	18398 MB
21	Spinfo	run3	20.6667	0.7289	116,830,000,000	65238 MB
21	INSA Lyon	run3	20.6667	0.4731	838,778,678	3217 MB
22	Spinfo	run2	21	0.689	116,830,000,000	65238 MB
23	VerbaNexAI II	run2	21.6667	0.5187	5,900,000,000	5980 MB
23	Hansel&Gretel	run2	21.6667	0.5788	7,000,000,000	15300 MB
24	MILRIT	run2	22	0.4264	466,585,989	1866 MB
25	INSA Lyon	run1	22.6667	0.639	101,927,226,758	195716 MB
26	FI-CODE	run3	23	0.4645	2,274,069,824	4442 MB
27	Awakened	run3	23.6667	0.6671	999,999,999,999	999999 MB
28	Awakened	run1	24	0.6584	999,999,999,999	999999 MB
29	Hansel&Gretel	run3	24.3333	0.6221	120,000,000,000	240000 MB
30	BIU_NLP	run2	24.6667	0.5781	27,000,000,000	54000 MB
30	BIU_NLP	run3	24.6667	0.539	24,000,000,000	48000 MB
31	UMUTEAM	run3	26	0.4495	4,000,000,000	7600 MB
32	BIU_NLP	run1	31	0.4429	26,000,000,000	52000 MB

Dataset-Specific Evaluations

The official report contains the detailed dataset-specific rankings and per-run diagnostics links:

Accuracy Profile by impresso language: German, English, French
Generalization Profile on surprise: French
Accuracy-Efficiency Profile by impresso language: German, English, French

Main table TSV files: Accuracy, Generalization, Accuracy-Efficiency.

Additional Balanced Accuracy-Efficiency Analysis

The balanced Accuracy-Efficiency analysis gives equal total weight to the Accuracy Profile rank and the combined resource ranks. It is provided as an additional analysis alongside the official Accuracy-Efficiency Profile.

Rendered official evaluation report
Download complete official results archive, including all per-run scores, diagnostics, diagnostic metrics, and validation files.
Balanced Accuracy-Efficiency results: Balanced Efficiency Profile Ranking Overall; TSV table.

Additional Binary Analysis

The binary analysis maps PROBABLE to TRUE for the at relation in both reference and system labels. It is provided as an additional analysis alongside the official ternary evaluation.

Rendered binary evaluation report
Download complete binary results archive, including all per-run scores, diagnostics, diagnostic metrics, and validation files.
Binary dataset-specific evaluations: Accuracy Profile German, English, French; Generalization Profile French; Accuracy-Efficiency Profile German, English, French.

Reproducibility

All competition data, submissions, and evaluation scripts are available in the HIPE-2026 evaluation repository. The task description and data documentation remain available on the Tasks & Data page.

On This Page