Word alignment plays a crucial role in several NLP tasks, such as lexicon injection and cross-lingual label projection. The evaluation of word alignment systems relies heavily on manually-curated datasets, which are not always available, especially in mid-and low-resource languages. In order to address this limitation, we propose XL-WA, a novel entirely manually-curated evaluation benchmark for word alignment covering 14 language pairs. We illustrate the creation process of our benchmark and compare statistical and neural approaches to word alignment in both language-specific and zero-shot settings, thus investigating the ability of state-of-the-art models to generalize on unseen language pairs. We release our new benchmark at: https://github.com/SapienzaNLP/XL-WA.
Dettaglio pubblicazione
2023, Proceedings of the Ninth Italian Conference on Computational Linguistics, Pages -
XL-WA: a Gold Evaluation Benchmark for Word Alignment in 14 Language Pairs (04b Atto di convegno in volume)
Martelli Federico, Bejgu ANDREI STEFAN, Campagnano Cesare, Čibej Jaka, Costa Rute, Gantar Apolonija, Kallas Jelena, Koeva Svetla, Koppel Kristina, Krek Simon, Langemets Margit, Lipp Veronika, Nimb Sanni, Olsen Sussi, Sandford Pedersen Bolette, Quochi Valeria, Salgado Ana, Simon László, Tiberius Carole, Ureña-Ruiz Rafael-J, Navigli Roberto
Gruppo di ricerca: Natural Language Processing
keywords