📖 What is IberBench?

IberBench is a hub comprised of datasets for languages across the Iberian Peninsula and Ibero-America regions, aimed to be used as a benchmark to evaluate causal language models. This initiative aims to enrich the Natural Language Processing (NLP) community in the Iberian Peninsula and Ibero-America. The benchmark enables the evaluation of NLP models in multiple Spanish variants and other languages such as Catalan, Galician, Basque, and Portuguese, fostering assessments and developments that reflect the linguistic diversity of these regions.

We hope to drive multilingual research that considers the cultural and linguistic richness and complexity of the Spanish-speaking world, encouraging the creation of models that are truly representative of these realities.

📂 What are the data sources?

IberBench contains datasets from prominent workshops in the field such as IberLEF@SEPLN or PAN@CLEF, as well as stablished existing benchmarks as those from HiTZ or BSC, with the aim to incorporate standardized and consistent evaluation within this context, enhancing the value of the data and models derived from this effort.

We strictly adhere to all established guidelines and regulations concerning the use and publication of this data. Specifically:

The collected datasets are published on 🤗HuggingFace private repositories, with appropriate credit given to the authors in the model card.
Under no circumstances we claim ownership of the datasets.
The test splits of the datasets are kept private to avoid leakage from IberBench side.

In any publication or presentation resulting from work with this data, we recognize the importance of citing and crediting to the organizing teams that crafted the datasets used at IberBench.

🙋 How can I join to IberBench?

IberBench comprises a committee composed of specialists in NLP, language ethics, and gender discrimination, drawn from both academia and industry, which will oversee the development of the project, ensuring its quality and relevance.

To be part of this committee, you can ask to join the IberBench organization at 🤗HuggingFace. Your request will be validated by experts already belonging to the organization.

🤝 How can I contribute to IberBench?

First, the initial committee will gather all the datasets from prominent workshops. From this, you can contribute with new datasets to the IberBench organization. The process is as follows:

Open a new discussion in the IberBench discussions space, linking to an existing dataset in the 🤗HuggingFace hub and explaining why the inclusion is relevant.
Discuss with the committee for the approval or rejection of the dataset.
If approval: your dataset will be included into the IberBench datasets, and will be used to evaluate LLMs in the IberBench leaderboard.

IberBench will never claim ownership over the dataset, the original author will receive all credits.

💬 Social networks

You can reach us at:

X: https://x.com/IberBench
🤗 Discussions: https://huggingface.co/spaces/iberbench/README/discussions
Github: https://github.com/IberBench

🫶 Acknowledgements

IberBench has been funded by the Valencian Institute for Business Competitiveness (IVACE).