INF

Scientific services and data management

INF supports the CRC projects through all research stages, including the planning of empirical work, its statistical analysis, and the management, curation and archiving of data under CC-0 or CC-BY licenses to ensure reproducibility and long-term sustainability. To achieve this, INF (1) optimizes the research workflow by formulating a structured data management plan; (2) provides statistical training through workshops and individual consultations; (3) assists in the maintenance of data repositories for short- and long-term storage; (4) develops novel computational tools and annotation schemes, including virtual reality-based stimulus design and multimodal tracking methods.

Services provided by INF

Consulting

Data management services
Assistance with experiment design, statistical methods and open science practices
Development of novel computational tools and annotation schemes
Evaluation and benchmarking of NLP and multimodal AI systems
Corpus design, data annotation, and annotation guidelines
Development of reproducible NLP and AI pipelines
Integration of AI components into existing research or production infrastructures
Model selection, prompt engineering, and RAG-based workflows
Data protection, transparency, and responsible AI use
Scientific validation of computational methods and outputs
Training and workshops on NLP, LLMs, and multimodal AI

Infrastructure

GitLab-Instance
OpenWebUI-Instance
Jupyter-Hub
Nextcloud-Instance

More about our project

You can find more information on our research activities on the following pages:
Text Technology Lab
Romance Lab

How to contact us

Please use our central email address to maintain regular and organized communication with us. We look forward to receiving your requests.

Publications

2026

Hammerla, Leon; Mehler, Alexander

Gutenberg+: A More Temporally Faithful Corpus for Diachronic NLP Proceedings Article

In: Proceedings Workshop on Structured Linguistic Data and Evaluation (SLiDE 2026), co-located with the Language Resources and Evaluation Conference (LREC 2026), Palma de Mallorca (Spain), 2026, (accepted).

BibTeX

Abusaleh, Ali; Hammerla, Leon; Mehler, Alexander

Learning to Detect Cross-Modal Negation: An Analysis of Latent Representations and an Attention-Based Solution Proceedings Article

In: 2026 8th International Conference on Natural Language Processing (ICNLP), Xi'an,China, 2026, (accepted).

Abstract | BibTeX

Borkowski, Cedric; Abrami, Giuseppe; Terefe, Dawit; Baumartz, Daniel; Mehler, Alexander

DUUIgateway: A Web Service for Platform-independent, Ubiquitous Big Data NLP Journal Article

In: SoftwareX, vol. 34, pp. 102549, 2026, ISSN: 2352-7110.

Abstract | Links | BibTeX

Lücking, Andy; Hammerla, Leon; Mehler, Alexander

Not every quantifier can be negated Proceedings Article Forthcoming

In: Proceedings of textitSinn und Bedeutung, Special Session ``Philosophical and Linguistic Approaches to Negation (PhilLingNeg)'', Frankfurt am Main, Forthcoming, (accepted).

BibTeX

Hammerla, Leon; Mehler, Alexander

Negation in Reasoning Traces: Interpretable Signals of Correctness and Provenance Proceedings Article

In: Proceedings of the 6th Workshop on Natural Logic Meets Machine Learning (NALOMA), Prague (Czech Republic), 2026, (accepted).

BibTeX

2025

Hammerla, Leon; Lücking, Andy; Reinert, Carolin; Mehler, Alexander

D-Neg: Syntax-Aware Graph Reasoning for Negation Detection Proceedings Article

In: Inui, Kentaro; Sakti, Sakriani; Wang, Haofen; Wong, Derek F.; Bhattacharyya, Pushpak; Banerjee, Biplab; Ekbal, Asif; Chakraborty, Tanmoy; Singh, Dhirendra Pratap (Ed.): Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, pp. 1432–1454, The Asian Federation of Natural Language Processing and The Association for Computational Linguistics, Mumbai, India, 2025, ISBN: 979-8-89176-303-6.

Abstract | Links | BibTeX

Hammerla, Leon; Mehler, Alexander; Abrami, Giuseppe

Standardizing Heterogeneous Corpora with DUUR: A Dual Data- and Process-Oriented Approach to Enhancing NLP Pipeline Integration Proceedings Article

In: Inui, Kentaro; Sakti, Sakriani; Wang, Haofen; Wong, Derek F.; Bhattacharyya, Pushpak; Banerjee, Biplab; Ekbal, Asif; Chakraborty, Tanmoy; Singh, Dhirendra Pratap (Ed.): Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, pp. 1410–1425, The Asian Federation of Natural Language Processing and The Association for Computational Linguistics, Mumbai, India, 2025, ISBN: 979-8-89176-303-6.

Abstract | Links | BibTeX

@inproceedings{Hammerla:et:al:2025a,

title = {Standardizing Heterogeneous Corpora with DUUR: A Dual Data- 

 and Process-Oriented Approach to Enhancing NLP Pipeline Integration},

author = {Leon Hammerla and Alexander Mehler and Giuseppe Abrami},

editor = {Kentaro Inui and Sakriani Sakti and Haofen Wang and Derek F. Wong and Pushpak Bhattacharyya and Biplab Banerjee and Asif Ekbal and Tanmoy Chakraborty and Dhirendra Pratap Singh},

url = {https://aclanthology.org/2025.findings-ijcnlp.87/},

isbn = {979-8-89176-303-6},

year  = {2025},

date = {2025-12-01},

booktitle = {Proceedings of the 14th International Joint Conference on Natural 

 Language Processing and the 4th Conference of the Asia-Pacific 

 Chapter of the Association for Computational Linguistics},

pages = {1410–1425},

publisher = {The Asian Federation of Natural Language Processing and The Association for Computational Linguistics},

address = {Mumbai, India},

abstract = {Despite their success, LLMs are too computationally expensive 

 to replace task- or domain-specific NLP systems. However, the 

 variety of corpus formats makes reusing these systems difficult. 

 This underscores the importance of maintaining an interoperable 

 NLP landscape. We address this challenge by pursuing two objectives: 

 standardizing corpus formats and enabling massively parallel corpus 

 processing. We present a unified conversion framework embedded 

 in a massively parallel, microservice-based, programming language-independent 

 NLP architecture designed for modularity and extensibility. It 

 allows for the integration of external NLP conversion tools and 

 supports the addition of new components that meet basic compatibility 

 requirements. To evaluate our dual data- and process-oriented 

 approach to standardization, we (1) benchmark its efficiency in 

 terms of processing speed and memory usage, (2) demonstrate the 

 benefits of standardized corpus formats for NLP downstream tasks, 

 and (3) illustrate the advantages of incorporating custom formats 

 into a corpus format ecosystem.},

keywords = {},

pubstate = {published},

tppubtype = {inproceedings}

}

Bundan, Daniel; Abrami, Giuseppe; Mehler, Alexander

Multimodal Docker Unified UIMA Interface: New Horizons for Distributed Microservice-Oriented Processing of Corpora using UIMA Proceedings Article

In: Wartena, Christian; Heid, Ulrich (Ed.): Proceedings of the 21st Conference on Natural Language Processing (KONVENS 2025): Long and Short Papers, pp. 257–268, HsH Applied Academics, Hildesheim, Germany, 2025.

Links | BibTeX

Lücking, Andy; Ginzburg, Jonathan

Exceptions From Rules and Noteworthy Exceptions Journal Article

In: Linguistics and Philosophy, vol. 48, pp. 371-409, 2025.

Links | BibTeX

Abrami, Giuseppe; Genios, Markos; Fitzermann, Filip; Baumartz, Daniel; Mehler, Alexander

Docker Unified UIMA Interface: New perspectives for NLP on big data Journal Article

In: SoftwareX, vol. 29, pp. 102033, 2025, ISSN: 2352-7110.

Abstract | Links | BibTeX

Lücking, Andy

Referential Transparency Theory Book Section

In: Schierholz, Stefan J.; Giacomini, Laura (Ed.): Wörterbücher zur Sprach- und Kommunikationswissenschaft (WSK) Online, De Gruyter, Berlin and Boston, 2025.

Links | BibTeX