Scientific services and data managementÂ
INF supports the CRC projects through all research stages, including the planning of empirical work, its statistical analysis, and the management, curation and archiving of data under CC-0 or CC-BY licenses to ensure reproducibility and long-term sustainability. To achieve this, INF (1) optimizes the research workflow by formulating a structured data management plan; (2) provides statistical training through workshops and individual consultations; (3) assists in the maintenance of data repositories for short- and long-term storage; (4) develops novel computational tools and annotation schemes, including virtual reality-based stimulus design and multimodal tracking methods.Â
Services provided by INF
Consulting
- Data management services
- Assistance with experiment design, statistical methods and open science practices
- Development of novel computational tools and annotation schemes
- Evaluation and benchmarking of NLP and multimodal AI systems
- Corpus design, data annotation, and annotation guidelines
- Development of reproducible NLP and AI pipelines
- Integration of AI components into existing research or production infrastructures
- Model selection, prompt engineering, and RAG-based workflows
- Data protection, transparency, and responsible AI use
- Scientific validation of computational methods and outputs
- Training and workshops on NLP, LLMs, and multimodal AI
Infrastructure
- GitLab-Instance
- OpenWebUI-Instance
- Jupyter-Hub
- Nextcloud-Instance
More about our project
You can find more information on our research activities on the following pages:
Text Technology Lab
Romance Lab
How to contact us
Please use our central email address to maintain regular and organized communication with us. We look forward to receiving your requests.
Publications
2026
Hammerla, Leon; Mehler, Alexander
Gutenberg+: A More Temporally Faithful Corpus for Diachronic NLP Proceedings Article
In: Proceedings Workshop on Structured Linguistic Data and Evaluation (SLiDE 2026), co-located with the Language Resources and Evaluation Conference (LREC 2026), Palma de Mallorca (Spain), 2026, (accepted).
@inproceedings{Hammerla:Mehler:2026:a,
title = {Gutenberg+: A More Temporally Faithful Corpus for Diachronic NLP},
author = {Leon Hammerla and Alexander Mehler},
year = {2026},
date = {2026-01-01},
booktitle = {Proceedings Workshop on Structured Linguistic Data and Evaluation
(SLiDE 2026), co-located with the Language Resources and Evaluation
Conference (LREC 2026)},
address = {Palma de Mallorca (Spain)},
note = {accepted},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Abusaleh, Ali; Hammerla, Leon; Mehler, Alexander
Learning to Detect Cross-Modal Negation: An Analysis of Latent Representations and an Attention-Based Solution Proceedings Article
In: 2026 8th International Conference on Natural Language Processing (ICNLP), Xi'an,China, 2026, (accepted).
@inproceedings{Abusaleh:et:al:2026,
title = {Learning to Detect Cross-Modal Negation: An Analysis of Latent
Representations and an Attention-Based Solution},
author = {Ali Abusaleh and Leon Hammerla and Alexander Mehler},
year = {2026},
date = {2026-01-01},
booktitle = {2026 8th International Conference on Natural Language Processing (ICNLP)},
address = {Xi'an,China},
abstract = {Detecting high-level semantic concepts like negation across modalities
remains a challenge for current multimodal systems. We analyze
this as a fundamental representation learning problem, providing
the first evidence that negation does not form a linearly or non-linearly
separable class in the latent spaces of standard vision-language
models (VLMs). We demonstrate that pretrained embeddings primarily
encode modality-specific features, lacking a generalizable negation
signal. To overcome this, we propose a novel cross-modal attention
architecture that explicitly models inter-modal dependencies,
achieving performance gains of up to +7.03% F1 over unimodal baselines.
Our analysis reveals a key asymmetry: while textual negation often
appears independently, visual negation is semantically dependent
on linguistic context, a finding validated through our statistical
analysis of 3,222 political video-text pairs automatically annotated
via Qwen2.5-VL. By combining this analysis with self-supervised
video representations (JEPA2), we advance the modeling of temporal
negation. This work provides new methods and insights for learning
robust, semantically-aligned representations in multimodal systems.},
note = {accepted},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
remains a challenge for current multimodal systems. We analyze
this as a fundamental representation learning problem, providing
the first evidence that negation does not form a linearly or non-linearly
separable class in the latent spaces of standard vision-language
models (VLMs). We demonstrate that pretrained embeddings primarily
encode modality-specific features, lacking a generalizable negation
signal. To overcome this, we propose a novel cross-modal attention
architecture that explicitly models inter-modal dependencies,
achieving performance gains of up to +7.03% F1 over unimodal baselines.
Our analysis reveals a key asymmetry: while textual negation often
appears independently, visual negation is semantically dependent
on linguistic context, a finding validated through our statistical
analysis of 3,222 political video-text pairs automatically annotated
via Qwen2.5-VL. By combining this analysis with self-supervised
video representations (JEPA2), we advance the modeling of temporal
negation. This work provides new methods and insights for learning
robust, semantically-aligned representations in multimodal systems.
Borkowski, Cedric; Abrami, Giuseppe; Terefe, Dawit; Baumartz, Daniel; Mehler, Alexander
DUUIgateway: A Web Service for Platform-independent, Ubiquitous Big Data NLP Journal Article
In: SoftwareX, vol. 34, pp. 102549, 2026, ISSN: 2352-7110.
@article{Borkowski:et:al:2026,
title = {DUUIgateway: A Web Service for Platform-independent, Ubiquitous Big Data NLP},
author = {Cedric Borkowski and Giuseppe Abrami and Dawit Terefe and Daniel Baumartz and Alexander Mehler},
url = {https://www.sciencedirect.com/science/article/pii/S2352711026000439},
doi = {https://doi.org/10.1016/j.softx.2026.102549},
issn = {2352-7110},
year = {2026},
date = {2026-01-01},
journal = {SoftwareX},
volume = {34},
pages = {102549},
abstract = {Distributed processing of unstructured text data is a challenge
in the rapidly changing and evolving natural language processing
(NLP) landscape. This landscape is characterized by heterogeneous
systems, models, and formats, and especially by the increasing
influence of AI systems. While many of these systems handle text
data, there are also unified systems that process multiple input
and output formats, while allowing for distributed corpus processing.
However, there are hardly any user-friendly interfaces that allow
existing NLP frameworks to be used flexibly and extended in a
user-controlled manner. Due to this gap and the increasing importance
of NLP for various scientific disciplines, there has been a demand
for a web and API based flexible software solution for deploying,
managing and monitoring NLP systems. Such a solution is provided
by Docker Unified UIMA-gateway. We introduce DUUIgateway and evaluate
its API and user-driven approach to encapsulation. We also describe
how these features improve the usability and accessibility of
the NLP framework DUUI. We illustrate DUUIgateway in the field
of process modeling in higher education and show how it closes
the latter gap in NLP by making a variety of systems for processing
text and multimodal data accessible to non-experts.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
in the rapidly changing and evolving natural language processing
(NLP) landscape. This landscape is characterized by heterogeneous
systems, models, and formats, and especially by the increasing
influence of AI systems. While many of these systems handle text
data, there are also unified systems that process multiple input
and output formats, while allowing for distributed corpus processing.
However, there are hardly any user-friendly interfaces that allow
existing NLP frameworks to be used flexibly and extended in a
user-controlled manner. Due to this gap and the increasing importance
of NLP for various scientific disciplines, there has been a demand
for a web and API based flexible software solution for deploying,
managing and monitoring NLP systems. Such a solution is provided
by Docker Unified UIMA-gateway. We introduce DUUIgateway and evaluate
its API and user-driven approach to encapsulation. We also describe
how these features improve the usability and accessibility of
the NLP framework DUUI. We illustrate DUUIgateway in the field
of process modeling in higher education and show how it closes
the latter gap in NLP by making a variety of systems for processing
text and multimodal data accessible to non-experts.
Lücking, Andy; Hammerla, Leon; Mehler, Alexander
Not every quantifier can be negated Proceedings Article Forthcoming
In: Proceedings of textitSinn und Bedeutung, Special Session ``Philosophical and Linguistic Approaches to Negation (PhilLingNeg)'', Frankfurt am Main, Forthcoming, (accepted).
@inproceedings{Luecking:Hammerla:Mehler:2026,
title = {Not every quantifier can be negated},
author = {Andy Lücking and Leon Hammerla and Alexander Mehler},
year = {2026},
date = {2026-01-01},
booktitle = {Proceedings of textitSinn und Bedeutung, Special Session ``Philosophical
and Linguistic Approaches to Negation (PhilLingNeg)''},
address = {Frankfurt am Main},
series = {SuB'30},
note = {accepted},
keywords = {},
pubstate = {forthcoming},
tppubtype = {inproceedings}
}
Hammerla, Leon; Mehler, Alexander
Negation in Reasoning Traces: Interpretable Signals of Correctness and Provenance Proceedings Article
In: Proceedings of the 6th Workshop on Natural Logic Meets Machine Learning (NALOMA), Prague (Czech Republic), 2026, (accepted).
@inproceedings{Hammerla:Mehler:2026:b,
title = {Negation in Reasoning Traces: Interpretable Signals of Correctness
and Provenance},
author = {Leon Hammerla and Alexander Mehler},
year = {2026},
date = {2026-01-01},
booktitle = {Proceedings of the 6th Workshop on Natural Logic Meets Machine Learning (NALOMA)},
address = {Prague (Czech Republic)},
note = {accepted},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
2025
Hammerla, Leon; Lücking, Andy; Reinert, Carolin; Mehler, Alexander
D-Neg: Syntax-Aware Graph Reasoning for Negation Detection Proceedings Article
In: Inui, Kentaro; Sakti, Sakriani; Wang, Haofen; Wong, Derek F.; Bhattacharyya, Pushpak; Banerjee, Biplab; Ekbal, Asif; Chakraborty, Tanmoy; Singh, Dhirendra Pratap (Ed.): Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, pp. 1432–1454, The Asian Federation of Natural Language Processing and The Association for Computational Linguistics, Mumbai, India, 2025, ISBN: 979-8-89176-303-6.
@inproceedings{Hammerla:et:al:2025b,
title = {D-Neg: Syntax-Aware Graph Reasoning for Negation Detection},
author = {Leon Hammerla and Andy Lücking and Carolin Reinert and Alexander Mehler},
editor = {Kentaro Inui and Sakriani Sakti and Haofen Wang and Derek F. Wong and Pushpak Bhattacharyya and Biplab Banerjee and Asif Ekbal and Tanmoy Chakraborty and Dhirendra Pratap Singh},
url = {https://aclanthology.org/2025.findings-ijcnlp.89/},
isbn = {979-8-89176-303-6},
year = {2025},
date = {2025-12-01},
booktitle = {Proceedings of the 14th International Joint Conference on Natural
Language Processing and the 4th Conference of the Asia-Pacific
Chapter of the Association for Computational Linguistics},
pages = {1432–1454},
publisher = {The Asian Federation of Natural Language Processing and The Association for Computational Linguistics},
address = {Mumbai, India},
abstract = {Despite the communicative importance of negation, its detection
remains challenging. Previous approaches perform poorly in out-of-domain
scenarios, and progress outside of English has been slow due to
a lack of resources and robust models. To address this gap, we
present D-Neg: a syntax-aware graph reasoning model based on a
transformer that incorporates syntactic embeddings by attention-gating.
D-Neg uses graph attention to represent syntactic structures,
emulating the effectiveness of rule-based dependency approaches
for negation detection. We train D-Neg using 7 English resources
and their translations into 10 languages, all aligned at the annotation
level. We conduct an evaluation of all these datasets in in-domain
and out-of-domain settings. Our work represents a significant
advance in negation detection, enabling more effective cross-lingual
research.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
remains challenging. Previous approaches perform poorly in out-of-domain
scenarios, and progress outside of English has been slow due to
a lack of resources and robust models. To address this gap, we
present D-Neg: a syntax-aware graph reasoning model based on a
transformer that incorporates syntactic embeddings by attention-gating.
D-Neg uses graph attention to represent syntactic structures,
emulating the effectiveness of rule-based dependency approaches
for negation detection. We train D-Neg using 7 English resources
and their translations into 10 languages, all aligned at the annotation
level. We conduct an evaluation of all these datasets in in-domain
and out-of-domain settings. Our work represents a significant
advance in negation detection, enabling more effective cross-lingual
research.
Hammerla, Leon; Mehler, Alexander; Abrami, Giuseppe
Standardizing Heterogeneous Corpora with DUUR: A Dual Data- and Process-Oriented Approach to Enhancing NLP Pipeline Integration Proceedings Article
In: Inui, Kentaro; Sakti, Sakriani; Wang, Haofen; Wong, Derek F.; Bhattacharyya, Pushpak; Banerjee, Biplab; Ekbal, Asif; Chakraborty, Tanmoy; Singh, Dhirendra Pratap (Ed.): Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, pp. 1410–1425, The Asian Federation of Natural Language Processing and The Association for Computational Linguistics, Mumbai, India, 2025, ISBN: 979-8-89176-303-6.
@inproceedings{Hammerla:et:al:2025a,
title = {Standardizing Heterogeneous Corpora with DUUR: A Dual Data-
and Process-Oriented Approach to Enhancing NLP Pipeline Integration},
author = {Leon Hammerla and Alexander Mehler and Giuseppe Abrami},
editor = {Kentaro Inui and Sakriani Sakti and Haofen Wang and Derek F. Wong and Pushpak Bhattacharyya and Biplab Banerjee and Asif Ekbal and Tanmoy Chakraborty and Dhirendra Pratap Singh},
url = {https://aclanthology.org/2025.findings-ijcnlp.87/},
isbn = {979-8-89176-303-6},
year = {2025},
date = {2025-12-01},
booktitle = {Proceedings of the 14th International Joint Conference on Natural
Language Processing and the 4th Conference of the Asia-Pacific
Chapter of the Association for Computational Linguistics},
pages = {1410–1425},
publisher = {The Asian Federation of Natural Language Processing and The Association for Computational Linguistics},
address = {Mumbai, India},
abstract = {Despite their success, LLMs are too computationally expensive
to replace task- or domain-specific NLP systems. However, the
variety of corpus formats makes reusing these systems difficult.
This underscores the importance of maintaining an interoperable
NLP landscape. We address this challenge by pursuing two objectives:
standardizing corpus formats and enabling massively parallel corpus
processing. We present a unified conversion framework embedded
in a massively parallel, microservice-based, programming language-independent
NLP architecture designed for modularity and extensibility. It
allows for the integration of external NLP conversion tools and
supports the addition of new components that meet basic compatibility
requirements. To evaluate our dual data- and process-oriented
approach to standardization, we (1) benchmark its efficiency in
terms of processing speed and memory usage, (2) demonstrate the
benefits of standardized corpus formats for NLP downstream tasks,
and (3) illustrate the advantages of incorporating custom formats
into a corpus format ecosystem.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
to replace task- or domain-specific NLP systems. However, the
variety of corpus formats makes reusing these systems difficult.
This underscores the importance of maintaining an interoperable
NLP landscape. We address this challenge by pursuing two objectives:
standardizing corpus formats and enabling massively parallel corpus
processing. We present a unified conversion framework embedded
in a massively parallel, microservice-based, programming language-independent
NLP architecture designed for modularity and extensibility. It
allows for the integration of external NLP conversion tools and
supports the addition of new components that meet basic compatibility
requirements. To evaluate our dual data- and process-oriented
approach to standardization, we (1) benchmark its efficiency in
terms of processing speed and memory usage, (2) demonstrate the
benefits of standardized corpus formats for NLP downstream tasks,
and (3) illustrate the advantages of incorporating custom formats
into a corpus format ecosystem.
Bundan, Daniel; Abrami, Giuseppe; Mehler, Alexander
Multimodal Docker Unified UIMA Interface: New Horizons for Distributed Microservice-Oriented Processing of Corpora using UIMA Proceedings Article
In: Wartena, Christian; Heid, Ulrich (Ed.): Proceedings of the 21st Conference on Natural Language Processing (KONVENS 2025): Long and Short Papers, pp. 257–268, HsH Applied Academics, Hildesheim, Germany, 2025.
@inproceedings{Bundan:Abrami:Mehler:2025,
title = {Multimodal Docker Unified UIMA Interface: New Horizons for Distributed
Microservice-Oriented Processing of Corpora using UIMA},
author = {Daniel Bundan and Giuseppe Abrami and Alexander Mehler},
editor = {Christian Wartena and Ulrich Heid},
url = {https://aclanthology.org/2025.konvens-1.22/},
year = {2025},
date = {2025-01-01},
booktitle = {Proceedings of the 21st Conference on Natural Language Processing
(KONVENS 2025): Long and Short Papers},
pages = {257–268},
publisher = {HsH Applied Academics},
address = {Hildesheim, Germany},
series = {KONVENS '25},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Lücking, Andy; Ginzburg, Jonathan
Exceptions From Rules and Noteworthy Exceptions Journal Article
In: Linguistics and Philosophy, vol. 48, pp. 371-409, 2025.
@article{Luecking:Ginzburg:2025-exceptions,
title = {Exceptions From Rules and Noteworthy Exceptions},
author = {Andy Lücking and Jonathan Ginzburg},
url = {https://doi.org/10.1007/s10988-024-09429-1},
doi = {10.1007/s10988-024-09429-1},
year = {2025},
date = {2025-01-01},
journal = {Linguistics and Philosophy},
volume = {48},
pages = {371-409},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Abrami, Giuseppe; Genios, Markos; Fitzermann, Filip; Baumartz, Daniel; Mehler, Alexander
Docker Unified UIMA Interface: New perspectives for NLP on big data Journal Article
In: SoftwareX, vol. 29, pp. 102033, 2025, ISSN: 2352-7110.
@article{Abrami:et:al:2025:a,
title = {Docker Unified UIMA Interface: New perspectives for NLP on big data},
author = {Giuseppe Abrami and Markos Genios and Filip Fitzermann and Daniel Baumartz and Alexander Mehler},
url = {https://www.sciencedirect.com/science/article/pii/S2352711024004047},
doi = {https://doi.org/10.1016/j.softx.2024.102033},
issn = {2352-7110},
year = {2025},
date = {2025-01-01},
journal = {SoftwareX},
volume = {29},
pages = {102033},
abstract = {Processing large amounts of natural language text using machine
learning-based models is becoming important in many disciplines.
This demand is being met by a variety of approaches, resulting
in the heterogeneous deployment of separate, partly incompatible,
not natively scalable applications. To overcome the technological
bottleneck involved, we have developed Docker Unified UIMA Interface,
a system for the standardized, parallel, platform-independent,
distributed and microservices-based solution for processing large
and extensive text corpora with any NLP method. We present DUUI
as a framework that enables automated orchestration of GPU-based
NLP processes beyond the existing Docker Swarm cluster variant,
and in addition to the adaptation to new runtime environments
such as Kubernetes. Therefore, a new driver for DUUI is introduced,
which enables the lightweight orchestration of DUUI processes
within a Kubernetes environment in a scalable setup. In this way,
the paper opens up novel text-technological perspectives for existing
practices in disciplines that deal with the scientific analysis
of large amounts of data based on NLP.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
learning-based models is becoming important in many disciplines.
This demand is being met by a variety of approaches, resulting
in the heterogeneous deployment of separate, partly incompatible,
not natively scalable applications. To overcome the technological
bottleneck involved, we have developed Docker Unified UIMA Interface,
a system for the standardized, parallel, platform-independent,
distributed and microservices-based solution for processing large
and extensive text corpora with any NLP method. We present DUUI
as a framework that enables automated orchestration of GPU-based
NLP processes beyond the existing Docker Swarm cluster variant,
and in addition to the adaptation to new runtime environments
such as Kubernetes. Therefore, a new driver for DUUI is introduced,
which enables the lightweight orchestration of DUUI processes
within a Kubernetes environment in a scalable setup. In this way,
the paper opens up novel text-technological perspectives for existing
practices in disciplines that deal with the scientific analysis
of large amounts of data based on NLP.
Lücking, Andy
Referential Transparency Theory Book Section
In: Schierholz, Stefan J.; Giacomini, Laura (Ed.): Wörterbücher zur Sprach- und Kommunikationswissenschaft (WSK) Online, De Gruyter, Berlin and Boston, 2025.
@incollection{Luecking:2025,
title = {Referential Transparency Theory},
author = {Andy Lücking},
editor = {Stefan J. Schierholz and Laura Giacomini},
url = {https://www.degruyterbrill.com/database/WSK/entry/wsk__38780752/html},
doi = {10.1515/wsk},
year = {2025},
date = {2025-01-01},
urldate = {2025-01-01},
booktitle = {Wörterbücher zur Sprach- und Kommunikationswissenschaft (WSK) Online},
publisher = {De Gruyter},
address = {Berlin and Boston},
keywords = {},
pubstate = {published},
tppubtype = {incollection}
}
Bahmanian, Nasimeh; Bruera, Mercedes Martinez; Lücking, Andy; Hammerla, Leon; Abrami, Giuseppe; Sailer, Manfred; Mehler, Alexander; Lago, Sol
Data management protocol for CRC 1629 Technical Report
CRC 1629 NegLaB - INF no. 1, 2025.
@techreport{Bahmanian:et:al:2025,
title = {Data management protocol for CRC 1629},
author = {Nasimeh Bahmanian and Mercedes Martinez Bruera and Andy Lücking and Leon Hammerla and Giuseppe Abrami and Manfred Sailer and Alexander Mehler and Sol Lago},
url = {https://next.hessenbox.de/index.php/s/zQYBAfeXTJSDaib},
year = {2025},
date = {2025-01-01},
number = {1},
institution = {CRC 1629 NegLaB - INF},
keywords = {},
pubstate = {published},
tppubtype = {techreport}
}
Project Leaders

Prof. Sol Lago
Dep. of Romance Languages and Literatures, GU Frankfurt

Prof. Alexander Mehler
Faculty of Computer Science and Mathematics, GU Frankfurt

Prof. Manfred Sailer
Dep. of English and American Studies, GU Frankfurt
Scientific Staff
Research Areas
Computational modelling, virtual reality, natural language processing, data management, statistics, open language scienceÂ




