Publications

The slowly growing list of publications that reference Hex. If your work uses Hex, please use the following reference in your bibliography (in LaTeX/BibTex format):


@misc{UcrelHex,
	title        = {{UCREL - Hex}; A shared, hybrid multiprocessor system},
    author       = {Vidler, John AND Rayson, Paul},
	abstract     = {Hex is a collection of GPU equipped hosts onto which single- multi-
                    or GPU-processor jobs can be executed hosted at Lancaster University,
                    UK as part of the School of Computing and Communications and the
                    UCREL group.},
	howpublished = {\url{https://github.com/UCREL/hex}},
    note         = {Accessed: 2024}
}

Towards Generalized Offensive Language Identification

The prevalence of offensive content on the internet, encompassing hate speech and cyberbullying, is a pervasive issue worldwide. Consequently, it has garnered significant attention from the machine learning (ML) and natural language processing (NLP) communities. As a result, numerous systems have been developed to automatically identify potentially harmful content and mitigate its impact. These systems can follow two approaches; (1) Use publicly available models and application endpoints, including prompting large language models (LLMs) (2) Annotate datasets and train ML models on them. However, both approaches lack an understanding of how generalizable they are. Furthermore, the applicability of these systems is often questioned in off-domain and practical environments. This paper empirically evaluates the generalizability of offensive language detection models and datasets across a novel generalized benchmark. We answer three research questions on generalizability. Our findings will be useful in creating robust real-world offensive language detection systems.

Alphaeus Dmonte and Tejas Arya and Tharindu Ranasinghe and Marcos Zampieri, 2024

LiSAScore: Exploring Linear Sum Assignment on BertScore

Metrics play a crucial role in evaluating the performance of machine learning models. In the context of Natural Language Processing (NLP) tasks, such as text summarization and machine translation, Natural Language Generation (NLG) metrics such as Bleu and Rouge have been widely used. However, these metrics are based on n-gram matching and do not capture the semantic similarity between the generated and reference texts. To address this, BertScore has emerged as a popular evaluation metric that uses a pre-trained Large Language Model (LLM) to measure semantic similarity between two sentences. Unlike n-gram-based metrics, BertScore uses the contextual and semantic embeddings of words, allowing flexible semantic evaluation. We outline a number of hypotheticals in which the dependence of BertScore on token embedding cosine similarity may be exploited. The comparative distribution of BertScores on a set of reference - prediction pairs mean that results often scale differently with training to traditional metrics, which requires more expertise when interpreting results.

Mander, Stephen and Phillips, Jesse, 2024

SENTimental-A Simple Multilingual Sentiment Annotation Tool

Here we present SENTimental, a simple and fast web-based, mobile-friendly tool for capturing sentiment annotations from participants and citizen scientist volunteers to create training and testing data for low-resource languages. In contrast to existing tools, we focus on assigning broad values to segments of text over specific tags for tokens or spans to build datasets for training and testing LLMs. The SENTimental interface minimises barriers to entry with a goal of maximising the time a user spends in a flow state whereby they are able to quickly and accurately rate each text fragment without being distracted by the complexity of the interface. Designed from the outset to handle multilingual representations, SENTimental allows for parallel corpus data to be presented to the user and switched between instantly for immediate comparison. As such this allows for users in any loaded languages to contribute to the data gathered, building up comparable rankings in a simple structured dataset for later processing.

Vidler, John and Rayson, Paul and Knight, Dawn, 2025

Creating a Hybrid Rule and Neural Network Based Semantic Tagger Using Silver Standard Data: The PyMUSAS Framework for Multilingual Semantic Annotation

Word Sense Disambiguation (WSD) has been widely evaluated using the semantic frameworks of WordNet, BabelNet, and the Oxford Dictionary of English. However, for the UCREL Semantic Analysis System (USAS) framework, no open extensive evaluation has been performed beyond lexical coverage or single language evaluation. In this work, we perform the largest semantic tagging evaluation of the rule based system that uses the lexical resources in the USAS framework covering five different languages using four existing datasets and one novel Chinese dataset. We create a new silver labelled English dataset, to overcome the lack of manually tagged training data, that we train and evaluate various mono and multilingual neural models in both mono and cross-lingual evaluation setups with comparisons to their rule based counterparts, and show how a rule based system can be enhanced with a neural network model. The resulting neural network models, including the data they were trained on, the Chinese evaluation dataset, and all of the code will be released as open resources.

Moore, Andrew and Rayson, Paul and Archer, Dawn and Czerniak, Tim and Knight, Dawn and Lal, Daisy Monika and Donnchadha, Gearóid Ó and Meachair, Mícheál J. Ó and Piao, Scott and Dhonnchadha, Elaine Uí and Vuorinen, Johanna and Yabo, Yan and Yang, Xiaobin, 2026