Projects running on Hex

The following are projects that are currently running (or have been run) on Hex. They range from research staff working on AI, ML and other topics; to PhD students exploring new methods and technologies for NLP (and beyond!).

See also our Publications page for related works.

 

Detecting Fake News in The Era of LLMs

This project investigates the ability of SOTA models to differentiate between real news, human-generated fake news, and LLM-generated fake news.

  • Irfan Sabri
  • Dr. Hansi Hettiarachchi

Wmatrix Annotation Parallelisation (qpym2)

Large scale parallelisation of corpus annotation pipelines

  • Prof. Paul Rayson

Spatial Narratives

Extracting and analysing spatial narratives from textual data

  • Ignatius Ezeani
  • Prof. Paul Rayson

Sinhala Encoder-only Language Models

The recent developments of language models (LMs) have shown significant advancements in the field of natural language processing (NLP) as they have produced state-of-the-art results in many NLP tasks, outperforming previous machine learning models such as LSTMs.

However, their effectiveness is largely dependent on having access to language resources for model pre-training.

This project aims to build LMs for Sinhala; a low-resource language.

  • Dr. Tharindu Ranasinghe

Transformer-Assisted LLM Source Code Summarisation

Neural Source Code Summarisation (NSCS) aims to generate natural language summaries of source code to improve developer and maintainer understanding of code.

Many solutions to this problem use small transformer models, designed to be run locally on a workstation. Transformer-generated summaries often score well across many NLG metrics but fail to consistently produce clear and understandable natural language.

Conversely, Large Language Model (LLM)s’ ability to generate clear and understandable natural language presents an exciting solution to this problem, especially with the increased availability of LLMs and the increase in capability of workstation hardware over recent years meaning that some LLMs can be run from developers' workstations.

However, LLM summaries of code often differ greatly from developer-written summaries, and frequently miss key words and phrases resulting in low scores across NLG metrics.

We show how combining these two methods by using transformer-generated summaries in prompt engineering may enable LLMs to create better source code summaries.

  • Jesse Phillips
  • Prof. Tracy Hall
  • Dr. Mo El-Haj

LM Applications

This project focuses on developing language model (LM)-based applications for natural language processing (NLP) tasks, mainly aiming at the models' learning, multilingual and explainable capabilities.

  • Dr. Hansi Hettiarachchi

Various NLP projects

NLP projects, including processing large diachronic datasets, looking at language change over time.

  • Dr. Alistair Baron

Applications of LLMs for UK court procedures

Legal NLP has been studied over decades and the recent advancements of the LLMs have pushed the possibilities of practical applications of Legal NLP to greater heights.

We study how these LLMs can help lawyers, judges and general public in multiple legal tasks such as Legal judgment prediction, Prior case retrieval and Citation network analysis.

  • Damith Dola Mullage
  • Prof Ruslan Mitkov

NLP Odyssey

This NLP application helps with Memory-Augmentation for Individuals dealing with Young-Onset Dementia.

  • Yash Bhatia
  • Dr. Mo El-Haj

Investigating How Users Adapt Their Language Across Different Online Social Groups

The proposed project aims to investigate how online users adapt their language across different online communities, focusing on the extent of language mirroring within these environments.

Building off research on language mirroring in face-to-face interactions, this project will use Natural Language Processing (NLP) techniques to analyse language patterns in data collected from Reddit.

  • George Bland
  • Dr. Alistair Baron

Measuring the Effect of Availability Attacks on Dynamic Obstacle Location Data for Reinforcement Learning Agents in 2D Grid World Environments

In a 2D grid world environment, reinforcement learning (RL) agents are trained to navigate toward a goal while avoiding obstacles that move randomly within the environment. However, if an attacker introduces delays in the obstacle location data provided to the agent, it could disrupt the agent's ability to make timely and accurate decisions.

This research investigates how such availability attacks affect the agent’s performance and success rate in reaching its goal. The study provides an understanding of the impact of availability attacks in this scenario, contributing to the development of more robust RL systems.

  • Ryan Hyland
  • Professor Daniel Prince

Translating Akkadian Cuneiform into English

Fine-tuning Large Language Models to translate Cuneiform (old clay tablet text) into English.

A model is trained on a large Corpus in order to translate Akkadian, a long extinct language, into English for use in Assyriology.

  • Daniel Jones
  • Dr. Mo El-Haj

How do people with a diagnosis of bipolar talk about hypersexuality on Reddit: An exploratory analysis using Natural Language Processing methods.

We're using the Jupyter notebooks to run BERTopic models for exploratory analysis of my dataset and testing different transformer embeddings

  • Daisy Harvey
  • Prof. Paul Rayson
  • Prof. Steve Jones
  • Dr. Jasper Palmier-Claus
  • Fi Lobban

Urban Genome - A study of medieval urban development

Using NLP and graph computation techniques to analyse the relationships between specific actors affecting sets of physical locations in sparse/minimal medieval collections.

Hex's high core count and large working memory enable high-speed graph queries.

  • Dr. John Vidler
  • Prof. Keith Lilley
  • Prof. Ian Gregory
  • Prof. Paul Rayson

Understanding imprecise space and time in narratives through qualitative representations, reasoning, and visualisation

With the Lancaster Uni SCC-UCREL Hex (in about 13hours), we processed 1000 Holocaust survivors' testimonies to extract spatial entities - toponyms (towns/cities, countries, continents), geographic feature nouns (ghettos, hills, and camps), events and times as well as sentiments and emotions.

  • Dr. Ignatius Ezeani
  • Prof. Ian Gregory
  • Prof. Paul Rayson

IAA-Oracle-ULTEC Project

The IAA-Oracle-ULTEC Project leverages advanced text processing tools to conduct in-depth linguistic analysis.

The project efficiently manages and analyses large datasets using the TextProcessor class, which enables the precise extraction of linguistic features and the identification of geographically named entities through NLP models.

Hosted on Hex, this setup ensures optimal performance and scalability, making it ideal for handling the extensive computational demands of text analysis tasks, especially those involving cleaned texts post-OCR processing, significantly reducing the time and effort needed to analyse such vast datasets.

  • Dr. Nouran Khallaf

Using Emotions to Help Detect Fake Reviews

A couple of experiments utilising large language models (LLMs) explore how incorporating emotion information can help to improve the performance metrics of detecting fake reviews.

  • Mansour Almansour

Metaphoric names identification in Chinese flower and plant names.

Token level classification task to identify metaphoric names in Chinese.

We evaluate discriminative models like BERT, XLMR and generative models like GPT, Falcon and Lllama2.

  • Dr. Fei Zhu
  • Damith Premasiri
  • Tharindu Ranasinghe
  • Prof. Ruslan Mitkov

LiSAScore - an incremental Improving of BERTScore

A demonstration that Linear Sum assignment removes double-counting in semantic-based metrics

Benchmarking translation and summarisation tasks against 6 common models including BERT, BART, Roberta, T5 and more.

  • Stephen Mander
  • Jesse Phillips

The "Managed Metal" research service

A low-level access service which allows researchers to execute code directly on the hosts that make up Hex.

  • Dr. John Vidler

GPU accelerated Jupyter Notebooks

A high-level research service based around (mostly) Python runtimes aimed to bring GPU compute support to researchers without discrete GPUs in their own workstations

Part of longer-term experimentation with modern cluster technologies for research use.

  • Dr. John Vidler