Hardware Comparison
By: Henry Oldroyd
This article aims to provide detail about the current and future hardware for the Hex. It will also include links to their data sheets and in future a hardware comparison with other HPCs systems.
| Hardware | Quantity | Currently Operational | Data Sheet | Description | Task Suitability |
|---|---|---|---|---|---|
| 12th Generation Intel Core i9 | 1 | Is Currently Operational | [to be found] | Tasks that don't require a high level of parallelization or involve matrix operations | |
| NVIDIA RTX A2000 12GiB GPU | 1 | Is Currently Operational | https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/rtx-a2000/nvidia-rtx-a2000-datasheet-1987439-r5.pdf | This GPU runs on CUDA version 12.3. It has over 3300 CUDA cores and over 100 tensor cores. It uses the Ampere architecture (instruction bit width and register size is 32 bits) https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/rtx-a2000/nvidia-rtx-a2000-datasheet-1987439-r5.pdf | Suitable for tasks that utilize matrix operations. 12 GB of RAM allows for language models like LAMA to be be fully loaded into the GPU's memory making it suited for inference, training and fine tuning tasks. |
| NVIDIA RTX A5000 GPU | 3 | To be installed | https://nvdam.widen.net/s/wrqrqt75vh/nvidia-rtx-a5000-datasheet | These more powerful GPUs boast over 8000 CUDA cores and over 250 Tensor cores. Each has 24GB of onboard memory and are equivalent to RTX 3090s for most jobs. It also runs on the Ampere architecture (instruction bit width and register size is 32 bits) | Increased cores and memory will allow for more intensive machine learning tasks and larger language models. Suitable to inference, training and fine tuning. |
| HEC V100 | 30+ | Is Currently Operational | https://lancaster-hec.readthedocs.io/en/latest/index.html | This is the Universities own HPC offering, designed to take the brunt of computationally intense jobs | HEC is designed for repeated, queuable high latency jobs. Once you have working code, this is the ideal place to queue a series of tests. However, remember that jobs may run concurrently so a framework for remote logging and storing of results is advised. SLURM is used for managing the HPC resource. Please contact an RSE for more details. |
| Bede V100 SXM | 120+ | Is Currently Operational | https://n8cir.org.uk/bede/ | This is the next tier of supercomputer above HEC. It boasts nodes optimized for machine learning workloads and greater file storage. | Optimum use case is for multi-node experiments, or long duration. However, wait times are typically far longer than on HEC, so this should not be used for debugging and code should already be known to work on HEC. Please contact an RSE for more information and support. |