Cluster Hardware

By: John Vidler

A (nearly) complete list of the equipment that currently (August 2024) makes up Hex.

Management VMs

ucrel-hex

Network 10GB/s Fiber

Status:

ucrel-hex:Status badge for ucrel-hex

ucrel-hex-nas

SSD 1TiB VNVMe ZRAID

Network 10GB/s Fiber

The 'Core 11'

hex-host-001 - hex-host-011

CPU 20 core 12th Gen Intel Core i7-12700

GPU 1x RTX A2000 12GB

RAM 32GB RAM

SSD 512 GiB NVMe

Network 1 GB/s Ethernet

Services:

Jupyter Notebooks

Container Services

Managed Metal

Status:

hex-host-001:Status badge for hex-host-001

hex-host-002:Status badge for hex-host-002

hex-host-003:Status badge for hex-host-003

hex-host-004:Status badge for hex-host-004

hex-host-005:Status badge for hex-host-005

hex-host-006:Status badge for hex-host-006

hex-host-007:Status badge for hex-host-007

hex-host-008:Status badge for hex-host-008

hex-host-009:Status badge for hex-host-009

hex-host-010:Status badge for hex-host-010

hex-host-011:Status badge for hex-host-011

HP Sponsored Hardware

hex-host-012

CPU 80 Core Intel Xeon 6240 @ 2.59Ghz

GPU 2x RTX A5000 24GB

RAM 384GB RAM

SSD 2x 1TiB NVMe + 2x 4TiB HDD

Network 10 GB/s Ethernet

Services:

Container Services

Managed Metal

The 'Extended 3'

hex-host-013 - hex-host-015

CPU 64 Core Intel Xeon Silver 4216 CPU @ 2.10GHz

GPU RTX A5000 24GB

RAM 128GB RAM

SSD 512 GiB NVMe + 1 TiB NVMe

Network 1 GB/s Ethernet

Services:

Container Services

Managed Metal

Status:

hex-host-013:Status badge for hex-host-013

hex-host-014:Status badge for hex-host-014

hex-host-015:Status badge for hex-host-015

Further Information #

The Management VMs #

There are two 'management' virtual machines; ucrel-hex and ucrel-hex-nas, both of which are provisioned from the normal ISS vSphere research pool.

Both are fairly unremarkable in specification, but perform core 'backhaul' duties, such as scheduling the workers, presenting the various web UIs, performing ingress/egress monitoring, logging and metrics storage and presentation, and most important of all; presenting a common interface for long-term backing storage back to the ISS Luna service.

The Original 11 Workers #

The 'core' of Hex is made up of 11 identical worker nodes; with each having the following specification:

  • 12th Generation Intel Core i9, with 20 logical processors (10 physical)
  • NVIDIA RTX A2000 12GiB GPU
  • 32 GiB RAM
  • 512 GiB NVMe solid-state disk
  • 1 Gbit Ethernet

Each node also has a 2.5Gbit network port but the switch we have available only supports 1GBit uplinks (for now!)

The "Extended 3" Workers #

Added in August of 2024, these three additional boxes each include multiple RTX A5000 GPUs and are intended to better support multi-GPU workloads with larger datasets.

  • 64 Core Intel Xeon Silver 4216 CPU @ 2.10GHz
  • 3x NVIDIA RTX A5000 24GiB GPUs
  • 128 GiB RAM
  • 512 GiB NVMe solid-state disk and 1 TiB NVMe solid-state disk
  • 1Gbit Ethernet

Note: Icons on this page from Icons8