Compute Environments for Genomics

HARDAC machine

High-Performance Computing for Genomics

GCB Computational Solutions operates Duke’s flagship HPC cluster purpose-designed for computational genomics, known as HARDAC (High-throughput Applied Research Data Analysis Cluster). HARDAC was initially put into operation within Duke’s former Institute for Genome Sciences and Policy (IGSP).

HARDAC currently consists of 60 compute nodes in the form of:

44 HP ProLiant XL170r Gen9 servers with 28 physical CPU cores and 256GB of RAM (1232 cores, 11264GB RAM)

7 Dell PowerEdge R620 servers with 16 physical CPU cores and 128GB of RAM (112 cores, 896GB RAM)

8 Dell PowerEdge R620 servers with 16 physical CPU cores and 256GB of RAM (128 cores, 2048GB RAM)

1 HP ProLiant DL560 Gen9 server with 40 physical CPU cores and 1TB of RAM (40 cores, 1000GB RAM)


Together these comprise of a total of 1512 physical CPU cores and nearly 15TB of RAM. 

Two main features of HARDAC designed specifically to meet the needs of computing with high-volume genomics data are high-performance network interconnects (Infiniband), and an attached high-performance parallel file system, providing roughly 1.2 petabytes of mass storage. All nodes are interconnected with 56Gbps FDR InfiniBand, and the data transfer node of the cluster is linked to the Duke Health Technology Services (DHTS) network through pair-bonded 10GB Ethernet switches. The attached mass storage runs IBM’s General Parallel File System (GPFS), which is managed through two redundant GPFS NSD server nodes and designed to sustain ~5GB per second average input/output read rate.

HARDAC uses SLURM as its job scheduler, similar to many of the world’s top HPC clusters. We strive to support the constantly evolving scientific software landscape genomics researchers need to process their data. We make most software installations available through a version of the Environment Modules system (Lmod) that is continuously built out by Harvard's Research Computing team. This allows us to isolate otherwise conflicting software environments from each other and to allow versions to be kept for reproducing analyses later.