Intel® NVMe* SSDs Help Accelerate and Lower Cost of Genomic Sequencing Data Analysis Leading German research center adopts Intel®-based high-performance computing solution “We considered cost, performance, and most importantly the operation of the pipeline, and ultimately decided that very fast, low- latency Intel® NVMe* SSD storage for each node was the best solution to ...meet our HPC needs.” – Georgios Nikolis IT manager, DKFZ Introduction When the human genome project began in 1990, sequencing and analyzing the genome of one person took about 13 years and cost several hundred million dollars. Using technologies including ultra-powerful and reliable Intel® NVMe* solid state drives (SSDs), the German Cancer Research Center (DKFZ) is today able to control and analyze a complete genome sequence in under 10 hours, at a lower cost than ever before. In partnership with university hospitals across Germany, DKFZ sequences the genomes of cancer cells and healthy cells of cancer patients to increase understanding of the disease, improve treatment, and foster development of new “personalized medicine” treatments that precisely target altered components of cancer cells. With its powerful Illumina* sequencing systems, HP* Apollo 6000 servers, 14-core Intel® Xeon® CPUs, and 1.6 TB Intel® NVMe-based SSDs (Intel® SSD DC P3700 Series), DKFZ is sequencing more genomes every month at a more affordable price —creating unprecedented opportunities to advance cancer prevention and treatment worldwide. High-Performance Computing Challenges Genomic sequencing is expected to play a critical role in the future of personalized medicine, in which one day each individual’s genomic profile will be used to prevent disease and precisely tailor treatments when disease is found. The promise of personalized medicine has been discussed for decades, but, among other things, the high-performance computing (HPC) challenge of genomic sequencing has posed a significant problem. At DKFZ, for instance, every genomic sequencing run produces about 560 GB of raw data (8 lanes with about 70GB FASTQ files each) which amount to some 1.3 PB per year. All that data must be preprocessed, converted, and stored, and the entire sequencing process requires highly detailed quality control measures—which are themselves compute and IO intensive. External NAS storage performance reduces HPC workload latency but also creates bottlenecks. Low latency is a critical concern throughout the IO-intensive operations, as the data exchange between each step in the pipeline is done by IO of millions of small files. Intel® NVMe* SSDs Help Accelerate and Lower Cost of Genomic Sequencing 2 Intel® SSD Data Center Family for PCIe* Devices The Intel® SSD Data Center Family for PCIe devices brings extreme data throughput directly to Intel® Xeon® processors, with data transfer speeds up to six times faster than 6 Gbps SAS/ 1 SATA SSDs. The Intel® SSD Data Center Family for PCIe was developed for the most intense workloads, including HPC applications. Before implementing its current solution, DKFZ relied on previous- generation Illumina* HiSeq 2000 sequencing systems and mechanical SATA disks. Analysis of the sequence data from a single genome required up to three nodes running computations for three days—until DKFZ upgraded its sequencing system and computing infrastructure. New Infrastructure Supports Growing HPC Workloads In 2015 DKFZ upgraded to 10 Illumina HiSeq X Ten Sequencing Systems, the most powerful sequencing platform available. With the help of hardware supplier GoVirtual*, DKFZ also invested in 50 HP* Apollo 6000 servers with 512 GB memory, two 14-core Intel® Xeon® CPUs, and 50 NVMe-based 1.6 TB Intel® SSD DC P3700 Series drives. The extremely low latency and high performance of the NVMe technology boosts Read the full CASE STUDY.