Osaka University: OCTOPUS Supercomputer

Heterogeneous architecture on new cluster delivers computing capacity with lower cost.

Executive Summary
Osaka University (Osaka U) is a leading research university in Japan. Its Cyberme­dia Center (CMC) hosts the university’s supercomputing resources. Historically, supercomputers at Osaka U were built to support both research and general edu­cation needs. To continue to attract leading researchers, CMC built a world-class, heterogeneous cluster targeted at scientific computing for a variety of workloads programmed for different architectures. The OCTOPUS cluster now attracts new users running a wide variety of workloads, from simulation to AI and machine learning.

Challenge
Innovation in research often begins with brilliant minds supported by latest-gen­eration High-Performance Computing (HPC) resources. Osaka U’s CMC supports a large variety of scientific fields that rely on supercomputing resources for break­throughs, including high-energy physics, molecular dynamics, material, life, dental, social sciences, and others. Recently, a researcher used CMC systems to under­stand vortex breakdowns in supersonic flows. His breakthroughs are expected to help contribute to a supersonic combustion ramjet engine for air and space planes. Other activities are described in the university’s research profile.

“There is a growing demand for supercomputing in every field of science,” stated Susumu Date, Associate Professor at Osaka U’s CMC, “because researchers today rely heavily on scientific computing prior to the experimental stage and afterwards to analyze and correlate the results of observations.”

With earlier computing resources in CMC, the system was designed to support both HPC and non-research needs. Some of the challenges users experienced were related to the conflicts of trying to partition for both general users and paral­lel computing users, resulting in an unreliable resource for scientific computing. Seeking to continue to support important research areas, and guided by feed­back from its users, Osaka U needed to expand its parallel computing capabilities beyond the existing systems in its data center.

“Our users’ biggest challenge, in most cases, is to achieve inter-node and intra-node parallelism,” added Professor Date. “Many are working with MPI and OpenMP coding to achieve greater parallelism. We needed to deliver more resources that supported their work.”

CMC’s research and user feedback resulted in the building of a new petascale heterogeneous supercomputer that supports a variety of scientific comput­ing domains—simulation, visualization, AI/machine learning, and HPDA—on a single system.

Built on Intel® Xeon® Scalable processors, Intel® Xeon Phi™ processors, and GPUs, OCTOPUS supports a wide range of scientific research.

Solution
The Osaka University Cybermedia Center’s Over-Petascale Universal Supercomputer (OCTOPUS) supports researchers using a wide variety of coding and application environments, from open sourced and commercial codes written for x86 Intel® Architecture (IA) to CUDA-based GPUs, targeting tradi­tional simulation, AI frameworks, genomics, and other fields of research.

“We had to explore the architecture of a new HPC system in terms of both hardware and software,” explained Professor Date, “so more people could take advantage of supercomput­ing resources. In particular, we had to look at an integrated architecture approach for HPC and HPDA, using x86 and other architectures.”

One of the key challenges in designing the system was to increase compute capacity within the data center’s power and cooling budget. Leveraging the performance and power efficiency of latest generation CPUs and GPUs and integrat­ing Asetek’s RackCDU Direct-to-Chip liquid cooling on all compute nodes (including GPUs), CMC could maintain reliable and stable performance across the cluster without increasing operational and power budgets.

The new system delivers 1.463 petaFLOPS1 of throughput using multiple types of processor architectures and a Lustre* filesystem interconnected with InfiniBand* Architecture at 100 Gbps. OCTOPUS was built by NEC using Intel® Xeon® Scalable processors, Intel® Xeon Phi™ 7210 processors based on Many Integrated Core (MIC) architecture, Tesla* P100 GPUs (CUDA architecture), and a DirectData Networks (DDN) EXAScaler* storage system. It went into production in December of 2017.

Osaka University OCTOPUS Supercomputer at a Glance:

  • Heterogeneous supercomputer to meet widely diverse research needs in simulation, visualization, AI/machine learning, and high-performance data analytics (HPDA)
  • Intel® Xeon® Gold 6126 processors (236 nodes), Intel® Xeon® Platinum 8153 processors (2 nodes), Intel Xeon Phi 7210 processors (44 nodes)
  • Intel Xeon Gold 6126 processors with four (per node) NVIDIA Tesla P100 using NVIDIA NVLINK* (37 nodes)
  • 5X larger compute capacity compared to previous system for less cost1

Osaka University Cybermedia Center.

Results
The new supercomputer boosts Osaka U’s scientific comput­ing capacity by five times, which has given researchers a new level of resources to work with.

“The new system is leading to an increase of users, which is a good impact,” concluded Professor Date.

Because OCTOPUS is heterogeneous, users can choose the resources they need based on their particular codes and research—IA or MIC Intel CPUs or CUDA GPUs. CMC has completed user surveys, in which users have reported higher performance than their previous system.

“Today, OCTOPUS is running machine learning and other AI-related jobs, which we have not seen before,” said Profes­sor Date. “Plus, we are seeing other new types of work from users. We designed the new system for these new work­loads.”

Solution Summary
Osaka U’s CMC needed to enhance its computing capabili­ties to keep and attract researchers from around the world. Based on research and user feedback, it specified a one-plus petaFLOPS supercomputer with a heterogeneous architec­ture. Built on Intel® Xeon® Scalable processors, Intel® Xeon Phi processors, and the latest GPUs, the new OCTOPUS cluster delivers 1.463 petaFLOPS, supporting a wide variety of work­loads across many scientific fields and drawing new users to the university.

Osaka’s OCTOPUS cluster supports a wide variety of workloads, from simulation to AI and machine learning.

Solution Ingredients

  • NEC LX* Servers 406 Rh-2 with Intel® Xeon® Scalable processors
  • NEC LX* Server 102Rh-1G with Intel® Xeon® Scalable processors and NVIDIA P100 GPUs
  • NEC Express5800/HR110c-M* Servers with Intel® Xeon Phi processors
  • NEC LX* 116Rg servers with Intel® Xeon® Scalable processors
  • DDN EXAScaler (3.1 PB) Lustre storage cluster

Explore Related Products and Solutions

Intel® Xeon® Scalable Processors

Drive actionable insight, count on hardware-based security, and deploy dynamic service delivery with Intel® Xeon® Scalable processors.

Learn more

Avisos e isenções de responsabilidade

Os recursos e benefícios das tecnologias Intel® dependem da configuração do sistema e podem exigir hardware habilitado, ativação de software ou de serviço. O desempenho varia dependendo da configuração do sistema. Nenhum sistema de computador é totalmente seguro. Consulte o fabricante ou o revendedor de seu sistema, ou saiba mais em https://www.intel.com. // Talvez o software e as cargas de trabalho utilizados nos testes de desempenho tenham sido otimizados apenas para desempenho em microprocessadores Intel®. Testes de desempenho, como SYSmark e MobileMark, são medidos usando sistemas de computação, componentes, software, operações e funções específicos. Qualquer modificação em algum desses fatores pode provocar variação nos resultados. Consulte outras informações e outros testes de desempenho para ajudá-lo a avaliar melhor as suas compras, incluindo o desempenho desse produto quando combinado com outros produtos. Para obter informações mais completas, acesse https://www.intel.com/benchmarks. // Os resultados de desempenho são baseados em testes realizados na data estabelecida nas configurações e podem não refletir todas as atualizações de segurança disponíveis ao público. Para obter detalhes, consulte a publicação da configuração. Nenhum produto ou componente pode ser totalmente seguro. // Os cenários de redução de custos descritos destinam-se a servir de exemplos de como um determinado produto baseado na tecnologia Intel®, dentro das circunstâncias e configurações especificadas, pode afetar custos futuros e proporcionar economia. As circunstâncias variarão. A Intel não garante nenhum custo ou redução de custo. // A Intel não controla nem audita dados de benchmarks de terceiros nem os sites citados neste documento. Visite o site citado e verifique a precisão dos dados mencionados. // Em alguns casos de teste, alguns resultados foram estimados ou simulados usando análise interna da Intel ou simulação de arquitetura ou modelagem, e fornecidos para fins informativos. Qualquer diferença no hardware, software ou na configuração do seu sistema pode afetar o desempenho real.

Informações de produto e desempenho

1Octopus system configuration per www.hpc.cmc.osaka-u.ac.jp/en/oc­topus/: General purpose CPU nodes: 236 nodes (471.24 TFLOPS); CPU: Intel Xeon Gold 6126 (Skylake/2.6 GHz 12 cores) 2 CPUs Memory: 192 GB; GPU nodes: 37 nodes (858.28 TFLOPS); CPU: Intel Xeon Gold 6126 (Skylake/2.6 GHz 12 cores) 2 CPUs; GPU: NVIDIA Tesla P100 (NV-Link) 4 units Memory: 192 GB; Xeon Phi nodes: 44 nodes (117.14 CPU: Intel Xeon Phi 7210 (Knights Landing/1.3 GHz 64 cores) 1 CPU Memory: 192GB; Large-scale shared-memory nodes: 2 nodes (16.38 TFLOPS); CPU: Intel Xeon Platinum 8153 (Skylake/2.0 GHz 16 cores) 8 CPUs Memory: 6 TB. For previous HCC system configuration see http://www.hpc.cmc.osaka-u.ac.jp/en/hcc-sys/ 1.463 petaFLOPS is not application performance but peak performance.