InfiniBand, Omni-Path, and High-Speed Ethernet:Advanced Features, Challenges in Designing HEC Systems and Usage
Event Type
AI/Machine Learning/Deep Learning
Big Data Analytics
Containerized HPC
HPC Accelerators
TimeSunday, June 24th2pm - 6pm
DescriptionAs InfiniBand (IB), Omni-Path, and High-Speed Ethernet (HSE) technologies
mature, they are being used to design and deploy various High-End Computing
(HEC) systems: HPC clusters with GPGPUs and Xeon Phis supporting MPI, Storage
and Parallel File Systems, Cloud Computing systems with SR-IOV Virtualization,
Grid Computing systems, and Deep Learning systems. These systems are bringing
new challenges in terms of performance, scalability, portability, reliability
and network congestion. Many scientists, engineers, researchers, managers and
system administrators are becoming interested in learning about these
challenges, approaches being used to solve these challenges, and the associated
impact on performance and scalability. This tutorial will start with an
overview of these systems. Advanced hardware and software features of IB,
Omni-Path, HSE, and RoCE and their capabilities to address these challenges will
be emphasized. Next, we will focus on Open Fabrics RDMA and Libfabrics
programming, and network management infrastructure and tools to effectively use
these systems. A common set of challenges being faced while designing these
systems will be presented. Finally, case studies focusing on domain-specific
challenges in designing these systems (including the associated software
stacks), their solutions and sample performance numbers will be presented.
Content Level The content level will be as follows: 10% beginner, 50% intermediate, and 40% advanced.
Target Audience This tutorial is targeted for various categories of people (scientists, engineers, system administrators, developers, and researchers) working in the areas of high-performance communication and I/O, storage, networking, middleware, virtualization, and applications related to high-end computing, cloud computing, deep learning, and grid systems.
PrerequisitesThe audiences are expected to have knowledge of the basic features and working on IB, Omni-Path, or HSE (or any other high-speed networking) technologies. For audiences not familiar with any of these, taking the complementary basic tutorial (titled "InfiniBand, Omni-Path, and High-speed Ethernet for Beginners") is recommended.
Tutorial Authors
Professor and University Distinguished Scholar