Introduction. Innovative scalable HPC. Dr. Holger Fröning October 2010

Relevanta dokument
Innovative scalable HPC. Ulrich Bruening/Holger Fröning June 2011

Datasäkerhet och integritet

Dagens OS. Unix, Linux och Windows. Unix. Unix. En översikt av dagens OS Titt på hur de gör. Många varianter Mycket gemensamt. En del som skiljer

802.11b Wireless router w. 4 port switch. StarTech ID: BR411BWDC

Biblioteket.se. A library project, not a web project. Daniel Andersson. Biblioteket.se. New Communication Channels in Libraries Budapest Nov 19, 2007

Effektivisering av virtuell produktutveckling med CFD och High Performance Computing (HPC)

PRODUKT- OCH PRISLISTA

ETS052 Internet Routing. Jens A Andersson

Beacon BluFi Bluzone. Givarna har mycket hög känslighet och kan mäta mycket små förändringar.

Windows 8 och det nya arbetslivet. Magnus Holmér IT strategisk rådgivare

PRODUKT- OCH PRISLISTA

OFTP2: Secure transfer over the Internet

3.5in Black USB 3.0 External SATA III Hard Drive Enclosure with UASP for SATA 6 Gbps Portable External HDD

V-Met IaaS VM [3] [6] IDS [2] IDS IDS [8] [7] [15] [12] [13] IDS. V-Met Xen 4.4. IaaS VM VM IDS IDS IDS IDS IDS IDS IDS IDS

Lär dig sälja framtidens trådlösa. idag

Föreläsning 4 IS1300 Inbyggda system

F1 SBS EC Utbildning AB

Alias 1.0 Rollbaserad inloggning

Datorteknik och datornät. Case Study Topics

SOA One Year Later and With a Business Perspective. BEA Education VNUG 2006

Tillgång till alla globala delar i systemet styrs av denna profil, som i sin tur kopplas till respektive användare.

BIG DATA FORSKNINGSCENTER

icore Solutions. All Rights Reserved.

Olika OS. Unix, Linux och Windows. Unix. Unix. En översikt av ett par OS. Titt på hur de gör. Många varianter. Mycket gemensamt. En del som skiljer

Overhead of the spin-lock in UltraSPARC T2

Innovation Enabled by ICT A proposal for a Vinnova national Strategic innovation Program

Displaysystem. Hans Brandtberg Saab Avitronics SAAB AVITRONICS

What Is Hyper-Threading and How Does It Improve Performance

Real world SharePoint 2013 architecture decisions. Wictor Wilén

InstalationGuide. English. MODEL:150NHighGain/30NMiniUSBAdapter

Nationellt stöd för finansiering av mjukvaruberoende innovation ANDREAS ALLSTRÖM

SAS VIYA JOHAN ELFMAN ROLAND BALI

Michael Q. Jones & Matt B. Pedersen University of Nevada Las Vegas

ETSF05 Internetprotokoll. Jens Andersson

PRODUKT- OCH PRISLISTA

Lagringssystem. server. arbetsstationer. Katalog Öppettider: 09:00-17:00 alla vardagar.

Anna Brunström. Hur kan man minska fördröjningarna över Internet? Karlstad University Computer Science

CS 475: Parallel Programming Introduction

IPv6 i Mobilnät. Mattias Karlsson. mattias.karlsson@telenor.com

Grundläggande rou-ngteknik. F2: Kapitel 2 och 3

Grundläggande rou-ngteknik

Neighborhood discovery overhead in dual rate ad hoc networks

Sara Skärhem Martin Jansson Dalarna Science Park

Vågkraft. Verification of Numerical Field Model for Permanent Magnet Two Pole Motor. Centrum för förnybar elenergiomvandling

Schenker Privpak AB Telefon VAT Nr. SE Schenker ABs ansvarsbestämmelser, identiska med Box 905 Faxnr Säte: Borås

Smarter Analytics med rätt infrastruktur

Grundkurs i 5G 5G och sedan då?

D-RAIL AB. All Rights Reserved.

Medical Informatics Period 2, 2009

GeoGebra in a School Development Project Mathematics Education as a Learning System

PFC and EMI filtering

Pulsen IAM: Del 2 Trender och teknik för morgondagens utmaningar. Tobias Ljunggren, PULSEN

ETS052 Internet Routing. Jens A Andersson

WAVES4POWER Fosnavåg 24 oktober 2016

Cluster VMCLUSTERcpuTotalmhzAverage CPU Mhz

A QUEST FOR MISSING PULSARS

Installationsguide. Vennesla Library & Cultural Center, Norway Helen & Hard Installationsguide för ArchiCAD 16

Nya möjligheter med M3 Technology. Björn Svensson, Björn Torold

- den bredaste guiden om Mallorca på svenska! -

HP BSM - Erfarenheter från FMS projektet ComHem Jan Östgren MangE Nordic AB (Med hjälp från Thomas Englund ComHem)

Övning 1: Skapa virtuell maskin för utveckling.

OH Slides F: Wide Area Networks

The road to Recovery in a difficult Environment

.SE (Stiftelsen för Internetinfrastruktur) Presentation November 2009

Lean & Green Volvo cars. Sofia Boyagi, Operational Development Environment

Arrowhead - Process- och energisystem- automation

Ingenjörsfirman Stéen ATM Sida 1 av 1

EFFEKTIVA PROJEKT MED WEBBASERAD PROJEKTLEDNING

CFO Live April 17, Per-Olov Östberg IPOCO AB

Riskhantering. med exempel från Siemens

Jan Markendahl Docent Kommunikationssystem. Associate Professor Wireless Infrastructure Deployment and Economics Tele-ekonomi

Gigabit och 10 Gigabit Ethernet. Lösningar för Stadsnätet. Björn Rudberg Vice President Electronic Design

Support for Artist Residencies

Monitor Audio ASB 2. Soundbar. Monitor Audios Nya soundbar är här! Den ultimata högtalaren för din tv. Monitor Audios soundbar är här!

Magic Grippers System för att enkelt bygga robotgrippers / grippers. -- Kort presentation -- Beställ komplett katalog

Rosetta. Ido Peled. A Digital Preservation System. December Rosetta Product Manager

Förändrade förväntningar

Unified Communication. Martin Lidholm

MPEG-4 innehåller bl.a:

USB 3.0 till VGA externt videokort, multiskärmsadapter med USB-hubb med 1 port 1920x1200

P O W E R E D B Y. ServSwitch Agility TM B L A C K B O X K V M o I P E X T E N D E R

Per-Anders Nilsson SaabTech Systems Oktober 2001

SYNKRONISERING I EN SHARED MEMORY MULTIPROCESSOR

Ibas forensics Mobila enheter

Resurser. Visa sidan Hjälpguide. Menykarta. Phaser 4400 laserskrivare

Datorteknik ERIK LARSSON

Windowsadministration II, 7.5hp, 1DV424 MODUL 5 EXCHANGE SERVER 2013 FÖRELÄSNING 1

NORDIC GRID DISTURBANCE STATISTICS 2012

Där hav möter land i Göteborg

District Application for Partnership

The present situation on the application of ICT in precision agriculture in Sweden

Flexible Assembly of Environmental Cars

Läkemedelsverkets Farmakovigilansdag

Programvaruintensiva system

BAE Systems. FXM Seminar Future Markets European Land Defence Industries approach. BAE Systems

Hållbar utveckling i kurser lå 16-17

Mönster. Ulf Cederling Växjö University Slide 1

Diskprestanda Tester

SOLAR LIGHT SOLUTION. Giving you the advantages of sunshine. Ningbo Green Light Energy Technology Co., Ltd.

Företagsnamn: Grundfos Skapad av: Magnus Johansson Tel: +46(0) Datum:

Transkript:

Introduction Innovative scalable HPC Dr. Holger Fröning October 2010 info@extoll.de

History Design of complex HW/SW systems, Computer Architecture Group, Prof. Dr. U. Brüning, University of Heidelberg Computer architecture Interconnection networks EXTOLL project started in 2005 FPGA (Xilinx Virtex4 based) prototype since 2008 ATOLL cluster Start-up company as a spin-off in 2010 ATOLL-Die (bonded) 2

Introduction I Interconnection networks are the key component of parallel systems Prof. Patterson stated: Latency lags Bandwidth Need to significantly lower communication latency and improve communication in parallel systems Finer grain parallelism PGAS systems Improve scaling EXTOLL project at the CAG Vision: more performance, lower cost for HPC! 3

Introduction II Lowest latency Maximum message rate / s Optimized for multi-core Optimized CPU-Device interface Direct topology Efficiency Innovative architecture Optimized scalability Latency Performance More Performance for HPC customer 4

Introduction III No external switches Complete own IP Lower cost Lower energy consumption EXTOLL Conventional HPC network 0% 20% 40% 60% 80% 100% 120% Relative price per G/port incl. NIC, cabling and switches for a 324 node cluster lower cost 5

Architecture - Ideas Lean network interface Ultra low latency message exchange Extremely high hardware message rate Small memory footprint Switchless-Design - 3D Torus Direct Network Reliable network High Scalability Extremely efficient, pipelined hardware architecture 6

7 Architecture - Overview

NIC Features VELO: Very fast two sided messaging RMA: Optimized access to remote memory Local and remote completion notifications for all RMA operations Fully virtualized user space and/or different virtual machines secure Hardware address translation unit (ATU) including TLB 8

Network Features I 64k nodes hundreds of endpoints per node Efficient network protocol low overhead even for very small packets Support for arbitrary direct topologies Choice for implementation: 6 links Natural topology 3-d torus 9

Network Features II Linkport Networkport Networkport NETWORK EXTOLL Network Switch Linkport Linkport Linkport Adaptive and deterministic routing Packets routed deterministically are delivered in-order Three virtual channels (VCs) Four independent traffic classes (TCs) Networkport Linkport Linkport Support for remote configuration and monitoring of EXTOLL nodes without host interaction 10

Reliability & Security Features Reliable message transport Link level retransmission protocol for reliable data transmission All network protocol elements are secured by strong CRCs All internal memory protected by ECC for high reliability Isolation of communication groups in the network Process isolation on the node 11

Additional Features Scalable hardware barriers implemented completely in hardware Global interrupts with low skew Hardware support for multicast operations Non-coherent shared memory features (for PGAS) 12

Architecture -Software Optimized for HPC-Users OS bypass Linux kernel drivers manage resources Low-level API libraries MPI support OpenMPI PGAS support GASNet (prototype) Management software 13

Prototype Performance Results EXTOLL FPGA Prototyp Xilinx Virtex 4 based HT400 16-bit 150 MHz core frequency 6 optical links 6 Gbit/s link bandwidth ~ 1 μs end-to-end latency FPGA 90% filled Test machines 2 node Quad Socket Quad-Core Opteron 8354 Barcelona 2.2 GHz 16GB, Supermicro 1041MT2B/H8QME-2+, Open MPI 1.3.3 Mellanox ConnectX IB SDR and SDR IB Switch, Open MPI 1.3.2 16 node dual-socket Opteron 8380 Shanghai 2.5 GHz 14

EXTOLL Virtex4: Pingpong BW Intel MPI Benchmark - Pingpong Bandwidth 300,00 400,00% 250,00 Headroom IB EXTOLL 355% 375% 350,00% 300,00% 200,00 284% 250,00% Mbytes/sec 150,00 100,00 178% 200,00% 150,00% 50,00 0,00 98% 39% 45% 34% 26% 21% 8 16 32 64 128 256 512 1024 2048 4096 size [bytes] 100,00% 50,00% 0,00% 15

EXTOLL Virtex4: RandomAccess 0,06 0,05 0,04 IB Extoll GUPS 0,03 0,02 0,01 16 0 256 MWords 2048 MWords Table Size GUPS = Giga updates per second HPCC MPIRandomAccess with 32 processes on 2 nodes

EXTOLL Virtex4: WRF V3 90 80 70 60 GFlop/s 50 40 30 20 10 0 32 64 96 128 Cores Opteron Cluster with IB DDR Opteron Cluster with EXTOLL R1 17 IB Cluster: Opteron Shanghai 2358 2.6GHz EXTOLL Cluster: Opteron Shanghai 2380 2.5GHz

Hardware Roadmap 2010 2011 2012 2013 ASIC 65nm HT2600 (10.4 GB/s) PCIe Gen3 6 Links: 10 Gbps x12 (>10 GB/s) production Tourmalet large quantities development production small volume Ventoux Virtex6 FPGA HT600 (2.4 GB/s) 6 Links: 6 Gbps x4 (2.4 GB/s) R1 R1 Prototype Evaluation Virtex4 FPGA HT400 (1.6 GB/s) 6 Links: 6.24 Gbps 18

19 Ventoux add-in card

Performance Performance Metric EXTOLL R1 EXTOLL R2 EXTOLL R2 (Virtex4) (Virtex6) (ASIC) Measurement Estimation Estimation Core Frequency 156 MHz ~300 MHz >= 800 MHz Internal datapath width 32 bit 64 bit 128 bit Raw Link bandwidth 6.24 Gb/s 24 Gb/s 120 Gb/s Half-roundtrip Latency (incl. software, single hop) ~1.2 μs < 1.0 μs 400-600 ns Message Rates (MPI) ~ 10 million ~ 40 million > 100 million Per hop latency 300 ns <150 ns <50 ns Memory (de-)registration (one page) ~ 2 μs ~ 2 μs ~ 2 μs 20

Thank You! Prototype cluster, Mannheim Prototype cluster, Valencia, Detail Prototype cluster, Valencia Computer Architecture Group, University of Heidelberg 21