Tutorials – ICDCS 2018

Tutorial Overview

Tutorial Day: Monday, July 2, 2018

Tutorial 1 (T1): Communication and Agreement in Byzantine Asynchronous Systems
Tutorial 2 (T2): Building Efficient Cloud Middleware for HPC, Big Data, and Deep Learning Applications
Tutorial 3 (T3): Distributed Service Prototyping with Cloud Functions
Tutorial 4 (T4): High Performance Network Services: The Role of NFV and Kernel Bypass Networking

Tutorial Details

Tutorial 1 (T1):

Communication and Agreement in Byzantine Asynchronous Systems

Michel Raynal (IRISA, Université de Rennes, France, and Polytechnic University, Hong Kong)

Location: EI 1 (Track 1)
Time: 8:00 am – 12:00 pm

Abstract

Communication and agreement abstractions are fundamental abstractions
in any distributed system. (If the computing entities do not need to communicate or agree in one way or another, the system is not a distributed system!) This tutorial is devoted to the design of such abstractions built on top of asynchronous distributed systems prone to Byzantine process failures. Such failures are among the most severe failures a process can exhibit, namely a Byzantine process is a process that behaves arbitrarily. The tutorial is made up of three parts, each devoted to a given abstraction and algorithms that implement it. The first two are related to communication, while the last one is on distributed agreement.

The first part focuses on the classical “reliable broadcast” abstraction, which allows processes to reliably disseminate information in a message-passing system prone to process failures. It first defines the problem, shows the requirements needed to solve it, and presents algorithms that solve it.
The second part considers the case where the upper layer communication abstraction consists of “atomic read/write registers”. It shows how such registers can be implemented despite the presence of Byzantine processes.
Finally, the third part of the tutorial considers the most basic “agreement abstraction”, namely consensus. It presents recent Byzantine consensus algorithms based on new all-to-all broadcast abstractions.The content of this tutorial can be found in the book Fault-Tolerant Message-Passing Distributed Systems: An Algorithmic Approach (Springer, to appear in July 2018).

Michel Raynal is an Emeritus Professor of Informatics, IRISA, University of Rennes (France) a Distinguished Chair Professor on Distributed Algorithms at Hong Kong Polytechnic University (PolyU).

His main research interests are the basic principles of distributed computing systems. Recognized as a world-leading researcher in distributed computing, he is the author of numerous papers on this topic (more than 170 articles in int’l scientific journals, and more than 330 papers in int’l conferences). (From a “purely numeric” point of view, his h-index is 58 and his i-10 index is 260.) He is also well-known for his books on distributed computing. Michel Raynal is a senior member of the prestigious “Institut Universitaire de France”, and a member of Academia Europaea. He was the recipient of the 2015 “Int’l Award “Innovation in Distributed Computing” (also known as SIROCCO Prize).

Tutorial 2 (T2):

Building Efficient Cloud Middleware for HPC, Big Data, and Deep Learning Applications

Dhabaleswar K. Panda (The Ohio State University)
Xiaoyi Lu (The Ohio State University)

Location: EI 8 (Track 4)
Time: 8:00 am – 12:00 pm
URL: http://www.cse.ohio-state.edu/∼panda/icdcs18_cloud_tut.html

Abstract

To alleviate the cost burden, sharing cluster resources to end-users through virtualization is becoming necessary for modern cloud platforms. The recently introduced Single Root I/O Virtualization (SR-IOV) technique provides native I/O virtualization capabilities and is changing the landscape of I/O irtualization. In this tutorial, we first provide an overview of popular virtualization system software on cloud environments, such as hypervisors, containers, OpenStack, Slurm, etc., and high-performance communication mechanisms on clouds, such as InfiniBand, RDMA, SR-IOV, IVShmem, etc. We further discuss the opportunities and technical challenges of designing high-performance MPI runtime over cloud environments. We also discuss how to integrate these designs into popular cloud management systems like OpenStack and Slurm. Next, we will demonstrate how high-performance solutions can be designed to run Big Data and Deep Learning workloads (like Hadoop, Spark, TensorFlow) in cloud environments. Finally, we will provide demos of running these designs on the Chameleon Cloud Testbed.

Dr. Dhabaleswar K. (DK) Panda is a Professor of Computer Science at the Ohio State University. His research interests include parallel computer architecture, high performance computing, communication protocols, files systems, network- based computing, and Quality of Service. He has published over 400 papers in major journals and international conferences related to these research areas. Dr. Panda and his research group members have been doing extensive research on modern networking technologies including InfiniBand, HSE and RDMA over Converged Enhanced Ethernet (RoCE). His research group is currently collaborating with National Laboratories and leading InfiniBand and 10GigE/iWARP companies on designing various subsystems of next generation high-end systems. The MVAPICH2 (High Performance MPI over InfiniBand, iWARP and RoCE) open-source software libraries, developed by his research group, are currently being used by more than 2,900 organizations worldwide (in 86 countries). This software has enabled several InfiniBand clusters (including the 1st one) to get into the latest TOP500 ranking during the last decade. More than 466,000 downloads of these libraries have taken place from the project’s site. These software packages are also available with the Open Fabrics stack for network vendors (InfiniBand and iWARP), server vendors and Linux distributors. Dr. Panda’s research is supported by funding from US National Science Foundation, US Department of Energy, and several industry including Intel, Cisco, SUN, Mellanox, QLogic, NVIDIA and NetApp. He is an IEEE Fellow and a member of ACM. More details about Dr. Panda are available at http://www.cse.ohio-state.edu/∼panda.

Dr. Xiaoyi Lu is a Research Scientist of the Department of Computer Science and Engineering at the Ohio State University, USA. His current research interests include high performance interconnects and protocols, Big Data Processing, Parallel Computing Models (MPI/PGAS), Virtualization and Cloud Computing. He has published over 90 papers in international journals and conferences related to these research areas. He has been actively involved in various professional activities in academic journals and conferences. Recently, Dr. Lu is leading the research and development of RDMA-based accelerations for Apache Hadoop, Spark, HBase, and Memcached, and OSU HiBD micro-benchmarks, which are publicly available from http://hibd.cse.ohio-state.edu. These libraries are currently being used by more than 285 organizations from 34 countries. More than 26,150 downloads of these libraries have taken place from the project site. He is a core member of the MVAPICH2 project and he is leading the research and development of MVAPICH2-Virt (high-performance and scalable MPI for hypervisor and container based HPC cloud). He is a member of IEEE and ACM. More details about Dr. Lu are available at http://www.cse.ohio-state.edu/∼luxi.

Tutorial 3 (T3):

Distributed Service Prototyping with Cloud Functions

Josef Spillner (Zurich University of Applied Sciences)

Location: EI 9 (Track 5)
Time: 9:00 am – 5:30 pm

Abstract

This hands-on tutorial lets participants build an application based on cloud functions by combining serverless computing and service prototyping approaches. All necessary background information including recent research results is conveyed during the tutorial which focuses on two parts, development and operation of functions.

Serverless computing is an increasing industry trend with corresponding rise in interest by scholars. Cloud providers have introduced popular services around this computing paradigm since 2014 including AWS Lambda, IBM and Google Cloud Functions, Azure Functions and several offerings based on Kubernetes extensions. On the industrial and academic research side, with 6 publications in 2016 and already 28 in 2017 according to the Serverless Literature Dataset, a further growth can be expected in 2018. Increasingly, academic prototypes such as OpenLambda or EdgeScale are being proposed especially in relation with cloud, edge and fog computing among other distributed computing specialisations. Due to the strict separation between elastically scalable stateless and stateful services in this computing paradigm, the resulting applications which consist of composite cloud functions are inherently distributed with favourable characteristics such as elastic scalability and disposability. Therefore, due to the expected profound impact on future engineering of distributed applications, this tutorial will provide the background knowledge and hands-on skills to researchers in this domain.

Service prototyping is a technique to rapidly deliver a fully running online service using appropriate tools and technologies. In this tutorial, the hands-on parts will result in a prototypical serverless application which remains accessible as a distributed service after the tutorial. The distribution will encompass a cross-provider composition involving at least two different cloud providers and Function-as-a-Service (FaaS) runtime technologies.

Josef Spillner is a senior lecturer and head of the Service Prototyping Lab at Zurich University of Applied Sciences in Switzerland. His research interests include cloud-native applications, service tooling and cloud accounting & billing. With his team, he works on challenging topics such as microservices, function-as-a-service and time series analysis. Before founding the lab, he conducted research at TUD, SAP, NTUU, UFCG and UniBZ and founded the Open Source Service Platform Research Initiative to promote re-usable software for scientific work. His work approach continues to promote international exchange, most recently at PTI, and modernisation of research and publication approaches. He published a doctoral dissertation about metaquality of services and a habilitation treatise about stealth computing in multi-cloud environments.

Tutorial 4 (T4):

High Performance Network Services: The Role of NFV and Kernel Bypass Networking

K. Ramakrishnan (University of California Riverside)
Timothy Wood (George Washington University)

Location: EI 10 (Track 6)
Time: 9:00 am – 5:30 pm

Abstract

Network Function Virtualization (NFV) is an important area for networking at this time, as it brings much needed flexibility to accommodate innovation in an increasingly inflexible, large-scale, global network infrastructure. Recently, several high performance I/O libraries such as netmap, DPDK, and PF_RING have emerged to allow developers and researchers to build efficient NF prototypes. These libraries typically enable packet processing rates of 10 Gbps or higher by avoiding the kernel’s networking stack. While this has been a boon for accelerating individual applications, these libraries do not assist with the composition of NFs nor their management. Thus, while the low-level tools to build network functions are becoming available, there is also a need for platforms that provide the higher level abstractions needed to compose them into service chains, facilitate protocol processing, and manage their resources. This tutorial seeks to elucidate the challenges and opportunities in software-based networks by providing hands-on experience with key NFV technologies.

OpenNetVM is a management framework for high performance networked middleboxes and end-point applications. The framework provides an abstraction layer to simplify deployment of network functions in containers. It can be used not only for simple layer 2/3 middleboxes, but also integrated with a user-space TCP stack to provide end-host services. OpenNetVM is publicly available as open source on Github (https://github.com/sdnfv/openNetVM) and is being used by several academic and industrial groups.

This tutorial will provide valuable hands-on experience to distributed systems researchers interested in learning about high performance network middleboxes, and requires no prior NFV knowledge or experience. The tutorial will introduce attendees to an overview of NFV technology, motivation for NFV from the perspective of service providers, the DPDK I/O library, the mTCP user-space stack, and the OpenNetVM NF management platform. Attendees will get hands-on experience running a mixture of high performance middleboxes and end-host applications on an integrated platform managed by OpenNetVM.

Timothy Wood is an associate professor in the Department of Computer Science at George Washington University. Before joining GW, he received a doctoral degree in computer science from the University of Massachusetts Amherst and a bachelor’s degree in electrical and computer engineering from Rutgers University. His research studies how new virtualization technologies can provide application agnostic tools that improve performance, efficiency, and reliability in cloud computing data centers and software-based networks. His PhD thesis received the UMass CS Outstanding Dissertation Award, his students have voted him CS Professor of the Year, and he has won three best paper awards, a Google Faculty Research Award, and an NSF Career award.

K. K. Ramakrishnan is Professor of Computer Science and Engineering at the University of California, Riverside. Previously, he was a Distinguished Member of Technical Staff at AT&T Labs-Research. He joined AT&T Bell Labs in 1994 and was with AT&T Labs-Research since its inception in 1996. Prior to 1994, he was a Technical Director and Consulting Engineer in Networking at Digital Equipment Corporation. Between 2000 and 2002, he was at TeraOptic Networks, Inc., as Founder and Vice President.

Dr. Ramakrishnan is an ACM Fellow, an IEEE Fellow and an AT&T Fellow, recognized for his fundamental contributions on communication networks, including his work on congestion control, traffic management and VPN services. His work on the “DECbit” congestion avoidance protocol received the ACM Sigcomm Test of Time Paper Award in 2006. He has published nearly 250 papers and has 167 patents issued in his name. K.K. has been on the editorial board of several journals and has served as the TPC Chair and General Chair for several networking conferences and has been a member of the National Research Council Panel on Information Technology for NIST. K. K. received his MTech from the Indian Institute of Science (1978), MS (1981) and Ph.D. (1983) in Computer Science from the University of Maryland, College Park, USA.