Résumé
Practical patterns for scaling machine learning from your laptop to a distributed cluster.
In Distributed Machine Learning Patterns you will learn how to:
Apply distributed systems patterns to build scalable and reliable machine learning projects
Construct machine learning pipelines with data ingestion, distributed training, model serving, and more
Automate machine learning tasks with Kubernetes, TensorFlow, Kubeflow, and Argo Workflows
Make trade offs between different patterns and approaches
Manage and monitor machine learning workloads at scale
Scaling up models from standalone devices to large distributed clusters is one of the biggest challenges faced by modern machine learning practitioners. Distributed Machine Learning Patterns teaches you how to scale machine learning models from your laptop to large distributed clusters.
In Distributed Machine Learning Patterns, you'll learn how to apply established distributed systems patterns to machine learning projects, and explore new ML-specific patterns as well. Firmly rooted in the real world, this book demonstrates how to apply patterns using examples based in TensorFlow, Kubernetes, Kubeflow, and Argo Workflows. Real-world scenarios, hands-on projects, and clear, practical DevOps techniques let you easily launch, manage, and monitor cloud-native distributed machine learning pipelines
Distributed Machine Learning Patterns teaches you how to scale machine learning models from your laptop to large distributed clusters. In it, you'll learn how to apply established distributed systems patterns to machine learning projects, and explore new ML-specific patterns as well. Firmly rooted in the real world, this book demonstrates how to apply patterns using examples based in TensorFlow, Kubernetes, Kubeflow, and Argo Workflows. Real-world scenarios, hands-on projects, and clear, practical DevOps techniques let you easily launch, manage, and monitor cloud-native distributed machine learning pipelines. about the technology Scaling up models from standalone devices to large distributed clusters is one of the biggest challenges faced by modern machine learning practitioners. Distributing machine learning systems allow developers to handle extremely large datasets across multiple clusters, take advantage of automation tools, and benefit from hardware accelerations. In this book, Kubeflow co-chair Yuan Tang shares patterns, techniques, and experience gained from years spent building and managing cutting-edge distributed machine learning infrastructure. about the book Distributed Machine Learning Patterns is filled with practical patterns for running machine learning systems on distributed Kubernetes clusters in the cloud. Each pattern is designed to help solve common challenges faced when building distributed machine learning systems, including supporting distributed model training, handling unexpected failures, and dynamic model serving traffic. Real-world scenarios provide clear examples of how to apply each pattern, alongside the potential trade offs for each approach. Once you've mastered these cutting edge techniques, you'll put them all into practice and finish up by building a comprehensive distributed machine learning system.table of contents PART 1: BASIC CONCEPTS AND BACKGROUND READ IN LIVEBOOK 1INTRODUCTION TO DISTRIBUTED MACHINE LEARNING SYSTEMS PART 2: PATTERNS OF DISTRIBUTED MACHINE LEARNING SYSTEMS READ IN LIVEBOOK 2DATA INGESTION PATTERNS READ IN LIVEBOOK 3DISTRIBUTED TRAINING PATTERNS READ IN LIVEBOOK 4MODEL SERVING PATTERNS READ IN LIVEBOOK 5WORKFLOW PATTERNS READ IN LIVEBOOK 6OPERATION PATTERNS PART 3: BUILDING A DISTRIBUTED MACHINE LEARNING PIPELINE 7 OVERVIEW OF PROJECT ARCHITECTURE 8 OVERVIEW OF RELEVANT TECHNOLOGIES 9 A COMPLETE IMPLEMENTATIONYuan Tang is currently a founding engineer at Akuity. Previously he was a senior software engineer at Alibaba Group, building AI infrastructure and AutoML platforms on Kubernetes. Yuan is co-chair of Kubeflow, maintainer of Argo, TensorFlow, XGBoost, and Apache MXNet. He is the co-author of TensorFlow in Practice and author of the TensorFlow implementation of Dive into Deep Learning.