ChatGPT and other massive models represents an amazing step forward in AI, yet they do not solve real-world business problems. In this session, Jordan Plawner, Global Director of Artificial Intelligence Product Manager and Strategy at Intel, surveys how the AI ecosystem has worked non-stop over this last year to take these all-purpose multi-task models and optimize them to they can be used by organizations to address domain specific problems. He explains these new AI-for-the-real world techniques and methods such as fine tuning and how they can be applied to deliver results which are highly performant with state-of-the-art accuracy while also being economical to build and deploy everywhere to enhance products and services.
ChatGPT and other massive models represents an amazing step forward in AI, yet they do not solve real-world business problems. In this session, Jordan Plawner, Global Director of Artificial Intelligence Product Manager and Strategy at Intel, surveys how the AI ecosystem has worked non-stop over this last year to take these all-purpose multi-task models and optimize them to they can be used by organizations to address domain specific problems. He explains these new AI-for-the-real world techniques and methods such as fine tuning and how they can be applied to deliver results which are highly performant with state-of-the-art accuracy while also being economical to build and deploy everywhere to enhance products and services.
Video:
Presentation slides:
Complete the form below to access the full overview:
Videos
TorchTitan is a proof-of-concept for Large-scale LLM training using native PyTorch. It is a repo that showcases PyTorch's latest distributed training features in a clean, minimal codebase.
In this talk, Tianyu will share TorchTitan’s design and optimizations for the Llama 3.1 family of LLMs, spanning 8 billion to 405 billion parameters, and showcase its performance, composability, and scalability.
As large-scale machine learning becomes increasingly GPU-centric, modern high-performance hardware like NVMe storage and RDMA networks (InfiniBand or specialized NICs) are becoming more widespread. To fully leverage these resources, it’s crucial to build a balanced architecture that avoids GPU underutilization. In this talk, we will explore various strategies to address this challenge by effectively utilizing these advanced hardware components. Specifically, we will present experimental results from building a Kubernetes-native distributed caching layer, utilizing NVMe storage and high-speed RDMA networks to optimize data access for PyTorch training.