Alibaba Cloud Accelerates AI Applications

Analytics Zoo and bfloat16 improve the performance of AI applications on seventh-generation Alibaba Cloud ECS instances.

At a Glance:

Seventh-generation Alibaba Cloud high-frequency ECS instances use the third generation of X-Dragon Architecture and 3rd Generation Intel® Xeon® Scalable processors.
3rd Generation Intel Xeon Scalable processors deliver industry-leading and workload-optimized platforms by using enhanced Intel® Deep Learning Boost (Intel® DL Boost), which is a built-in artificial intelligence (AI) acceleration feature. Enhanced Intel DL Boost provides the first x86 support for bfloat16 in the industry, which enhances AI inference and training performance.

Download the one-page summary

โดย

PDF

Executive Overview

This paper describes how to use Analytics Zoo and Brain Floating Point 16-bit (bfloat16) to improve the performance of artificial intelligence (AI) applications running on seventh-generation Alibaba Cloud Elastic Compute Service (ECS) instances.

Seventh-generation Alibaba Cloud ECS instances are powered by 3rd Generation Intel® Xeon® Scalable processors, and they provide bfloat16 support.

3rd Generation Intel Xeon Scalable processors can process complex AI workloads. By using enhanced Intel DL Boost, 3rd Generation Intel Xeon Scalable processors can deliver up to 1.93 times the AI training performance,¹ up to 1.87 times the AI inference performance for image classification,² up to 1.7 times the AI training performance for natural language processing (NLP),³ and up to 1.9 times the AI inference performance for NLP, compared with previous-generation processors.⁴ Many AI-training workloads from industry sectors such as healthcare, financial, and retail can benefit from the bfloat16 support provided by these processors.

Read the white paper - Accelerating AI Applications on Alibaba Cloud with Analytics Zoo and Bfloat16.

สำรวจผลิตภัณฑ์และโซลูชันที่เกี่ยวข้อง

Drive actionable insight, count on hardware-based security, and deploy dynamic service delivery with Intel® Xeon® Scalable processors.

Learn more

Seamlessly scale your AI models to big data clusters with thousands of nodes for distributed training or inference.

Learn more

Intel® Xeon® Scalable processors take embedded AI performance to the next level with Intel® Deep Learning Boost.

Learn more

แสดงเพิ่ม แสดงน้อยลง

เรื่องราวของลูกค้าและกรณีศึกษา

เรียนรู้เรื่องราวล่าสุดเกี่ยวกับลูกค้า กรณีศึกษา และข่าวประชาสัมพันธ์ที่เน้นย้ำถึงนวัตกรรมที่มีข้อมูลเป็นศูนย์กลาง

เวิร์กโหลดศูนย์ข้อมูล

เรียนรู้วิธีที่เทคโนโลยี Intel® สามารถช่วยส่งมอบขีดความสามารถในการปรับขนาดที่จำเป็นสำหรับเวิร์กโหลดและแอปพลิเคชันที่จำเป็นต้องใช้การประมวลผลสูง

ข้อมูลเชิงลึกของศูนย์ข้อมูล

รับข้อมูลล่าสุดเกี่ยวกับประสิทธิภาพ ความยืดหยุ่น และความสามารถในการปรับขนาดศูนย์ข้อมูลของ Intel

ข้อมูลผลิตภัณฑ์และประสิทธิภาพ

¹Up to 1.93x higher AI training performance with a 3rd Generation Intel Xeon Scalable processor supporting Intel DL Boost with BF16 vs. a prior-generation processor with ResNet-50 throughput for image classification. New configuration: 1 node, 4 x 3rd Generation Intel Xeon Platinum 8380H processor (pre-production 28 cores, 250 W) with 384 GB total memory (24 x 16 GB, 3,200 GHz), 800 GB Intel SSD drive, ResNet-50 v1.5, ucode 0x700001b, Intel Hyper-Threading Technology (Intel HT Technology) on, Intel Turbo Boost Technology on, and running Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic. Throughput: https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642769358b388d8f615ded9c213f10c99a, Model Zoo: https://github.com/IntelAI/models -b v1.6.1, ImageNet dataset, oneDNN 1.4, BF16, BS=512, tested by Intel on 5/18/2020. Baseline: 1 node, 4 x Intel Xeon Platinum 8280 processor with 768 GB total memory (24 x 32 GB, 2,933 GHz), 800 GB Intel SSD, ucode 0x4002f00, Intel HT Technology on, Intel Turbo Boost Technology on, with Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic, ResNet-50 v1.5. Throughput: https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642760358b388d8f615ded0c213f10c99a, Model Zoo: https://github.com/intelai/models -b v1.6.1, ImageNet dataset, oneDNN 1.4, FP32, BS=512, tested by Intel on 5/18/2020.

²Up to 1.87x higher AI inference performance with a 3rd Generation Intel Xeon Scalable processors supporting Intel DL Boost with BF16 compared to prior-generation processors using FP32 on ResNet-50 throughput for image classification. New configuration: 1 node, 4 x 3rd Generation Intel Xeon Platinum 8380H processor (pre-production, 28 cores, 250 W) with 384 GB total memory (24 x 16 GB, 3,200 GHz), 800 GB Intel SSD, ucode 0x700001b, Intel HT Technology on, Intel Turbo Boost Technology on with Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic, ResNet-50 v1.5. Throughput: https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642760358b388e8r615ded0c213f10c99a, Model Zoo: https://github.com/IntelAI/models -b v1.6.1, ImageNet dataset, oneDNN 1.4, BF16, BS=56, 5 instances, 28 cores/instance, tested by Intel on 5/18/2020. Baseline: 1 node, 4 x Intel Xeon Platinum 8280 processors with 768 GB total memory (24 x 32 GB, 2,933 GHz), 800 GB Intel SSD, ucode 0x4002f00, Intel HT Technology on, Intel Turbo Boost Technology on, with Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic, ResNet-50 v1.5. Throughput: https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642760358b388d8f615ded0c213f10c99a, Model Zoo: https://github.com/IntelAI/models -b v1.6.1, ImageNet dataset, oneDNN 1.5, FP32, BS=56, 4 instances, 28 cores/instance, tested by Intel on 5/18/2020.

³Up to 1.7x more AI training performance with a 3rd Generation Intel Xeon Scalable processor supporting Intel DL Boost with BF16 vs. a prior-generation processor on BERT throughput for natural language processing. New configuration: 1 node, 4 x 3rd Generation Intel Xeon Platinum 8380H processor (pre-production, 28 cores, 250 W) with 384 GB total memory (24 x 16 GB, 3,200 GHz), 800 GB Intel SSD, ucode 0x700001b, Intel HT Technology on, Intel Turbo Boost Technology on with Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic, BERT-Large (QA). Throughput: https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642760358b388e8r615ded0c213f10c99a, Model Zoo: https://github.com/IntelAI/models -b v1.6.1, Squad 1.1 dataset, oneDNN 1.4, BF16, BS=12, tested by Intel on 5/18/2020. Baseline: 1 node, 4 x Intel Xeon Platinum 8280 processors with 768 GB total memory (24 x 32 GB, 2,933 GHz), 800 GB Intel SSD, ucode 0x4002f00, Intel HT Technology on, Intel Turbo Boost Technology on, with Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic, BERT-Large (QA). Throughput: https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642760358b388d8f615ded0c213f10c99a, Model Zoo: https://github.com/IntelAI/models -b v1.6.1, Squad 1.1 dataset, oneDNN 1.5,FP32, BS=12, tested by Intel on 5/18/2020.

⁴Up to 1.9x higher AI inference performance with a 3rd Generation Intel Xeon Scalable processor supporting Intel DL Boost with BF16 vs. a prior-generation processor with FP32 for BERT throughput for natural language processing. New configuration: 1 node, 4 x 3rd Generation Intel Xeon Platinum 8380H processor (pre-production, 28 cores, 250 W) with 384 GB total memory (24 x 16 GB, 3,200 GHz), 800 GB Intel SSD, ucode 0x700001b, Intel HT Technology on, Intel Turbo Boost Technology on with Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic, BERT-Large (QA). Throughput: https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642760358b388e8r615ded0c213f10c99a, Model Zoo: https://github.com/IntelAI/models -b v1.6.1, Squad 1.1 dataset, oneDNN 1.4, BF16, BS=32, 4 instances, 28 cores/instance, tested by Intel on 5/18/2020. Baseline: 1 node, 4 x Intel Xeon Platinum 8280 processors with 768 GB total memory (24 x 32 GB, 2,933 GHz), 800 GB Intel SSD, ucode 0x4002f00, Intel HT Technology on, Intel Turbo Boost Technology on, with Ubuntu 20.04 LTS, Linux 5.4.0-26,28,29-generic, BERTLarge (QA). Throughput: https://github.com/Intel-tensorflow/tensorflow -b bf16/base, commit#828738642760358b388d8f615ded0c213f10c99a, Model Zoo: https://github.com/IntelAI/models -b v1.6.1, Squad 1.1 dataset, oneDNN 1.5, FP32, BS=32, 4 instances, 28 cores/instance, tested by Intel on 5/18/2020.

เลือกภาษาของคุณ

ใช้งานการค้นหาของ Intel.com

ลิงค์ด่วน

การค้นหาล่าสุด

ค้นหาขั้นสูง

ค้นหาเฉพาะใน

Alibaba Cloud Accelerates AI Applications

At a Glance:

Executive Overview

สำรวจผลิตภัณฑ์และโซลูชันที่เกี่ยวข้อง

เรื่องราวของลูกค้าและกรณีศึกษา

เวิร์กโหลดศูนย์ข้อมูล

ข้อมูลเชิงลึกของศูนย์ข้อมูล

ข้อมูลผลิตภัณฑ์และประสิทธิภาพ

ใช้งานการค้นหาของ Intel.com

ลิงค์ด่วน

การค้นหาล่าสุด

ค้นหาขั้นสูง

ค้นหาเฉพาะใน

Alibaba Cloud Accelerates AI Applications

At a Glance:

Executive Overview

สำรวจผลิตภัณฑ์และโซลูชันที่เกี่ยวข้อง

Intel® Xeon® Scalable Processors

Analytics Zoo

Intel® Deep Learning Boost

เรื่องราวของลูกค้าและกรณีศึกษา

เวิร์กโหลดศูนย์ข้อมูล

ข้อมูลเชิงลึกของศูนย์ข้อมูล

ข้อมูลผลิตภัณฑ์และประสิทธิภาพ