Case Study Intel® Memory Failure Prediction Meituan eCommerce Platform for Services ® Intel Memory Failure Prediction Improves Reliability at Meituan

Intel® Memory Failure Prediction uses machine learning to send potential memory failure alerts prior to hardware failure and thus reducing impact of downtime. Business Meituan, founded in March 2010, is a company based in China that offers an online delivery and social commerce platform providing discounts for movie tickets, groceries, food, restaurants, entertainment and health/fitness products and services. Figure 1. Meituan Beijing Headquarter Challenges • Real-time visibility into server memory health • Predicting catastrophic server memory failures before they happen Solution • Intel® Memory Failure Prediction Executive Summary Meituan-Dianping (Meituan), a Chinese leading e-commerce platform for services, setup Intel® Memory Failure Prediction (Intel® MFP) for a test deployment over several thousands of servers based on Intel® Xeon® Scalable Processors to help improve the performance and reliability of its server memory which is essential to fast data analytics computing. Meituan deployed Intel® MFP in its data center, integrating it into their existing management solutions to take advantage of its memory analysis and predictive capabilities. The aim is to help them analyze and model server memory-failure data in order to predict potential failures, prevent downtime, and optimize their current Dual Inline Memory Module (DIMM) upgrade. The Intel® MFP deployment resulted in improved memory reliability by predictions based on the analysis of the micro-level memory failure logs. Intel® MFP allowed data center staff to migrate workloads before catastrophic memory failures could happen, use page offlining policies to isolate unreliable memory cells or pages, or replace failing DIMMs before they reach a terminal stage, thus reducing downtime by responding appropriately before server failure occurs. “We would thank Intel for Memory Failure Prediction collaboration with Meituan” said Rui Guo who is the leader of Infrastructure/Server technology at Meituan, “the testing results indicates, with Intel® MFP’s prediction capabilities, it could significantly reduce server hardware failures by up to 40 percent.”. 1 Case Study | Intel® Memory Failure Prediction Enhances Reliability And Eradicates Impact of Memory Failure Background Meituan-Dianping, a China leading e-commerce platform for services, has Meituan, Dianping, Takeaway, Taxi, Mobike and other well-known apps for customers. Services include catering, takeaway, taxi, bike, hotels. There are more than 200 categories such as tourism, film, entertainment and the business covers 2800 cities. To remain successful and competitive, Meituan has to be able to rely on the health of its data center infrastructure and predict failures to act proactively. Memory failures are one of the top three hardware failures that occur in data centers today. Using Machine Learning to analyze real-time memory health data would make it possible to predict such failures ahead of time, and this ultimately translates to a better experience for their customers. This is why Meituan deployed Intel® MFP in a test environment containing several thousands of servers with Intel® Xeon® Scalable Processors. They integrated Intel® MFP into their existing data center monitoring solution and were able to gain greater insights into server memory health. Intel® MFP is an ideal solution for organizations such as online services platforms and cloud service providers relying heavily on server hardware reliability, availability and serviceability. The solution helps to significantly reduce memory failure events by analyzing data and then predicting catastrophic events before they happen. Intel® MFP Provides Real-time Memory Health Visibility Intel® MFP uses machine learning to analyze server memory errors down to the DIMM, bank, column, row, and cell levels to generate