Neural processing unit
A neural processing unit (NPU), also known as AI accelerator or deep learning processor, is a class of specialized hardware accelerator[1] or computer system[2][3] designed to accelerate artificial intelligence (AI) and machine learning applications, including artificial neural networks and computer vision.
Use
[edit]Their purpose is either to efficiently execute already trained AI models (inference) or to train AI models. Their applications include algorithms for robotics, Internet of things, and data-intensive or sensor-driven tasks.[4] They are often manycore designs and focus on low-precision arithmetic, novel dataflow architectures, or in-memory computing capability. As of 2024[update], a typical datacenter-grade AI integrated circuit chip, the H100 GPU, contains tens of billions of MOSFETs.[5]
Consumer devices
[edit]AI accelerators are used in mobile devices such as Apple iPhones, AMD AI engines[6] in Versal and NPUs, Huawei, and Google Pixel smartphones,[7] and seen in many Apple Silicon, Qualcomm, Samsung, and Google Tensor smartphone processors.[8]
It is more recently (circa 2022) added to computer processors from Intel,[9] AMD,[10] and Apple silicon.[11] All models of Intel Meteor Lake processors have a built-in versatile processor unit (VPU) for accelerating inference for computer vision and deep learning.[12]
On consumer devices, the NPU is intended to be small, power-efficient, but reasonably fast when used to run small models. To do this they are designed to support low-bitwidth operations using data types such as INT4, INT8, FP8, and FP16. A common metric is trillions of operations per second (TOPS), though this metric alone does not quantify which kind of operations are being done.[13]
Datacenters
[edit]Accelerators are used in cloud computing servers, including tensor processing units (TPU) in Google Cloud Platform[14] and Trainium and Inferentia chips in Amazon Web Services.[15] Many vendor-specific terms exist for devices in this category, and it is an emerging technology without a dominant design.
Graphics processing units designed by companies such as Nvidia and AMD often include AI-specific hardware, and are commonly used as AI accelerators, both for training and inference.[16]
Programming
[edit]Mobile NPU vendors typically provide their own application programming interface such as the Snapdragon Neural Processing Engine. An operating system or a higher-level library may provide a more generic interface such as TensorFlow Lite with LiteRT Next (examples are for Android as iOS has no equivalent public interface).
Consumer CPU-integrated NPUs are accessible through vendor-specific APIs. AMD (Ryzen AI), Intel (OpenVINO), Apple Silicon (MLX) each have their own APIs, which can be built upon by a higher-level library.
GPUs generally use existing GPGPU pipelines such as CUDA and OpenCL adapted for lower precisions. Custom-built systems such as the Google TPU use private interfaces.
References
[edit]- ^ "Intel unveils Movidius Compute Stick USB AI Accelerator". July 21, 2017. Archived from the original on August 11, 2017. Retrieved August 11, 2017.
- ^ "Inspurs unveils GX4 AI Accelerator". June 21, 2017.
- ^ Wiggers, Kyle (November 6, 2019) [2019], Neural Magic raises $15 million to boost AI inferencing speed on off-the-shelf processors, archived from the original on March 6, 2020, retrieved March 14, 2020
- ^ "Google Designing AI Processors". May 18, 2016. Google using its own AI accelerators.
- ^ Moss, Sebastian (March 23, 2022). "Nvidia reveals new Hopper H100 GPU, with 80 billion transistors". Data Center Dynamics. Retrieved January 30, 2024.
- ^ Brown, Nick (February 12, 2023). "Exploring the Versal AI Engines for Accelerating Stencil-based Atmospheric Advection Simulation". Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays. FPGA '23. New York, NY, USA: Association for Computing Machinery: 91–97. doi:10.1145/3543622.3573047. ISBN 978-1-4503-9417-8.
- ^ "HUAWEI Reveals the Future of Mobile AI at IFA".
- ^ https://docs.qualcomm.com/bundle/publicresource/87-71408-1_REV_B_Snapdragon_8_gen_3_Mobile_Platform_Product_Brief.pdf
- ^ "Intel's Lunar Lake Processors Arriving Q3 2024". Intel. May 20, 2024.
- ^ "AMD XDNA Architecture".
- ^ "Deploying Transformers on the Apple Neural Engine". Apple Machine Learning Research. Retrieved August 24, 2023.
- ^ "Intel to Bring a 'VPU' Processor Unit to 14th Gen Meteor Lake Chips". PCMAG. August 2022.
- ^ "A guide to AI TOPS and NPU performance metrics".
- ^ Jouppi, Norman P.; et al. (June 24, 2017). "In-Datacenter Performance Analysis of a Tensor Processing Unit". ACM SIGARCH Computer Architecture News. 45 (2): 1–12. arXiv:1704.04760. doi:10.1145/3140659.3080246.
- ^ "How silicon innovation became the 'secret sauce' behind AWS's success". Amazon Science. July 27, 2022. Retrieved July 19, 2024.
- ^ Patel, Dylan; Nishball, Daniel; Xie, Myron (November 9, 2023). "Nvidia's New China AI Chips Circumvent US Restrictions". SemiAnalysis. Retrieved February 7, 2024.
External links
[edit]- Nvidia Puts The Accelerator To The Metal With Pascal, The Next Platform
- Eyeriss Project, MIT