Compiling SYCL* for Different GPUs

เผยแพร่: 05/11/2022

Overview

This document demonstrates how a SYCL* application can be compiled and executed on different graphics processing units (GPUs) from Intel, AMD, NVIDIA*, etc.

A Simple SYCL Example

The following is a sample SYCL program that we will be using to compile on different GPUs, captured in the file vector_add.cpp:

#include <iostream>
#include <CL/sycl.hpp>

using namespace sycl;
class vector_addition;

int main(int, char**) {

   float4 vec_a = { 2.0, 3.0, 7.0, 4.0 };
   float4 vec_b = { 4.0, 6.0, 1.0, 3.0 };
   float4 vec_c = { 0.0, 0.0, 0.0, 0.0 };

   default_selector device_selector;

   queue queue(device_selector);

   std::cout << "Running on "
             << queue.get_device().get_info<info::device::name>()
             << "\n";
   {
      buffer<float4, 1> vec_a_sycl(&vec_a, range<1>(1));
      buffer<float4, 1> vec_b_sycl(&vec_b, range<1>(1));
      buffer<float4, 1> vec_c_sycl(&vec_c, range<1>(1));

      queue.submit([&] (cl::sycl::handler& cgh) {

         auto vec_a_acc = vec_a_sycl.get_access<access::mode::read>(cgh);
         auto vec_b_acc = vec_b_sycl.get_access<access::mode::read>(cgh);
         auto vec_c_acc = vec_c_sycl.get_access<access::mode::discard_write>(cgh);

         cgh.single_task<class vector_addition>([=] () {
         vec_c_acc[0] = vec_a_acc[0] + vec_b_acc[0];
         });
      });
   }

   std::cout << "  Vec_A { " << vec_a.x() << ", " << vec_a.y() << ", " << vec_a.z() << ", " << vec_a.w() << " }\n"
        << "+ Vec_B { " << vec_b.x() << ", " << vec_b.y() << ", " << vec_b.z() << ", " << vec_b.w() << " }\n"
        << "----------------------\n"
        << "= Vec_C { " << vec_c.x() << ", " << vec_c.y() << ", " << vec_c.z() << ", " << vec_c.w() << " }"
        << std::endl;

   return 0;
}    

Following is the expected output:

Running on Intel® Xeon® Gold 6128 CPU @ 3.40GHz

  Vec_A { 2, 3, 7, 4 }
+ Vec_B { 4, 6, 1, 3 }

----------------------

= Vec_C { 6, 9, 8, 7 }

Intel® oneAPI DPC++/C++ Compiler

The Intel® oneAPI DPC++/C++ Compiler compiles C++ and SYCL source files with code for both CPU and a wide range of compute accelerators such as GPU and FPGA. The Intel oneAPI DPC++/C++ Compiler provides optimizations that help your applications to run faster on Intel® 64 and IA-32 (Windows* and Linux* only) architectures, with support for the latest C, C++, and DPC++ language standards (including C++17).

The Intel oneAPI DPC++/C++ Compiler is available as part of the Intel® oneAPI Base Toolkit, Intel oneAPI HPC Toolkit, Intel oneAPI IoT Toolkit, or as a standalone compiler. Refer to the Intel oneAPI DPC++/C++ Compiler main page for more information about features, specifications, and downloads, and Intel oneAPI DPC++/C++ Compiler System Requirement for dependencies to be downloaded.

Intel® oneAPI DPC++/C++ Compiler for Intel Hardware

Dependencies Required

  • Intel® oneAPI DPC++/C++ Compiler

Build and Set Up the Compiler

The Intel oneAPI DPC++/C++ Compiler is part of the Intel oneAPI Base Toolkit package which has compilers, libraries and profiling tools. The Intel oneAPI Base Toolkit can be installed using the apt package name: intel-basekit

You can also install just the compiler using the apt package name: intel-oneapi-compiler-dpcpp-cpp

More information on DPC++/C++ compiler installation can be found here: Intel® oneAPI Toolkits Installation Guide for Linux* OS.

You can find the list of package names here: Intel oneAPI packages.

Installation Steps:

  1. Set up the repository:
# download the key to system keyring
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null

# add signed entry to apt sources and configure the APT client to use Intel repository:
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
  1. Update packages list and repository index:
sudo apt-get update
  1. Install the compiler with the following command:
sudo apt install intel-oneapi-compiler-dpcpp-cpp

Compiling SYCL Example

  1. To build the vector_add using DPC++/C++ on an Intel® graphics processor unit (GPU), use:
dpcpp vector_add.cpp -o vector_add-sycl
  1. To run the vector_add, use:
./vector_add-sycl

DPC++/C++ Compiler Options

The following table shows the list of options supported by Intel oneAPI DPC++/C++ compiler:

FC

Displays the full path of source files passed to the compiler in diagnostics.

fexceptions

Enables exception handling table generation.

Fp

Lets you specify an alternate path or file name for precompiled header files.

fsycl

Enables a program to be compiled as a SYCL* program rather than as plain C++11 program.

fsycl-early-optimizations

Enables LLVM-related optimizations before SPIR-V* generation.

fsycl-help

Causes help information to be emitted from the device compiler backend. This content is specific to DPC++.

fsycl-targets

Tells the compiler to generate code for specified device targets.

g

Tells the compiler to generate a level of debugging information in the object file.

O

Specifies the code optimization for applications.

w

Disables all warning messages.

Wall

Enables warning and error diagnostics.

dpcpp --help

Help displayed for DPC++ specific options.

dpcpp --version

Displays DPC++/C++ compiler version.

Intel oneAPI DPC++/C++ compiler supports many compiler options that can be used in applications. Refer the following alphabetical list of compiler options that are included with their short descriptions.

DPC++-LLVM (CLang-LLVM)

The Data Parallel C++ or DPC++ is a LLVM-based compiler project that implements compiler and runtime support for the SYCL language.

This project is an open source collaboration on the DPC++ compiler implementation in LLVM across a variety of architectures, prototyping compiler and runtime library solutions, designing future extensions, and conducting experiments.

The LLVM Core libraries provide a modern source and target-independent optimizer, along with code generation support for many popular CPUs. These libraries are built around a well-specified code representation known as the LLVM intermediate representation.

Clang is an LLVM native C/C++/Objective-C compiler, which aims to deliver amazingly fast compiles, extremely useful error and warning messages, and to provide a platform for building great source-level tools.

The compiler can be downloaded from LLVM/Clang Intel.

DPC++ LLVM NVIDIA*

Dependencies Required

  • Git
  • CMake* version 3.14 or later
  • Python*
  • Ninja
  • C++ compiler-GCC version 7.1.0 or later (including libstdc++)
  • The CUDA* toolkit by NVIDIA

Build and Set Up the Compiler

Build a CLANG-LLVM compiler on Linux with CUDA NVIDIA support by following DPC toolchain with NVIDIA CUDA support.

To use DPC++ LLVM for CUDA, compile the DPC++ LLVM project for targeting NVIDIA devices. Follow the commands below to compile and install DPC++ for CUDA.

Enabling --cuda flag to configure.py requires an installation of CUDA 10.2 on the system; refer to the NVIDIA CUDA Installation Guide for Linux.

git clone https://github.com/intel/llvm.git -b sycl
cd llvm
python ./buildbot/configure.py --cuda -t release --cmake-gen "Unix Makefiles"
cd build
make install -j `nproc`

Set Up and Build the SYCL Example

  1. Set the paths for compiler with NVIDIA support
export DPCPP_HOME=<COMPILER_PATH>/llvm/build
export PATH=$DPCPP_HOME/bin:$PATH
export LD_LIBRARY_PATH=$DPCPP_HOME/lib:$LD_LIBRARY_PATH
  1. To build vector_add using Clang-LLVM on NVIDIA GPU, the following command is used:
clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda vector_add.cpp -o vector_add-cuda

Note: fsycl-targets=nvptx64-nvidia-cuda is a specific flag for using the CUDA target.

  1. To run the vector_add, use:
./vector_add-cuda

DPC++ LLVM for AMD

Dependencies Required

  • Git
  • CMake version 3.14 or later
  • Python 
  • Ninja
  • C++ compiler-GCC version 7.1.0 or later (including libstdc++)
  • ROCm toolkit by AMD

Build and Set Up the Compiler

Build a Clang-LLVM compiler on Linux with HIP AMD support by following DPC toolchain with AMD ROCm support.

To enable support for HIP devices, follow the instructions for the Linux DPC++ toolchain, but add the --hip flag to configure.py.

Enabling --hip flag requires an installation of ROCm on the system; refer to AMD ROCm Installation Guide for Linux.

git clone https://github.com/intel/llvm.git -b sycl
cd llvm

python ./buildbot/configure.py --hip -t release --cmake-gen "Unix Makefiles"
cd build

make install -j `nproc`

Set Up and Build the SYCL Example

  1. Set the paths for DPCPP compiler with HIP support:
export DPCPP_HIP_COMPILER_ROOTDIR=<COMPILER_PATH>/llvm/build
export PATH=$DPCPP_COMP_HIP_ROOTDIR/bin:$PATH
export LD_LIBRARY_PATH=$DPCPP_COMP_HIP_ROOTDIR/lib:$LD_LIBRARY_PATH
  1. To build vector_add using Clang-LLVM on AMD GPU, the following command is used:
clang++ -fsycl -fsycl-targets=amdgcn-amd-amdhsa -Xsycl-target-backend –offload-arch=gfx906 vector_add.cpp -o vector_add-amd

While building for HIP AMD, use the AMD target triple and specify the target architecture with -Xsycl-target-backend --offload-arch=<arch>.

  1. To run the vector_add, use:
./vector_add-amd

DPC++ LLVM Compiler Options

the following table shows the list of SYCL-specific options supported by the compiler.

-fsycl

General enabling option for SYCL* compilation and linking mode.

-fsycl-targets=<T1>[,...,<Tn>]

Enables ahead of time (AOT) compilation for specified device targets.

-f[no-]sycl-unnamed-lambda

Enables/disables unnamed SYCL lambda kernels support.

-f[no-]sycl-early-optimizations

Enables/disables intermediate representation optimization pipeline before translation to SPIR-V*.

-f[no-]sycl-dead-args-optimization

Enables/disables LLVM IR dead argument elimination pass to remove unused arguments for the kernel functions before translation to SPIR-V.

-fsycl-device-code-split=<mode>

Specifies SYCL device code module assembly.

-f[no-]sycl-device lib=<lib1> [,<lib2>,...]

Enables/disables linking of the device libraries.

-fsycl-device-only

Compile only device part of the code and ignore host part.

-fsycl-help[=backend]

Emit help information from device compiler backend.

-fsycl-host-compiler=<arg>

Informs the compiler driver that the host compilation step that is performed as part of the greater compilation flow will be performed by the compiler <arg>.

-fsycl-host-compiler options="opts"

The options used here are compatible with the compiler specified via ‑fsycl-host-compiler=<arg>.

For more SYCL-specific compiler options along with description and some examples refer to the Users Manual.

hipSYCL

hipSYCL is a SYCL compiler targeting AMD and NVIDIA GPUs. hipSYCL is a modern SYCL implementation targeting CPUs and GPUs, with a focus on utilizing existing toolchains such as CUDA or HIP. hipSYCL supports compiling source files into a single binary that can run on all these backends when building against appropriate Clang distributions.

The hipSYCL compiler can not only compile SYCL code, but also CUDA/HIP code even if they are mixed in the same source file, making all CUDA/HIP features—such as the latest device intrinsics—also available from SYCL code. Additionally, vendor-optimized template libraries such as rocPRIM or CUB can also be used with hipSYCL. Consequently, hipSYCL allows for highly optimized code paths in SYCL code for specific devices.

Includes two components, SYCL runtime on top of CUDA/HIP runtime and compiler plugin, to compile SYCL using CUDA frontend of Clang.

hipSYCL currently targets the following devices:

  • Any CPU via OpenMP*
  • NVIDIA GPUs via CUDA
  • Using Clang's CUDA toolchain
  • As a library for NVIDIA's nvc++ compiler
  • AMD GPUs via HIP/ROCm
  • Intel GPUs via oneAPI Level Zero and SPIR-V*

The compiler can be downloaded from hipSYCL.

hipSYCL for NVIDIA

Dependencies Required

In order to successfully build and install hipSYCL, the following dependencies must be installed for all backends:

  • Python 3 (for the syclcc and syclcc-clang compiler wrappers)
  • CMake
  • The Boost C++ libraries (in particular boost.fiber, boost.context and for the unit tests boost.test)
    • It may be helpful to set the BOOST_ROOT CMake variable to the path to the root directory of Boost that you want to use if CMake does not find it automatically
  • LLVM and Clang
  • CUDA follow these instructions
  • NVIDIA CUDA GPUs

Build and Set Up the Compiler

Once the software requirements mentioned above are met, clone the repository:

git clone https://github.com/illuhad/hipSYCL

Then, create a build directory and compile hipSYCL. As described below, some backends and compilation flows must be configured with specific CMake arguments, which should be passed during the CMake step.

sudo apt install llvm-12
sudo apt install clang-12
sudo apt install libclang-12-dev

cd <build directory>

cmake .. -DCMAKE_INSTALL_PREFIX=<installation prefix> -DLLVM_DIR=<llvm path> -DCLANG_INCLUDE_PATH=<clang path> <more optional options> <hipSYCL source directory> -DWITH_CUDA_BACKEND=ON

make install

Build SYCL Example

  1. To build  and vector_add using hipSYCL on NVIDIA GPU, use:
bin/syclcc -O2 --hipsycl-targets="cuda:sm_70" -o vector_add_hip_cuda ./vector_add.cpp
  1. To run the vector_add, use:
./vector_add_hip_amd

hipSYCL for AMD

Dependencies Required

In order to successfully build and install hipSYCL, the following dependencies must be installed for all backends:

  • Python 3 (for the syclcc and syclcc-clang compiler wrappers)
  • CMake
  • The Boost C++ libraries (in particular boost.fiber, boost.context and for the unit tests boost.test)
  • It may be helpful to set the BOOST_ROOT CMake variable to the path to the root directory of Boost that you want to use if CMake does not find it automatically
  • LLVM and Clang
  • ROCm follow these instructions
  • AMD GPUs that are supported by ROCm

Build and Set Up the Compiler

Once the software requirements mentioned above are met, clone the repository:

git clone https://github.com/illuhad/hipSYCL

Then, create a build directory and compile hipSYCL. As described below, some backends and compilation flows must be configured with specific CMake arguments, which should be passed during the CMake step.

sudo apt install llvm-12
sudo apt install clang-12
sudo apt install libclang-12-dev

cd <build directory>

cmake .. -DCMAKE_INSTALL_PREFIX=<installation prefix> -DLLVM_DIR=<llvm path> -DCLANG_INCLUDE_PATH=<clang path> <more optional options> <hipSYCL source directory> -DWITH_ROCM_BACKEND=ON

make install

Build SYCL Example

  1. To build vector_add using hipSYL on AMD GPU, use:
$ bin/syclcc -O2 –hipsycl-targets=”hip:gfx908” -o vector_add_hip_amd ./vector_add.cpp
  1. To run the vector_add, use:
/vector_add_hip_amd

hipSYCL Compiler Options

The following table shows the list of options supported by hipSYCL compiler:

--hipsycl-platform=<value>

The platform that hipSYCL should target. Valid values: cuda, rocm, cpu.

--hipsycl-clang=<value>

The path to the Clang executable that should be used for compilation.

--hipsycl-cuda-path=<value>

The path to the CUDA* toolkit installation directory.

--hipsycl-rocm-path=<value>

The path to the ROCm installation directory.

--hipsycl-gpu-arch=<value>

The GPU architecture that should be targeted when compiling for GPUs.

--hipsycl-clang-include-path=<value>

The path to Clang's internal include headers.

--hipsycl-targets=<value>

Specify backends and targets to compile for. Available backends: omp, cuda, cuda-nvcxx, spirv, HIP.

--hipsycl-version

Print hipSYCL version and configuration.

--help

Print this help message.

For more compiler options refer to Manually compiling with syclcc.

References

ข้อมูลผลิตภัณฑ์และประสิทธิภาพ

1

ประสิทธิภาพจะแตกต่างกันไปตามการใช้งาน การกำหนดค่า และปัจจัยอื่นๆ เรียนรู้เพิ่มเติมที่ www.Intel.com/PerformanceIndex