ROCM SDK Builder provides easy and convenient machine learning and GPU computing development environment by using single or multiple regular consumer level GPUs on PC and laptop environments.
It builds and integrates AMD¨s ROCm machine learning stack and other common ML Tools and models to easy to use environment. It is especially targeting the consumer Level discrete and integrated GPUs available on desktops and laptops.
Latest ROCM SDK Builder 6.1.2 release is based on to source code of AMD's ROCM 6.1.2 base libraries with additional patches to tune and add support for additional GPUs. In addition an increasing number of additional libraries and applications has been integrated on top of it.
Testing of different GPUs and Linux distributions is mainly done by a users and developers and tracked in tickets.
This project has been so far tested with the following AMD Radeon GPUs:
- AMD RX 7900 XTX (gfx1100)
- AMD RX 7800 XT (gfx1101)
- AMD RX 7700S/Framework Laptop 16 (gfx1102)
- AMD Radeon 780M Laptop iGPU (gfx1103)
- AMD RX 6800 / 6800 XT (gfx1030)
- AMD RX 6800 / 6600 XT (gfx1030)
- AMD RX 6700 / 6700 XT (gfx1031)
- AMD RX 6600 / 6600 XT (gfx1032)
- AMD RX 5700 / 5700 XT (gfx1010)
- AMD Radeon Pro V520 (gfx1011)
- AMD RX 5500 (gfx1012)
- AMD Radeon 680M Laptop iGPU (gfx1035)
- AMD Raphael iGPU (gfx1036) (desktops)
- AMD Strix Point iGPU (gfx1150) / (Experimental support, testing and feedback needed)
- AMD Strix Halo iGPU (gfx1151) / (Framework mini PC)
- Radeon VII (gfx906) / (Experimental support)
- MI50 CDNA (gfx906) / (Tested recently)
- MI100 CDNA (gfx908) / (Experimental support)
- MI210/250 CDNA (gfx90a) / (Experimental support)
- MI300A/MI300X CDNA (gfx940, gfx941, gfx942) / (Experimental support)
- AMD XDNA/XDNA2 NPU (experimental support, requires also xdna drivers to be patched to kernel)
Older GPUs having 8GB of memory or less may not be able to run the memory extensive benchmarks and applications but there are many application where they will still work well compared to CPU based operations.
Some GPU's have been benchmarked after building the ROCM SDK Builder with the benchmark available at https://github.com/lamikr/pytorch-gpu-benchmark/
Tested and officially supported Linux distributions: (rocm sdk builder 6.1.2)
- Fedora 40
- Ubuntu 24.04
- Ubuntu 22.04 (python 3.10 is build instead of 3.11)
- Mageia 9
- Arch Linux
- Manjaro Linux
- Void Linux
- Mint Linux 21
- LMDE 6
Thanks by the many users and developers who have contributed to ROCM SDK Builder, the list of supported Linux distros have increased signigicatly since the rocm sdk builder 6.1.0 release. Manjaro and Arch Linux are rolling releases and therefore their status needs to be verified more often.
If you do not want to build the rocm sdk builder your self, you can download and install the docker image. There exist 3 docker images and you need to select a correct one depending which GPU you have:
https://hub.docker.com/r/lamikr/rocm_sdk_builder/tags
Install and test Docker image for RDNA3 GPUs for example with commands
# sudo su
# docker pull lamikr/rocm_sdk_builder:612_01_rdna3
# docker run -it --device=/dev/kfd --device=/dev/dri --group-add keep-groups docker.io/lamikr/rocm_sdk_builder:612_01_rdna3 bash
Detailed instructions for docker image download and usage available on file:
docs/notes/containers/howto_install_and_run_docker_image.txt
babs.sh is the command line interface that is used for most of the rocm sdk builder tasks. It provides an interface to control the download, patch, configure, build, install and update either single application or a list of applications.
Following set of commands below will download rocm sdk 6.1.2 project sources and then build and install it to directory /opt/rocm_sdk_612
# git clone https://github.com/lamikr/rocm_sdk_builder.git
# cd rocm_sdk_builder
# git checkout releases/rocm_sdk_builder_612
# ./install_deps.sh
# ./babs.sh -c
# ./babs.sh -i
# ./babs.sh -b
Below these commands are described more in detail.
# git clone https://github.com/lamikr/rocm_sdk_builder.git
# cd rocm_sdk_builder
# git checkout releases/rocm_sdk_builder_612
# ./install_deps.sh
In the end of the execution, install_debs.sh will check whether you have configured git and access to AMD GPU device driver properly. If either of these have problem, install_deps.sh will printout in the end instructions how to fix the problem.
Git user.name and email address configuration is needed because ROCM SDK builder uses git-am command for applying patches on top of the projects source code and git am itself requires that they are configured. This can be done for example in a following way if not alrady set up.
# git config --global user.name "John Doe"
# git config --global user.email johndoe@example.com
Access to GPU is not needed during the build time but it's needed while running the applications later on ROCM SDK Builder environment. Some users have tested this by building the environment on virtual linux machines which does not have access to their GPU's and then later installing the system to more isolated production environments where the devices does not have direct internet access.
Many of the files needs to build on libraries for each GPU separately, so for regular builds you should really select only your GPUs to save significant amout of build time.
Selections will be stored to build_cfg.user file. If this file will not exist, the selection dialog is also showed automaticlly before the many other babs.sh commands.
# ./babs.sh -c
Note that the babs.sh -i command is not strictly required. That command will download source code for all projects in advance instead of downloading them one by one while the build progress. This is a new feature starting from the rocm sdk builder 6.1.2.
Build is installed automatically to /opt/rocm_sdk_612 directory but it is possible to change this by specifying the INSTALL_DIR_PREFIX_SDK_ROOT environment variable in envsetup_user.sh file. Check the envsetup_user_template.sh for further information.
# ./babs.sh -i
# ./babs.sh -b
... get some good coffee beans... grind ... brew ... enjoy ... It will take usually 5-10 hours to get everything build from scratch depending your machine.
Babs.sh provides now as a new command since 6.1.2 version an update command which can be used on updating the source code in a way that it will check which projects has been updated. For updated projects the build dir is cleaned so that they are easy to rebuild.
# ./babs.sh -up
# ./babs.sh -b
Command will accept the git branch name where to checkout as an optional parameter. Note that update command does not check binfo files which does not belong to any blist files in binfo/extra directory.
Following examples show how to run various types of applications in the builder ROCM SDK BUILDER environment.
There is a lot of more examples under docs/examples folder related to different subjects but these help you to get started.
By default the pytorch audio supports the audio playback on Mac computers, but rocm sdk builder have patched it to do the playback also on Linux. At the moment implementation is done by using the ffmpeg but we are looking also for other alternavives like SDL for future.
# cd /opt/rocm_sdk_612/docs/examples/pytorch/audio
# ./pytorch_audio_play_effects.sh
rocminfo, amd-smi, rocm-smi and nvtop are usefull tools that can be used to query and monitor for example the memory usage, clock frequencies and temperatures.
Note that command 'source /opt/rocm_sdk_612/bin/env_rocm.sh' needs to be run once on each terminal to set up environment variables and patch correctly.
# rocminfo
This will print out details about your detected CPUs and GPUs that can be used for running applications in ROCM SDK environment.
# amd-smi metric
This will show how to build from source and then run the hello world type example application from document directory. Application loads HIP kernel to your GPU, sends data from the userspace to GPU where the kernel changes it and sends back to userspace. Userspace will then print the received data.
# source /opt/rocm_sdk_612/bin/env_rocm.sh
# cd /opt/rocm_sdk_612/docs/examples/hipcc/hello_world
# ./build.sh
You should expect to see a following output if the application can communicate with your GPU.
./hello_world
System minor: 3
System major: 10
Agent name: AMD Radeon Graphics
Kernel input: GdkknVnqkc
Executing GPU kernel task to increases each input character by one...
Kernel output: HelloWorld
Output string matched with the expected text: HelloWorld
Test ok!
Very simple benchmark that shows how to run the same math operation both in the CPU and GPU is available on as pytorch program which can be run on jupyter notebook. On CPU the expected time is usually around 20-30 seconds. It can be executed with these commands:
# source /opt/rocm_sdk_612/bin/env_rocm.sh
# cd /opt/rocm_sdk_612/docs/examples/pytorch
# ./pytorch_simple_cpu_vs_gpu_benchmark.sh
# source /opt/rocm_sdk_612/bin/env_rocm.sh
# cd /opt/rocm_sdk_612/docs/examples/pytorch
# ./pytorch_gpu_hello_world_jupyter.sh
ROCM SDK Builder has defines ai_tools.blist file which can be used to build 3 commonly known AI tool that can be used for running LLM models for chat type communication and image geneation. Application included are:
- llama.cpp
- VLLM
- statble-diffusion-webui
They can all utilize the AMD GPU's and can be build with the following commands.
# cd rocm_sdk_builder
# ./babs.sh -b binfo/extra/ai_tools.blist
# source /opt/rocm_sdk_612/bin/env_rocm.sh
# llama-server_rocm.sh
- llama.cpp will create server on address http://127.0.0.1:8080
- Launch your browser to access the server and test chat-ui: http://127.0.0.1:8080
# curl -L https://huggingface.co/Comfy-Org/stable-diffusion-v1-5-archive/resolve/main/v1-5-pruned-emaonly-fp16.safetensors --output-dir /opt/rocm_sdk_models/ -o sd-v1-5-pruned-emaonly-fp16.safetensors
# source /opt/rocm_sdk_612/bin/env_rocm.sh
# sd -M txt2img -p "a cat" -m /opt/rocm_sdk_models/sd-v1-5-pruned-emaonly-fp16.safetensors -o ~/Pictures/test_cat.png
- stable-diffusion.cpp will run and create the image of a cat and save it in the Pictures folder
# cd /opt/rocm_sdk_612/apps/stable-diffusion-webui
# ./webui.sh
- Stable Diffusion WebUI will create server on address http://127.0.0.1:7860/
- Launch your browser to access the server and test the image generation from text description: http://127.0.0.1:7860/
- On first launch the stable diffusion webui initialization and image generation takes some time, because it downloads the model file to /opt/rocm_sdk_models directory before initializing the web server for ui.
- Try to set "Sampling steps" in UI to be first less than 20, for example to 5. I have seen errors to generate images on many GPU's if it is using 20 steps.
# cd /opt/rocm_sdk_612/docs/examples/llm/vllm
# ./test_questions_and_answers.sh
Note: It may take a while for the first run to download the model on first time to /opt/rocm_sdk_models/vllm directory and then to run the question/answer example.
If you encounter the error: ModuleNotFoundError: No module named 'vllm'
, please rebuild the vllm module individually as described below.
If the issue persists and you see the message No files found to uninstall
, try forcing the reinstallation using the following command:
# cd rocm_sdk_builder
# pip install ./packages/whl/vllm+<version>.whl --force-reinstall --no-deps
This is updated from the example on https://github.com/Xilinx/llvm-aie/wiki/E2E-Linux-Example
Build application both for the xdna1 and xdna2 NPU if you have either of them.
# ./babs.sh -b binfo/extra/amd_aie.blist
# source /opt/rocm_sdk_612/bin/env_rocm.sh
# cd /opt/rocm_sdk_612/docs/examples/aie/store_pii
# ./build.sh
Then launch it for xdna1 or xdna2 NPU depending which one you have.
# cd /opt/rocm_sdk_612/docs/examples/aie/store_pii
# ./load_fivepi_xdna1.sh (or ./load_fivepi_xdna2.sh)
External Benchmark originally made for NVIDIA cards is available here. Upstream does not seems to be anymore active so the benchmark is now available here.
https://github.com/lamikr/pytorch-gpu-benchmark/
AMD GPU specific results have been run either on ROCM SDK Builder 6.11 or 6.12 environment. As the same commands for downloading the benchmarks will now run more tests than couple of years ago, the newer tests requiring much more memory, the benchmark has been tweaked to check the available GPU memory and then using that information to determine if to run all or only a subset of tests.
For many GPU's the benchmark results have improved significantly from 6.1.1 release thanks to improvements made to tuning made ROCBLAS, Composable Kernel and FFTW.
ROCM SDK Builder has integrated initial XDNA userspace software stack support for testing and application development purposes.
- peanu llvm-aie compiler for xdna/xdna2 NPUs
- xilinx xrt runtime required by xdna
- xrt-xdna plugin to manage the application load via xrt runtime to xdna
- test application to verify the functionality
In addition you will need to build kernel with xdna driver and xdna firmware as it is not yet available on any Linux distribution kernels. (expected to be in upstream kernel on 6.14)
Copy xdna firmware files to /lib/firmware/amdnpu folder if they are there yet. Little newer firmware version may be available on https://github.com/amd/xdna-driver
git clone https://kernel.googlesource.com/pub/scm/linux/kernel/git/firmware/linux-firmware.git
sudo mkdir /lib/firmware/amdnpu/1502_00/
sudo cp linux-firmware/amdnpu/1502_00/npu.sbin.1.5.2.380 /lib/firmware/amdnpu/1502_00/npu.sbin
...
do same also for other subfolders like 17f0_10, 17f0_11, 17f0_20
Build kernel from xdna kernel branch available on https://github.com/lamikr/linux
git clone https://github.com/lamikr/linux
cd linux
git checkout release/613rc7_xdna_driver
./kernel_build.sh
Reboot and check that amdxdna driver gets loaded
dmesg | grep amdxdna
If xdna driver worked and was able to load the firmware, you should see message like: "Initialized amdxdna_accel_driver"
If xdna driver failed to find the firmware, you should see a this type of message from kernel dmesg: "npu.sbin failed with error -2". In case of error, you may need to force the firmware files to get copied to initrd image. At least of Fedora 40 that can be done with command:
"dracut -f -i /lib/firmware /lib/firmware
Build userspace application
# ./babs.sh -b binfo/extra/amd-aie.blist
And test with command
# source /opt/rock_sdk_<version>/bin/env_rocm.sh
# example_noop_test /opt/rocm_sdk_612/apps/aie/tools/bins/1502_00/validate.xclbin
Which should printout following output:
Host test code start...
Host test code is creating device object...
Host test code is loading xclbin object...
Host test code is creating kernel object...
Host test code kernel name: DPU_PDI_0
Host code is registering xclbin to the device...
Host code is creating hw_context...
Host test code is creating kernel object...
Host test code allocate buffer objects...
Host test code sync buffer objects to device...
Host test code iterations (~10 seconds): 70000
Host test microseconds: 4640738
Host test average latency: 66 us/iter
TEST PASSED!
Get help from the commands
# ./babs.sh -h
Update the ROCM SDK Builder to latest version in current branch and then check if project specific binfo or patch directories have been updated. Do the source code checkout, apply patches and clean commands for all changed projects so that they can be rebuild. Check the end of the command output for instructions how to re-build the updated projects.
# ./babs.sh -up
Update the ROCM SDK Builder to latest version in git master branch and then check if project specific binfo or patch directories have been updated. Do the source code checkout, apply patches and clean commands for all changed projects so that they can be rebuild. Check the end of the command output for instructions how to re-build the updated projects.
# ./babs.sh -up master
Checkout and apply patches to all core projects
# ./babs.sh -ca
Checkout and apply patches to binfo/extra/ai_tools.blist
# ./babs.sh -ca binfo/extra/ai_tools.blist