2026年2月11日 星期三

ROCm 6.4.4 & pytorch 2.9.1 gfx803 / gfx900 build note

 後來想一想前篇留著當筆記, pytorch 2.9.1 再開一篇文寫.

以下筆記我自己實驗出可用的版本, ubuntu 24.04.3, ROCm 6.4.4, pytorch 2.9.1, 目標 gfx803, gfx900. 以下都可以在虛擬機裡編譯, 不需要真的有 AMD GPU. 之所以用 ROCm 6.4.4 不用 7.1.1 | 7.2.0 是因為這是最後一版支援 gfx803 (RX460/470/480/560/570/580) 的 ROCm, 也是最後一版支援 WSL 的 ROCm, 所以這篇就以 ROCm 6.4.4 (6.x 最後一版), pytorch 2.9.1 為主. 還是跟前篇一樣, 要編哪一版 pytorch 就要注意它是對到哪一版 ROCm.

ps. ROCm 7.1.1 可以支援在 ryzen 3000 開始的 APU (代號 gfx902), 但沒實驗出讓它跑 code 的辦法.

#首先 /etc/default/grub 裡替開機參數加東西, GRUB_CMDLINE_LINUX_DEFAULT 後面加上 "intel_iommu=on, iommu=pt", intel_iommu 是對 intel 用的, AMD ryzen 體系的預設 amd_iommu 都是 on, ryzen 之前的要看板子能不能開 IOMMU.

# 環境變數裡加東西, 下次登入時就生效. ps. 加在 /etc/environment.d/ 裡無效, 理由不明.

# Add GFX803 related variables
echo "ROC_ENABLE_PRE_VEGA=1" | sudo tee -a /etc/environment
echo "HSA_OVERRIDE_GFX_VERSION=8.0.3" | sudo tee -a /etc/environment

安裝 miniconda 3. conda init 是在 $LOGUSER , 特別記清處 rocm-build 這個字, 後面會很常用上

Miniconda 3

# Download and Install Miniconda 3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -P /tmp/
sudo bash /tmp/Miniconda3-latest-Linux-x86_64.sh -b -p /opt/conda
sudo sed -i 's|^PATH="|PATH="/opt/conda/bin:|' /etc/environment

# Logout to apply environment
exit

# Logged in
conda init
source ~/.bashrc
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r
conda create -n rocm-build -y python=3.12
conda activate rocm-build

Install Prerequisites

sudo apt install -y build-essential ccache git libjpeg-dev \
libjpeg-turbo8-dev libpng-dev libmsgpack-dev libssl-dev \
python3-virtualenv libboost-dev libboost1.83-dev libmsgpack-cxx-dev \
ninja-build

ROCm 6.4.4

# Setup AMD repository
wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | gpg --dearmor \
| sudo tee /etc/apt/keyrings/rocm.gpg > /dev/null

echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] \
https://repo.radeon.com/rocm/apt/6.4.4 noble main" | sudo tee \
--append /etc/apt/sources.list.d/rocm.list

echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: \
600' | sudo tee /etc/apt/preferences.d/rocm-pin-600

sudo apt update

# Install ROCm 6.4.4
sudo apt install rocm rocm-developer-tools rocm-ml-sdk \
rocm-ml-libraries rocm-hip-sdk rocm-hip-libraries

sudo sed -i 's|^PATH="|PATH="/opt/rocm/bin:|' /etc/environment

echo "ROCM_PATH=/opt/rocm" | sudo tee -a /etc/environment

# Logout to apply environment
exit

此時如果要修改 rocm 安裝版本, 就要在 /etc/apt/sources.list.d/rocm.list 中修改版本數字跟 ubuntu 發行主版號

#sudo nano /etc/apt/sources.list.d/rocm.list 

ps. 到寫這篇前還沒看到支援 ubuntu 26.04

rocBLAS 

# Re-enable conda environment
conda activate rocm-build

# Download rocBLAS
sudo mkdir /opt/rocBLAS

sudo chown $LOGNAME: /opt/rocBLAS

git clone --recursive https://github.com/ROCm/rocBLAS.git -b \
rocm-6.4.4 /opt/rocBLAS

cd /opt/rocBLAS

# Install required packages
pip install "cmake<4.0" joblib pyyaml virtualenv \
typing-extensions

# Build rocBLAS
time ./install.sh -a "gfx803;gfx900;gfx90a;gfx942;gfx1100" \
-b rocm-6.4.4

# Copy compiled library
sudo rsync -vrh /opt/rocBLAS/build/release/rocblas-install\
/lib/rocblas/library/ /opt/rocm/lib/rocblas/library/

SRC=$(sudo find \
/opt/rocBLAS/build/release/rocblas-install/lib/ \
-type f -name "librocblas.so.*")

TGT=$(sudo find /opt/rocm/ -type f -name "librocblas.so*")

LIST=$(find /opt/rocm/ -type l -name "librocblas.so*")

sudo rsync -vrh \
/opt/rocBLAS/build/release/rocblas-install/lib/rocblas/library/ \
/opt/rocm/lib/rocblas/library/

sudo cp -f ${SRC} ${TGT}
for d in ${LIST}; do sudo ln -sf ${TGT} ${d}; done

rocBLAS 這裡新加一個很重要的參數 -b, -b 參數是指定 Tensile branch, 這個最好跟 rocBLAS branch 一樣指到哪版就指定哪版, 否則預設會去抓最新版, 最新版往往要不根本編不起來, 要不根本不存在. 我在寫這篇文的時候編譯時 Tensile 指到 4.45, github 上只 release 4.43. 這裡只能更確定做這些東西的人根本沒試過能不能正常安裝就丟出來了.

rocSOLVER

ps. rocSOLVER ROCm 7.0 開始才能指定 Tensile 版本, 如果是 7.0 之後的版本就建議跟 rocBLAS 一樣 install.sh 要多下一個 -b 指定 Tensile 版號.

(update: ROCm 7.0以上特有) rocSOLVER的編法很奇怪, 最好特別連進去 ssh console 裡再操作 install.sh, 否則不會動作.

# Download rocSOLVER
sudo mkdir /opt/rocSOLVER
sudo chown $LOGNAME: /opt/rocSOLVER
git clone --recursive https://github.com/ROCm/rocSOLVER.git -b \
rocm-6.4.4 /opt/rocSOLVER
cd /opt/rocSOLVER

# Build rocBLAS
time ./install.sh -a "gfx803;gfx900;gfx90a;gfx942;gfx1100"


# Copy compiled library
SRC=$(sudo find \
/opt/rocSOLVER/build/release/rocsolver-install/lib/ \
-type f -name "librocsolver.so.*")

TGT=$(sudo find /opt/rocm/ -type f -name "librocsolver.so*")

LIST=$(find /opt/rocm/ -type l -name "librocsolver.so*")

sudo cp -f ${SRC} ${TGT}
for d in ${LIST}; do sudo ln -sf ${TGT} ${d}; done

# Reboot 

如果你是在目標機器上編的話才要 reboot.

PyTorch 2.9.1

# Re-enable conda environment
conda activate rocm-build

# Download PyTorch
sudo mkdir /opt/pytorch
sudo chown $LOGNAME: /opt/pytorch
git clone --recursive https://github.com/pytorch/pytorch.git -b v2.9.1 /opt/pytorch
cd /opt/pytorch

# Install required packages
pip install mkl-static mkl-include -r requirements.txt

# Build PyTorch
export PYTORCH_ROCM_ARCH="gfx803;gfx900"
export PYTORCH_BUILD_VERSION=2.9.1 PYTORCH_BUILD_NUMBER=1
python tools/amd_build/build_amd.py
time python setup.py bdist_wheel

# Install PyTorch
pip install /opt/pytorch/dist/torch-2.9.1-cp312-cp312-linux_x86_64.whl

ps1. torchVision 要編譯 0.24.1 , pytorch 也是有對應版號的
https://github.com/pytorch/vision

ps2. torchAudio 對應版號是 2.9.1 

之後就比照 https://github.com/NULL0xFF/rocm-gfx803?tab=readme-ov-file 這邊筆記操作.
後續有實驗成功的版本我會再貼上來.

update: 操蛋的, 花了我三個禮拜試一堆組合後終於可以動了....

MNIST PyTorch example

  1. Clone the PyTorch examples repository.

    git clone https://github.com/pytorch/examples.git
    
  2. Go to the MNIST example folder.

    cd examples/mnist

以上引用自 https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/3rd-party/pytorch-install.html

update2: rocm 7.0 開始擋掉對 gfx803 的支援.


2026年2月6日 星期五

ROCm 6.3.3 & pytorch 2.4.1 gfx803 / gfx900 build note

這篇只是筆記, 主要引述自(NULL0xFF / rocm-gfx803)這裡的編譯指引, 這篇主要針對 ubuntu 22.04 的環境使用, 對的是 ROCm 6.1.5, 所以 pytorch 最好的配對版本是 2.4.1, pytorch 對 ROCm 的版本基本不會是自由配對, 可以參考這裡, 而 pytorch 每個版本編譯情況不一樣, 通常要選好特定的版本下手比較穩.. 

以下筆記我自己實驗出可用的版本, ubuntu 24.04.3, ROCm 6.3.3, pytorch 2.4.1, 目標 gfx803, gfx900. 以下都可以在虛擬機裡編譯, 不需要真的有 AMD 的 GPU.

# 環境變數裡加東西, 下次登入時就生效. ps. 加在 /etc/environment.d/ 裡無效, 理由不明.

# Add GFX803 related variables
echo "ROC_ENABLE_PRE_VEGA=1" | sudo tee -a /etc/environment
echo "HSA_OVERRIDE_GFX_VERSION=8.0.3" | sudo tee -a /etc/environment

安裝 miniconda 3. conda init 是在 $LOGUSER 下, 特別記清處 rocm-build 這個字, 後面會很常用上

# Download and Install Miniconda 3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -P /tmp/
sudo bash /tmp/Miniconda3-latest-Linux-x86_64.sh -b -p /opt/conda
sudo sed -i 's|^PATH="|PATH="/opt/conda/bin:|' /etc/environment

# Logout to apply environment
exit

# Logged in
conda init
source ~/.bashrc
conda create -n rocm-build -y python=3.12
conda activate rocm-build

Install Prerequisites

sudo apt install -y build-essential ccache git libjpeg-dev libjpeg-turbo8-dev libpng-dev libmsgpack-dev libssl-dev python3-virtualenv libboost-dev libboost1.83-dev libmsgpack-cxx-dev 

ps. ROCm 6.3 最後一版是 6.3.4, 但 rocblas 關聯 ROCm 6.3 的部份只到 6.3.3, 所以以下 ROCm 使用 6.3.3

Install ROCm (基本到 7.2.0 也一樣)

ROCm 6.3.3

# Setup AMD repository
wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | gpg --dearmor | sudo tee /etc/apt/keyrings/rocm.gpg > /dev/null
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/6.3.3 noble main" | sudo tee --append /etc/apt/sources.list.d/rocm.list
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
sudo apt update

# Install ROCm 6.3.3
sudo apt install rocm rocm-developer-tools rocm-ml-sdk rocm-ml-libraries rocm-hip-sdk rocm-hip-libraries
sudo sed -i 's|^PATH="|PATH="/opt/rocm/bin:|' /etc/environment
echo "ROCM_PATH=/opt/rocm" | sudo tee -a /etc/environment

# Logout to apply environment
exit

此時如果要修改 rocm 安裝版本, 就要在 /etc/apt/sources.list.d/rocm.list 中修改版本數字跟 ubuntu 發行主版號: 

#sudo nano /etc/apt/sources.list.d/rocm.list 

ps. 到寫這篇前還沒看到支援 ubuntu 26.04

rocBLAS 

# Re-enable conda environment
conda activate rocm-build

# Download rocBLAS
sudo mkdir /opt/rocBLAS
sudo chown $LOGNAME: /opt/rocBLAS
git clone --recursive https://github.com/ROCm/rocBLAS.git -b rocm-6.3.3 /opt/rocBLAS
cd /opt/rocBLAS

# Install required packages
pip install "cmake<4.0" joblib pyyaml

# Build rocBLAS
time ./install.sh -a "gfx803;gfx900;gfx90a;gfx942;gfx1100"

# Copy compiled library
sudo rsync -vrh /opt/rocBLAS/build/release/rocblas-install/lib/rocblas/library/ /opt/rocm/lib/rocblas/library/
sudo cp -f $(find /opt/rocBLAS/build/release/rocblas-install/lib/ -type f -name "librocblas.so.*") $(find /opt/rocm/ -type f -name "librocblas.so*")$(find /opt/rocBLAS/build/release/rocblas-install/lib/ -type f -name "librocblas.so.*") $(find /opt/rocm/ -type f -name "librocblas.so*")

rocBLAS 這裡會編譯從 ROCm 6.0 開始就被拿掉的 gfx803, gfx900 hsaco 等 kernel 檔案, 以及把檔案關聯放進 librocblas.so 裡 (ROCm 5.0 開始 gfx803, gfx900 有檔案但沒有相關連結), 因為如果全部的 GPU 都塞進來的話, 會在後面編 pytorch ldd 時因為 so 檔太大無法成功連結, 所以選好自己想要編進去的東西. 而 gfx90a/942/1100 這三個是 pytorch 要的, 不是我想加的...


rocSOLVER的編法很奇怪, 最好特別連進去 ssh console 裡再操作 install.sh, 否則不會動作.

rocSOLVER

# Download rocBLAS
sudo mkdir /opt/rocSOLVER
sudo chown $LOGNAME: /opt/rocSOLVER
git clone --recursive https://github.com/ROCm/rocSOLVER.git -b rocm-6.3.3 /opt/rocSOLVER
cd /opt/rocSOLVER

# Build rocBLAS
time ./install.sh -a "gfx803;gfx900;gfx90a;gfx942;gfx1100"

# Copy compiled library
sudo cp -f $(find /opt/rocBLAS/build/release/rocblas-install/lib/ -type f -name "librocblas.so.*") $(find /opt/rocm/ -type f -name "librocblas.so*")echo $(find ./build/release/rocsolver-install/lib/ -type f -name "librocsolver.so.*") $(find /opt/rocm/ -type f -name "librocsolver.so*")

# Reboot 

如果你是在目標機器上編的話才要 reboot.

Install PyTorch

PyTorch 2.4.1 (可以自己改 2.7.1)

# Re-enable conda environment
conda activate rocm-build

# Download PyTorch
sudo mkdir /opt/pytorch
sudo chown $LOGNAME: /opt/pytorch
git clone --recursive https://github.com/pytorch/pytorch.git -b v2.4.1 /opt/pytorch
cd /opt/pytorch

# Install required packages
pip install mkl-static mkl-include ninja -r requirements.txt

# Build PyTorch
export PYTORCH_ROCM_ARCH="gfx803;gfx900"
export PYTORCH_BUILD_VERSION=2.4.1 PYTORCH_BUILD_NUMBER=1
python tools/amd_build/build_amd.py
time python setup.py bdist_wheel

# Install PyTorch
pip install /opt/pytorch/dist/torch-2.4.1-cp312-cp312-linux_x86_64.whl

ps1. 2.5.0 & 2.5.1 我嘗試編過, 失敗, 2.6我沒試, 2.7.1可以順利編過, 2.8.0 檔案下載 SHA256 有問題我找不到可以改哪裡. 以上也可以自己改成 2.7.1 跑.

ps2. 2.4.x & 2.5.x 編完記得把 ~/.triton/cache ~/.triton/dump 清一清.

ps3. 如果裝的是 pytorch 2.4.1 的話 torchvision 請用 0.19, pytorch 2.7.1 的話請用 0.22, 參考 https://github.com/pytorch/vision

之後就比照 https://github.com/NULL0xFF/rocm-gfx803?tab=readme-ov-file 這邊筆記操作.

後續有實驗成功的版本我會再貼上來.

update: pytorch 2.10.0 有編起來了, 後續 torchvision & pytorch 測試會以 2.10.0 優先, 除非 2.10.0 碰到什麼釘子我再回來 2.7 或 2.4 試試看.