GitHub - tanjatang/Jetson-LMDeploy

Setup

since the lmdeploy is not for jetson, it need to install jetson device wheel and CUDA version, CMAKE version and lmdeploy version.

Install Miniconda

wget https://github.com/conda-forge/miniforge/releases/lastest/download/Miniforge3-Linux-aarch64.sh
bash Miniforge3-Linux-aarch64.sh

Setup conda

conda create -n lmdeploy python=3.8
conda activate lmdeploy

Install Pytorch for Jetson Orin Nano

Requirement

CUDA 11.8

Pytorch 2.1.0

JetPack >=5.1

Update to CUDA 11.8

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/arm64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda-tegra-repo-ubuntu2004-11-8-local_11.8.0-1_arm64.deb
sudo dpkg -i cuda-tegra-repo-ubuntu2004-11-8-local_11.8.0-1_arm64.deb
sudo cp /var/cuda-tegra-repo-ubuntu2004-11-8-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda

src: CUDA Toolkit 11.8 Downloads

PyTorch for Jetson

Installation

Jetpacket 5.12 - Python3.8 > https://developer.download.nvidia.cn/compute/redist/jp/v512/pytorch/torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl

sudo apt-get install python3-pip libopenblas-base libopenmpi-dev libomp-dev
pip install torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl

Installation for rapidjosn

git clone https://github.com/Tencent/rapidjson.git

cd rapidjson

mkdir build && cd build
cmake .. \
    -DRAPIDJSON_BUILD_DOC=OFF \
    -DRAPIDJSON_BUILD_EXAMPLES=OFF \
    -DRAPIDJSON_BUILD_TESTS=OFF
make -j4

sudo make install

Installation CMake Version

cd ~
wget https://github.com/Kitware/CMake/releases/download/v3.29.0-rc1/cmake-3.29.0-rc1-linux-aarch64.tar.gz
tar xf cmake-3.29.0-rc1-linux-aarch64.tar.gz && rm cmake-3.29.0-rc1-linux-aarch64.tar.gz

# rename the folder
mv cmake-3.29.0-rc1-linux-aarch64 cmake-3.29.0
cd cmake-3.29.0

# verify version
./bin/cmake --version

# set the path variable
export PATH=/home/{hostname}/cmake-3.29.0/bin:$PATH

# Verify again
cmake --version

Install LMDeploy-0.2.5

cd ~
git clone https://github.com/InternLM/lmdeploy.git
cd lmdeploy
git checkout c5f4014

Under ~/lmdeploy create a generation_jetson.sh

#!/bin/sh
builder="-G Ninja"

if [ "$1" == "make" ]; then
    builder=""
fi

cmake ${builder} .. \
    -DCMAKE_BUILD_TYPE=RelWithDebInfo \
    -DCMAKE_EXPORT_COMPILE_COMMANDS=1 \
    -DCMAKE_INSTALL_PREFIX=./install \
    -DBUILD_PY_FFI=ON \
    -DBUILD_MULTI_GPU=OFF \
    -DCMAKE_CUDA_FLAGS="-lineinfo" \
    -DUSE_NVTX=ON

Then using following command:

chmod +x generate_jetson.sh

sudo apt-get install ninja-build

mkdir build && cd build

../generate_jetson.sh

ninja install

Comment some dependency in the file requirements/runtime.txt

# torch<=2.1.2,>=2.0.0
# triton>=2.1.0,<=2.2.0

Install lmdeploy

cd ~/lmdeploy

pip install -e .[serve]

Quantization Model

AWQ Quantization

export HF_MODEL=./path/to/hf-model
export WORK_DIR=./paht/to/hf-model-4bit

lmdeploy lite auto_awq \
   $HF_MODEL \
  --calib-dataset 'ptb' \
  --calib-samples 128 \
  --calib-seqlen 2048 \
  --w-bits 4 \
  --w-group-size 128 \
  --work-dir $WORK_DIR

Turbomind Model

export WORK_DIR=./path/to/hf-model-4bit
export TM_DIR=./path/to/hf-model-turbomind

lmdeploy convert  model-type \
    $WORK_DIR \
    --model-format awq \
    --group-size 128 \
    --dst-path $TM_DIR

Evaluation

webui-genration

base-line

Device	llama2-chat-7b	mistral-instruction-7b
Jetson Orin Nano	(Memory: 3.89G) 1.03 tokens/s	(Memory: 4.16G) 0.96 tokens/s

Capture of text output

Question	llama2-chat-7b	mistral-instruction-7b
Hi, how are you?
whats the square root of 900?
can I get a recipie for french onion soup?

Turbomind

base-line

Device	llama2-chat-7b	mistral-instruction-7b
Jetson Orin Nano	(Memory: 5.1G) 13.36 tokens/s	(Memory: 4.9G) 12.61 tokens/s

Capture of text output

Question	llama2-chat-7b	mistral-instruction-7b
Hi, how are you?
whats the square root of 900?
can I get a recipie for french onion soup?

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Setup

Install Miniconda

Setup conda

Install Pytorch for Jetson Orin Nano

Update to CUDA 11.8

PyTorch for Jetson

Installation for rapidjosn

Installation CMake Version

Install LMDeploy-0.2.5

Quantization Model

AWQ Quantization

Turbomind Model

Evaluation

webui-genration

base-line

Capture of text output

Turbomind

base-line

Capture of text output

About

Uh oh!

Releases

Packages

tanjatang/Jetson-LMDeploy

Folders and files

Latest commit

History

Repository files navigation

Setup

Install Miniconda

Setup conda

Install Pytorch for Jetson Orin Nano

Update to CUDA 11.8

PyTorch for Jetson

Installation for rapidjosn

Installation CMake Version

Install LMDeploy-0.2.5

Quantization Model

AWQ Quantization

Turbomind Model

Evaluation

webui-genration

base-line

Capture of text output

Turbomind

base-line

Capture of text output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages