Eval bug: [hexagon-npu]Unable to open NPU device, err: 0x80000406, uri file:///libhexagon_npu_skel_v79.so?npu_device_skel_handle_invoke&_modver=1.0&_dom=cdsp

Name and Version

build: 5363 (c2b6fec) with Android (11349228, +pgo, +bolt, +lto, -mlgo, based on r487747e) clang version 17.0.2 (https://android.googlesource.com/toolchain/llvm-project d9f89f4d16663d5012e5c09495f3b30ece3d2362) for x86_64-unknown-linux-gnu (debug)

Operating systems

Linux

GGML backends

CPU

Hardware

snapdragon 8 elite
16G memory

Models

qwen2-1_5b-instruct-fp16.gguf

Problem description & steps to reproduce

Problem Description

I encountered an issue while running the following command:

./llama-cli -m ../../qwen2-1_5b-instruct-fp16.gguf

The error message is: "Unable to open NPU device, err: 0x80000406, uri file:///libhexagon_npu_skel_v79.so?npu_device_skel_handle_invoke&_modver=1.0&_dom=cdsp"

Troubleshooting Steps Taken

I've placed the libhexagon_npu_skel_v79.so file in the following directories:

/data/local/tmp
/data/local/tmp/20250427npuLlamacpp
/etc/vendor/
/etc/vendor/lib

Additionally, I've set the following environment variables:

export ADSP_LIBRARY_PATH=/data/local/tmp
export LD_LIBRARY_PATH=/data/local/tmp

However, the command still fails to execute properly when run again.

Compilation Script

Here is my compilation script:

#!/bin/bash

mkdir -p ./build_qnn
rm -rf ./build_qnn/*
cd ./build_qnn
set -e
export ANDROID_NDK_ROOT=/home/android-ndk-r26c
export QNN_SDK_PATH=/home/2.29.0.241129
export HEXAGON_SDK_PATH="/home/6.1.0.1"
export QNN_DEFAULT_LIB_SEARCH_PATH=/data/local/tmp/20250427npuLlamacpp
export OUTPUT_DIR=/home/20250427npuLlamacpp
echo $ANDROID_NDK_ROOT
echo $QNN_SDK_PATH
echo $HEXAGON_SDK_PATH
echo $QNN_DEFAULT_LIB_SEARCH_PATH 

source $QNN_SDK_PATH/bin/envsetup.sh
source $HEXAGON_SDK_PATH/setup_sdk_env.source
cmake -H.. -B. \
  -DANDROID_NDK="$ANDROID_NDK_ROOT" \
  -DCMAKE_TOOLCHAIN_FILE="$ANDROID_NDK_ROOT/build/cmake/android.toolchain.cmake" \
  -DGGML_QNN_DEFAULT_LIB_SEARCH_PATH="$QNN_DEFAULT_LIB_SEARCH_PATH" \
  -DGGML_QNN_SDK_PATH="$QNN_SDK_PATH" \
  -DGGML_QNN_ENABLE_HEXAGON_BACKEND=on \
  -DGGML_QNN=ON \
  -DANDROID_ABI="arm64-v8a" \
  -DBUILD_SHARED_LIBS=OFF \
  -DGGML_OPENMP=OFF \
  -DLLAMA_CURL=OFF \
  -DANDROID_PLATFORM=28 \
  -DCMAKE_BUILD_TYPE=Debug \
  -DCMAKE_HAVE_LIBC_PTHREAD=ON \
  -DGGML_QNN_ENABLE_PERFORMANCE_TRACKING=on \
  -DGGML_QNN_ENABLE_CPU_BACKEND=on \
  -DGGML_CCACHE=OFF \

cmake --build . --config Debug -- -j$(nproc)


# Copy the output files to the output directory
chmod -R u+rw $OUTPUT_DIR
rsync -av ./bin/llama-cli $OUTPUT_DIR
rsync -av ./bin/test-backend-ops $OUTPUT_DIR
rsync -av ./bin/*.so $OUTPUT_DIR

function build_hexagon_libs() {
    local dsp_arch=$1
    local build_sim=$2

    local postfix=''
    if [ "$build_sim" = "0" ]; then
        build_type='hexagon'
    else
        build_type='hexagonsim'
        postfix='_sim'
    fi

    echo "Building ${build_type} libs for $dsp_arch"

    rm -rf ./hexagon_*
    build_cmake ${build_type} DSP_ARCH=$dsp_arch BUILD=$HEXAGON_BUILD_TYPE VERBOSE=1 TREE=1 -- -j$(nproc)
    rsync -av ./hexagon_${HEXAGON_BUILD_TYPE}_toolv88_${dsp_arch}/libhexagon_npu_skel_${dsp_arch}.so $OUTPUT_DIR/libhexagon_npu_skel_${dsp_arch}${postfix}.so
}


echo "Building hexagon package"
cd ../ggml/src/ggml-qnn/npu

HEXAGON_BUILD_TYPE="Debug"
# build_hexagon_libs v73 0
# build_hexagon_libs v73 1

# build_hexagon_libs v75 0
# build_hexagon_libs v75 1

build_hexagon_libs v79 0

First Bad Commit

[hexagon-npu]Unable to open NPU device, err: 0x80000406, uri file:///libhexagon_npu_skel_v79.so?npu_device_skel_handle_invoke&_modver=1.0&_dom=cdsp

Relevant log output

/llama-cli -m ../../qwen2-1_5b-instruct-fp16.gguf < backend registry init [hexagon-npu]NPU device created skip qnn device 2 skip qnn device 1 skip qnn device 0 register_backend: registered backend qualcomm (1 devices) register_device: registered device hexagon-npu (Hexagon NPU) register_backend: registered backend CPU (1 devices) register_device: registered device CPU (CPU) build: 5363 (c2b6fec6) with Android (11349228, +pgo, +bolt, +lto, -mlgo, based on r487747e) clang version 17.0.2 (https://android.googlesource.com/toolchain/llvm-project d9f89f4d16663d5012e5c09495f3b30ece3d2362) for x86_64-unknown-linux-gnu (debug) main: llama backend init main: load the model and apply lora adapter, if any llama_model_loader: loaded meta data with 22 key-value pairs and 338 tensors from ../../qwen2-1_5b-instruct-fp16.gguf (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = qwen2 llama_model_loader: - kv 1: general.name str = qwen2-1_5b-instruct llama_model_loader: - kv 2: qwen2.block_count u32 = 28 llama_model_loader: - kv 3: qwen2.context_length u32 = 32768 llama_model_loader: - kv 4: qwen2.embedding_length u32 = 1536 llama_model_loader: - kv 5: qwen2.feed_forward_length u32 = 8960 llama_model_loader: - kv 6: qwen2.attention.head_count u32 = 12 llama_model_loader: - kv 7: qwen2.attention.head_count_kv u32 = 2 llama_model_loader: - kv 8: qwen2.rope.freq_base f32 = 1000000.000000 llama_model_loader: - kv 9: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 10: general.file_type u32 = 1 llama_model_loader: - kv 11: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 12: tokenizer.ggml.pre str = qwen2 llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 15: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 16: tokenizer.ggml.eos_token_id u32 = 151645 llama_model_loader: - kv 17: tokenizer.ggml.padding_token_id u32 = 151643 llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 151643 llama_model_loader: - kv 19: tokenizer.chat_template str = {% for message in messages %}{% if lo... llama_model_loader: - kv 20: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 21: general.quantization_version u32 = 2 llama_model_loader: - type f32: 141 tensors llama_model_loader: - type f16: 197 tensors print_info: file format = GGUF V3 (latest) print_info: file type = F16 print_info: file size = 2.88 GiB (16.00 BPW) load: special tokens cache size = 293 load: token to piece cache size = 0.9338 MB print_info: arch = qwen2 print_info: vocab_only = 0 print_info: n_ctx_train = 32768 print_info: n_embd = 1536 print_info: n_layer = 28 print_info: n_head = 12 print_info: n_head_kv = 2 print_info: n_rot = 128 print_info: n_swa = 0 print_info: n_swa_pattern = 1 print_info: n_embd_head_k = 128 print_info: n_embd_head_v = 128 print_info: n_gqa = 6 print_info: n_embd_k_gqa = 256 print_info: n_embd_v_gqa = 256 print_info: f_norm_eps = 0.0e+00 print_info: f_norm_rms_eps = 1.0e-06 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: f_attn_scale = 0.0e+00 print_info: n_ff = 8960 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: causal attn = 1 print_info: pooling type = 0 print_info: rope type = 2 print_info: rope scaling = linear print_info: freq_base_train = 1000000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 32768 print_info: rope_finetuned = unknown print_info: ssm_d_conv = 0 print_info: ssm_d_inner = 0 print_info: ssm_d_state = 0 print_info: ssm_dt_rank = 0 print_info: ssm_dt_b_c_rms = 0 print_info: model type = 1.5B print_info: model params = 1.54 B print_info: general.name = qwen2-1_5b-instruct print_info: vocab type = BPE print_info: n_vocab = 151936 print_info: n_merges = 151387 print_info: BOS token = 151643 '<|endoftext|>' print_info: EOS token = 151645 '<|im_end|>' print_info: EOT token = 151645 '<|im_end|>' print_info: PAD token = 151643 '<|endoftext|>' print_info: LF token = 198 'Ċ' print_info: EOG token = 151643 '<|endoftext|>' print_info: EOG token = 151645 '<|im_end|>' print_info: max token length = 256 load_tensors: loading model tensors, this can take a while... (mmap = true) load_tensors: CPU_Mapped model buffer size = 2944.68 MiB ...................................................................................... llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 4096 llama_context: n_ctx_per_seq = 4096 llama_context: n_batch = 2048 llama_context: n_ubatch = 512 llama_context: causal_attn = 1 llama_context: flash_attn = 0 llama_context: freq_base = 1000000.0 llama_context: freq_scale = 1 llama_context: n_ctx_per_seq (4096) < n_ctx_train (32768) -- the full capacity of the model will not be utilized [hexagon-npu]Unable to open NPU device, err: 0x80000406, uri file:///libhexagon_npu_skel_v79.so?npu_device_skel_handle_invoke&_modver=1.0&_dom=cdsp [hexagon-npu]Failed to init device llama_init_from_model: failed to initialize the context: failed to initialize hexagon-npu backend common_init_from_params: failed to create context with model '../../qwen2-1_5b-instruct-fp16.gguf' main: error: unable to load model FORTIFY: pthread_mutex_lock called on a destroyed mutex (0x589d5c5f20) Aborted

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

Problem Description

Troubleshooting Steps Taken

Compilation Script

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

Problem Description

Troubleshooting Steps Taken

Compilation Script

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions