8000 function call with llama cpp python · Issue #13 · MadeAgents/Hammer · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

function call with llama cpp python #13

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your accoun 8000 t

Open
HuntZhaozq opened this issue Apr 8, 2025 · 7 comments
Open

function call with llama cpp python #13

HuntZhaozq opened this issue Apr 8, 2025 · 7 comments

Comments

@HuntZhaozq
Copy link

Does this model support the function call using llama cpp python?

@linqq9
Copy link
Contributor
linqq9 commented Apr 8, 2025

I haven't tried this, so I'm not sure if llama-cpp-python supports directly calling the chat_template of the tokenizer in the same way as the Hugging Face Transformers library. If it doesn't support direct chat_template calls, the option might be to concatenate the prompt before passing it to llama-cpp-python.

@HuntZhaozq
Copy link
Author

The llama-cpp-python supports the function call. You can see in this link https://llama-cpp-python.readthedocs.io/en/latest/server/#function-calling. I tried the hammer2.1-3b-Q4.gguf in this way. But the results is not good. Can you have a try?

@nickvdw
Copy link
nickvdw commented Apr 8, 2025

I had success by specifying the chatml-function-calling chat format in the model instantiation. Specifically,

  • If I don't specify a chat format, the model will not run inference as it will leverages the default chat format which leads to a TypeError: can only concatenate str (not "list") to str.
  • If I specify the chatml chat format, the model will run inference but the generated response will be empty (e.g., \n[]\n).
  • If I select the chatml-function-calling chat format, the model will run inference and the generated response will be a function call from my list of specified tools.

My code ends up looking like something below:

from llama_cpp import Llama

# Instantiate the model.
llm = Llama.from_pretrained(
    repo_id="mradermacher/Hammer2.1-3b-i1-GGUF",
    filename="**i1-Q4_K_M**",
    chat_format="chatml-function-calling",
)

# Define the user prompt.
user_prompt = ...

# Specify the messages and list of tools.
messages = [
    {
        "role": "system",
        "content": [
            {
                "type": "text",
                "text": SYSTEM_PROMPT,
            },
        ],
    },
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": user_prompt,
            },
        ],
    },
]

tools = [
    {
        "type": "function",
        "function": {
            "name": "dummy_tool",
            "description": "A dummy tool for this MVP.",
            "parameters": {
                "type": "object",
                "properties": {
                    "input_text": {"type": "string", "description": "Text as input for the function."}
                },
                "required": ["input_text"],
            },
        },
    },
    ...
]

# Run inference.
response = llm.create_chat_completion(
    messages=messages,
    tools=tools,
)
>>> ```\nAction: {"name": "dummy_tool", "arguments": {"input_text": "Example of input text."}}\n```

For the system prompt, I'm leveraging smolagent's ToolCallingAgent system prompt (reference).

I haven't tried this on a larger scale yet, so I'm not fully sure whether it'll work robustly but I hope that this helps!

@HuntZhaozq
Copy link
Author
HuntZhaozq commented Apr 8, 2025

@nickvdw Thank you ! How about asking it the name of it ? Like these general query (not about the tool), I found it responses so bad. It seems it always call the tool.

@linqq9
Copy link
Contributor
linqq9 commented Apr 8, 2025

@HuntZhaozq hi, Maybe you can refer to this document llama-cpp-python chat-completion to set tokenizer.chat_template

@nickvdw
Copy link
nickvdw commented Apr 8, 2025

Thank you ! How about asking it the name of it ? Like these general query (not about the tool), I found it responses so bad. It seems it always call the tool.

Without specifying any tools nor any system prompt for mradermacher/Hammer2.1-3b-i1-GGUF (i1-Q4_K_M), the model generates the following text:

// Input prompt:
What is your name?

// With the `chatml` chat format:
>>> Hello! How can I assist you today?

// With the `chatml-function-calling` chat format:
>>> I am an AI assistant.
// Input prompt:
Describe the meaning of life.

// With the `chatml` chat format:
>>> Hello! How can I assist you today?

// With the `chatml-function-calling` chat format:
>>> The meaning of life is a philosophical question that has been debated for centuries. It is a fundamental question that seeks to understand the purpose and significance of existence. Different people have different beliefs and perspectives on the meaning of life, but some common themes include the search for happiness, fulfillment, and a connection to something greater than oneself. Some argue that the meaning of life is to seek knowledge and understanding, while others believe it is to contribute to society and make a positive impact. Ultimately, the meaning of life is a personal and subjective question that can vary greatly from person to person.

@HuntZhaozq
Copy link
Author

@nickvdw But when you specify the tools in the request and set the tool_choice="auto", the output is abnormal for these general queries. I try this and it outputs the null content and calling no tools.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
0