8000 Question about Hammer-2.1-7b chat template · Issue #7 · MadeAgents/Hammer · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Question about Hammer-2.1-7b chat template #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jc-ryan opened this issue Dec 13, 2024 · 11 comments
Open

Question about Hammer-2.1-7b chat template #7

jc-ryan opened this issue Dec 13, 2024 · 11 comments

Comments

@jc-ryan
Copy link
jc-ryan commented Dec 13, 2024

Congratulations on your new progress with Hammer-2.1! I have a question: I noticed that when using vllm serve, the tool call parser being used is hermes, but your training data uses a custom tool output format. How does the hermes XML template implement the parsing?

Additionally, I noticed that you modified the chat template of Qwen-2.5-coder-7B. I'm curious why you didn't directly use Qwen-2.5's own chat template for training (which already supports tool calls and appropriate Hermes output format)? Are there any additional considerations behind this decision?

@jc-ryan
Copy link
Author
jc-ryan commented Dec 13, 2024

Additionally, I'd like to ask: I noticed that the Model Card mentions "better multi-turn" capabilities, but the current training data, custom chat template, and the LlamaFactory training format used ({ "instruction": content, "input": "", "output": label}) don't seem to support multi-turn function calling? (Indeed, using Qwen2.5's own chat template would have made it easier to train for multi-turn function calling)

@jc-ryan
Copy link
Author
jc-ryan commented Dec 13, 2024

Congratulations on your new progress with Hammer-2.1! I have a question: I noticed that when using vllm serve, the tool call parser being used is hermes, but your training data uses a custom tool output format. How does the hermes XML template implement the parsing?

Additionally, I noticed that you modified the chat template of Qwen-2.5-coder-7B. I'm curious why you didn't directly use Qwen-2.5's own chat template for training (which already supports tool calls and appropriate Hermes output format)? Are there any additional considerations behind this decision?

Indeed, it doesn't work.
image

@linqq9
Copy link
Contributor
linqq9 commented Dec 13, 2024

Congratulations on your new progress with Hammer-2.1! I have a question: I noticed that when using vllm serve, the tool call parser being used is hermes, but your training data uses a custom tool output format. How does the hermes XML template implement the parsing?

Additionally, I noticed that you modified the chat template of Qwen-2.5-coder-7B. I'm curious why you didn't directly use Qwen-2.5's own chat template for training (which already supports tool calls and appropriate Hermes output format)? Are there any additional considerations behind this decision?
hi, we have found that if we want to use Hammer tool to output formats, we may need to submit a Pull Request (PR) to VLLM to create our own parsing. We tried it and found that Hermes can directly output the tool output format we defined without any processing, so we chose it.

Regarding why we didn't use Qwen - 2.5's own chat template for fine - tuning, when we were training Hammer 1.0, we found that attempting to switch to Qwen's own tool calls prompt didn't bring any benefits. Therefore, we didn't continue with that approach.

@linqq9
Copy link
Contributor
linqq9 commented Dec 13, 2024

Additionally, I'd like to ask: I noticed that the Model Card mentions "better multi-turn" capabilities, but the current training data, custom chat template, and the LlamaFactory training format used ({ "instruction": content, "input": "", "output": label}) don't seem to support multi-turn function calling? (Indeed, using Qwen2.5's own chat template would have made it easier to train for multi-turn function calling)

For Hammer 2.1, we have used some multi turn data mixing training, and we will also open source this data in the future

@linqq9
Copy link
Contributor
linqq9 commented Dec 13, 2024

Congratulations on your new progress with Hammer-2.1! I have a question: I noticed that when using vllm serve, the tool call parser being used is hermes, but your training data uses a custom tool output format. How does the hermes XML template implement the parsing?
Additionally, I noticed that you modified the chat template of Qwen-2.5-coder-7B. I'm curious why you didn't directly use Qwen-2.5's own chat template for training (which already supports tool calls and appropriate Hermes output format)? Are there any additional considerations behind this decision?

Indeed, it doesn't work. !

sorry, I misunderstood. Currently, it does not support calling the built-in parse directly, but you can use json.loads() to complete the parsing directly

@linqq9
Copy link
Contributor
linqq9 commented Dec 13, 2024

And the current custom chat template supports multi-turn function calling.

@jc-ryan
Copy link
Author
jc-ryan commented Dec 13, 2024

And the current custom chat template supports multi-turn function calling.

Oh I see, so the data_processing file should also be updated accordingly to use tokenizer.apply_chat_template(), right?

Also, was multi-turn function calling also fine-tuned using llama_factory? If so, it probably no longer uses the Alpaca data format as Hammer 2.0 did, right?

@linqq9
Copy link
Contributor
linqq9 commented Dec 13, 2024

Currently, we are only leveraging data processing files to augment single-turn data. Looking ahead, we'll also take into account the application of these files for multi-turn scenarios.

Our multi-turn function calling is fine-tuned with the Llama_factory. Initially, we had planned to conduct training based on the ShareGPT format. However, during the process, we discovered that Llama_factory appears to restrict even-numbered roles to be assistant. Given that the potential data might encompass diverse roles, we ultimately opted to train using the Alpaca data format, with the sole modification being made to the template.

@jc-ryan
Copy link
Author
jc-ryan commented Dec 13, 2024

Currently, we are only leveraging data processing files to augment single-turn data. Looking ahead, we'll also take into account the application of these files for multi-turn scenarios.

Our multi-turn function calling is fine-tuned with the Llama_factory. Initially, we had planned to conduct training based on the ShareGPT format. However, during the process, we discovered that Llama_factory appears to restrict even-numbered roles to be assistant. Given that the potential data might encompass diverse roles, we ultimately opted to train using the Alpaca data format, with the sole modification being made to the template.

Sorry, what I meant to say was that the template in data_processing.py is not exactly the same as the one in chat_template - for example, the template in data_processing.py doesn't include roles. Shouldn't we construct the fine-tuning data based on chat_template + tokenizer.apply_chat_template()?

@linqq9
Copy link
Contributor
linqq9 commented Dec 13, 2024

You're right. Constructing fine-tuning data based on chat_template + tokenizer.apply_chat_template() is indeed a great approach.

@jc-ryan
Copy link
Author
jc-ryan commented Dec 16, 2024

Ok, thanks, looking forward to the updated data_processing logic and vllm tool parser support~
(p.s. If using Qwen's chat template won't cause significant performance impact, you can save the effort of submitting a separate pull request for a hammer tool parser to the vllm official repo)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0