-
Notifications
You must be signed in to change notification settings - Fork 11
feat: Add MBRD and ROUGE score technique for majority voting #115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a051182
to
4a09ea3
Compare
157c7e4
to
0e88479
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this, but this implementation needs work.
The existing code for simple majority voting and the new code for minimum Bayesian risk decoding represent two of many methods for majority voting. Both of these methods have sub-variants depending on what types of normalizers, similarity scores, etc. the users plugs into them. There are other forms of majority voting that we do not currently have code for. For example, one might want to cluster the model outputs, then return the best representative of every cluster.
Majority voting techniques are supposed to be model-agnostic. They work with many different general-purpose LLMs, as well as with LLMs and LoRA adapters fine-tuned for specific applications and even with composite workflows that call multiple models. We shouldn't tie our implementations to a specific model or to inference operations that directly call a model as opposed to running a workflow.
The ramifications of the above to this code are as follows:
- We should not remove the existing code for simple majority voting. There are cases where that method is a better fit than MBRD. For example, one might want to return a set of top-k answers with diversity among the answers chosen.
- The MBRD code should be in its own composite I/O processor, so that it can be used with multiple different models. This composite I/O processor should allow the user to specify important parameters such as the choice of similarity metric.
- There should not be a Boolean "majority_voting" flag in the base Granite 3.2 model's I/O processor. There are many types of majority voting. Users should enable majority voting by configuring the appropriate composite I/O processor and attaching that I/O processor to the I/O processor of an LLM, LoRA adapter, or composite inference workflow.
8250bb8
to
5617e58
Compare
Thanks @frreiss for the feedback in #115 (review) and the discussion we had on best direction with supporting majority voting. I have taken your feedback on board and updated accordingly. It is ready for review again. |
9348e82
to
0d0c909
Compare
No idea whats up with DCO. Tried rebasing and force pushing but not fixing it. |
one of the reverts is not signed off. Override should be okay here. |
I set DCO to pass, but I'm not seeing the tests run. Maybe it'd be best to do some squashing(?) |
All tests passed previously. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 little things inline that I think need fixing
In general, voting seems like internal experiments or examples more than a granite-io feature right now -- but I do see this as progress in the right direction.
src/granite_io/io/granite_3_2/input_processors/granite_3_2_input_processor.py
Outdated
Show resolved
Hide resolved
src/granite_io/io/granite_3_2/output_processors/granite_3_2_output_processor.py
Outdated
Show resolved
Hide resolved
src/granite_io/io/granite_3_2/output_processors/granite_3_2_output_processor.py
Show resolved
Hide resolved
Add Minimum Bayesian Risk Decoding (MBRD) which is a technique used for majority voting. Closes: #72 Co-authored-by: Ramón Fernandez Astudillo <ramon@astudillo.com> Co-authored-by: MD ARAFAT SULTAN <arafat.sultan@colorado.edu> Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Enables majority voting when chat completions called for Granite 3.2. Majority voting is standalone from other capabilities like thinking, rage etc. This is because need to add specifics to prompt to be able to get single final responses to be able to do similarity checking and identify the majority answer. Example python script 10000 also added. Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Some additional changes: - Fix prompt to show better results - Return only one result and not all completions - Update example to give better clarity on baclends supporting multiple completions Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
The MajorityVotingProcessor processor is replaced with adding majority voting into existing processors. It uses a flag instead like reasoning whioch encapsulates the implementation from the user. Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
This reverts commit 8d19541.
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Review comments: - #115 (review) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Review comments: - #115 (review) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Review comments: - #115 (review) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
So sorry... updating with main was an accident. |
I can revisit the recording problems as long as CI is passing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Some minor comments inline.
src/granite_io/io/granite_3_2/output_processors/granite_3_2_output_processor.py
Show resolved
Hide resolved
""" | ||
|
||
similarity_scores: list[float] = [] | ||
for _, x in enumerate(answers): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be for x in answers
, not sure why the linter didn't catch this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean by this. The enumnerate()
function will return counter and object (string in this case). If you don't account for the counter you end up with tuples like this: (0, 'str1') (1, 'str2') ....
tests/io/cassettes/test_voting/test_mbrd_majority_voting[backend_openai].yaml
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fred added comments that you might want to address, but otherwise you have approvals to merge.
PR #115 was merged and these are addressing some small nits that didn't make it into the merge. Review comments: #115 (review) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Thanks @frreiss and @markstur for the reviews and feedback. @frreiss I have addressed your feedback from #115 (review) in #139. |
Updates for feedback in PR #115
Minimum Bayesian Risk Decoding (MBRD) is a technique that when used with ROUGE score similarity shows good accuracy when performing majority voting of model multi completion outputs per prompt.
This PR includes the following:
MBRDMajorityVotingProcessor
composite processorCloses: #72