8000 feat: Add MBRD and ROUGE score technique for majority voting by hickeyma · Pull Request #115 · ibm-granite/granite-io · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

feat: Add MBRD and ROUGE score technique for majority voting #115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 10, 2025

Conversation

hickeyma
Copy link
Collaborator
@hickeyma hickeyma commented Mar 31, 2025

Minimum Bayesian Risk Decoding (MBRD) is a technique that when used with ROUGE score similarity shows good accuracy when performing majority voting of model multi completion outputs per prompt.

This PR includes the following:

  • Adds MBRD and ROUGE score algorithm
  • Add new Python script to demonstrate how to use majority voting
  • Add MBRDMajorityVotingProcessor composite processor

Closes: #72

@hickeyma hickeyma force-pushed the feat/add-mbrd branch 2 times, most recently from a051182 to 4a09ea3 Compare March 31, 2025 16:49
@hickeyma hickeyma changed the title feat: Improve majority voting using MBRD WIP: feat: Improve majority voting using MBRD Mar 31, 2025
@hickeyma hickeyma marked this pull request as draft March 31, 2025 17:34
@hickeyma hickeyma force-pushed the feat/add-mbrd branch 2 times, most recently from 157c7e4 to 0e88479 Compare April 2, 2025 10:46
@hickeyma hickeyma changed the title WIP: feat: Improve majority voting using MBRD feat: Improve majority voting using MBRD and ROGUE score Apr 2, 2025
@hickeyma hickeyma marked this pull request as ready for review April 2, 2025 13:08
@hickeyma hickeyma requested review from markstur and frreiss April 2, 2025 13:21
@markstur markstur changed the title feat: Improve majority voting using MBRD and ROGUE score feat: Improve majority voting using MBRD and ROUGE score Apr 2, 2025
Copy link
Collaborator
@frreiss frreiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this, but this implementation needs work.

The existing code for simple majority voting and the new code for minimum Bayesian risk decoding represent two of many methods for majority voting. Both of these methods have sub-variants depending on what types of normalizers, similarity scores, etc. the users plugs into them. There are other forms of majority voting that we do not currently have code for. For example, one might want to cluster the model outputs, then return the best representative of every cluster.

Majority voting techniques are supposed to be model-agnostic. They work with many different general-purpose LLMs, as well as with LLMs and LoRA adapters fine-tuned for specific applications and even with composite workflows that call multiple models. We shouldn't tie our implementations to a specific model or to inference operations that directly call a model as opposed to running a workflow.

The ramifications of the above to this code are as follows:

  • We should not remove the existing code for simple majority voting. There are cases where that method is a better fit than MBRD. For example, one might want to return a set of top-k answers with diversity among the answers chosen.
  • The MBRD code should be in its own composite I/O processor, so that it can be used with multiple different models. This composite I/O processor should allow the user to specify important parameters such as the choice of similarity metric.
  • There should not be a Boolean "majority_voting" flag in the base Granite 3.2 model's I/O processor. There are many types of majority voting. Users should enable majority voting by configuring the appropriate composite I/O processor and attaching that I/O processor to the I/O processor of an LLM, LoRA adapter, or composite inference workflow.

@hickeyma hickeyma force-pushed the feat/add-mbrd branch 2 times, most recently from 8250bb8 to 5617e58 Compare April 6, 2025 09:47
@hickeyma
Copy link
Collaborator Author
hickeyma commented Apr 6, 2025

Thanks @frreiss for the feedback in #115 (review) and the discussion we had on best direction with supporting majority voting. I have taken your feedback on board and updated accordingly.

It is ready for review again.

@hickeyma hickeyma requested a review from frreiss April 6, 2025 09:49
@hickeyma hickeyma changed the title feat: Improve majority voting using MBRD and ROUGE score feat: Add MBRD and ROUGE score technique for majority voting Apr 6, 2025
@hickeyma hickeyma force-pushed the feat/add-mbrd branch 2 times, most recently from 9348e82 to 0d0c909 Compare April 8, 2025 18:56
@hickeyma
Copy link
Collaborator Author
hickeyma commented Apr 8, 2025

No idea whats up with DCO. Tried rebasing and force pushing but not fixing it.

@markstur
Copy link
Collaborator
markstur commented Apr 8, 2025

No idea whats up with DCO. Tried rebasing and force pushing but not fixing it.

one of the reverts is not signed off. Override should be okay here.

@markstur
Copy link
Collaborator 10000
markstur commented Apr 8, 2025

No idea whats up with DCO. Tried rebasing and force pushing but not fixing it.

one of the reverts is not signed off. Override should be okay here.

I set DCO to pass, but I'm not seeing the tests run. Maybe it'd be best to do some squashing(?)

@hickeyma
Copy link
Collaborator Author
hickeyma commented Apr 8, 2025

I set DCO to pass, but I'm not seeing the tests run. Maybe it'd be best to do some squashing(?)

All tests passed previously.

Copy link
Collaborator
@markstur markstur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 little things inline that I think need fixing

In general, voting seems like internal experiments or examples more than a granite-io feature right now -- but I do see this as progress in the right direction.

hickeyma and others added 6 commits April 9, 2025 12:00
Add Minimum Bayesian Risk Decoding (MBRD) which is a technique used for
majority voting.

Closes: #72

Co-authored-by: Ramón Fernandez Astudillo <ramon@astudillo.com>
Co-authored-by: MD ARAFAT SULTAN <arafat.sultan@colorado.edu>
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Enables majority voting when chat completions called for
Granite 3.2. Majority voting is standalone from other
capabilities like thinking, rage etc. This is because need
to add specifics to prompt to be able to get single final
responses to be able to do similarity checking and identify
the majority answer.

Example python script
10000
 also added.

Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Some additional changes:
- Fix prompt to show better results
- Return only one result and not all completions
- Update example to give better clarity on baclends supporting multiple completions

Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
The MajorityVotingProcessor processor is replaced
with adding majority voting into existing processors. It uses a flag
instead like reasoning whioch encapsulates the implementation
from the user.

Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
hickeyma added a commit that referenced this pull request Apr 9, 2025
Review comments:
- #115 (review)

Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
@hickeyma hickeyma requested a review from markstur April 9, 2025 11:22
hickeyma added a commit that referenced this pull request Apr 9, 2025
Review comments:
- #115 (review)

Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Review comments:
- #115 (review)

Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
@markstur
Copy link
Collaborator
markstur commented Apr 9, 2025

So sorry... updating with main was an accident.

@markstur markstur dismissed their stale review April 9, 2025 21:17

I can revisit the recording problems as long as CI is passing

Copy link
Collaborator
@frreiss frreiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Some minor comments inline.

"""

similarity_scores: list[float] = []
for _, x in enumerate(answers):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be for x in answers, not sure why the linter didn't catch this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean by this. The enumnerate() function will return counter and object (string in this case). If you don't account for the counter you end up with tuples like this: (0, 'str1') (1, 'str2') ....

Copy link
Collaborator
@markstur markstur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fred added comments that you might want to address, but otherwise you have approvals to merge.

@markstur markstur merged commit c361880 into main Apr 10, 2025
12 checks passed
@hickeyma hickeyma deleted the feat/add-mbrd branch April 10, 2025 07:51
hickeyma added a commit that referenced this pull request Apr 10, 2025
PR #115 was merged and these are addressing some small nits that didn't
make it into the merge. Review comments:

#115 (review)

Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
@hickeyma
Copy link
Collaborator Author

Thanks @frreiss and @markstur for the reviews and feedback.

@frreiss I have addressed your feedback from #115 (review) in #139.

markstur added a commit that referenced this pull request Apr 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add general-purpose normalizer for majority voting
3 participants
0