feat: Add MBRD and ROUGE score technique for majority voting #115

hickeyma · 2025-03-31T16:39:29Z

Minimum Bayesian Risk Decoding (MBRD) is a technique that when used with ROUGE score similarity shows good accuracy when performing majority voting of model multi completion outputs per prompt.

This PR includes the following:

Adds MBRD and ROUGE score algorithm
Add new Python script to demonstrate how to use majority voting
Add MBRDMajorityVotingProcessor composite processor

Closes: #72

frreiss

Thanks for doing this, but this implementation needs work.

The existing code for simple majority voting and the new code for minimum Bayesian risk decoding represent two of many methods for majority voting. Both of these methods have sub-variants depending on what types of normalizers, similarity scores, etc. the users plugs into them. There are other forms of majority voting that we do not currently have code for. For example, one might want to cluster the model outputs, then return the best representative of every cluster.

Majority voting techniques are supposed to be model-agnostic. They work with many different general-purpose LLMs, as well as with LLMs and LoRA adapters fine-tuned for specific applications and even with composite workflows that call multiple models. We shouldn't tie our implementations to a specific model or to inference operations that directly call a model as opposed to running a workflow.

The ramifications of the above to this code are as follows:

We should not remove the existing code for simple majority voting. There are cases where that method is a better fit than MBRD. For example, one might want to return a set of top-k answers with diversity among the answers chosen.
The MBRD code should be in its own composite I/O processor, so that it can be used with multiple different models. This composite I/O processor should allow the user to specify important parameters such as the choice of similarity metric.
There should not be a Boolean "majority_voting" flag in the base Granite 3.2 model's I/O processor. There are many types of majority voting. Users should enable majority voting by configuring the appropriate composite I/O processor and attaching that I/O processor to the I/O processor of an LLM, LoRA adapter, or composite inference workflow.

hickeyma · 2025-04-06T09:49:33Z

Thanks @frreiss for the feedback in #115 (review) and the discussion we had on best direction with supporting majority voting. I have taken your feedback on board and updated accordingly.

It is ready for review again.

hickeyma · 2025-04-08T18:57:22Z

No idea whats up with DCO. Tried rebasing and force pushing but not fixing it.

markstur · 2025-04-08T19:15:37Z

No idea whats up with DCO. Tried rebasing and force pushing but not fixing it.

one of the reverts is not signed off. Override should be okay here.

markstur · 2025-04-08T19:17:29Z

No idea whats up with DCO. Tried rebasing and force pushing but not fixing it.

one of the reverts is not signed off. Override should be okay here.

I set DCO to pass, but I'm not seeing the tests run. Maybe it'd be best to do some squashing(?)

hickeyma · 2025-04-08T19:21:19Z

I set DCO to pass, but I'm not seeing the tests run. Maybe it'd be best to do some squashing(?)

All tests passed previously.

markstur

2 little things inline that I think need fixing

In general, voting seems like internal experiments or examples more than a granite-io feature right now -- but I do see this as progress in the right direction.

examples/model_chat_with_majority_voting.py

src/granite_io/io/granite_3_2/input_processors/granite_3_2_input_processor.py

src/granite_io/io/granite_3_2/output_processors/granite_3_2_output_processor.py

tests/io/test_voting.py

Add Minimum Bayesian Risk Decoding (MBRD) which is a technique used for majority voting. Closes: #72 Co-authored-by: Ramón Fernandez Astudillo <ramon@astudillo.com> Co-authored-by: MD ARAFAT SULTAN <arafat.sultan@colorado.edu> Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

Enables majority voting when chat completions called for Granite 3.2. Majority voting is standalone from other capabilities like thinking, rage etc. This is because need to add specifics to prompt to be able to get single final responses to be able to do similarity checking and identify the majority answer. Example python script 10000 also added. Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

Some additional changes: - Fix prompt to show better results - Return only one result and not all completions - Update example to give better clarity on baclends supporting multiple completions Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

The MajorityVotingProcessor processor is replaced with adding majority voting into existing processors. It uses a flag instead like reasoning whioch encapsulates the implementation from the user. Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

This reverts commit 8d19541.

Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

Review comments: - #115 (review) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

markstur · 2025-04-09T18:01:18Z

So sorry... updating with main was an accident.

I can revisit the recording problems as long as CI is passing

frreiss

Looks good. Some minor comments inline.

src/granite_io/io/granite_3_2/output_processors/granite_3_2_output_processor.py

src/granite_io/io/voting/__init__.py

frreiss · 2025-04-09T21:33:04Z

src/granite_io/io/voting/mbrd_voting.py

+    """
+
+    similarity_scores: list[float] = []
+    for _, x in enumerate(answers):


Should be for x in answers, not sure why the linter didn't catch this.

Not sure what you mean by this. The enumnerate() function will return counter and object (string in this case). If you don't account for the counter you end up with tuples like this: (0, 'str1') (1, 'str2') ....

src/granite_io/io/voting/mbrd_voting.py

tests/io/cassettes/test_voting/test_mbrd_majority_voting[backend_openai].yaml

markstur

Fred added comments that you might want to address, but otherwise you have approvals to merge.

PR #115 was merged and these are addressing some small nits that didn't make it into the merge. Review comments: #115 (review) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

hickeyma · 2025-04-10T10:26:23Z

Thanks @frreiss and @markstur for the reviews and feedback.

@frreiss I have addressed your feedback from #115 (review) in #139.

Updates for feedback in PR #115

hickeyma force-pushed the feat/add-mbrd branch 2 times, most recently from a051182 to 4a09ea3 Compare March 31, 2025 16:49

hickeyma changed the title ~~feat: Improve majority voting using MBRD~~ WIP: feat: Improve majority voting using MBRD Mar 31, 2025

hickeyma marked this pull request as draft March 31, 2025 17:34

hickeyma force-pushed the feat/add-mbrd branch 2 times, most recently from 157c7e4 to 0e88479 Compare April 2, 2025 10:46

hickeyma changed the title ~~WIP: feat: Improve majority voting using MBRD~~ feat: Improve majority voting using MBRD and ROGUE score Apr 2, 2025

hickeyma marked this pull request as ready for review April 2, 2025 13:08

hickeyma requested review from markstur and frreiss April 2, 2025 13:21

markstur changed the title ~~feat: Improve majority voting using MBRD and ROGUE score~~ feat: Improve majority voting using MBRD and ROUGE score Apr 2, 2025

frreiss requested changes Apr 3, 2025

View reviewed changes

hickeyma force-pushed the feat/add-mbrd branch 2 times, most recently from 8250bb8 to 5617e58 Compare April 6, 2025 09:47

hickeyma requested a review from frreiss April 6, 2025 09:49

hickeyma changed the title ~~feat: Improve majority voting using MBRD and ROUGE score~~ feat: Add MBRD and ROUGE score technique for majority voting Apr 6, 2025

hickeyma force-pushed the feat/add-mbrd branch 2 times, most recently from 9348e82 to 0d0c909 Compare April 8, 2025 18:56

markstur previously requested changes Apr 8, 2025

View reviewed changes

hickeyma and others added 6 commits April 9, 2025 12:00

Add unit tests

c14570b

Some additional changes: - Fix prompt to show better results - Return only one result and not all completions - Update example to give better clarity on baclends supporting multiple completions Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

Revert "Remove MajorityVotingProcessor processor"

cfa7d2b

This reverts commit 8d19541.

Move MBRD voting into IO processor

58e49a6

Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

hickeyma force-pushed the feat/add-mbrd branch from 0d0c909 to 58e49a6 Compare April 9, 2025 11:01

hickeyma added a commit that referenced this pull request Apr 9, 2025

Update after review

e6ca83a

Review comments: - #115 (review) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

hickeyma requested a review from markstur April 9, 2025 11:22

hickeyma added a commit that referenced this pull request Apr 9, 2025

Update after review

fa8a9b6

Review comments: - #115 (review) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

hickeyma force-pushed the feat/add-mbrd branch from e6ca83a to fa8a9b6 Compare April 9, 2025 11:31

Update after review

bd310ef

Review comments: - #115 (review) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

hickeyma force-pushed the feat/add-mbrd branch from fa8a9b6 to bd310ef Compare April 9, 2025 11:32

Merge branch 'main' into feat/add-mbrd

49847e7

frreiss approved these changes Apr 9, 2025

View reviewed changes

markstur approved these changes Apr 10, 2025

View reviewed changes

markstur merged commit c361880 into main Apr 10, 2025
12 checks passed

hickeyma deleted the feat/add-mbrd branch April 10, 2025 07:51

hickeyma added a commit that referenced this pull request Apr 10, 2025

Updates for feedback in PR #115

4222f09

PR #115 was merged and these are addressing some small nits that didn't make it into the merge. Review comments: #115 (review) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

hickeyma mentioned this pull request Apr 10, 2025

Updates for feedback in PR #115 #139

Merged

markstur added a commit that referenced this pull request Apr 10, 2025

Merge pull request #139 from ibm-granite/follow-up-pr-115

ec66abf

Updates for feedback in PR #115

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add MBRD and ROUGE score technique for majority voting #115

feat: Add MBRD and ROUGE score technique for majority voting #115

feat: Add MBRD and ROUGE score technique for majority voting #115

feat: Add MBRD and ROUGE score technique for majority voting #115

Conversation

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment