You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
In math_environment.py and it will assign 0 reward if exceptions are raised. It may be true for this math case, but in general, should we skip such examples and use None or np.nan instead (assuming len(results) == len(pred_responses) has to be true?)
def verify(
self, pred_responses: List[str], ground_truths: List[str]
) -> List[float]:
"""Verify the correctness of the predicted responses against the ground truth.
Args:
pred_responses: List[str]. The predicted responses from the LLM.
ground_truths: List[str]. The ground truth responses.
Returns:
List[float]. The rewards for each predicted response.
"""
results = []
for response, ground_truth in zip(pred_responses, ground_truths):
try:
ground_truth_parsable = "\\boxed{" + ground_truth + "}"
with _mute_output():
try:
ret_score, _ = self.verify_func(
[ground_truth_parsable], [response]
)
except Exception:
ret_score = 0.0 <= Should we skip if detection is failed?
results.append(float(ret_score))
except Exception:
results.append(0.0) <= Same here
return results
Describe the solution you'd like
Per our internal discussion, adding a feature to mark invalid outputs to mask out samples in training might be a good solution.
Describe alternatives you've considered
N/A
Additional context
N/A
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
In
math_environment.py
and it will assign0
reward if exceptions are raised. It may be true for this math case, but in general, should we skip such examples and useNone
ornp.nan
instead (assuminglen(results) == len(pred_responses)
has to be true?)https://github.com/NVIDIA/NeMo-RL/blob/cec9a60ff798554279ca494a0ef40ce5f283e0d8/nemo_rl/environments/math_environment.py#L87-L92
Describe the solution you'd like
Per our internal discussion, adding a feature to mark
invalid
outputs to mask out samples in training might be a good solution.Describe alternatives you've considered
N/A
Additional context
N/A
The text was updated successfully, but these errors were encountered: