8000
We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I'm working on a PR to add different turkish evaluation sets to the lighteval such as MMLU, ARC, GSM8K
My northstar repository: Malhajar/lm-evaluation-harness_turkish
def turkish_gsm8k_eval_prompt(line: dict, task_name: Optional[str] = "", instruction: Optional[str] = "") -> Doc: question = line["question"] answer = line["answer"] # Extract numerical answer numerical_answer = None if "####" in answer: match = re.search(r"####\s*(\d+)", answer) if match: numerical_answer = int(match.group(1)) # ... query building code ... return Doc( task_name=task_name, query=query, choices=[], # Empty list since not multiple choice gold_index=-1, # Using -1 as sentinel instruction=instruction, )
Should we:
Metrics.quasi_exact_match_gsm8k
Metrics.maj_at_8_gsm8k
Could someone from the team clarify:
/cc @malhajar17
The text was updated successfully, but these errors were encountered:
Hi ! you can look at the way it is done in the original gsm8k. also, for multinlang taskls, we have a lot of examples here
Sorry, something went wrong.
No branches or pull requests
Turkish Community Evals.
I'm working on a PR to add different turkish evaluation sets to the lighteval such as MMLU, ARC, GSM8K
My northstar repository: Malhajar/lm-evaluation-harness_turkish
Should we:
Questions/Concerns
Metrics.quasi_exact_match_gsm8k
Metrics.maj_at_8_gsm8k
Request
Could someone from the team clarify:
/cc @malhajar17
The text was updated successfully, but these errors were encountered: