8000 Revamped Eval Function · Issue #38 · ScalingIntelligence/KernelBench · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Revamped Eval Function #38
Open
Open
@simonguozirui

Description

@simonguozirui

During investigation with Sakana's Kernel in #25, we created a stronger eval function to avoid that kind of exploits that some observed.
I didn't merge it in (sit on a branch) because we want to make sure our paper result didn't change from such an update (for ICML rebuttal).

During ICML rebuttal, I have also checked if any of our existing kernels have similar kind of exploits. Luckily, none of our kernels are smart enough to do that yet.

Now that ICML is over, I plan to merge the more robust eval function in.

In particular, the simple fix is

compute reference, Model
clear cache
compute reference, ModelNew
check if they are equivalent

AND

compute reference, ModelNew
clear cache
compute reference, Model
check if they are equivalent

Check both directions to be extra sure!

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0