Skip to content

Sanity benchmark

To ensure that we have implemented the datasets loading, model invocation and metric calculation correctly, we have invoked the model and have reproduced numbers.

In case of Medgemma, we have been able to reproduce the results for most datasets as claimed in their technical report and huggingface readme page.