Sanity benchmark
To ensure that we have implemented the datasets loading, model invocation and metric calculation correctly, we have invoked the model and have reproduced numbers.
MedGemma-4B Reproduction
Section titled “MedGemma-4B Reproduction”In case of Medgemma, we have been able to reproduce the results for most datasets as claimed in their technical report and huggingface readme page.