Skip to content

KARMA-OpenMedEvalKit

Knowledge Assessment and Reasoning for Medical Applications - An evaluation framework for medical AI models.

KARMA is designed for researchers, developers, and healthcare organizations who need reliable evaluation of medical AI systems.

Extensible

Bring your own model, dataset or even metric. Integrated with Huggingface and also supports local evaluation.

Add your own →

Fast & Efficient

Process thousands of medical examples efficiently with intelligent caching and batch processing.

See caching →

Model Agnostic

Works with any model - Qwen, MedGemma, Bedrock-SDK, OpenAI-SDK or your custom architecture with unified interface.

See available models →

Get started with KARMA in minutes:

Terminal window
# Install KARMA
pip install karma-medeval
# Run your first evaluation
karma eval --model "Qwen/Qwen3-0.6B" --datasets openlifescienceai/pubmedqa --max-samples 3
Terminal window
$ karma eval --model "Qwen/Qwen3-0.6B" --datasets openlifescienceai/pubmedqa --max-samples 3
{
"openlifescienceai/pubmedqa": {
"metrics": {
"exact_match": {
"score": 0.3333333333333333,
"evaluation_time": 0.9702351093292236,
"num_samples": 3
}
},
"task_type": "mcqa",
"status": "completed",
"dataset_args": {},
"evaluation_time": 7.378399848937988
},
"_summary": {
"model": "Qwen/Qwen3-0.6B",
"model_path": "Qwen/Qwen3-0.6B",
"total_datasets": 1,
"successful_datasets": 1,
"total_evaluation_time": 7.380354166030884,
"timestamp": "2025-07-22 18:43:07"
}
}
  • Registry-Based Architecture: Auto-discovery of models, datasets, and metrics
  • Smart Caching: DuckDB and DynamoDB backends for faster re-evaluations
  • Extensible Design: Easy integration of custom models, datasets, and metrics
  • Rich CLI: Beautiful progress bars, formatted outputs, and help
  • Standards-Based: Built on PyTorch and HuggingFace Transformers

Installation

Multiple installation methods with uv, pip, or development setup.

Install KARMA →

Basic Usage

Learn the CLI commands and start evaluating your first model.

Learn CLI →

Add Your Own

Extend KARMA with custom models, datasets, and evaluation metrics.

Customize →

Supported Resources

Complete list of available models, datasets, and metrics.

View Resources →

Ready to evaluate your medical AI models? Get started with installation →