KARMA-OpenMedEvalKit

Knowledge Assessment and Reasoning for Medical Applications - An evaluation framework for medical AI models.

Why KARMA?

KARMA is designed for researchers, developers, and healthcare organizations who need reliable evaluation of medical AI systems.

Extensible

Bring your own model, dataset or even metric. Integrated with Huggingface and also supports local evaluation.

Add your own →

Fast & Efficient

Process thousands of medical examples efficiently with intelligent caching and batch processing.

See caching →

Multi-Modal Ready

Support for text, images, and audio evaluation across multiple datasets.

See available datasets →

Model Agnostic

Works with any model - Qwen, MedGemma, Bedrock-SDK, OpenAI-SDK or your custom architecture with unified interface.

See available models →

Quick Start

Get started with KARMA in minutes:

# Install KARMA
pip install karma-medeval

# Run your first evaluation
karma eval --model "Qwen/Qwen3-0.6B" --datasets openlifescienceai/pubmedqa --max-samples 3

Example Output

$ karma eval --model "Qwen/Qwen3-0.6B" --datasets openlifescienceai/pubmedqa --max-samples 3

{
  "openlifescienceai/pubmedqa": {
    "metrics": {
      "exact_match": {
        "score": 0.3333333333333333,
        "evaluation_time": 0.9702351093292236,
        "num_samples": 3
      }
    },
    "task_type": "mcqa",
    "status": "completed",
    "dataset_args": {},
    "evaluation_time": 7.378399848937988
  },
  "_summary": {
    "model": "Qwen/Qwen3-0.6B",
    "model_path": "Qwen/Qwen3-0.6B",
    "total_datasets": 1,
    "successful_datasets": 1,
    "total_evaluation_time": 7.380354166030884,
    "timestamp": "2025-07-22 18:43:07"
  }
}

Key Features

Registry-Based Architecture: Auto-discovery of models, datasets, and metrics
Smart Caching: DuckDB and DynamoDB backends for faster re-evaluations
Extensible Design: Easy integration of custom models, datasets, and metrics
Rich CLI: Beautiful progress bars, formatted outputs, and help
Standards-Based: Built on PyTorch and HuggingFace Transformers