Built-in Models
KARMA includes several pre-configured models optimized for medical AI evaluation across different modalities.
Available Models Overview
Section titled “Available Models Overview”# List all available modelskarma list models
# Expected output:┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓┃ Model Name ┃ Status ┃ Modality ┃┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩│ Qwen/Qwen3-0.6B │ ✓ Available │ Text ││ Qwen/Qwen3-1.7B │ ✓ Available │ Text ││ google/medgemma-4b-it │ ✓ Available │ Text + Vision ││ gpt-4o │ ✓ Available │ Text ││ gpt-4o-mini │ ✓ Available │ Text ││ gpt-3.5-turbo │ ✓ Available │ Text ││ us.anthropic.claude-3-5-sonnet-20241022-v2:0│ ✓ Available │ Text ││ us.anthropic.claude-sonnet-4-20250514-v1:0 │ ✓ Available │ Text ││ ai4bharat/indic-conformer-600m-multilingual │ ✓ Available │ Audio ││ aws-transcribe │ ✓ Available │ Audio ││ gpt-4o-transcribe │ ✓ Available │ Audio ││ gemini-2.0-flash │ ✓ Available │ Audio ││ gemini-2.5-flash │ ✓ Available │ Audio ││ eleven_labs │ ✓ Available │ Audio │└─────────────────────────────────────────────┴─────────────┴────────────────────┘
Text Generation Models
Section titled “Text Generation Models”Qwen Models
Section titled “Qwen Models”Alibaba’s Qwen models with specialized thinking capabilities for medical reasoning:
# Get detailed model informationkarma info model "Qwen/Qwen3-0.6B"
# Basic usagekarma eval --model "Qwen/Qwen3-0.6B" \ --datasets openlifescienceai/pubmedqa
# Advanced configuration with thinking modekarma eval --model "Qwen/Qwen3-0.6B" \ --datasets openlifescienceai/pubmedqa \ --model-args '{"enable_thinking": true, "temperature": 0.3}'
Available Models:
- Qwen/Qwen3-0.6B: Compact 0.6B parameter model
- Qwen/Qwen3-1.7B: Larger 1.7B parameter model
MedGemma models
Section titled “MedGemma models”Google’s medical-specialized Gemma models with vision capabilities:
# MedGemma for specialized medical taskskarma eval --model "google/medgemma-4b-it" \ --datasets openlifescienceai/medmcqa \ --model-args '{"temperature": 0.1, "max_tokens": 512}'
# MedGemma with image analysiskarma eval --model "google/medgemma-4b-it" \ --datasets medical_image_dataset \ --model-args '{"temperature": 0.01, "max_tokens": 1024}'
OpenAI models
Section titled “OpenAI models”OpenAI’s GPT models for comprehensive text generation: When invoking OpenAI models, multiprocessing is leveraged to make multiple calls concurrently.
# GPT-4o for complex medical reasoningkarma eval --model "gpt-4o" \ --datasets openlifescienceai/pubmedqa \ --model-args '{"temperature": 0.7, "max_tokens": 1024}'
# GPT-4o Mini for efficient processingkarma eval --model "gpt-4o-mini" \ --datasets medical_qa_dataset \ --model-args '{"temperature": 0.3, "max_tokens": 512}'
# GPT-3.5 Turbo for cost-effective inferencekarma eval --model "gpt-3.5-turbo" \ --datasets simple_medical_tasks \ --model-args '{"temperature": 0.5, "max_tokens": 1024}'
Available Models:
- gpt-4o: Latest GPT-4 Omni model with advanced reasoning
- gpt-4o-mini: Compact version of GPT-4o for efficient processing
- gpt-3.5-turbo: Cost-effective model for simpler tasks
Anthropic models via AWS Bedrock
Section titled “Anthropic models via AWS Bedrock”Anthropic’s Claude models via AWS Bedrock: When invoking Bedrock models, multiprocessing is leveraged to make multiple calls concurrently.
# Claude 3.5 Sonnet for advanced medical reasoningkarma eval --model "us.anthropic.claude-3-5-sonnet-20241022-v2:0" \ --datasets complex_medical_cases \ --model-args '{"temperature": 0.7, "max_tokens": 1024}'
# Claude Sonnet 4 for cutting-edge performancekarma eval --model "us.anthropic.claude-sonnet-4-20250514-v1:0" \ --datasets advanced_medical_reasoning \ --model-args '{"temperature": 0.3, "max_tokens": 2048}'
Available Models:
- us.anthropic.claude-3-5-sonnet-20241022-v2:0: Claude 3.5 Sonnet v2
- us.anthropic.claude-sonnet-4-20250514-v1:0: Latest Claude Sonnet 4
Audio Recognition Models
Section titled “Audio Recognition Models”IndicConformer ASR
Section titled “IndicConformer ASR”AI4Bharat’s Conformer model for Indian languages:
# Indian language speech recognitionkarma eval \ --model "ai4bharat/indic-conformer-600m-multilingual" \ --datasets "ai4bharat/indicvoices_r" \ --batch-size 1 \ --dataset-args "ai4bharat/indicvoices_r:language=Hindi" \ --processor-args "ai4bharat/indicvoices_r.general_text_processor:language=Hindi"
Key Features:
- 22 Indian Languages: Complete coverage of constitutional languages
- Medical Audio: Optimized for healthcare speech recognition
- Conformer Architecture: State-of-the-art speech recognition architecture
- Regional Dialects: Handles diverse Indian language variations
- Open Source: MIT licensed with open weights
Cloud ASR Services
Section titled “Cloud ASR Services”Enterprise-grade speech recognition for production deployments:
AWS Transcribe
Section titled “AWS Transcribe”# AWS Transcribe with automatic language detectionkarma eval --model aws-transcribe \ --datasets medical_audio_dataset \ --model-args '{"region_name": "us-east-1", "s3_bucket": "your-bucket"}'
Google Gemini ASR
Section titled “Google Gemini ASR”# Gemini 2.0 Flash for audio transcriptionkarma eval --model gemini-2.0-flash \ --datasets medical_audio_dataset \ --model-args '{"thinking_budget": 1000}'
# Gemini 2.5 Flash for enhanced performancekarma eval --model gemini-2.5-flash \ --datasets medical_audio_dataset \ --model-args '{"thinking_budget": 2000}'
Available Models:
- gemini-2.0-flash: Fast transcription with thinking capabilities
- gemini-2.5-flash: Enhanced model with improved accuracy
OpenAI Whisper ASR
Section titled “OpenAI Whisper ASR”# OpenAI Whisper for high-accuracy transcriptionkarma eval --model gpt-4o-transcribe \ --datasets medical_audio_dataset \ --model-args '{"language": "en"}'
ElevenLabs ASR
Section titled “ElevenLabs ASR”# ElevenLabs for specialized audio processingkarma eval --model eleven_labs \ --datasets medical_audio_dataset \ --model-args '{"diarize": false, "tag_audio_events": false}'
Getting Model Information
Section titled “Getting Model Information”# Get detailed information about any model$ karma info model "Qwen/Qwen3-0.6B"
Model Information: Qwen/Qwen3-0.6B────────────────────────────────────────────────── Model: Qwen/Qwen3-0.6B Name Qwen/Qwen3-0.6B Class QwenThinkingLLM Module karma.models.qwen
Description:╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ Qwen language model with specialized thinking capabilities. │╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Constructor Signature: QwenThinkingLLM(self, model_name_or_path: str, device: str = 'mps', max_tokens: int = 32768, temperature: float = 0.7, top_p: float = 0.9, top_k:Optional = None, enable_thinking: bool = False, **kwargs)
Usage Examples:
Basic evaluation: karma eval --model "Qwen/Qwen3-0.6B" --datasets openlifescienceai/pubmedqa
With multiple datasets: karma eval --model "Qwen/Qwen3-0.6B" \ --datasets openlifescienceai/pubmedqa,openlifescienceai/mmlu_professional_medicine
With custom arguments: karma eval --model "Qwen/Qwen3-0.6B" \ --datasets openlifescienceai/pubmedqa \ --model-args '{"temperature": 0.8, "top_p": 0.85}' --max-samples 100 --batch-size 4
Interactive mode: karma eval --model "Qwen/Qwen3-0.6B" --interactive
✓ Model information retrieved successfully