Core Components of KARMA
This document defines the four core components of KARMA’s evaluation system and how they interact with each other.
- Models
- Datasets
- Metrics
- Processors
Data Flow Sequence
Section titled “Data Flow Sequence”sequenceDiagram
participant CLI
participant Orchestrator
participant Registry
participant Model
participant Dataset
participant Processor
participant Metrics
participant Cache
CLI->>Orchestrator: karma eval model --datasets ds1
Orchestrator->>Registry: discover_all_registries()
Registry-->>Orchestrator: components metadata
Orchestrator->>Model: initialize with config
Orchestrator->>Dataset: initialize with args
Orchestrator->>Processor: initialize processors
loop For each dataset
Orchestrator->>Dataset: create dataset instance
Dataset->>Processor: apply postprocessors
loop For each batch
Dataset->>Model: provide samples
Model->>Cache: check cache
alt Cache miss
Model->>Model: run inference
Model->>Cache: store results
end
Model-->>Dataset: return predictions
Dataset->>Dataset: extract_prediction()
Dataset->>Processor: postprocess predictions
Processor-->>Dataset: processed text
Dataset->>Metrics: evaluate(predictions, references)
Metrics-->>Dataset: scores
end
Dataset-->>Orchestrator: evaluation results
end
Orchestrator-->>CLI: aggregated results
Component Interaction Diagram
Section titled “Component Interaction Diagram”graph TD
%% CLI Layer
CLI[CLI Command
karma eval model --datasets ds1,ds2]
%% Orchestrator Layer
ORCH[Orchestrator
MultiDatasetOrchestrator]
%% Registry System
MR[Model Registry]
DR[Dataset Registry]
MetR[Metrics Registry]
PR[Processor Registry]
%% Core Components
MODEL[Model
BaseModel]
DATASET[Dataset
BaseMultimodalDataset]
METRICS[Metrics
BaseMetric]
PROC[Processors
BaseProcessor]
%% Benchmark
BENCH[Benchmark
Evaluation Engine]
%% Cache System
CACHE[Cache Manager
DuckDB/DynamoDB]
%% Data Flow
CLI --> |parse args| ORCH
ORCH --> |discover| MR
ORCH --> |discover| DR
ORCH --> |discover| MetR
ORCH --> |discover| PR
MR --> |create| MODEL
DR --> |create| DATASET
MetR --> |create| METRICS
PR --> |create| PROC
ORCH --> |orchestrate| BENCH
BENCH --> |inference| MODEL
BENCH --> |iterate| DATASET
BENCH --> |compute| METRICS
BENCH --> |cache lookup/store| CACHE
DATASET --> |postprocess| PROC
DATASET --> |extract predictions| MODEL
MODEL --> |predictions| DATASET
DATASET --> |processed data| METRICS
PROC --> |normalized text| METRICS
%% Configuration Flow
CLI -.-> |--model-args| MODEL
CLI -.-> |--dataset-args| DATASET
CLI -.-> |--metric-args| METRICS
CLI -.-> |--processor-args| PROC
%% Styling
classDef cli fill:#e1f5fe
classDef orchestrator fill:#f3e5f5
classDef registry fill:#fff3e0
classDef component fill:#e8f5e8
classDef benchmark fill:#fff8e1
classDef cache fill:#fce4ec
class CLI cli
class ORCH orchestrator
class MR,DR,MetR,PR registry
class MODEL,DATASET,METRICS,PROC component
class BENCH benchmark
class CACHE cache
This architecture ensures clean separation of concerns while enabling flexible configuration and robust error handling throughout the evaluation process.