Add dataset
You can create custom datasets by inheriting from BaseMultimodalDataset
and implementing the format_item
method to return a properly formatted DataLoaderIterable
:
from karma.eval_datasets.base_dataset import BaseMultimodalDatasetfrom karma.registries.dataset_registry import register_datasetfrom karma.data_models.dataloader_iterable import DataLoaderIterable
Here we will use the register_dataset
decorator to register and make the dataset discoverable to the CLI.
This decorator also has information about the metric to use and any arguments that can be configured.
@register_dataset( "my_medical_dataset", metrics=["exact_match", "accuracy"], task_type="mcqa", required_args=["split"], optional_args=["subset"], default_args={"split": "test"})class MyMedicalDataset(BaseMultimodalDataset): """Custom medical dataset."""
def __init__(self, split: str = "test", **kwargs): self.split = split super().__init__(**kwargs)
def load_data(self): # Load your dataset return your_dataset_loader(split=self.split)
def format_item(self, item): """Format each item into DataLoaderIterable format.""" # Example for text-based dataset return DataLoaderIterable( input=f"Question: {item['question']}\nChoices: {item['choices']}", expected_output=item['answer'], other_args={"question_id": item['id']} )
In the class, we implement the format_item
method to specify how the output will be like through the DataLoaderIterable
See (DataLoaderIterable
)[user-guide/datasets/data-loader-iterable] for more information.
Multi-Modal Dataset Example
Section titled “Multi-Modal Dataset Example”For datasets that combine multiple modalities:
def format_item(self, item): """Format multi-modal item.""" return DataLoaderIterable( input=f"Question: {item['question']}", images=[item['image_bytes']], # List of image bytes audio=item.get('audio_bytes'), # Optional audio expected_output=item['answer'], other_args={ "question_type": item['type'], "difficulty": item['difficulty'] } )
Conversation Dataset Example
Section titled “Conversation Dataset Example”For datasets with multi-turn conversations:
from karma.data_models.dataloader_iterable import Conversation, ConversationTurn
def format_item(self, item): """Format conversation item.""" conversation_turns = [] for turn in item['conversation']: conversation_turns.append( ConversationTurn( content=turn['content'], role=turn['role'] # 'user' or 'assistant' ) )
return DataLoaderIterable( conversation=Conversation(conversation_turns=conversation_turns), system_prompt=item.get('system_prompt', ''), expected_output=item['expected_response'] )
The DataLoaderIterable
format ensures that all datasets work seamlessly with any model type, whether it’s text-only, multi-modal, or conversation-based. Models receive the appropriate data fields and can process them according to their capabilities.
Using Local Datasets with KARMA
Section titled “Using Local Datasets with KARMA”This guide will walk you through how to plug that dataset into KARMA’s evaluation pipeline. Let’s say we are trying to integrate an MCQA dataset.
-
Organize Your Dataset Ensure your dataset is structured correctly.
Each row should ideally include:- A question
- A list of options (optional)
- The correct answer
- Optionally: metadata like category, generic name, or citation
-
Set Up a Custom Dataset Class KARMA supports registering your own datasets using a decorator.
@register_dataset(dataset_name="mcqa-local",split="test",metrics=["exact_match"],task_type="mcqa",)class LocalDataset(BaseMultimodalDataset):...This decorator registers your dataset with KARMA for evaluations.
-
Load your Dataset In your Dataset class, load your dataset file.
You can use any format supported by pandas, such as CSV or Parquet.def __init__(self, ...):self.data_path = <path_to_your_dataset>if not os.path.exists(self.data_path):raise FileNotFoundError(f"Dataset file not found: {self.data_path}")self.df = pd.read_parquet(self.data_path)... -
Implement the format_item Method Each row in your dataset will be converted into an input-output pair for the model.
def format_item(self, sample: Dict[str, Any]) -> DataLoaderIterable:input_text = self._format_question(sample["data"])correct_answer = sample["data"]["ground_truth"]prompt = self.confinement_instructions.replace("<QUESTION>", input_text)dataloader_item = DataLoaderIterable(input=prompt, expected_output=correct_answer)dataloader_item.conversation = Nonereturn dataloader_item -
Iterate Over the Dataset Implement
__iter__()
to yield formatted examples.def __iter__(self) -> Generator[Dict[str, Any], None, None]:if self.dataset is None:self.dataset = list(self.load_eval_dataset())for idx, sample in enumerate(self.dataset):if self.max_samples is not None and idx >= self.max_samples:breakitem = self.format_item(sample)yield item -
Handle Model Output Extract the model’s predictions.
def extract_prediction(self, response: str) -> Tuple[str, bool]:answer, success = "", Falseif "Final Answer:" in response:answer = response.split("Final Answer:")[1].strip()if answer.startswith("(") and answer.endswith(")"):answer = answer[1:-1]success = Truereturn answer, success -
Yield Examples for Evaluation Read from your DataFrame and return structured examples.
def load_eval_dataset(self, ...):for _, row in self.df.iterrows():prediction = Noneparsed_output = row.get("model_output_parsed", None)if isinstance(parsed_output, dict):prediction = parsed_output.get("prediction", None)yield {"id": row["index"],"data": {"question": row["question"],"options": row["options"],"ground_truth": row["ground_truth"],},"prediction": prediction,"metadata": {"generic_name": row.get("generic_name", None),"category": row.get("category", None),"citation": row.get("citation", None),},}