DataLoaderIterable
All datasets in KARMA format their data using the DataLoaderIterable
class, which provides a unified interface for different modalities and data types.
The format_item
method in each dataset transforms raw data into this standardized format.
DataLoaderIterable Structure
Section titled “DataLoaderIterable Structure”from karma.data_models.dataloader_iterable import DataLoaderIterable
# The complete structuredata_item = DataLoaderIterable( input=None, # Text input for the model images=None, # Image data (PIL Image or bytes) audio=None, # Audio data (bytes) conversation=None, # Multi-turn conversation structure system_prompt=None, # System instructions for the model expected_output=None, # Ground truth answer rubric_to_evaluate=None, # Rubric criteria for evaluation other_args=None # Additional metadata)
Text Dataset Example: PubMedMCQA
Section titled “Text Dataset Example: PubMedMCQA”Text-based datasets use the input
and expected_output
fields:
def format_item(self, sample: Dict[str, Any], **kwargs): input_text = self._format_question(sample["data"])
# Parse correct answer from Correct Option field correct_option = sample["data"]["Correct Option"] context = "\n".join(sample["data"]["Context"]) prompt = self.confinement_instructions.replace("<CONTEXT>", context).replace( "<QUESTION>", input_text )
processed_sample = DataLoaderIterable( input=prompt, # Formatted question with context expected_output=correct_option, # Correct answer (e.g., "A") )
return processed_sample
Key Features:
input
: Contains the formatted question with context and instructionsexpected_output
: Contains the correct answer for evaluation- No other modalities are used for pure text tasks
Audio Dataset Example: IndicVoices
Section titled “Audio Dataset Example: IndicVoices”Audio datasets use the audio
field for input data:
def format_item(self, sample: Dict[str, Any]) -> DataLoaderIterable: audio_info = sample.get("audio_filepath", {}) audio_data = audio_info.get("bytes")
return DataLoaderIterable( audio=audio_data, # Audio bytes for ASR expected_output=sample.get("text", ""), # Ground truth transcription other_args={"language": sample.get("lang", "unknown")}, # Language metadata )
Key Features:
audio
: Contains the raw audio data as bytesexpected_output
: Contains the ground truth transcriptionother_args
: Stores additional metadata like language information- No
input
field needed as audio is the primary input
Image Dataset Example: SLAKE VQA
Section titled “Image Dataset Example: SLAKE VQA”Vision-language datasets combine text and images:
def format_item(self, sample: Dict[str, Any]) -> DataLoaderIterable: question = sample.get("question", "") answer = sample.get("answer", "").lower() image = sample["image"]["bytes"]
# Create VQA prompt prompt = self.confinement_instructions.replace("<QUESTION>", question)
processed_sample = DataLoaderIterable( input=prompt, # Text question with instructions expected_output=answer, # Ground truth answer images=[image], # Image data as bytes (in a list) )
return processed_sample
Key Features:
input
: Contains the formatted question textimages
: Contains image data as bytes (wrapped in a list for batch processing)expected_output
: Contains the ground truth answer- Multi-modal models can process both text and image inputs
Rubric Dataset Example: Health-Bench
Section titled “Rubric Dataset Example: Health-Bench”Rubric-based datasets use conversations and structured evaluation criteria:
def format_item(self, sample: Dict[str, Any]) -> DataLoaderIterable: # Extract conversation turns conversation = [] for conversation_turn in sample["prompt"]: conversation.append( ConversationTurn( content=conversation_turn["content"], role=conversation_turn["role"], ) ) conversation = Conversation(conversation_turns=conversation)
# Extract rubric criteria criterions = [] for rubric_item in sample["rubrics"]: criterions.append( RubricCriteria( criterion=rubric_item["criterion"], points=rubric_item["points"], tags=rubric_item.get("tags", []), ) )
processed_sample = DataLoaderIterable( conversation=conversation, # Multi-turn conversation rubric_to_evaluate=criterions, # Structured evaluation criteria system_prompt=self.system_prompt, # System instructions )
return processed_sample
Key Features:
conversation
: Contains structured multi-turn conversationsrubric_to_evaluate
: Contains structured evaluation criteriasystem_prompt
: Contains system-level instructions- No
expected_output
as evaluation is done via rubric scoring