Metric¶
The danling.metric
module provides a flexible and powerful system for computing, tracking, and aggregating metrics during model training and evaluation.
This module is designed to work seamlessly with PyTorch and supports both single-task and multi-task scenarios.
Overview¶
Metrics are essential for measuring model performance during training and evaluation. The danling.metric
module offers a comprehensive solution for:
- Computing various metrics (accuracy, AUROC, Pearson, etc.)
- Aggregating metrics across batches and devices
- Supporting complex scenarios like multi-task learning
- Integrating with distributed training environments
Key Components¶
The module consists of three classes and several helpful functions:
metrics
: Keeps track of all predictions and labels to compute multiple metrics that require the entire dataset.average_meter
: Core component for averaging values over time.metric_meter
: Computes and averages metrics on a per-batch basis.factory
: Convenient functions to create common metric for different task types.functional
: Implementation of common metric functions.
Quick Start¶
Binary Classification¶
Python | |
---|---|
Multiclass Classification¶
Python | |
---|---|
Regression¶
Python | |
---|---|
Choosing the Right Metric Class¶
DanLing provides Metrics
and MetricMeters
for different use cases.
Understanding the differences will help you choose the right one for your specific needs.
Use Metrics
when
- You need metrics that require the entire dataset (like AUROC, Spearman correlation)
- You want to maintain the full history of predictions and labels
- Memory is not a constraint for your dataset size
- You need metrics that cannot be meaningfully averaged batch-by-batch
- Precision is top priority
Best for:
- Evaluation phases where you need high-quality metrics
- ROC curves and PR curves that require all predictions
- Correlation measures (Pearson, Spearman)
- Final model assessment
Use MetricMeters
when
- You need to track metrics that can be averaged across batches (like accuracy, loss)
- Memory efficiency is important (doesn’t store all predictions)
- Speed matters (syncing predictions across the entire process group takes time)
- You want simple averaging of metrics across iterations
- Approximation is good enough
Best for:
- Training phases where speed and memory efficiency is critical
- Simple metrics like accuracy, precision, recall
- Loss tracking during training
- Large datasets where storing all predictions would be impractical
Metrics
and MetricMeters
are mostly identical
Metrics
and MetricMeters
have a shared API, so that they are interchageable.
You can easily converts a Metrics
to MetricMeters
by calling meters = MetricMeters(metrics)
and vice versa.
Key Differences¶
Feature | Metrics |
MetricMeters |
---|---|---|
Storage | Stores all predictions and labels | Only stores running statistics |
Memory Usage | Higher (scales with dataset size) | Lower (constant) |
Computation | Computes metrics on full dataset | Averages per-batch metrics |
Multiple Metrics | Stores multiple metrics with same data | Multiple metrics with same preprocessing |
Use Case | For metrics requiring all data | For multiple batch-averageable metrics |
Distributed Support | Yes | Yes |
Factory Functions¶
The module provides convenient factory functions for common task types:
binary_metrics()
: For binary classification tasksmulticlass_metrics(num_classes)
: For multiclass classification tasksmultilabel_metrics(num_labels)
: For multi-label classification tasksregression_metrics(num_outputs)
: For (multi-)regression tasks
Each factory creates a Metrics
instance pre-configured with appropriate metric functions and preprocessing.
Using Factory Functions
Advanced Usage¶
Multi-Task Learning¶
For multi-task scenarios, use the multi-task variants:
Custom Preprocessing¶
Customize how inputs are preprocessed before metric calculation:
Custom Metrics¶
Create custom metric functions to use with the metrics system:
Note that the Metrics
and MetricMeters
will apply a unified preprocess at once if is defined.