Metrics¶
danling.metrics provides metric containers and metric descriptors for large-scale training.
The design is exact-by-default while keeping a lighter streaming path available for hot training loops.
Design Summary¶
- Exact by default: factory functions return
GlobalMetricsunlessmode="stream"is set. - Shared state: metric descriptors declare required artifacts (
preds/targets,confmat) so containers build them once. - Symmetric API:
GlobalMetricsandStreamMetricsshare the same constructor signature. - Extensible: users can provide custom
MetricFuncimplementations (or plain callables forStreamMetrics).
Core Components¶
GlobalMetrics- Stores exact artifacts for global/global computation.
- Computes values from shared [
MetricState][danling.metrics.MetricState]. batsynchronizes the current-step exact state: reduced current-step confusion matrices when sufficient, gathered current-steppreds/targetsotherwise.- Performs distributed synchronization lazily in
average(). - In distributed exact mode, descriptors that require
preds/targetsgather full artifacts onaverage(), so this path is best reserved for eval/reporting rather than hot training-loop logging. StreamMetrics- Computes streaming scores online and tracks running averages.
- Uses the same metric descriptors and preprocess contract as
GlobalMetrics. - Metrics are evaluated once per update; batch-vs-sample semantics are determined by the metric itself.
- Suitable for high-throughput training loops.
MetricMeter- Single-metric streaming meter used internally by
StreamMetrics. - [
METRICSregistry][danling.metrics.METRICS] - Task factory registry with explicit
mode. MultiTaskMetrics- Flat task container for multi-head / multi-dataset evaluation.
- Aggregates matching metric paths with a plain mean across tasks.
Quick Start¶
Exact Metrics (Default)¶
| Python | |
|---|---|
Streaming Metrics¶
| Python | |
|---|---|
StreamMetrics semantics:
valis the local value for the most recent update.batis the synchronized current-step metric.avgis a sample-count-weighted running average.- Metrics are evaluated once per update.
- Stream metrics preserve tensor outputs and average them elementwise across batches.
- Plain callables receive preprocessed
input/targettensors;MetricFuncdescriptors receiveMetricState. - Stream metrics with the same names as exact global metrics may still be running approximations rather than exact dataset-level values.
Global vs Stream¶
| Aspect | GlobalMetrics |
StreamMetrics |
|---|---|---|
| Default factory mode | mode="global" |
mode="stream" |
| State | Stores full required artifacts | Stores running meter stats |
| Sync pattern | bat() syncs current-step exact state; average() syncs accumulated exact state |
bat() syncs current-step metric; average() syncs running stats |
| Typical use | Exact eval, AUROC/AUPRC/correlation | Fast training logs |
| Memory | Higher | Lower |
Shared Constructor Contract¶
GlobalMetrics and StreamMetrics intentionally share this signature:
| Python | |
|---|---|
Rules:
- Positional
*metric_funcscan be metric descriptors (or iterables of descriptors).StreamMetricsalso accepts plain callables. - Keyword
**metricsare named metrics and override positional metrics with the same name. preprocessis applied once perupdate.devicecontrols where internal artifacts/stat reductions live.
Factory Functions¶
All factories accept:
mode="global" | "stream"("global"default)*metric_funcs: if provided, defaults are replaced**metrics: named extra metrics (or overrides)- task-specific arguments (
num_classes,num_labels,num_outputs,ignore_index, etc.)
danling.metrics.functional.classification is kept as a thin convenience layer for one-shot DanLing metric calls.
These wrappers apply DanLing preprocessing first (for example: nested-tensor alignment, shape normalization,
ignore_index filtering, and probability normalization where applicable), then forward to the corresponding
TorchMetrics functional implementation. Extra keyword arguments are still forwarded, but they operate on the
preprocessed tensors and therefore must be compatible with the resulting shapes.
Container-facing code should prefer MetricFunc descriptors, which let GlobalMetrics and StreamMetrics build shared state once.
Example:
Default Metric Sets¶
Factories keep defaults minimal:
- Binary / Multiclass / Multilabel:
auroc,auprc,acc,f1,mcc- Regression:
pearson,spearman,r2,mse,rmse
Additional built-ins (opt-in):
- Classification:
precision,recall,fbeta,specificity,balanced_accuracy,jaccard,iou,hamming_loss - Regression:
mae
multiclass_accuracy also supports top-k via k.
For multiclass classification, balanced_accuracy is the class-balanced recall and only supports the standard definition: average="macro" with k=1.
Custom Metric Descriptor (MetricFunc)¶
For consistent behavior across both containers, implement MetricFunc and read from MetricState.
Multi-Task Usage¶
Pass aggregate="macro" if you want equal task weighting, aggregate="micro" if you want sample-count weighting,
or aggregate="weighted" together with aggregate_weights={"task": weight, ...} if you want explicit task
weights. Aggregate outputs match metrics by exact relative metric path, so tasks with different metric namespaces
stay separate rather than being merged by leaf name alone.