Functions¶
NestedTensor operation support is split by dispatch layer, not by a single
utils.py module. The public documentation follows that structure:
Torch Functions¶
danling.tensors.torch_functions registers torch.* handlers through
__torch_function__, such as torch.cat, torch.stack, reductions, indexing,
and matrix operations.
danling.tensors.torch_functions
¶
torch.* function overrides for NestedTensor via __torch_function__.
This module is the Level 2 dispatch layer. When a torch.* call
(e.g. torch.cat, torch.mean, torch.einsum) involves a
NestedTensor, __torch_function__ checks
NestedTensorFuncRegistry for a
registered handler.
Handlers here use several strategies depending on the op’s needs:
- Packed fast-path — ops that work directly on the concatenated
_valuestensor viaNestedTensor._from_packedwithout knowing element boundaries. - Per-element dispatch — ops that must be applied to each element
individually via
_map_storage_serial, e.g. when dimension indices need translation or output shapes differ per element.
If no handler is registered here, the call falls through to aten
decomposition and then to __torch_dispatch__ (see aten_functions).
NN Functions¶
danling.tensors.nn_functions registers torch.nn.functional.* handlers such
as attention, embedding, normalization, pooling, convolution, and loss
functions.
danling.tensors.nn_functions
¶
torch.nn.functional.* overrides for NestedTensor via __torch_function__.
This module is the Level 3 dispatch layer, registering handlers for
F.linear, F.conv*, F.max_pool*, F.embedding,
F.layer_norm, F.scaled_dot_product_attention, and other
torch.nn.functional ops.
The design rule is:
- use one canonical NestedTensor implementation path per op whenever possible
- reserve
compile_safefor packed-first training hot paths only - keep convenience APIs honest by marking densifying or repacking handlers
eager-only under
torch.compile - leave non-hot spatial ops eager-only rather than carrying speculative packed fast paths
That means most spatial operators here use per-element dispatch, while a small Tier A set of transformer-hot packed handlers stays compile-safe.
danling.tensors.nn_functions.create_flex_block_mask
¶
create_flex_block_mask(
mask_mod: Callable,
query: NestedTensor,
key: NestedTensor | None = None,
*,
num_heads: int | None = None,
block_size: int | tuple[int, int] = 128,
compile_mask: bool = False
)
Create a FlexAttention block mask directly from DanLing ragged attention storage.
Source code in danling/tensors/nn_functions.py
Aten Functions¶
danling.tensors.aten_functions registers packed-storage
__torch_dispatch__ handlers and fallback behavior for aten ops.
danling.tensors.aten_functions
¶
__torch_dispatch__ handlers for NestedTensor aten ops (Level 1 dispatch).
This module implements the dispatch table that maps aten ops to optimized handlers operating on the packed representation (_values, _offsets, _physical_shape).
Architecture
- Elementwise ops operate directly on
_values(no unpack/repack overhead) - Structural ops (clone, detach, to_copy) operate on all inner tensors
- Unregistered ops fall back to per-element application via
_storage
Dispatch Registries¶
danling.tensors.ops provides registry types, dispatch tables, and diagnostic
helpers used to extend or test NestedTensor operation support.
danling.tensors.ops
¶
Internal helpers shared across NestedTensor function registrations.
TorchFuncRegistry
¶
Bases: dict
Plain dict mapping functions/ops to their NestedTensor handlers.
Uses dict directly for O(1) lookup with minimal overhead (~30 ns)
instead of chanfig.Registry (~700-2300 ns).
Used for both __torch_function__ (torch/nn ops) and
__torch_dispatch__ (aten ops) dispatch tables.
Source code in danling/tensors/ops.py
| Python | |
|---|---|
132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 | |
register
¶
register(
func: Callable,
handler: Callable,
*,
compile_safe: bool = False,
compile_guard: (
Callable[[tuple, dict[str, object]], bool] | None
) = None
) -> Callable
Register handler for func and record whether the path is compile-safe by default.
Source code in danling/tensors/ops.py
implement
¶
implement(
func: Callable,
*,
compile_safe: bool = False,
compile_guard: (
Callable[[tuple, dict[str, object]], bool] | None
) = None
) -> Callable
Decorator to register a handler for func.
Source code in danling/tensors/ops.py
is_compile_safe
¶
is_compile_safe(
func: Callable,
args: tuple | None = None,
kwargs: dict[str, object] | None = None,
) -> bool
Return whether func is allowed to run while torch.compile is tracing.
Source code in danling/tensors/ops.py
set_compile_safe
¶
Update compile policy for an already-registered handler.
Source code in danling/tensors/ops.py
set_compile_guard
¶
set_compile_guard(
func: Callable,
guard: (
Callable[[tuple, dict[str, object]], bool] | None
),
) -> None
Set or clear the runtime compile guard for an already-registered handler.
Source code in danling/tensors/ops.py
get_compile_guard
¶
Return the runtime compile guard for func, if any.
nested_execution_guard
¶
nested_execution_guard(
*,
forbid_iteration: bool = False,
forbid_storage_map: bool = False,
forbid_eager_fallback: bool = False,
forbid_padded_materialization: bool = False,
forbid_dense_repack: bool = False
)
Temporarily forbid selected slow paths while exercising NestedTensor hot paths.
This is intended for transformer-critical regression checks, where falling back to Python loops or padded materialization is considered a bug.
Source code in danling/tensors/ops.py
The files under danling.tensors.functions are specialized implementations
used by nn_functions for convolution, pooling, and channel operators. They are
kept out of the docs navigation because users normally call the corresponding
PyTorch or torch.nn.functional API directly.