The danling.tensor module provides utilities for handling tensors with variable lengths in batched operations.
The core feature is the NestedTensor class which allows efficient representation of sequences of different lengths without excessive padding.
In many deep learning tasks, especially those involving sequences (text, time series, etc.), each example in a batch may have a different length. Traditional approaches include:
Padding: Adding placeholder values to make all examples the same length (wastes computation)
# Get as a list of listsdata=nested.tolist()# Get as a tuple of (padded_tensor, mask)tensor,mask=nested[:]# Access individual itemsfirst_item=nested[0]# Returns the first tensor
fromtorch.utils.dataimportDataset,DataLoaderfromdanling.tensorimportPNTensorclassVariableLengthDataset(Dataset):def__init__(self,data):self.data=datadef__len__(self):returnlen(self.data)def__getitem__(self,idx):# Return a PNTensor, which will be automatically# collated into a NestedTensorreturnPNTensor(self.data[idx])# Example usagedataset=VariableLengthDataset([[1,2,3],[4,5],[6,7,8,9]])dataloader=DataLoader(dataset,batch_size=3)# The batches will be NestedTensor objectsforbatchindataloader:print(type(batch))# <class 'danling.tensor.nested_tensor.NestedTensor'>print(batch.tensor)# Padded tensorprint(batch.mask)# Mask
fromdanling.tensor.nested_tensorimportNestedTensorFuncRegistryimporttorch@NestedTensorFuncRegistry.implement(torch.softmax)defsoftmax(tensor,dim=-1):# Implement softmax for NestedTensorreturntensor.nested_like(torch.softmax(tensor.tensor,dim=dim))