"Fossies" - the Fresh Open Source Software Archive

Member "pytorch-1.8.2/docs/source/quantization-support.rst" (23 Jul 2021, 16785 Bytes) of package /linux/misc/pytorch-1.8.2.tar.gz:

As a special service "Fossies" has tried to format the requested source page into HTML format (assuming markdown format). Alternatively you can here view or download the uninterpreted source code file. A member file download can also be achieved by clicking within a package contents listing on the according byte size field. See also the last Fossies "Diffs" side-by-side code changes report for "quantization-support.rst": 1.11.0_vs_1.12.0.

Quantization Operation coverage

Quantized Tensors support a limited subset of data manipulation methods of the regular full-precision tensor. For NN operators included in PyTorch, we restrict support to:

  1. 8 bit weights (data_type = qint8)
  2. 8 bit activations (data_type = quint8)

Note that operator implementations currently only support per channel quantization for weights of the conv and linear operators. Furthermore the minimum and the maximum of the input data is mapped linearly to the minimum and the maximum of the quantized data type such that zero is represented with no quantization error.

Additional data types and quantization schemes can be implemented through the custom operator mechanism.

Many operations for quantized tensors are available under the same API as full float version in torch or torch.nn. Quantized version of NN modules that perform re-quantization are available in torch.nn.quantized. Those operations explicitly take output quantization parameters (scale and zero_point) in the operation signature.

In addition, we also support fused versions corresponding to common fusion patterns that impact quantization at: torch.nn.intrinsic.quantized.

For quantization aware training, we support modules prepared for quantization aware training at torch.nn.qat and torch.nn.intrinsic.qat

The following operation list is sufficient to cover typical CNN and RNN models

Quantized torch.Tensor operations

Operations that are available from the torch namespace or as methods on Tensor for quantized tensors:


Basic activations are supported.


Fused modules are provided for common patterns in CNNs. Combining several operations together (like convolution and relu) allows for better quantization accuracy


Layers for the quantization-aware training



Quantized version of standard NN layers.


Layers used in dynamically quantized models (i.e. quantized only on weights)


Functional versions of quantized NN layers (many of them accept explicit quantization output parameters)

Quantized dtypes and quantization schemes