"Fossies" - the Fresh Open Source Software Archive

Member "pytorch-1.8.2/docs/source/autograd.rst" (23 Jul 2021, 6975 Bytes) of package /linux/misc/pytorch-1.8.2.tar.gz:


As a special service "Fossies" has tried to format the requested source page into HTML format (assuming markdown format). Alternatively you can here view or download the uninterpreted source code file. A member file download can also be achieved by clicking within a package contents listing on the according byte size field.

Automatic differentiation package - torch.autograd

torch.autograd

torch.autograd

backward

grad

Functional higher level API

Warning

This API is in beta. Even though the function signatures are very unlikely to change, major improvements to performances are planned before we consider this stable.

This section contains the higher level API for the autograd that builds on the basic API above and allows you to compute jacobians, hessians, etc.

This API works with user-provided functions that take only Tensors as input and return only Tensors. If your function takes other arguments that are not Tensors or Tensors that don't have requires_grad set, you can use a lambda to capture them. For example, for a function f that takes three inputs, a Tensor for which we want the jacobian, another tensor that should be considered constant and a boolean flag as f(input, constant, flag=flag) you can use it as functional.jacobian(lambda x: f(x, constant, flag=flag), input).

torch.autograd.functional.jacobian

torch.autograd.functional.hessian

torch.autograd.functional.vjp

torch.autograd.functional.jvp

torch.autograd.functional.vhp

torch.autograd.functional.hvp

Locally disabling gradient computation

no_grad

enable_grad

set_grad_enabled

Default gradient layouts

When a non-sparse param receives a non-sparse gradient during torch.autograd.backward or torch.Tensor.backward param.grad is accumulated as follows.

If param.grad is initially None:

  1. If param's memory is non-overlapping and dense, .grad is created with strides matching param (thus matching param's layout).
  2. Otherwise, .grad is created with rowmajor-contiguous strides.

If param already has a non-sparse .grad attribute:

  1. If create_graph=False, backward() accumulates into .grad in-place, which preserves its strides.
  2. If create_graph=True, backward() replaces .grad with a new tensor .grad + new grad, which attempts (but does not guarantee) matching the preexisting .grad's strides.

The default behavior (letting .grads be None before the first backward(), such that their layout is created according to 1 or 2, and retained over time according to 3 or 4) is recommended for best performance. Calls to model.zero_grad() or optimizer.zero_grad() will not affect .grad layouts.

In fact, resetting all .grads to None before each accumulation phase, e.g.:

for iterations...
    ...
    for param in model.parameters():
        param.grad = None
    loss.backward()

such that they're recreated according to 1 or 2 every time, is a valid alternative to model.zero_grad() or optimizer.zero_grad() that may improve performance for some networks.

Manual gradient layouts

If you need manual control over .grad's strides, assign param.grad = a zeroed tensor with desired strides before the first backward(), and never reset it to None. 3 guarantees your layout is preserved as long as create_graph=False. 4 indicates your layout is likely preserved even if create_graph=True.

In-place operations on Tensors

Supporting in-place operations in autograd is a hard matter, and we discourage their use in most cases. Autograd's aggressive buffer freeing and reuse makes it very efficient and there are very few occasions when in-place operations actually lower memory usage by any significant amount. Unless you're operating under heavy memory pressure, you might never need to use them.

In-place correctness checks

All Tensor s keep track of in-place operations applied to them, and if the implementation detects that a tensor was saved for backward in one of the functions, but it was modified in-place afterwards, an error will be raised once backward pass is started. This ensures that if you're using in-place functions and not seeing any errors, you can be sure that the computed gradients are correct.

Variable (deprecated)

Warning

The Variable API has been deprecated: Variables are no longer necessary to use autograd with tensors. Autograd automatically supports Tensors with requires_grad set to True. Below please find a quick guide on what has changed:

In addition, one can now create tensors with requires_grad=True using factory methods such as torch.randn, torch.zeros, torch.ones, and others like the following:

autograd_tensor = torch.randn((2, 3, 4), requires_grad=True)

Tensor autograd functions

torch.Tensor

grad

requires_grad

is_leaf

backward

detach

detach

register_hook

retain_grad

Function

Function

Context method mixins

When creating a new Function, the following methods are available to ctx.

torch.autograd.function._ContextMethodMixin

Numerical gradient checking

gradcheck

gradgradcheck

Profiler

Autograd includes a profiler that lets you inspect the cost of different operators inside your model - both on the CPU and GPU. There are two modes implemented at the moment - CPU-only using ~torch.autograd.profiler.profile. and nvprof based (registers both CPU and GPU activity) using ~torch.autograd.profiler.emit_nvtx.

torch.autograd.profiler.profile

torch.autograd.profiler.emit_nvtx

torch.autograd.profiler.load_nvprof

Anomaly detection

detect_anomaly

set_detect_anomaly