import torch
import torch.nn as nn
import torch.nn.functional as F
print('PyTorch version:', torch.__version__)PyTorch version: 2.12.0+xpu
jshn9515
2026-03-19
2026-04-04
In Section 2.1, we answered one question: how are gradients computed? During forward propagation, the framework keeps a ledger; during backpropagation, it checks that ledger. Autograd builds the computation graph and then propagates gradients backward along the graph.
But when we actually write code, we quickly run into another, more practical question: should this ledger be kept at all?
During training, of course it should, because we need backpropagation. But during validation, inference, feature extraction, or when we just want to run a model once to inspect the output, keeping that ledger is wasteful. It stores intermediate results, builds a computation graph, consumes memory, and may accidentally drag a block of code that was meant to compute only values into the backward pass.
So in this section, we shift perspective. Instead of discussing how to differentiate, we discuss which computations Autograd records and which ones it ignores. PyTorch provides several very direct switches for this: torch.no_grad(), torch.enable_grad(), and the more inference-oriented torch.inference_mode(). They do not change the numerical results you compute, but they do change whether the computation has a graph, whether it can participate in backpropagation, and how much memory and overhead it incurs.
This also reflects one of PyTorch’s design principles: how a computation is performed is the operator’s job, while whether that computation is recorded is Autograd’s job. We begin with the most common one, no_grad(), and use it to understand these gradient-recording modes.
PyTorch version: 2.12.0+xpu
torch.no_grad(): Pause the BookkeepingBy default, as long as a tensor has requires_grad=True and we perform operations on it, PyTorch automatically builds a computation graph. In other words, as long as you are computing inside a differentiable region, Autograd silently records the ledger for you. But sometimes we do not need that ledger at all.
For example, when validating model performance, we usually do not need gradients, because we are not going to run backpropagation. Or during inference, we only care about the model’s output and not about how it was computed. In such cases, continuing to let Autograd keep records not only wastes memory but may also hurt performance. If Autograd is still building a computation graph, then that extra work is unnecessary.
That is why PyTorch provides the torch.no_grad() context manager, which can also be used as a function decorator. It lets us explicitly tell Autograd: inside this code block, we do not need you to keep records.
Let us first look at the most direct comparison. In the default mode:
`y_pred.requires_grad` before `no_grad()`: True
The output will be True, because model parameters have requires_grad=True by default, so the result automatically enters the computation graph.
Now let us put the same forward pass inside no_grad():
`y_pred.requires_grad` inside `no_grad()`: False
This time the output is False.
Notice that inside no_grad(), the forward computation still runs normally, but the resulting tensors are no longer tracked by Autograd. And once a tensor is no longer being tracked, all subsequent computations based on it are no longer tracked either. If we call backward() on such an untracked tensor, PyTorch raises an error, because that tensor simply is not in the computation graph and therefore cannot participate in backpropagation.
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Here, loss is not itself computed inside no_grad(), but it is computed from y_pred, and y_pred is already untracked. In addition, the other input y does not request gradients either. So loss is also excluded from the whole computation graph. If we call backward() on this loss, PyTorch will raise an error.
Some people think that no_grad() changes certain tensors so that their requires_grad attribute becomes False, but that is not what happens. no_grad() only tells Autograd not to track computations inside this code block; it does not modify the tensor’s own attributes. You can create a tensor outside no_grad() whose requires_grad is still True, and that does not prevent it from being treated as an ordinary tensor inside no_grad(), with no tracking.
`x.requires_grad` inside `no_grad()`: True
`y_pred.requires_grad` inside `no_grad()`: False
So no_grad() does not prevent a tensor from being differentiable in principle. It only prevents newly created computations in that context from being recorded. In other words, a tensor’s requires_grad attribute is a kind of capability declaration saying “I am eligible to be tracked,” while no_grad() is a behavior control saying “in this context, do not track any computation.” These two are independent.
In addition, if we create a new tensor inside no_grad() and later want it to re-enter the automatic differentiation system, we can still do that by calling requires_grad_(). For example:
`x.requires_grad` inside `no_grad()`: False
`x.requires_grad` after `requires_grad_()`: True
In other words, no_grad() turns recording off temporarily rather than permanently stripping a tensor of its ability to participate in differentiation. Internally, PyTorch still maintains a set of counters so that when we later need to turn gradient recording back on, it can restore the correct state. But this still introduces some computational and memory overhead. This will form an important contrast with inference_mode(), which we discuss next. In inference_mode(), PyTorch not only stops tracking, but also fully disables some Autograd-related functionality, making it impossible to re-enable gradient recording later by calling requires_grad_().
If we understand no_grad() from a more low-level perspective, we see that in PyTorch, computation is numerical behavior, while recording is behavior of the automatic differentiation system. no_grad() affects only the latter. That is why we often use it in model validation, inference deployment, and parameter updates.
The next natural question is this: if gradients can be turned off, can they be turned back on locally? What if, during inference, one small step suddenly needs gradients? This leads us to the next section: torch.enable_grad().
torch.enable_grad(): Start the Bookkeeping AgainIn the previous section, we saw that no_grad() lets Autograd pause recording. A natural next question is: if we are already inside no_grad(), can we re-enable gradients for only a small part of the computation?
The answer is yes. That is exactly what enable_grad() is for.
Of course, we can also use enable_grad() in an outer scope and then use no_grad() in an inner scope to turn gradients off again. These modes can be nested freely. However, in the default mode, wrapping code in enable_grad() is equivalent to doing nothing, so people often omit it.
As usual, let us begin with a simple example:
x = torch.randn(10, 6, requires_grad=True)
with torch.no_grad():
y = x * 3 # Does not record computation graph
print('`y.requires_grad` in `no_grad()`:', y.requires_grad)
with torch.enable_grad():
z = x * 4 # Enables gradient tracking
print('`z.requires_grad` in `enable_grad()`:', z.requires_grad)
# Only z will have gradients tracked
z.backward(gradient=torch.ones_like(z))`y.requires_grad` in `no_grad()`: False
`z.requires_grad` in `enable_grad()`: True
What happens here is very important: the outer no_grad() turns automatic differentiation recording off, while the inner enable_grad() restores recording locally. And after we exit the inner enable_grad(), the outer no_grad() is still in effect, so subsequent computations return to the untracked state. This shows that gradient modes are managed in a stack-like way. Entering a context pushes a mode; exiting that context restores the previous one.
Why does this matter?
In many cases, our code paths are shared. For example, most of a forward pass during inference may not need gradients, but some intermediate step might need a sensitivity analysis. Or perhaps some debugging code wants to compute a gradient temporarily. Without enable_grad(), we would have to split the entire code path apart, or keep switching the state at a broader scope. With enable_grad(), however, we can turn recording on locally exactly where needed, without affecting the overall inference flow.
There is also a more general interface, torch.set_grad_enabled(), which accepts a Boolean argument and directly sets the current gradient mode. In fact, no_grad() and enable_grad() are just special cases of this more general interface.
When is_training=True, it is equivalent to enable_grad(); when is_training=False, it is equivalent to no_grad(). This makes the code logic more uniform and makes conditional control easier to write.
At this point, we have already introduced two common gradient-control contexts: no_grad() and enable_grad(). They are used to turn gradient recording off and on, respectively, and they can be nested to form a flexible stack-based management system. Next, we introduce a context aimed more specifically at inference optimization: torch.inference_mode(), which goes even further than no_grad() in performance and memory efficiency.
torch.inference_mode(): Do Not Keep the Ledger at All from Now OnIn the previous two sections, we already built a fairly flexible mechanism:
no_grad() can turn gradient recording off;enable_grad() can restore gradient recording locally;set_grad_enabled() is a more general interface that directly sets the current gradient mode;On the surface, this already seems sufficient. So why does PyTorch still provide inference_mode()?
The answer lies in a deeper question: if we not only know that the current computation does not need gradients, but also know that it can never participate in backpropagation in the future, then can the framework be more aggressive? Can it eliminate all overhead related to gradients altogether?
That is the design motivation behind inference_mode()1.
Inside no_grad(), PyTorch still maintains version counters, view tracking, and some internal checks used to ensure gradient correctness. These mechanisms are necessary during training. They can prevent in-place operations from corrupting the graph structure, or prevent shared memory from leading to incorrect gradients. But in pure inference, they become extra overhead. Since the result of this code block will never participate in gradient computation, the framework can stop maintaining gradient-related version checks and view tracking, and perform more aggressive memory optimization. That is why inference_mode() is usually faster and more memory-efficient than no_grad().
However, it is irreversible.
We already know that tensors created inside no_grad() can later have gradient tracking re-enabled:
`x.requires_grad` after `requires_grad_`: True
But if a tensor is created inside inference_mode(), and we try to set requires_grad=True on it, PyTorch raises an error immediately:
RuntimeError: Setting requires_grad=True on inference tensor outside InferenceMode is not allowed.
This is because inference_mode() does not merely turn recording off temporarily. Instead, it creates a special kind of tensor called an inference tensor. This kind of tensor is marked as “it will never enter the automatic differentiation system.” Even if you later turn gradient mode back on, such tensors will still not be included in the computation graph. So no_grad() is a temporary shutdown, while inference_mode() is a permanent shutdown. If we are sure that a block of code will only ever be used for inference, then inference_mode() is the right tool.
At this point, we have effectively seen three different gradient semantics: the default mode, no_grad() mode, and inference_mode() mode. They represent three levels of semantic commitment, and they correspond to different performance trade-offs under different usage scenarios.
In the default mode, Autograd must assume that any current computation may later participate in backpropagation. Therefore it will:
This is the default mode. It is flexible, but it is also the most expensive. It is typically used for forward propagation during model training.
When we enter no_grad(), we are making a temporary statement: this block of computation does not participate in backpropagation right now.
Accordingly, in this mode Autograd can make some optimizations:
This is a temporary shutdown. The flexibility is still there, but the performance has already improved noticeably. It is mostly used for validation or model evaluation.
By contrast, inference_mode() is a stronger commitment: this block of computation will never participate in gradient computation.
Based on that premise, Autograd can make more aggressive optimizations:
This is an irreversible shutdown. It is the most aggressively optimized mode, but also the most restrictive. It is suitable for pure inference, model evaluation, and data processing.
inference_mode() was introduced in PyTorch 1.9 specifically for performance optimization during inference. For details of the implementation, see RFC-0011-InferenceMode.↩︎