A Look at TensorFlow 2.9
Google released TensorFlow 2.9 which adds a few major features and improvements, and a lot of bug fixes and security updates.
Highlights include performance improvements with oneDNN, and the release of DTensor, a new API for model distribution that can be used to seamlessly move from data parallelism to model parallelism.
They’ve also made improvements to the core library, including Eigen and tf.function
unification, deterministic behavior, and new support for Windows' WSL2. Finally, we’re releasing new experimental APIs for tf.function retracing and Keras Optimizers.
You can view the full list of changes on the TensorFlow GitHub page (and download and install the latest version): TensorFlow 2.9.0
Let's take a closer look at some of these features.
Improved CPU performance: oneDNN by default
Working with Intel they integrated the oneDNN performance library with TensorFlow to achieve top performance on Intel CPUs. Since TensorFlow 2.5, TensorFlow has had experimental support for oneDNN, which could provide up to a 4x performance improvement. In TensorFlow 2.9, they are turning on oneDNN optimizations by default on Linux x86 packages and for CPUs with neural-network-focused hardware features such as AVX512_VNNI, AVX512_BF16, AMX, and others, which are found on Intel Cascade Lake and newer CPUs.
Users running TensorFlow with oneDNN optimizations enabled might observe slightly different numerical results from when the optimizations are off. This is because floating-point round-off approaches and order differ, and can create slight errors. If this causes issues for you, turn the optimizations off by setting TF_ENABLE_ONEDNN_OPTS=0
before running your TensorFlow programs. To enable or re-enable them, set TF_ENABLE_ONEDNN_OPTS=1
before running your TensorFlow program. To verify that the optimizations are on, look for a message beginning with "oneDNN custom operations are on"
in your program log.
Model parallelism with DTensor
DTensor is a new TensorFlow API for distributed model processing that allows models to seamlessly move from data parallelism to single program multiple data (SPMD) based model parallelism, including spatial partitioning. This means you have tools to easily train models where the model weights or inputs are so large they don't fit on a single device. (If you are familiar with Mesh TensorFlow in TF1, DTensor serves a similar purpose.)
DTensor is designed with the following principles at its core:
- A device-agnostic API: This allows the same model code to be used on CPU, GPU, or TPU, including models partitioned across device types.
- Multi-client execution: Removes the coordinator and leaves each task to drive its locally attached devices, allowing scaling a model with no impact to startup time.
- A global perspective vs. per-replica: Traditionally with TensorFlow, distributed model code is written around replicas, but with DTensor, model code is written from the global perspective and per replica code is generated and run by the DTensor runtime. Among other things, this means no uncertainty about whether batch normalization is happening at the global level or the per replica level.
TraceType for tf.function
This release also revamps the way tf.function retraces to make it simpler, predictable, and configurable.
All arguments of tf.function
are assigned a tf.types.experimental.TraceType.
Custom user classes can declare a TraceType
using the Tracing Protocol (tf.types.experimental.SupportsTracingProtocol
).
The TraceType
system makes it easy to understand retracing rules. For example, subtyping rules indicate what type of arguments can be used with particular function traces. Subtyping also explains how different specific shapes are joined into a generic shape that is their supertype, to reduce the number of traces for a function.
Support for WSL2
The Windows Subsystem for Linux lets developers run a Linux environment directly on Windows, without the overhead of a traditional virtual machine or dual boot setup. TensorFlow now supports WSL2 out of the box, including GPU acceleration.
Deterministic behavior
The API tf.config.experimental.enable_op_determinism
makes TensorFlow ops deterministic.
Determinism means that if you run an op multiple times with the same inputs, the op returns the exact same outputs every time. This is useful for debugging models, and if you train your model from scratch several times with determinism, your model weights will be the same every time. Normally, many ops are non-deterministic due to the use of threads within ops which can add floating-point numbers in a nondeterministic order.
TensorFlow 2.8 introduced an API to make ops deterministic, and TensorFlow 2.9 improved determinism performance in tf.data
in some cases. If you want your TensorFlow models to run deterministically, just add the following to the start of your program:
tf.keras.utils.set_random_seed(1) tf.config.experimental.enable_op_determinism()
The first line sets the random seed for Python, NumPy, and TensorFlow, which is necessary for determinism. The second line makes each TensorFlow op deterministic. Note that determinism in general comes at the expense of lower performance and so your model may run slower when op determinism is enabled.
Optimized Training with Keras
In TensorFlow 2.9, a new experimental version of the Keras Optimizer API is now available: tf.keras.optimizers.experimental. The API provides a more unified and expanded catalog of built-in optimizers which can be more easily customized and extended.
In a future release, tf.keras.optimizers.experimental.Optimizer
(and subclasses) will replace tf.keras.optimizers.Optimizer
(and subclasses), which means that workflows using the legacy Keras optimizer will automatically switch to the new optimizer. The current (legacy) tf.keras.optimizers.* API will still be accessible via tf.keras.optimizers.legacy.*, such as tf.keras.optimizers.legacy.Adam.
Here are some highlights of the new optimizer class:
- Incrementally faster training for some models.
- Easier to write customized optimizers.
- Built-in support for moving average of model weights ("Polyak averaging").
For most users, you will need to take no action.
Security Fixes
There were also a large number of security fixes included. TensorFlow 2.9:
- Fixes a code injection in
saved_model_cli
(CVE-2022-29216) - Fixes a missing validation which causes
TensorSummaryV2
to crash (CVE-2022-29193) - Fixes a missing validation which crashes
QuantizeAndDequantizeV4Grad
(CVE-2022-29192) - Fixes a missing validation which causes denial of service via
DeleteSessionTensor
(CVE-2022-29194) - Fixes a missing validation which causes denial of service via
GetSessionTensor
(CVE-2022-29191) - Fixes a missing validation which causes denial of service via
StagePeek
(CVE-2022-29195) - Fixes a missing validation which causes denial of service via
UnsortedSegmentJoin
(CVE-2022-29197) - Fixes a missing validation which causes denial of service via
LoadAndRemapMatrix
(CVE-2022-29199) - Fixes a missing validation which causes denial of service via
SparseTensorToCSRSparseMatrix
(CVE-2022-29198) - Fixes a missing validation which causes denial of service via
LSTMBlockCell
(CVE-2022-29200) - Fixes a missing validation which causes denial of service via
Conv3DBackpropFilterV2
(CVE-2022-29196) - Fixes a
CHECK
failure in depthwise ops via overflows (CVE-2021-41197) - Fixes issues arising from undefined behavior stemming from users supplying invalid resource handles (CVE-2022-29207)
- Fixes a segfault due to missing support for quantized types (CVE-2022-29205)
- Fixes a missing validation which results in undefined behavior in
SparseTensorDenseAdd
(CVE-2022-29206) - Fixes a missing validation which results in undefined behavior in
QuantizedConv2D
(CVE-2022-29201) - Fixes an integer overflow in
SpaceToBatchND
(CVE-2022-29203) - Fixes a segfault and OOB write due to incomplete validation in
EditDistance
(CVE-2022-29208) - Fixes a missing validation which causes denial of service via
Conv3DBackpropFilterV2
(CVE-2022-29204) - Fixes a denial of service in
tf.ragged.constant
due to lack of validation (CVE-2022-29202) - Fixes a segfault when
tf.histogram_fixed_width
is called with NaN values (CVE-2022-29211) - Fixes a core dump when loading TFLite models with quantization (CVE-2022-29212)
- Fixes crashes stemming from incomplete validation in signal ops (CVE-2022-29213)
- Fixes a type confusion leading to
CHECK
-failure based denial of service (CVE-2022-29209) - Fixes a heap buffer overflow due to incorrect hash function (CVE-2022-29210)
- Updates
curl
to7.83.1
to handle (CVE-2022-22576, (CVE-2022-27774, (CVE-2022-27775, (CVE-2022-27776, (CVE-2022-27778, (CVE-2022-27779, (CVE-2022-27780, (CVE-2022-27781, (CVE-2022-27782 and (CVE-2022-30115 - Updates
zlib
to1.2.12
after1.2.11
was pulled due to security issue
Feel free to contact us if you have any questions or take a look at our Deep Learning Solutions if you're interested in a workstation or server to run TensorFlow on.