| .. _freethreading-python-howto: |
| |
| ********************************* |
| Python support for free threading |
| ********************************* |
| |
| Starting with the 3.13 release, CPython has support for a build of |
| Python called :term:`free threading` where the :term:`global interpreter lock` |
| (GIL) is disabled. Free-threaded execution allows for full utilization of the |
| available processing power by running threads in parallel on available CPU cores. |
| While not all software will benefit from this automatically, programs |
| designed with threading in mind will run faster on multi-core hardware. |
| |
| Some third-party packages, in particular ones |
| with an :term:`extension module`, may not be ready for use in a |
| free-threaded build, and will re-enable the :term:`GIL`. |
| |
| This document describes the implications of free threading |
| for Python code. See :ref:`freethreading-extensions-howto` for information on |
| how to write C extensions that support the free-threaded build. |
| |
| .. seealso:: |
| |
| :pep:`703` – Making the Global Interpreter Lock Optional in CPython for an |
| overall description of free-threaded Python. |
| |
| |
| Installation |
| ============ |
| |
| Starting with Python 3.13, the official macOS and Windows installers |
| optionally support installing free-threaded Python binaries. The installers |
| are available at https://www.python.org/downloads/. |
| |
| For information on other platforms, see the `Installing a Free-Threaded Python |
| <https://py-free-threading.github.io/installing-cpython/>`_, a |
| community-maintained installation guide for installing free-threaded Python. |
| |
| When building CPython from source, the :option:`--disable-gil` configure option |
| should be used to build a free-threaded Python interpreter. |
| |
| |
| Identifying free-threaded Python |
| ================================ |
| |
| To check if the current interpreter supports free-threading, :option:`python -VV <-V>` |
| and :data:`sys.version` contain "free-threading build". |
| The new :func:`sys._is_gil_enabled` function can be used to check whether |
| the GIL is actually disabled in the running process. |
| |
| The ``sysconfig.get_config_var("Py_GIL_DISABLED")`` configuration variable can |
| be used to determine whether the build supports free threading. If the variable |
| is set to ``1``, then the build supports free threading. This is the recommended |
| mechanism for decisions related to the build configuration. |
| |
| |
| The global interpreter lock in free-threaded Python |
| =================================================== |
| |
| Free-threaded builds of CPython support optionally running with the GIL enabled |
| at runtime using the environment variable :envvar:`PYTHON_GIL` or |
| the command-line option :option:`-X gil`. |
| |
| The GIL may also automatically be enabled when importing a C-API extension |
| module that is not explicitly marked as supporting free threading. A warning |
| will be printed in this case. |
| |
| In addition to individual package documentation, the following websites track |
| the status of popular packages support for free threading: |
| |
| * https://py-free-threading.github.io/tracking/ |
| * https://hugovk.github.io/free-threaded-wheels/ |
| |
| |
| Thread safety |
| ============= |
| |
| The free-threaded build of CPython aims to provide similar thread-safety |
| behavior at the Python level to the default GIL-enabled build. Built-in |
| types like :class:`dict`, :class:`list`, and :class:`set` use internal locks |
| to protect against concurrent modifications in ways that behave similarly to |
| the GIL. However, Python has not historically guaranteed specific behavior for |
| concurrent modifications to these built-in types, so this should be treated |
| as a description of the current implementation, not a guarantee of current or |
| future behavior. |
| |
| .. note:: |
| |
| It's recommended to use the :class:`threading.Lock` or other synchronization |
| primitives instead of relying on the internal locks of built-in types, when |
| possible. |
| |
| |
| Known limitations |
| ================= |
| |
| This section describes known limitations of the free-threaded CPython build. |
| |
| Immortalization |
| --------------- |
| |
| In the free-threaded build, some objects are :term:`immortal`. |
| Immortal objects are not deallocated and have reference counts that are |
| never modified. This is done to avoid reference count contention that would |
| prevent efficient multi-threaded scaling. |
| |
| As of the 3.14 release, immortalization is limited to: |
| |
| * Code constants: numeric literals, string literals, and tuple literals |
| composed of other constants. |
| * Strings interned by :func:`sys.intern`. |
| |
| |
| Frame objects |
| ------------- |
| |
| It is not safe to access :attr:`frame.f_locals` from a :ref:`frame <frame-objects>` |
| object if that frame is currently executing in another thread, and doing so may |
| crash the interpreter. |
| |
| |
| Iterators |
| --------- |
| |
| It is generally not thread-safe to access the same iterator object from |
| multiple threads concurrently, and threads may see duplicate or missing |
| elements. |
| |
| |
| Single-threaded performance |
| --------------------------- |
| |
| The free-threaded build has additional overhead when executing Python code |
| compared to the default GIL-enabled build. The amount of overhead depends |
| on the workload and hardware. On the pyperformance benchmark suite, the |
| average overhead ranges from about 1% on macOS aarch64 to 8% on x86-64 Linux |
| systems. |
| |
| |
| Behavioral changes |
| ================== |
| |
| This section describes CPython behavioural changes with the free-threaded |
| build. |
| |
| |
| Context variables |
| ----------------- |
| |
| In the free-threaded build, the flag :data:`~sys.flags.thread_inherit_context` |
| is set to true by default which causes threads created with |
| :class:`threading.Thread` to start with a copy of the |
| :class:`~contextvars.Context()` of the caller of |
| :meth:`~threading.Thread.start`. In the default GIL-enabled build, the flag |
| defaults to false so threads start with an |
| empty :class:`~contextvars.Context()`. |
| |
| |
| Warning filters |
| --------------- |
| |
| In the free-threaded build, the flag :data:`~sys.flags.context_aware_warnings` |
| is set to true by default. In the default GIL-enabled build, the flag defaults |
| to false. If the flag is true then the :class:`warnings.catch_warnings` |
| context manager uses a context variable for warning filters. If the flag is |
| false then :class:`~warnings.catch_warnings` modifies the global filters list, |
| which is not thread-safe. See the :mod:`warnings` module for more details. |
| |
| |
| Increased memory usage |
| ---------------------- |
| |
| The free-threaded build will typically use more memory compared to the default |
| build. There are multiple reasons for this, mostly due to design decisions. |
| |
| |
| All interned strings are immortal |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| For modern Python versions (since version 2.3), interning a string (e.g. with |
| :func:`sys.intern`) does not cause it to become immortal. Instead, if the last |
| reference to that string disappears, it will be removed from the interned |
| string table. This is not the case for the free-threaded build and any interned |
| string will become immortal, surviving until interpreter shutdown. |
| |
| |
| Non-GC objects have a larger object header |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| The free-threaded build uses a different :c:type:`PyObject` structure. Instead |
| of having the GC related information allocated before the :c:type:`PyObject` |
| structure, like in the default build, the GC related info is part of the normal |
| object header. For example, on the AMD64 platform, ``None`` uses 32 bytes on |
| the free-threaded build vs 16 bytes for the default build. GC objects (such as |
| dicts and lists) are the same size for both builds since the free-threaded |
| build does not use additional space for the GC info. |
| |
| |
| QSBR can delay freeing of memory |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| In order to safely implement lock-free data structures, a safe memory |
| reclamation (SMR) scheme is used, known as quiescent state-based reclamation |
| (QSBR). This means that the memory backing data structures allowing lock-free |
| access will use QSBR, which defers the free operation, rather than immediately |
| freeing the memory. Two examples of these data structures are the list object |
| and the dictionary keys object. See ``InternalDocs/qsbr.md`` in the CPython |
| source tree for more details on how QSBR is implemented. Running |
| :func:`gc.collect` should cause all memory being held by QSBR to be actually |
| freed. Note that even when QSBR frees the memory, the underlying memory |
| allocator may not immediately return that memory to the OS and so the resident |
| set size (RSS) of the process might not decrease. |
| |
| |
| mimalloc allocator vs pymalloc |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| The default build will normally use the "pymalloc" memory allocator for small |
| allocations (512 bytes or smaller). The free-threaded build does not use |
| pymalloc and allocates all Python objects using the "mimalloc" allocator. The |
| pymalloc allocator has the following properties that help keep memory usage |
| low: small per-allocated-block overhead, effective memory fragmentation |
| prevention, and quick return of free memory to the operating system. The |
| mimalloc allocator does quite well in these respects as well but can have some |
| more overhead. |
| |
| In the free-threaded build, mimalloc manages memory in a number of separate |
| heaps (currently four). For example, all GC supporting objects are allocated |
| from their own heap. Using separate heaps means that free memory in one heap |
| cannot be used for an allocation that uses another heap. Also, some heaps are |
| configured to use QSBR (quiescent-state based reclamation) when freeing the |
| memory that backs up the heap (known as "pages" in mimalloc terminology). The |
| use of QSBR creates a delay between all memory blocks for a page being freed |
| and the memory page being released, either for new allocations or back to the |
| OS. |
| |
| The mimalloc allocator also defers returning freed memory back to the OS. You |
| can reduce that delay by setting the environment variable |
| :envvar:`!MIMALLOC_PURGE_DELAY` to ``0``. Note that this will likely reduce |
| the performance of the allocator. |
| |
| |
| Free-threaded reference counting can cause objects to live longer |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| In the default build, when an object's reference count reaches zero, it is |
| normally deallocated. The free-threaded build uses "biased reference |
| counting", with a fast-path for objects "owned" by the current thread and a |
| slow path for other objects. See :pep:`703` for additional details. Any time |
| an object's reference count ends up in a "queued" state, deallocation can be |
| deferred. The queued state is cleared from the "eval breaker" section of the |
| bytecode evaluator. |
| |
| The free-threaded build also allows a different mode of reference counting, |
| known as "deferred reference counting". This mode is enabled by setting a flag |
| on a per-object basis. Deferred reference counting is enabled for the |
| following types: |
| |
| * module objects |
| * module top-level functions |
| * class methods defined in the class scope |
| * descriptor objects |
| * thread-local objects, created by :class:`threading.local` |
| |
| When deferred reference counting is enabled, references from Python function |
| stacks are not added to the reference count. This scheme reduces the overhead |
| of reference counting, especially for objects used from multiple threads. |
| Because the stack references are not counted, objects with deferred reference |
| counting are not immediately freed when their internal reference count goes to |
| zero. Instead, they are examined by the next GC run and, if no stack |
| references to them are found, they are freed. This means these objects are |
| freed by the GC and not when their reference count goes to zero, as is typical. |
| |
| |
| Per-thread reference counting can delay freeing objects |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| To avoid contention on the reference count fields of frequently shared |
| objects, the free-threaded build also uses "per-thread reference counting" |
| for a few selected object types. Rather than updating a single shared |
| reference count, each thread maintains its own local reference count array, |
| indexed by a unique id assigned to the object. The true reference count is |
| only computed by summing the per-thread counts when the object's local |
| count drops to zero. Per-thread reference counting is currently used for: |
| |
| * heap type objects (classes created in Python) |
| * code objects |
| * the ``__dict__`` of module objects |
| |
| Because the per-thread counts must be merged back to the object before it |
| can be deallocated, objects using per-thread reference counting are |
| typically freed later than they would be in the default build. In |
| particular, such an object is usually not freed until the thread that |
| referenced it reaches a safe point (for example, in the "eval breaker" |
| section of the bytecode evaluator) or exits. Running :func:`gc.collect` |
| will merge the per-thread counts and allow these objects to be freed. |