This page describes the internal implementation of Python's dictionary (dict) and set container types (set, frozenset), including their hash table structures, memory layouts, and optimization strategies. For information about the base object system and type hierarchy, see PyObject and Type System. For memory management and reference counting details, see Reference Counting and Garbage Collection.
CPython's dictionary and set implementations are built on compact hash tables. Since Python 3.6, dictionaries maintain insertion order while providing O(1) average-case lookup performance. Both dictionaries and sets use open addressing with linear probing for collision resolution.
Key Implementation Files:
Sources: Objects/dictobject.c9-106 Include/cpython/dictobject.h11-33 Include/internal/pycore_dict.h75-85
The dictionary implementation separates hash table management (PyDictKeysObject) from the top-level dictionary object (PyDictObject). This enables key sharing between multiple dictionary instances.
Dictionary Object (PyDictObject):
| Field | Type | Purpose |
|---|---|---|
ma_used | Py_ssize_t | Number of active entries in the dictionary |
ma_keys | PyDictKeysObject* | Pointer to the keys object (hash table) |
ma_values | PyDictValues* | Pointer to values array (NULL for combined tables) |
_ma_watcher_tag | uint64_t | Dict watchers, mutation counter, and unique ID |
Sources: Include/cpython/dictobject.h11-33
Keys Object (PyDictKeysObject):
| Field | Type | Purpose |
|---|---|---|
dk_refcnt | Py_ssize_t | Reference count for key sharing |
dk_log2_size | uint8_t | Log2 of hash table size |
dk_log2_index_bytes | uint8_t | Log2 of index size in bytes |
dk_kind | uint8_t | Kind: GENERAL, UNICODE, or SPLIT |
dk_version | uint32_t | Version for cache invalidation |
dk_usable | Py_ssize_t | Number of usable slots remaining |
dk_nentries | Py_ssize_t | Number of entries currently used |
dk_indices[] | variable | Hash table (int8/16/32/64 based on size) |
Sources: Include/internal/pycore_dict.h175-219
Sources: Objects/dictobject.c54-89 Include/internal/pycore_dict.h168-172
Combined Tables:
dk_entries array within PyDictKeysObjectdk_refcnt) is always 1Split Tables:
PyDictKeysObject (can have dk_refcnt > 1)PyDictValues array pointed to by ma_valuesDICT_KEYS_UNICODE or DICT_KEYS_SPLIT)SHARED_KEYS_MAX_SIZE)Sources: Objects/dictobject.c54-65 Include/internal/pycore_dict.h222-223
Sources: Objects/dictobject.c9-50 Objects/dictobject.c786-838
The compact design (introduced in Python 3.6) separates the hash table indices from the actual key-value entries:
dk_indices[]: Sparse hash table that maps hash values to entry indices
2^dk_log2_sizeDKIX_EMPTY (-1), DKIX_DUMMY (-2), or valid indices >= 0dk_entries[]: Dense array of actual entries
USABLE_FRACTION(dk_size) where USABLE_FRACTION(n) = (n * 2) / 3PyDictKeyEntry (general) or PyDictUnicodeEntry (unicode keys)Sources: Objects/dictobject.c33-50 Objects/dictobject.c507-558
Sources: Objects/dictobject.c320-412 Objects/dictobject.c220-248
Probing Algorithm:
i = hash(key) & mask where mask = size - 1i = ((5*i) + 1 + perturb) & mask where perturb >>= PERTURB_SHIFTPERTURB_SHIFT = 5This hybrid approach combines:
Sources: Objects/dictobject.c320-412
Sources: Objects/dictobject.c1847-2040 Objects/dictobject.c2042-2154
Insertion (setitem_lock_held):
dk_entries[] and update dk_indices[]USABLE_FRACTION)Deletion (delitem_lock_held):
DKIX_DUMMY (can't use DKIX_EMPTY or probing breaks)ma_used but not dk_nentries (entries array stays dense)Resizing:
used * 3 (or used * 4 for large dicts > 50000)PyDictKeysObject, rehashes all entriesSources: Objects/dictobject.c414-619 Objects/dictobject.c1427-1574
Sources: Include/internal/pycore_dict.h82-85 Objects/dictobject.c432-436
When all dictionary keys are unicode strings, CPython uses PyDictUnicodeEntry instead of PyDictKeyEntry:
me_hash field: Hash is retrieved from the unicode object via unicode_get_hash()unicode_eq() for string comparisonThis optimization is particularly beneficial for instance dictionaries (__dict__), which almost always have string keys (attribute names).
Sources: Objects/dictobject.c432-436 Include/internal/pycore_dict.h82-85
Sources: Objects/dictobject.c56-65 Include/internal/pycore_dict.h231-237
Split Table Mechanism:
ht_cached_keys pointing to a shared keys objectdk_refcnt > 1)ma_values array containing the attribute valuesBenefits:
Limitations:
SHARED_KEYS_MAX_SIZE)Sources: Objects/dictobject.c102-106 Include/internal/pycore_dict.h222-223
Sources: Include/internal/pycore_dict.h263-274 Objects/dictobject.c1087-1161
Version Tag (dk_version):
Watcher Tag (_ma_watcher_tag):
Dictionary Watchers:
PyDict_EVENT_ADDED, PyDict_EVENT_MODIFIED, PyDict_EVENT_DELETEDSources: Include/internal/pycore_dict.h269-282 Objects/dictobject.c1087-1161
Sources: Objects/setobject.c1-32 Objects/setobject.c66-70
Set Structure:
| Field | Type | Purpose |
|---|---|---|
fill | Py_ssize_t | Total used slots (active + dummy) |
used | Py_ssize_t | Number of active elements |
mask | Py_ssize_t | Hash table size minus 1 |
table | setentry* | Pointer to hash table |
hash | Py_hash_t | Cached hash (frozenset only, -1 if not computed) |
weakreflist | PyObject* | List of weak references |
Set Entry:
hash: Hash value of the keykey: Pointer to the Python objectUnlike dictionaries, sets don't need value storage, only keys.
Sources: Objects/setobject.c66-70 Include/setobject.h
Sources: Objects/setobject.c220-248 Objects/setobject.c252-335
Probing Strategy:
i = (i*5 + 1 + perturb) & maskSet Operations:
hash = -1)Free-Threaded Considerations:
SET_MARK_SHARED) for concurrent accessSources: Objects/setobject.c75-136 Objects/setobject.c413-465
Sources: Objects/setobject.c171-193 Objects/setobject.c417-420
Key Differences:
| Feature | set | frozenset |
|---|---|---|
| Mutability | Mutable | Immutable |
| Hashable | No (__hash__ is None) | Yes (hash cached) |
| Dict key | Cannot be used | Can be used |
| Thread safety | Requires locking | Lock-free reads |
| Operations | Add, remove, update, etc. | Query only |
Frozenset Hash Computation:
set->hashSources: Objects/setobject.c728-773
Sources: Objects/dictobject.c158-286 Objects/dictobject.c1161-1225
Locking Strategy:
Combined Tables:
Py_BEGIN_CRITICAL_SECTION(dict)Split Tables:
dk_mutex, shared read-only accessQSBR (Quiescent State-Based Reclamation):
_PyMem_FreeDelayed()Atomic Operations:
STORE_INDEX, LOAD_INDEX: Atomic index array accessFT_ATOMIC_STORE_PTR_RELEASE, FT_ATOMIC_LOAD_PTR_ACQUIRE: Entry pointersSources: Objects/dictobject.c158-286 Objects/dictobject.c177-178
Sources: Objects/setobject.c75-95 Objects/setobject.c436-464
Set Thread Safety:
Mutable Sets:
Frozensets:
set_compare_frozenset() without locksLock-Free Reads:
Sources: Objects/setobject.c75-136 Objects/setobject.c436-464
| Operation | Average Case | Worst Case | Notes |
|---|---|---|---|
dict[key] | O(1) | O(n) | With good hash function |
dict[key] = val | O(1) | O(n) | May trigger resize |
del dict[key] | O(1) | O(n) | No reordering |
key in dict | O(1) | O(n) | Hash lookup |
dict.get(key) | O(1) | O(n) | Same as __getitem__ |
set.add(x) | O(1) | O(n) | May trigger resize |
set.remove(x) | O(1) | O(n) | Must find element |
x in set | O(1) | O(n) | Hash lookup |
Sources: Objects/dictobject.c320-412 Objects/setobject.c220-248
Dictionary:
PyDictObject + PyDictKeysObject headerPyDictKeyEntry)PyDictUnicodeEntry)Set:
PySetObjectsetentry: hash + key pointer)Memory Sharing:
Sources: Objects/dictobject.c786-838 Objects/setobject.c483-501
Sources: Include/internal/pycore_dict.h90-104 Objects/typeobject.c40-56
Version-Based Specialization:
dk_version in inline cachesSplit Key Optimization:
LOAD_ATTR_SPLIT_KEYS: Direct index into ma_values arraySTORE_ATTR_SPLIT_KEYS: Direct assignment without hash lookupSources: Include/internal/pycore_dict.h90-104
CPython's dictionary and set implementations leverage sophisticated hash table techniques:
These optimizations make dictionaries and sets extremely efficient for typical Python workloads while maintaining O(1) average-case performance.
Key Implementation Files:
Refresh this wiki