This page documents Bun's implementation of Node.js-compatible Buffer objects and the underlying string handling infrastructure. It covers the type system for representing binary data and strings, encoding/decoding operations, and conversions between JavaScript and native representations.
Related Pages:
Sources: src/bun.js/node/types.zig1-812 src/string.zig1-150 src/bun.js/webcore/encoding.zig1-200
The BlobOrStringOrBuffer union allows APIs to accept multiple input types without explicit type checking in every call site.
| Type Variant | Purpose | Thread Safety |
|---|---|---|
blob | References a Blob with potential backing store | Single-threaded (store ref-counted) |
string_or_buffer | Delegates to StringOrBuffer | Depends on variant |
Key Methods:
fromJS() / fromJSAsync() - Create from JavaScript value with optional async copyingslice() - Get byte slice without ownership transferdeinit() / deinitAndUnprotect() - Cleanup resourcesSources: src/bun.js/node/types.zig1-133
Represents either a string (in various encodings) or a raw buffer.
Conversion Methods:
| Method | Purpose | Returns |
|---|---|---|
fromJS() | Convert from JS value (sync) | ?StringOrBuffer |
fromJSMaybeAsync() | Convert with thread-safety option | ?StringOrBuffer |
fromJSWithEncoding() | Convert with specific encoding | ?StringOrBuffer |
toJS() | Convert back to JavaScript value | JSValue |
toThreadSafe() | Make safe for cross-thread usage | void |
Sources: src/bun.js/node/types.zig135-339
Similar to StringOrBuffer but specialized for filesystem paths with validation.
Variants:
string: bun.PathStringbuffer: Bufferslice_with_underlying_stringthreadsafe_stringencoded_slicePath-Specific Methods:
sliceZ() - Get null-terminated pathsliceW() - Get UTF-16 path (Windows)osPath() - Get platform-appropriate path representationfromBunString() - Create from bun.String with validationSources: src/bun.js/node/types.zig526-760
Bun supports the following encodings, matching Node.js compatibility:
Encoding Mapping:
| Encoding | Node.js Aliases | Bit Width | Use Case |
|---|---|---|---|
utf8 | "utf-8", "utf8" | Variable (1-4 bytes) | Default text encoding |
utf16le | "ucs2", "ucs-2", "utf16le" | 16-bit | Unicode with BOM |
latin1 | "binary", "latin1" | 8-bit | ISO-8859-1 |
ascii | "ascii" | 7-bit | ASCII subset |
base64 | "base64" | N/A | Binary data encoding |
base64url | "base64url" | N/A | URL-safe base64 |
hex | "hex" | N/A | Hexadecimal encoding |
Sources: src/bun.js/node/types.zig344-493
The encoding layer provides functions for converting between encodings:
Core Functions:
| Function | Purpose | Source Type |
|---|---|---|
Bun__encoding__writeLatin1() | Write Latin-1 to buffer | [*]const u8 |
Bun__encoding__writeUTF16() | Write UTF-16 to buffer | [*]const u16 |
Bun__encoding__byteLengthLatin1AsUTF8() | Calculate UTF-8 size | [*]const u8 |
Bun__encoding__byteLengthUTF16AsUTF8() | Calculate UTF-8 size | [*]const u16 |
Bun__encoding__toString() | Convert to JS string | [*]const u8 |
Sources: src/bun.js/webcore/encoding.zig1-100
The native Buffer implementation resides in C++ and provides the foundation for Node.js Buffer compatibility.
Buffer Creation Functions:
allocBuffer(globalObject, byteLength)
→ Creates zero-filled JSUint8Array with Buffer subclass structure
allocBufferUnsafe(globalObject, byteLength)
→ Creates uninitialized JSUint8Array (faster, but may leak data)
createBuffer(globalObject, data, length)
→ Creates Buffer from existing memory with copy
Key Buffer Methods Implemented in C++:
| Method | Location | Purpose |
|---|---|---|
write() | jsBufferPrototypeFunction_write | Write string to buffer with encoding |
toString() | jsBufferPrototypeFunction_toString | Convert buffer to string with encoding |
slice() | jsBufferPrototypeFunction_slice | Create view into buffer |
indexOf() / lastIndexOf() | jsBufferPrototypeFunction_indexOf | Search for value in buffer |
fill() | jsBufferPrototypeFunction_fill | Fill buffer with value |
copy() | jsBufferPrototypeFunction_copy | Copy bytes between buffers |
Sources: src/bun.js/bindings/JSBuffer.cpp1-500
Additional Buffer methods are implemented in TypeScript for compatibility with Node.js edge cases.
Methods in JSBufferPrototype.ts:
| Method Category | Functions |
|---|---|
| Numeric Reads | readInt8, readUInt8, readInt16LE, readInt32BE, etc. |
| Numeric Writes | writeInt8, writeUInt8, writeFloatLE, writeDoubleBE, etc. |
| BigInt Support | readBigInt64LE, writeBigUInt64BE, etc. |
| String Operations | toString (override for edge cases) |
DataView Optimization:
The TypeScript layer caches a DataView instance for numeric operations:
Sources: src/js/builtins/JSBufferPrototype.ts1-100
Static methods on the Buffer constructor:
Sources: src/js/builtins/JSBufferConstructor.ts1-50
The bun.String type is a tagged union representing strings with different ownership and encoding characteristics.
String Tag Meanings:
| Tag | Memory Owner | Encodings | Thread-Safe | GC Integration |
|---|---|---|---|---|
Dead | N/A | N/A | N/A | Error state |
WTFStringImpl | JavaScriptCore | latin1, utf16le | No | Yes (ref-counted by JSC) |
ZigString | Zig/mimalloc | utf8, utf16le | If marked | No (manual management) |
StaticZigString | Static memory | utf8, utf16le | Yes | No (never freed) |
Empty | N/A | N/A | Yes | N/A |
Key Operations:
Sources: src/string.zig42-150
WTFStringImpl is JavaScriptCore's internal string representation:
WTF String Operations Exposed to Zig:
| Function | Purpose |
|---|---|
Bun__WTFStringImpl__ref() | Increment reference count |
Bun__WTFStringImpl__deref() | Decrement reference count |
Bun__WTFStringImpl__hasPrefix() | Check string prefix |
Sources: src/bun.js/bindings/BunString.cpp48-60 src/string.zig1-50
The conversion from JavaScript values to native types follows a type hierarchy:
Conversion Functions:
| From | To | Function | Async Support |
|---|---|---|---|
JSValue | StringOrBuffer | StringOrBuffer.fromJS() | Yes |
JSValue | BlobOrStringOrBuffer | BlobOrStringOrBuffer.fromJS() | Yes |
JSValue | PathLike | PathLike.fromJS() | Yes |
JSValue | Encoding | Encoding.fromJS() | No |
JSString | UTF-8 bytes | byteLength() | No |
Thread Safety Considerations:
When is_async = true is passed to conversion functions:
threadsafe_string variant.protect() called to prevent GCSources: src/bun.js/node/types.zig230-338
Converting native types back to JavaScript:
Conversion Logic for StringOrBuffer.toJS():
Sources: src/bun.js/node/types.zig175-196
Converting between different text encodings:
Encoding Size Calculation:
| Source | Target | Size Formula |
|---|---|---|
| Latin-1 | UTF-8 | Bun__encoding__byteLengthLatin1AsUTF8() |
| UTF-16 | UTF-8 | strings.elementLengthUTF16IntoUTF8() |
| Any | Base64 | base64.encodeLen(input) |
| Any | Base64URL | base64.urlSafeEncodeLen(input) |
| Any | Hex | input.len * 2 |
| Any | UTF-16 | input.len * 2 (approximation) |
Sources: src/bun.js/webcore/encoding.zig4-100
Writing data to buffers with encoding:
Write Function Implementations:
Each encoding has a specialized write function in the prototype:
| Method | Encoding | Implementation Location |
|---|---|---|
utf8Write() | UTF-8 | C++ → Bun__encoding__writeLatin1/UTF16() |
latin1Write() | Latin-1 | C++ → Bun__encoding__writeLatin1() |
asciiWrite() | ASCII (masked) | C++ → Bun__encoding__writeLatin1() |
base64Write() | Base64 | C++ → Bun__encoding__writeLatin1() |
base64urlWrite() | Base64URL | C++ → Bun__encoding__writeLatin1() |
hexWrite() | Hexadecimal | C++ → Bun__encoding__writeLatin1() |
ucs2Write() | UTF-16LE | C++ → Bun__encoding__writeUTF16() |
Sources: src/bun.js/bindings/JSBuffer.cpp377-414 test/js/node/buffer.test.js114-140
Converting buffer contents to string with encoding:
Implementation Flow:
start and end parameters (default to 0 and buffer.length)utf8)buffer[start:end]Bun__encoding__toString() with slice and encodingSources: src/bun.js/bindings/JSBuffer.cpp421-414 src/bun.js/webcore/encoding.zig76-90
Finding bytes or strings within buffers:
SIMD-Optimized Search:
Bun uses SIMD (Single Instruction Multiple Data) for fast buffer searching:
Sources: src/bun.js/bindings/JSBuffer.cpp86-88
Filling buffers with repeated values:
Fill Behavior:
Sources: src/bun.js/bindings/JSBuffer.cpp114
Windows paths require special handling due to different conventions:
Path Conversion Functions:
| Function | Purpose | Platform |
|---|---|---|
PathLike.osPath() | Get OS-native path | All |
PathLike.sliceZ() | Get null-terminated UTF-8 | POSIX |
PathLike.sliceW() | Get null-terminated UTF-16 | Windows |
strings.toWPath() | Convert UTF-8 → UTF-16 | Windows |
Windows Long Path Support:
Paths longer than 260 characters need the \\?\ prefix:
Sources: src/bun.js/node/types.zig583-627
Different allocation strategies optimize for different use cases:
Allocation Functions:
| Function | Zero-Filled | Pool | Use Case |
|---|---|---|---|
allocBuffer() | Always | Yes | Safe default |
allocBufferUnsafe() | Optional | Yes | Performance-critical |
allocUnsafeSlow() | Optional | No | Large buffers |
The global flag Bun__Node__ZeroFillBuffers controls whether allocUnsafe zero-fills:
This can be set via the --zero-fill-buffers command line flag for security-sensitive applications.
Sources: src/bun.js/bindings/JSBuffer.cpp83-254
Buffer operations validate inputs to prevent errors and match Node.js behavior:
Validation Functions:
| Function | Checks | Error Type |
|---|---|---|
validateOffset() | Type, integer, range | ERR_INVALID_ARG_TYPE, ERR_OUT_OF_RANGE |
parseEncoding() | Known encoding name | ERR_UNKNOWN_ENCODING |
Valid.pathBuffer() | Non-empty, max length | ENAMETOOLONG |
Valid.pathNullBytes() | No null bytes | ERR_INVALID_ARG_VALUE |
Sources: src/bun.js/bindings/JSBuffer.cpp293-316 src/bun.js/node/types.zig762-811
Bun uses Node.js-compatible error codes:
| Code | Meaning | Example |
|---|---|---|
ERR_INVALID_ARG_TYPE | Wrong argument type | Passing string where number expected |
ERR_OUT_OF_RANGE | Value outside valid range | Offset > buffer length |
ERR_UNKNOWN_ENCODING | Unsupported encoding | Unknown string passed as encoding |
ERR_INVALID_ARG_VALUE | Invalid value for argument | Path contains null bytes |
Sources: src/bun.js/bindings/JSBuffer.cpp23-45
Bun uses SIMD (Single Instruction Multiple Data) instructions for fast string operations:
Optimized Operations:
| Operation | Function | Benefit |
|---|---|---|
| Substring search | highway_memmem() | ~4-8x faster than naive search |
| Single char search | highway_index_of_char() | ~8-16x faster than loop |
| UTF-8 validation | isAllASCII() | SIMD checks multiple bytes |
| Case conversion | SIMD ASCII lowering | Parallel byte operations |
Sources: src/bun.js/bindings/JSBuffer.cpp86-88
When possible, Bun avoids copying string data:
Zero-Copy Scenarios:
Buffer.prototype.slice)StaticZigStringWTF::ExternalStringImplCopy Required:
toThreadSafe())Sources: src/string.zig42-150 src/bun.js/bindings/BunString.cpp1-100
The buffer allocation system may use memory pooling for small allocations:
Small Buffer (< 8KB):
→ Use pooled allocator
→ Fast allocation/deallocation
→ Reduced fragmentation
Large Buffer (≥ 8KB):
→ Direct mimalloc allocation
→ No pooling overhead
Refresh this wiki