Getting Started Guide

Relevant source files

This page provides quick-start instructions for building NCNN and running basic neural network inference. It covers the essential workflow from obtaining the library to executing your first model. For detailed platform-specific build configurations and toolchain options, see Platform Support and Build System. For in-depth information about the runtime architecture, see Core Runtime Architecture.

Overview of NCNN Inference Workflow

NCNN inference follows a simple three-phase pattern: load a model, create an inference session, and execute forward passes. The following diagram illustrates the fundamental data flow:

Sources: src/net.h27-164 src/net.cpp tests/test_squeezenet.cpp14-125

Building NCNN from Source

Prerequisites

Clone the repository with submodules:

Required dependencies:

CMake 2.8.12 or higher
C++ compiler (GCC 4.9+, Clang 3.4+, MSVC 2015+)
(Optional) Protobuf headers and compiler for model conversion tools
(Optional) Vulkan SDK for GPU support
(Optional) OpenMP or libomp for multi-threading
(Optional) OpenCV for example applications

On Ubuntu/Debian, install dependencies:

On Redhat/Centos:

Sources: docs/how-to-build/how-to-build.md1-8 docs/how-to-build/how-to-build.md42-62 README.md72

Basic Build on Linux/macOS

Common CMake options:

Option	Default	Description
`NCNN_VULKAN`	OFF	Enable Vulkan GPU support
`NCNN_OPENMP`	ON	Enable OpenMP multi-threading
`NCNN_BUILD_EXAMPLES`	ON	Build example programs (squeezenet, yolov5, etc)
`NCNN_BUILD_TOOLS`	ON	Build conversion tools (caffe2ncnn, onnx2ncnn)
`NCNN_BUILD_BENCHMARK`	ON	Build benchncnn performance tool
`NCNN_SHARED_LIB`	OFF	Build shared library instead of static
`NCNN_PIXEL`	ON	Enable image preprocessing functions
`NCNN_INT8`	ON	Enable INT8 quantization support
`NCNN_SIMPLEOCV`	OFF	Use built-in OpenCV replacement

For GPU builds, add -DNCNN_VULKAN=ON. For Python bindings, add -DNCNN_PYTHON=ON.

Sources: docs/how-to-build/how-to-build.md40-89 CMakeLists.txt61-125

Build with Vulkan GPU Support

Install Vulkan SDK first:

Build with Vulkan enabled:

The Vulkan driver is required at runtime. For AMD/Intel GPUs, Mesa provides Vulkan drivers. For NVIDIA GPUs, install the proprietary driver. See GPU and Vulkan System for device selection and configuration.

Sources: docs/how-to-build/how-to-build.md51-65 docs/how-to-build/how-to-build.md40-89

Windows Build with Visual Studio

Download and install Visual Studio Community 2017 or later. Start the x64 Native Tools Command Prompt:

For protobuf support (required for conversion tools), first build protobuf then link:

Windows Build with MinGW-w64

Download MinGW-w64 from winlibs or w64devkit add bin to PATH:

Sources: docs/how-to-build/how-to-build.md180-210 docs/how-to-build/how-to-build.md253-267

Model File Format

NCNN uses two files to represent a neural network:

.param: Text file containing layer types, parameters, and topology (human-readable)
.param.bin: Binary equivalent of .param (smaller, faster to load)
.bin: Binary file containing all network weights (32-bit aligned memory)

Models from other frameworks must be converted to NCNN format using conversion tools. See Model Conversion and Optimization for details on converting PyTorch, ONNX, Caffe, or TensorFlow models.

Sources: src/net.h60-109 docs/how-to-use-and-FAQ/ncnn-load-model.md1-9

Basic Inference in C++

The following diagram shows the relationship between core runtime classes:

Sources: src/net.h27-164 src/net.h167-245 src/mat.h src/option.h

Minimal Inference Example

Compile and link:

Key API methods:

Method	Description	Header Location
`Net::load_param(const char*)`	Load network structure from `.param` file	src/net.h72-76
`Net::load_model(const char*)`	Load weight data from `.bin` file	src/net.h92-96
`Net::create_extractor()`	Create inference session (thread-safe)	src/net.h131-133
`Extractor::input(const char*, const Mat&)`	Set input blob by name	src/net.h203-205
`Extractor::extract(const char*, Mat&)`	Get output blob by name	src/net.h209-211

Sources: examples/squeezenet.cpp46-108 src/net.h27-164 src/net.h167-245

Loading Models from Memory

NCNN supports zero-copy loading from memory for embedded deployments:

The ncnn2mem tool converts models to C++ header files for static embedding:

This generates:

squeezenet.id.h - Enum of blob names (e.g., squeezenet_v1_1_param_id::BLOB_data)
squeezenet.mem.h - Byte arrays squeezenet_v1_1_param_bin[] and squeezenet_v1_1_bin[]

Include in your program:

See Model Optimization Tools for additional model manipulation utilities.

Sources: src/net.h78-86 src/net.h98-108 tools/ncnn2mem.cpp1-150

Configuration Options

The ncnn::Option class controls runtime behavior:

Common configurations:

Scenario	Settings
CPU inference	`num_threads=4`, `use_vulkan_compute=false`
GPU inference	`use_vulkan_compute=true`, `lightmode=false`
Mobile (power-efficient)	`num_threads=2`, `lightmode=true`, `use_fp16_storage=true`
INT8 quantization	`use_int8_storage=true`, `use_int8_arithmetic=true`
Memory-constrained	`lightmode=true`, custom allocators with pooling

Option configuration in Net:

Option override in Extractor:

Sources: src/option.h src/net.h37 src/net.h184-198

Using the C API

NCNN provides a C89-compatible API for integration with non-C++ languages:

Key C API types:

ncnn_net_t - Opaque pointer to ncnn::Net (src/c_api.h166)
ncnn_extractor_t - Opaque pointer to ncnn::Extractor (src/c_api.h199)
ncnn_mat_t - Opaque pointer to ncnn::Mat (src/c_api.h80)
ncnn_option_t - Opaque pointer to ncnn::Option (src/c_api.h36)

All C API functions follow the naming pattern ncnn_<class>_<method>. See C API and Python Bindings for complete API reference.

Sources: src/c_api.h1-600 src/c_api.cpp1-1500 examples/squeezenet_c_api.cpp1-100 tests/test_c_api.cpp1-200

Python Bindings

NCNN provides Python bindings via pybind11 for seamless integration with Python ML pipelines.

Installation:

Basic usage example:

Image preprocessing utilities:

The Python API mirrors the C++ interface with Pythonic naming (use_vulkan_compute instead of use_vulkan_compute). For complete API documentation, see C API and Python Bindings.

Sources: python/src/ncnn_export.cpp1-200 python/README.md1-50

Verification: Running Examples

After building NCNN, verify the installation by running included example applications. The examples demonstrate various network types and preprocessing techniques.

SqueezeNet Image Classification

Expected output:

[0 AMD RADV FIJI (LLVM 10.0.1)]  queueC=1[4]  queueG=0[1]  queueT=0[1]
[0 AMD RADV FIJI (LLVM 10.0.1)]  bugsbn1=0  buglbia=0  bugcopc=0  bugihfa=0
[0 AMD RADV FIJI (LLVM 10.0.1)]  fp16p=1  fp16s=1  fp16a=0  int8s=1  int8a=1
532 = 0.163452  (n02123045 tabby, tabby cat)
920 = 0.093140  (n03584254 iPod)
716 = 0.061584  (n02971356 carton)

The output shows GPU device capabilities and top-3 ImageNet class predictions with confidence scores.

Additional Examples

Other example applications demonstrate different use cases:

Example	Purpose	Input	Output
`squeezenet`	Image classification	Image file	Top-K class predictions
`yolov5`	Object detection	Image file	Bounding boxes with labels
`arcface`	Face recognition	Image file	Face embeddings (512-dim vector)
`whisper`	Speech recognition	Audio file	Transcribed text
`ppocrv5`	OCR text recognition	Image file	Detected text regions and content

Sources: examples/squeezenet.cpp1-150 examples/arcface.cpp1-50 examples/whisper.cpp1-100 examples/ppocrv5.cpp1-150 docs/how-to-build/how-to-build.md144-156

Benchmarking Performance

Use benchncnn to measure inference performance across different models:

Arguments:

10 - Number of loops (more loops = more accurate timing)
4 - Number of CPU threads
0 - Powersave mode (0=disabled, 1=little cores only, 2=big cores only)
0 - GPU device index (0=first GPU, -1=CPU only)

Sample output:

num_threads = 4
powersave = 0
gpu_device = 0
cooling_down = 1
          squeezenet  min =    4.68  max =    4.99  avg =    4.85
     squeezenet_int8  min =   38.52  max =   66.90  avg =   48.52
           mobilenet  min =    7.12  max =    7.45  avg =    7.23
      mobilenet_int8  min =   51.68  max =   84.12  avg =   62.15

Times are in milliseconds. See Benchmarking System for performance analysis and optimization techniques.

Sources: benchmark/benchncnn.cpp1-500 docs/how-to-build/how-to-build.md158-176

Next Steps

To learn more about specific topics:

Platform-specific builds → Platform Support and Build System
Network loading API details → Network Loading and Inference Pipeline
Understanding Mat tensors → Mat and Tensor Data Structures
Memory management → Memory Allocators
GPU acceleration → GPU and Vulkan System
CPU optimizations → Platform-Specific Optimizations
Model conversion → Model Conversion and Optimization
Custom layers → Layer System and Registry

Common workflows:

Converting a PyTorch model: Use PNNX tool (PNNX PyTorch Converter)
Converting an ONNX model: Use onnx2ncnn tool (ONNX Converters)
Optimizing a model: Use ncnnoptimize tool (Model Optimization Tools)
Deploying to Android: See Android build instructions (Platform Support)

Sources: README.md560-576 docs/how-to-build/how-to-build.md1-1000

Getting Started Guide

Relevant source files

Overview of NCNN Inference Workflow

NCNN inference follows a simple three-phase pattern: load a model, create an inference session, and execute forward passes. The following diagram illustrates the fundamental data flow:

Sources: src/net.h27-164 src/net.cpp tests/test_squeezenet.cpp14-125

Building NCNN from Source

Prerequisites

Clone the repository with submodules:

Required dependencies:

CMake 2.8.12 or higher
C++ compiler (GCC 4.9+, Clang 3.4+, MSVC 2015+)
(Optional) Protobuf headers and compiler for model conversion tools
(Optional) Vulkan SDK for GPU support
(Optional) OpenMP or libomp for multi-threading
(Optional) OpenCV for example applications

On Ubuntu/Debian, install dependencies:

On Redhat/Centos:

Sources: docs/how-to-build/how-to-build.md1-8 docs/how-to-build/how-to-build.md42-62 README.md72

Basic Build on Linux/macOS

Common CMake options:

Option	Default	Description
`NCNN_VULKAN`	OFF	Enable Vulkan GPU support
`NCNN_OPENMP`	ON	Enable OpenMP multi-threading
`NCNN_BUILD_EXAMPLES`	ON	Build example programs (squeezenet, yolov5, etc)
`NCNN_BUILD_TOOLS`	ON	Build conversion tools (caffe2ncnn, onnx2ncnn)
`NCNN_BUILD_BENCHMARK`	ON	Build benchncnn performance tool
`NCNN_SHARED_LIB`	OFF	Build shared library instead of static
`NCNN_PIXEL`	ON	Enable image preprocessing functions
`NCNN_INT8`	ON	Enable INT8 quantization support
`NCNN_SIMPLEOCV`	OFF	Use built-in OpenCV replacement

For GPU builds, add -DNCNN_VULKAN=ON. For Python bindings, add -DNCNN_PYTHON=ON.

Sources: docs/how-to-build/how-to-build.md40-89 CMakeLists.txt61-125

Build with Vulkan GPU Support

Install Vulkan SDK first:

Build with Vulkan enabled:

Sources: docs/how-to-build/how-to-build.md51-65 docs/how-to-build/how-to-build.md40-89

Windows Build with Visual Studio

Download and install Visual Studio Community 2017 or later. Start the x64 Native Tools Command Prompt:

For protobuf support (required for conversion tools), first build protobuf then link:

Windows Build with MinGW-w64

Download MinGW-w64 from winlibs or w64devkit add bin to PATH:

Sources: docs/how-to-build/how-to-build.md180-210 docs/how-to-build/how-to-build.md253-267

Model File Format

NCNN uses two files to represent a neural network:

.param: Text file containing layer types, parameters, and topology (human-readable)
.param.bin: Binary equivalent of .param (smaller, faster to load)
.bin: Binary file containing all network weights (32-bit aligned memory)

Models from other frameworks must be converted to NCNN format using conversion tools. See Model Conversion and Optimization for details on converting PyTorch, ONNX, Caffe, or TensorFlow models.

Sources: src/net.h60-109 docs/how-to-use-and-FAQ/ncnn-load-model.md1-9

Basic Inference in C++

The following diagram shows the relationship between core runtime classes:

Sources: src/net.h27-164 src/net.h167-245 src/mat.h src/option.h

Minimal Inference Example

Compile and link:

Key API methods:

Method	Description	Header Location
`Net::load_param(const char*)`	Load network structure from `.param` file	src/net.h72-76
`Net::load_model(const char*)`	Load weight data from `.bin` file	src/net.h92-96
`Net::create_extractor()`	Create inference session (thread-safe)	src/net.h131-133
`Extractor::input(const char*, const Mat&)`	Set input blob by name	src/net.h203-205
`Extractor::extract(const char*, Mat&)`	Get output blob by name	src/net.h209-211

Sources: examples/squeezenet.cpp46-108 src/net.h27-164 src/net.h167-245

Loading Models from Memory

NCNN supports zero-copy loading from memory for embedded deployments:

The ncnn2mem tool converts models to C++ header files for static embedding:

This generates:

squeezenet.id.h - Enum of blob names (e.g., squeezenet_v1_1_param_id::BLOB_data)
squeezenet.mem.h - Byte arrays squeezenet_v1_1_param_bin[] and squeezenet_v1_1_bin[]

Include in your program:

See Model Optimization Tools for additional model manipulation utilities.

Sources: src/net.h78-86 src/net.h98-108 tools/ncnn2mem.cpp1-150

Configuration Options

The ncnn::Option class controls runtime behavior:

Common configurations:

Scenario	Settings
CPU inference	`num_threads=4`, `use_vulkan_compute=false`
GPU inference	`use_vulkan_compute=true`, `lightmode=false`
Mobile (power-efficient)	`num_threads=2`, `lightmode=true`, `use_fp16_storage=true`
INT8 quantization	`use_int8_storage=true`, `use_int8_arithmetic=true`
Memory-constrained	`lightmode=true`, custom allocators with pooling

Option configuration in Net:

Option override in Extractor:

Sources: src/option.h src/net.h37 src/net.h184-198

Using the C API

NCNN provides a C89-compatible API for integration with non-C++ languages:

Key C API types:

ncnn_net_t - Opaque pointer to ncnn::Net (src/c_api.h166)
ncnn_extractor_t - Opaque pointer to ncnn::Extractor (src/c_api.h199)
ncnn_mat_t - Opaque pointer to ncnn::Mat (src/c_api.h80)
ncnn_option_t - Opaque pointer to ncnn::Option (src/c_api.h36)

All C API functions follow the naming pattern ncnn_<class>_<method>. See C API and Python Bindings for complete API reference.

Sources: src/c_api.h1-600 src/c_api.cpp1-1500 examples/squeezenet_c_api.cpp1-100 tests/test_c_api.cpp1-200

Python Bindings

NCNN provides Python bindings via pybind11 for seamless integration with Python ML pipelines.

Installation:

Basic usage example:

Image preprocessing utilities:

The Python API mirrors the C++ interface with Pythonic naming (use_vulkan_compute instead of use_vulkan_compute). For complete API documentation, see C API and Python Bindings.

Sources: python/src/ncnn_export.cpp1-200 python/README.md1-50

Verification: Running Examples

After building NCNN, verify the installation by running included example applications. The examples demonstrate various network types and preprocessing techniques.

SqueezeNet Image Classification

Expected output:

[0 AMD RADV FIJI (LLVM 10.0.1)]  queueC=1[4]  queueG=0[1]  queueT=0[1]
[0 AMD RADV FIJI (LLVM 10.0.1)]  bugsbn1=0  buglbia=0  bugcopc=0  bugihfa=0
[0 AMD RADV FIJI (LLVM 10.0.1)]  fp16p=1  fp16s=1  fp16a=0  int8s=1  int8a=1
532 = 0.163452  (n02123045 tabby, tabby cat)
920 = 0.093140  (n03584254 iPod)
716 = 0.061584  (n02971356 carton)

The output shows GPU device capabilities and top-3 ImageNet class predictions with confidence scores.

Additional Examples

Other example applications demonstrate different use cases:

Example	Purpose	Input	Output
`squeezenet`	Image classification	Image file	Top-K class predictions
`yolov5`	Object detection	Image file	Bounding boxes with labels
`arcface`	Face recognition	Image file	Face embeddings (512-dim vector)
`whisper`	Speech recognition	Audio file	Transcribed text
`ppocrv5`	OCR text recognition	Image file	Detected text regions and content

Sources: examples/squeezenet.cpp1-150 examples/arcface.cpp1-50 examples/whisper.cpp1-100 examples/ppocrv5.cpp1-150 docs/how-to-build/how-to-build.md144-156

Benchmarking Performance

Use benchncnn to measure inference performance across different models:

Arguments:

10 - Number of loops (more loops = more accurate timing)
4 - Number of CPU threads
0 - Powersave mode (0=disabled, 1=little cores only, 2=big cores only)
0 - GPU device index (0=first GPU, -1=CPU only)

Sample output:

num_threads = 4
powersave = 0
gpu_device = 0
cooling_down = 1
          squeezenet  min =    4.68  max =    4.99  avg =    4.85
     squeezenet_int8  min =   38.52  max =   66.90  avg =   48.52
           mobilenet  min =    7.12  max =    7.45  avg =    7.23
      mobilenet_int8  min =   51.68  max =   84.12  avg =   62.15

Times are in milliseconds. See Benchmarking System for performance analysis and optimization techniques.

Sources: benchmark/benchncnn.cpp1-500 docs/how-to-build/how-to-build.md158-176

Next Steps

To learn more about specific topics:

Platform-specific builds → Platform Support and Build System
Network loading API details → Network Loading and Inference Pipeline
Understanding Mat tensors → Mat and Tensor Data Structures
Memory management → Memory Allocators
GPU acceleration → GPU and Vulkan System
CPU optimizations → Platform-Specific Optimizations
Model conversion → Model Conversion and Optimization
Custom layers → Layer System and Registry

Common workflows:

Converting a PyTorch model: Use PNNX tool (PNNX PyTorch Converter)
Converting an ONNX model: Use onnx2ncnn tool (ONNX Converters)
Optimizing a model: Use ncnnoptimize tool (Model Optimization Tools)
Deploying to Android: See Android build instructions (Platform Support)

Sources: README.md560-576 docs/how-to-build/how-to-build.md1-1000

Getting Started Guide

Overview of NCNN Inference Workflow

Building NCNN from Source

Prerequisites

Basic Build on Linux/macOS

Build with Vulkan GPU Support

Windows Build with Visual Studio

Windows Build with MinGW-w64

Model File Format

Basic Inference in C++

Minimal Inference Example

Loading Models from Memory

Configuration Options

Using the C API

Python Bindings

Verification: Running Examples

SqueezeNet Image Classification

Additional Examples

Benchmarking Performance

Next Steps

On this page

Getting Started Guide

Overview of NCNN Inference Workflow

Building NCNN from Source

Prerequisites

Basic Build on Linux/macOS

Build with Vulkan GPU Support

Windows Build with Visual Studio

Windows Build with MinGW-w64

Model File Format

Basic Inference in C++

Minimal Inference Example

Loading Models from Memory

Configuration Options

Using the C API

Python Bindings

Verification: Running Examples

SqueezeNet Image Classification

Additional Examples

Benchmarking Performance

Next Steps

On this page