CPU Optimization

Relevant source files

Purpose and Scope

This document covers CPU inference optimization strategies in PaddleOCR, including configuration of MKL-DNN acceleration, threading parameters, and build-time optimization options. These optimizations apply to both Python and C++ inference deployments when running on CPU hardware.

For GPU optimization and TensorRT acceleration, see NVIDIA GPU and TensorRT. For general inference configuration, see Python Inference System.

Overview of CPU Optimization Architecture

The CPU optimization system in PaddleOCR consists of three main components: runtime configuration parameters, Intel MKL-DNN (oneDNN) acceleration library integration, and threading/parallelization controls. These optimizations are configured through the predictor creation process.

Sources: tools/infer/utility.py177-435

Configuration Parameters

Python Inference Arguments

The CPU optimization parameters are defined in the argument parser and used during predictor creation:

Parameter	Type	Default	Description
`enable_mkldnn`	bool	None	Enable Intel MKL-DNN acceleration. When None, automatically determined based on model type
`cpu_threads`	int	10	Number of threads for CPU math library operations
`ir_optim`	bool	True	Enable inference IR optimizations
`precision`	str	"fp32"	Precision mode: "fp32" or "fp16" (with MKL-DNN BFloat16)

Key Implementation Points:

The enable_mkldnn parameter in tools/infer/utility.py133 controls whether MKL-DNN acceleration is enabled. When set to None (default), the system makes automatic decisions based on the model architecture.

The cpu_threads parameter in tools/infer/utility.py134 specifies the number of threads used by the CPU math library. This directly affects parallel computation during inference.

Sources: tools/infer/utility.py38-169

MKL-DNN Acceleration

What is MKL-DNN?

MKL-DNN (now called oneDNN) is Intel's open-source performance library for deep learning applications. It provides highly optimized implementations of common DNN operations specifically tuned for Intel architectures.

Activation and Configuration

Implementation Details:

The MKL-DNN configuration occurs in tools/infer/utility.py386-402 The key steps are:

Enable MKL-DNN: When args.enable_mkldnn is True, config.enable_mkldnn() is called at tools/infer/utility.py391
Set Cache Capacity: config.set_mkldnn_cache_capacity(10) at tools/infer/utility.py390 prevents memory leaks by caching 10 different shapes
BFloat16 Support: For FP16 precision, config.enable_mkldnn_bfloat16() is called at tools/infer/utility.py393
Disable Option: config.disable_mkldnn() at tools/infer/utility.py396 explicitly disables when requested

Sources: tools/infer/utility.py386-402

MKL-DNN Cache Management

The cache capacity setting is critical for preventing memory leaks:

This caches optimized kernels for up to 10 different input shapes. Without this limit, the cache could grow indefinitely as new shapes are encountered, leading to memory exhaustion in long-running inference scenarios.

Sources: tools/infer/utility.py389-391

Threading Configuration

CPU Thread Count

The number of CPU threads directly impacts parallel computation performance:

Configuration in Code:

The thread count is configured in tools/infer/utility.py398-402:

Thread Count Guidelines:

Default: 10 threads provides balanced performance for most scenarios
Low Thread Count (1-4): Reduces context switching overhead on systems with limited cores
High Thread Count (16+): Beneficial on high-core-count servers when processing large batches
Optimal Setting: Generally set to the number of physical CPU cores, not logical cores

Sources: tools/infer/utility.py398-402

C++ Deployment Optimizations

Build Configuration

For C++ inference, CPU optimizations are configured at compile time through CMake options:

CMakeLists.txt Configuration:

The primary CPU optimization options are defined in deploy/cpp_infer/CMakeLists.txt9:

Compiler Flags for CPU:

Linux/Unix flags in deploy/cpp_infer/CMakeLists.txt89-92:

Windows flags in deploy/cpp_infer/CMakeLists.txt72-78:

Sources: deploy/cpp_infer/CMakeLists.txt1-244

MKL vs OpenBLAS

The choice between Intel MKL and OpenBLAS affects performance characteristics:

Aspect	Intel MKL	OpenBLAS
Performance on Intel CPUs	Highly optimized	Good general performance
Performance on AMD CPUs	May underperform	Better cross-vendor support
License	Intel Simplified Software License	BSD License
Binary Size	Larger	Smaller
Tuning	Auto-tuned for Intel architectures	Generic optimizations

The MKL library is linked in deploy/cpp_infer/CMakeLists.txt136-158 when WITH_MKL=ON.

Sources: deploy/cpp_infer/CMakeLists.txt136-158

Build Script Example

The provided build script demonstrates CPU-only configuration in deploy/cpp_infer/tools/build.sh1-23:

Sources: deploy/cpp_infer/tools/build.sh1-23

Inference Pipeline Optimization Points

Optimization Integration Points

Key Optimization Points:

Disable GPU (tools/infer/utility.py386): config.disable_gpu() ensures CPU execution path
Pass Deletion (tools/infer/utility.py412-421): Remove incompatible optimization passes
New Executor (tools/infer/utility.py404-407): Enable new execution engine for better CPU performance
Memory Optimization (tools/infer/utility.py410): Reduce memory fragmentation

Sources: tools/infer/utility.py386-435

Practical Usage Examples

Python Inference with CPU Optimization

Example 1: Text Detection with MKL-DNN

Example 2: Text Recognition with Custom Thread Count

Sources: tools/infer/predict_det.py36-150 tools/infer/predict_rec.py39-203

C++ Inference Build Commands

Example: CPU-Only Build with MKL

Sources: deploy/cpp_infer/tools/build.sh1-23

Performance Tuning Guidelines

Thread Count Optimization

Decision Matrix:

Scenario	Recommended cpu_threads	Rationale
Single image inference on 4-core CPU	4	Match physical core count
Batch inference on 8-core CPU	8	Maximize parallelism
Multi-process deployment on 16-core CPU	4-8 per process	Leave cores for process scheduling
Real-time inference on embedded CPU	2-4	Balance latency vs power

Thread Scaling Behavior:

Undersubscription (threads < cores): Leaves CPU resources unused
Optimal (threads ≈ cores): Best throughput/latency balance
Oversubscription (threads > cores): Increases context switching overhead

Sources: tools/infer/utility.py398-402

MKL-DNN Usage Recommendations

When to Enable MKL-DNN:

✓ Recommended:

Intel x86-64 CPUs (especially Xeon and Core series)
Server deployment with stable workloads
Batch inference scenarios
Models with heavy convolution operations

✗ Not Recommended:

AMD CPUs (may have suboptimal performance)
ARM CPUs (not supported)
Dynamic input shapes with high variance (cache thrashing)
Extremely low-latency requirements (first-run compilation overhead)

Precision Selection:

Precision	MKL-DNN Config	Performance	Accuracy	Use Case
FP32	`enable_mkldnn()`	Baseline	Full precision	Default
BF16	`enable_mkldnn_bfloat16()`	~2x faster	Minimal loss	Intel CPUs with BF16 support

Sources: tools/infer/utility.py387-396

Memory Optimization

The memory optimizer in tools/infer/utility.py410 (config.enable_memory_optim()) provides:

Tensor reuse: Reduces memory allocation overhead
In-place operations: Minimizes memory copies
Memory planning: Pre-allocates memory for fixed-shape models

Best Practice: Always enable unless debugging memory-related issues.

Sources: tools/infer/utility.py410

Advanced Configuration

IR Pass Management

Certain IR optimization passes are incompatible with specific models or CPU execution modes. PaddleOCR explicitly removes problematic passes:

Pass Deletion Rationale:

conv_transpose_eltwiseadd_bn_fuse_pass: May cause numerical issues on CPU
matmul_transpose_reshape_fuse_pass: Incompatible with dynamic shapes
fc_fuse_pass: Causes errors in table recognition models

Sources: tools/infer/utility.py412-421

New IR and Executor

Modern PaddlePaddle versions support an improved IR (Intermediate Representation) and executor:

Benefits:

Better graph optimization opportunities
Improved CPU kernel fusion
Lower scheduling overhead

Compatibility: Feature-gated to ensure compatibility with older PaddlePaddle versions.

Sources: tools/infer/utility.py404-407

Troubleshooting

Common Issues

Issue 1: Low CPU utilization

Symptom: CPU usage stays below 50% during inference

Solution: Increase cpu_threads parameter to match CPU core count:

Issue 2: Memory leak during continuous inference

Symptom: Memory usage grows over time with varying input shapes

Cause: MKL-DNN cache grows unbounded

Solution: MKL-DNN cache capacity is automatically set to 10 in tools/infer/utility.py390 If issues persist, consider disabling MKL-DNN for dynamic shape workloads.

Issue 3: Slower performance on AMD CPUs

Symptom: Inference is slower than expected on AMD processors

Solution: Disable MKL-DNN and use OpenBLAS:

For C++ builds:

Issue 4: First inference much slower than subsequent runs

Symptom: Initial prediction takes 2-10x longer than later predictions

Cause: MKL-DNN kernel compilation and optimization on first run

Solution: Use warmup iterations in tools/infer/predict_system.py202-205:

Sources: tools/infer/utility.py389-391 tools/infer/predict_system.py202-205

Summary

CPU optimization in PaddleOCR provides substantial performance improvements through three primary mechanisms:

Intel MKL-DNN/oneDNN: Highly optimized kernels for Intel CPUs, controlled by enable_mkldnn flag with cache capacity management to prevent memory leaks
Threading Configuration: Adjustable CPU thread count via cpu_threads parameter (default 10), allowing optimization for different hardware and workload characteristics
Build-time Optimizations: CMake options (WITH_MKL, WITH_GPU=OFF) and compiler flags (-O3, OpenMP) for C++ deployment

Key Recommendations:

Enable MKL-DNN on Intel CPUs for 20-40% performance improvement
Set cpu_threads to physical core count for optimal throughput
Use warmup iterations to amortize first-run compilation overhead
Consider OpenBLAS for AMD CPU deployments

Sources: tools/infer/utility.py177-435 deploy/cpp_infer/CMakeLists.txt1-244

CPU Optimization

Relevant source files

Purpose and Scope

For GPU optimization and TensorRT acceleration, see NVIDIA GPU and TensorRT. For general inference configuration, see Python Inference System.

Overview of CPU Optimization Architecture

Sources: tools/infer/utility.py177-435

Configuration Parameters

Python Inference Arguments

The CPU optimization parameters are defined in the argument parser and used during predictor creation:

Parameter	Type	Default	Description
`enable_mkldnn`	bool	None	Enable Intel MKL-DNN acceleration. When None, automatically determined based on model type
`cpu_threads`	int	10	Number of threads for CPU math library operations
`ir_optim`	bool	True	Enable inference IR optimizations
`precision`	str	"fp32"	Precision mode: "fp32" or "fp16" (with MKL-DNN BFloat16)

Key Implementation Points:

The cpu_threads parameter in tools/infer/utility.py134 specifies the number of threads used by the CPU math library. This directly affects parallel computation during inference.

Sources: tools/infer/utility.py38-169

MKL-DNN Acceleration

What is MKL-DNN?

Activation and Configuration

Implementation Details:

The MKL-DNN configuration occurs in tools/infer/utility.py386-402 The key steps are:

Enable MKL-DNN: When args.enable_mkldnn is True, config.enable_mkldnn() is called at tools/infer/utility.py391
Set Cache Capacity: config.set_mkldnn_cache_capacity(10) at tools/infer/utility.py390 prevents memory leaks by caching 10 different shapes
BFloat16 Support: For FP16 precision, config.enable_mkldnn_bfloat16() is called at tools/infer/utility.py393
Disable Option: config.disable_mkldnn() at tools/infer/utility.py396 explicitly disables when requested

Sources: tools/infer/utility.py386-402

MKL-DNN Cache Management

The cache capacity setting is critical for preventing memory leaks:

Sources: tools/infer/utility.py389-391

Threading Configuration

CPU Thread Count

The number of CPU threads directly impacts parallel computation performance:

Configuration in Code:

The thread count is configured in tools/infer/utility.py398-402:

Thread Count Guidelines:

Default: 10 threads provides balanced performance for most scenarios
Low Thread Count (1-4): Reduces context switching overhead on systems with limited cores
High Thread Count (16+): Beneficial on high-core-count servers when processing large batches
Optimal Setting: Generally set to the number of physical CPU cores, not logical cores

Sources: tools/infer/utility.py398-402

C++ Deployment Optimizations

Build Configuration

For C++ inference, CPU optimizations are configured at compile time through CMake options:

CMakeLists.txt Configuration:

The primary CPU optimization options are defined in deploy/cpp_infer/CMakeLists.txt9:

Compiler Flags for CPU:

Linux/Unix flags in deploy/cpp_infer/CMakeLists.txt89-92:

Windows flags in deploy/cpp_infer/CMakeLists.txt72-78:

Sources: deploy/cpp_infer/CMakeLists.txt1-244

MKL vs OpenBLAS

The choice between Intel MKL and OpenBLAS affects performance characteristics:

Aspect	Intel MKL	OpenBLAS
Performance on Intel CPUs	Highly optimized	Good general performance
Performance on AMD CPUs	May underperform	Better cross-vendor support
License	Intel Simplified Software License	BSD License
Binary Size	Larger	Smaller
Tuning	Auto-tuned for Intel architectures	Generic optimizations

The MKL library is linked in deploy/cpp_infer/CMakeLists.txt136-158 when WITH_MKL=ON.

Sources: deploy/cpp_infer/CMakeLists.txt136-158

Build Script Example

The provided build script demonstrates CPU-only configuration in deploy/cpp_infer/tools/build.sh1-23:

Sources: deploy/cpp_infer/tools/build.sh1-23

Inference Pipeline Optimization Points

Optimization Integration Points

Key Optimization Points:

Disable GPU (tools/infer/utility.py386): config.disable_gpu() ensures CPU execution path
Pass Deletion (tools/infer/utility.py412-421): Remove incompatible optimization passes
New Executor (tools/infer/utility.py404-407): Enable new execution engine for better CPU performance
Memory Optimization (tools/infer/utility.py410): Reduce memory fragmentation

Sources: tools/infer/utility.py386-435

Practical Usage Examples

Python Inference with CPU Optimization

Example 1: Text Detection with MKL-DNN

Example 2: Text Recognition with Custom Thread Count

Sources: tools/infer/predict_det.py36-150 tools/infer/predict_rec.py39-203

C++ Inference Build Commands

Example: CPU-Only Build with MKL

Sources: deploy/cpp_infer/tools/build.sh1-23

Performance Tuning Guidelines

Thread Count Optimization

Decision Matrix:

Scenario	Recommended cpu_threads	Rationale
Single image inference on 4-core CPU	4	Match physical core count
Batch inference on 8-core CPU	8	Maximize parallelism
Multi-process deployment on 16-core CPU	4-8 per process	Leave cores for process scheduling
Real-time inference on embedded CPU	2-4	Balance latency vs power

Thread Scaling Behavior:

Undersubscription (threads < cores): Leaves CPU resources unused
Optimal (threads ≈ cores): Best throughput/latency balance
Oversubscription (threads > cores): Increases context switching overhead

Sources: tools/infer/utility.py398-402

MKL-DNN Usage Recommendations

When to Enable MKL-DNN:

✓ Recommended:

Intel x86-64 CPUs (especially Xeon and Core series)
Server deployment with stable workloads
Batch inference scenarios
Models with heavy convolution operations

✗ Not Recommended:

AMD CPUs (may have suboptimal performance)
ARM CPUs (not supported)
Dynamic input shapes with high variance (cache thrashing)
Extremely low-latency requirements (first-run compilation overhead)

Precision Selection:

Precision	MKL-DNN Config	Performance	Accuracy	Use Case
FP32	`enable_mkldnn()`	Baseline	Full precision	Default
BF16	`enable_mkldnn_bfloat16()`	~2x faster	Minimal loss	Intel CPUs with BF16 support

Sources: tools/infer/utility.py387-396

Memory Optimization

The memory optimizer in tools/infer/utility.py410 (config.enable_memory_optim()) provides:

Tensor reuse: Reduces memory allocation overhead
In-place operations: Minimizes memory copies
Memory planning: Pre-allocates memory for fixed-shape models

Best Practice: Always enable unless debugging memory-related issues.

Sources: tools/infer/utility.py410

Advanced Configuration

IR Pass Management

Certain IR optimization passes are incompatible with specific models or CPU execution modes. PaddleOCR explicitly removes problematic passes:

Pass Deletion Rationale:

conv_transpose_eltwiseadd_bn_fuse_pass: May cause numerical issues on CPU
matmul_transpose_reshape_fuse_pass: Incompatible with dynamic shapes
fc_fuse_pass: Causes errors in table recognition models

Sources: tools/infer/utility.py412-421

New IR and Executor

Modern PaddlePaddle versions support an improved IR (Intermediate Representation) and executor:

Benefits:

Better graph optimization opportunities
Improved CPU kernel fusion
Lower scheduling overhead

Compatibility: Feature-gated to ensure compatibility with older PaddlePaddle versions.

Sources: tools/infer/utility.py404-407

Troubleshooting

Common Issues

Issue 1: Low CPU utilization

Symptom: CPU usage stays below 50% during inference

Solution: Increase cpu_threads parameter to match CPU core count:

Issue 2: Memory leak during continuous inference

Symptom: Memory usage grows over time with varying input shapes

Cause: MKL-DNN cache grows unbounded

Solution: MKL-DNN cache capacity is automatically set to 10 in tools/infer/utility.py390 If issues persist, consider disabling MKL-DNN for dynamic shape workloads.

Issue 3: Slower performance on AMD CPUs

Symptom: Inference is slower than expected on AMD processors

Solution: Disable MKL-DNN and use OpenBLAS:

For C++ builds:

Issue 4: First inference much slower than subsequent runs

Symptom: Initial prediction takes 2-10x longer than later predictions

Cause: MKL-DNN kernel compilation and optimization on first run

Solution: Use warmup iterations in tools/infer/predict_system.py202-205:

Sources: tools/infer/utility.py389-391 tools/infer/predict_system.py202-205

Summary

CPU optimization in PaddleOCR provides substantial performance improvements through three primary mechanisms:

Intel MKL-DNN/oneDNN: Highly optimized kernels for Intel CPUs, controlled by enable_mkldnn flag with cache capacity management to prevent memory leaks
Threading Configuration: Adjustable CPU thread count via cpu_threads parameter (default 10), allowing optimization for different hardware and workload characteristics
Build-time Optimizations: CMake options (WITH_MKL, WITH_GPU=OFF) and compiler flags (-O3, OpenMP) for C++ deployment

Key Recommendations:

Enable MKL-DNN on Intel CPUs for 20-40% performance improvement
Set cpu_threads to physical core count for optimal throughput
Use warmup iterations to amortize first-run compilation overhead
Consider OpenBLAS for AMD CPU deployments

Sources: tools/infer/utility.py177-435 deploy/cpp_infer/CMakeLists.txt1-244

CPU Optimization

Purpose and Scope

Overview of CPU Optimization Architecture

Configuration Parameters

Python Inference Arguments

MKL-DNN Acceleration

What is MKL-DNN?

Activation and Configuration

MKL-DNN Cache Management

Threading Configuration

CPU Thread Count

C++ Deployment Optimizations

Build Configuration

MKL vs OpenBLAS

Build Script Example

Inference Pipeline Optimization Points

Optimization Integration Points

Practical Usage Examples

Python Inference with CPU Optimization

C++ Inference Build Commands

Performance Tuning Guidelines

Thread Count Optimization

MKL-DNN Usage Recommendations

Memory Optimization

Advanced Configuration

IR Pass Management

New IR and Executor

Troubleshooting

Common Issues

Summary

On this page

CPU Optimization

Purpose and Scope

Overview of CPU Optimization Architecture

Configuration Parameters

Python Inference Arguments

MKL-DNN Acceleration

What is MKL-DNN?

Activation and Configuration

MKL-DNN Cache Management

Threading Configuration

CPU Thread Count

C++ Deployment Optimizations

Build Configuration

MKL vs OpenBLAS

Build Script Example

Inference Pipeline Optimization Points

Optimization Integration Points

Practical Usage Examples

Python Inference with CPU Optimization

C++ Inference Build Commands

Performance Tuning Guidelines

Thread Count Optimization

MKL-DNN Usage Recommendations

Memory Optimization

Advanced Configuration

IR Pass Management

New IR and Executor

Troubleshooting

Common Issues

Summary

On this page