Optimize Age Verification for Enhanced Performance [omron-subnet-competition-1] #975

JSnobody · 2025-04-27T16:12:09Z

Optimization of AI Model Proving Time and Memory Usage in Age Verification Competition 🚀

Background

This PR addresses the optimization of age verification as part of the EZKL competition. The goal is to enhance the efficiency and accuracy of the age verification process through model and circuit optimizations.

Memory Constraints: The original model provided for the competition, age.onnx, requires an enormous amount of memory during circuit generation and proof computation, exceeding 128GB. The objective is to ensure it can run effectively on machines with at least 128GB of RAM. The competition recommends using the Mac2-M1 Ultra (AWS) with 128GB RAM.
Prove Time Limitations: The proof generation time for the original model is excessively long, necessitating significant optimization. The competition recommends using the Apple M1 Ultra (20-cores) and Metal for optimal performance.
Precision Maintenance: To reduce memory usage and accelerate proof generation, adjustments to the circuit are required while ensuring a high level of accuracy is maintained.

Environment

Hardware Configuration

This PR includes optimizations to the model and circuitry, significantly reducing memory requirements. As a result, hardware configurations with only a few GB of RAM are sufficient. There are no specific requirements for the CPU. GPUs can also be used.🖥️

Software Configuration

The actual software environment for running this PR is:

Python Version: 3.12

Python Packages:

pip install opencv-python numpy onnxruntime torch torchvision torchaudio requests tqdm

Branch: Based on v19.0.7

Notice ⚠️

Due to the version used for the Omron subnet competition on Bittensor being v19.0.7, this PR is developed based on v19.0.7 and needs to be tested and validated on v19.0.7. Other versions have not been validated yet. ❗

Changes Made

AI Model Optimization:
- Conducted optimizations on the original model age.onnx, including pruning and restructuring to create a minimal network model that improves performance.
Circuit Generation:
- We achieved a balance between model inference accuracy, memory usage during proof, and proof time by adjusting the following parameters for optimal performance:
  1. num_inner_cols: Influences the number of advice polynomials during layout.
  2. input_scale and param_scale: Used for quantizing weights, biases, and input images from floating-point to finite field representations. The quantization is defined by the equation: in_f = round(in_float32 * (1 << input_scale)) mod q, where q is the order of the finite field. Smaller scales may reduce accuracy, while larger scales increase computational load during inference.
  3. scale_rebase_multiplier: After quantization, multiplication operations (e.g., convolutions) may increase output scale, potentially leading to proof failures. For instance, leaky_relu requires outputs to fit within a base 16384 representation of length 2; exceeding 16384^2 results in failure. To mitigate this, we introduce division operations based on scale_rebase_multiplier. If op_out_scale > (input_scale * scale_rebase_multiplier), a division op is added. A larger multiplier can cause proof failures, while a smaller one adds more division ops, increasing trace length and memory consumption.
Through extensive experimentation and analysis, we identified optimal combinations of num_inner_cols, input_scale, param_scale, and scale_rebase_multiplier to achieve the best overall performance.
Performance Enhancements:
- Improved the Halo2 multi-open proof mechanism to enhance overall performance. We found that the current Halo2 multi-open proof implementation incurs unnecessary large memory allocations and redundant computations. By applying straightforward mathematical reasoning, we modified the computational process within the proof to avoid these inefficiencies, resulting in approximately a 10% reduction in proof time.
  - Related PR: Optimize CPU Algorithm for Halo2 Multi-Open Proof

Modified Files

File	Status	Description
./example/notebooks/age_verification_optimize.ipynb	Added	Download optimized circuits and provide a complete Age Verification Benchmark test.

Benchmark Process

Download the Original Model 📥
Download the Test Image Dataset and resize the images 📷✨
Download Several Optimized Circuits 🔌
Randomly Select 10 Resized Images for each circuit:
- Test Accuracy ✅
- Measure Proof Time ⏱️
- Evaluate Proof Size 📏
- Calculate the average for the 10 images 📊
- You also can set the number of test images by yourself according to your requirements.

Note: We referenced and cited the testing methods and code from Omron's benchmark. Memory usage can be monitored using system commands. 🖥️

Execution Instructions

Compile ezkl:
- If you wish to utilize GPU acceleration for enhanced performance, execute the following command:
```
cargo build --release --bin ezkl --features ezkl,icicle
```
- This will compile the ezkl binary with GPU support, optimizing the execution of your computations.
Run the Optimization:
- To run the optimization process, use the Notebook age_verification_optimize.ipynb. This program will automatically download the generated circuits, test datasets, and benchmarks, streamlining the workflow for efficient age verification model evaluation. 🛠️

Results

Actual Benchmark Results:

Hardware Configuration:

Component Specification

CPU AMD Ryzen 9 9950X (16 cores)

RAM 32 GB

GPU 1 x RTX 4090
Results:

Circuit Average Raw Accuracy Average Proof Size Average Response Time Memory Usage Verification Result

Circuit 1 0.95 ~ 0.96 3.072 KB ~1.6s ~2GB ✅

Circuit 2 ~0.96 3.072 KB ~1.5s ~2GB ✅

…some logs

alexander-camuto · 2025-05-19T00:48:49Z

ty have finally made it to you after reviewing other submissions -- ty for your patience

JSnobody added 3 commits April 27, 2025 15:44

[omron-subnet-competition-1] Submit age verification optimization

f1ab4bc

[omron-subnet-competition-1] Modify log info

e2150a0

[omron-subnet-competition-1] Modify multi images test flow and close …

51ae32a

…some logs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize Age Verification for Enhanced Performance [omron-subnet-competition-1] #975

Optimize Age Verification for Enhanced Performance [omron-subnet-competition-1] #975

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Circuit	Average Raw Accuracy	Average Proof Size	Average Response Time	Memory Usage	Verification Result
Circuit 1	0.95 ~ 0.96	3.072 KB	~1.6s	~2GB	✅
Circuit 2	~0.96	3.072 KB	~1.5s	~2GB	✅

Component	Specification
CPU	AMD Ryzen 9 9950X (16 cores)
RAM	32 GB
GPU	1 x RTX 4090

Optimize Age Verification for Enhanced Performance [omron-subnet-competition-1] #975

Are you sure you want to change the base?

Optimize Age Verification for Enhanced Performance [omron-subnet-competition-1] #975

Uh oh!

Conversation

Uh oh!

Optimization of AI Model Proving Time and Memory Usage in Age Verification Competition 🚀

Background

Environment

Hardware Configuration

Software Configuration

Notice ⚠️

Changes Made

Modified Files

Benchmark Process

Execution Instructions

Results

Actual Benchmark Results:

Uh oh!

Uh oh!

Uh oh!