8000 Optimize Age Verification for Enhanced Performance [omron-subnet-competition-1] by JSnobody · Pull Request #975 · zkonduit/ezkl · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Optimize Age Verification for Enhanced Performance [omron-subnet-competition-1] #975

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

JSnobody
Copy link
@JSnobody JSnobody commented Apr 27, 2025

Optimization of AI Model Proving Time and Memory Usage in Age Verification Competition 🚀

Background

This PR addresses the optimization of age verification as part of the EZKL competition. The goal is to enhance the efficiency and accuracy of the age verification process through model and circuit optimizations.

  1. Memory Constraints: The original model provided for the competition, age.onnx, requires an enormous amount of memory during circuit generation and proof computation, exceeding 128GB. The objective is to ensure it can run effectively on machines with at least 128GB of RAM. The competition recommends using the Mac2-M1 Ultra (AWS) with 128GB RAM.

  2. Prove Time Limitations: The proof generation time for the original model is excessively long, necessitating significant optimization. The competition recommends using the Apple M1 Ultra (20-cores) and Metal for optimal performance.

  3. Precision Maintenance: To reduce memory usage and accelerate proof generation, adjustments to the circuit are required while ensuring a high level of accuracy is maintained.

Environment

Hardware Configuration

This PR includes optimizations to the model and circuitry, significantly reducing memory requirements. As a result, hardware configurations with only a few GB of RAM are sufficient. There are no specific requirements for the CPU. GPUs can also be used.🖥️

Software Configuration

The actual software environment for running this PR is:

  • Python Version: 3.12
  • Python Packages:
    pip install opencv-python numpy onnxruntime torch torchvision torchaudio requests tqdm
  • Branch: Based on v19.0.7

Notice ⚠️

Due to the version used for the Omron subnet competition on Bittensor being v19.0.7, this PR is developed based on v19.0.7 and needs to be tested and validated on v19.0.7. Other versions have not been validated yet. ❗

Changes Made

  • AI Model Optimization:

    • Conducted optimizations on the original model age.onnx, including pruning and restructuring to create a minimal network model that improves performance.
  • Circuit Generation:

    • We achieved a balance between model inference accuracy, memory usage during proof, and proof time by adjusting the following parameters for optimal performance:
      1. num_inner_cols: Influences the number of advice polynomials during layout.
      2. input_scale and param_scale: Used for quantizing weights, biases, and input images from floating-point to finite field representations. The quantization is defined by the equation: in_f = round(in_float32 * (1 << input_scale)) mod q, where q is the order of the finite field. Smaller scales may reduce accuracy, while larger scales increase computational load during inference.
      3. scale_rebase_multiplier: After quantization, multiplication operations (e.g., convolutions) may increase output scale, potentially leading to proof failures. For instance, leaky_relu requires outputs to fit within a base 16384 representation of length 2; exceeding 16384^2 results in failure. To mitigate this, we introduce division operations based on scale_rebase_multiplier. If op_out_scale > (input_scale * scale_rebase_multiplier), a division op is added. A larger multiplier can cause proof failures, while a smaller one adds more division ops, increasing trace length and memory consumption.

    Through extensive experimentation and analysis, we identified optimal combinations of num_inner_cols, input_scale, param_scale, and scale_rebase_multiplier to achieve the best overall performance.

  • Performance Enhancements:

    • Improved the Halo2 multi-open proof mechanism to enhance overall performance. We found that the current Halo2 multi-open proof implementation incurs unnecessary large memory allocations and redundant computations. By applying straightforward mathematical reasoning, we modified the computational process within the proof to avoid these inefficiencies, resulting in approximately a 10% reduction in proof time.

Modified Files

File Status Description
./example/notebooks/age_verification_optimize.ipynb Added Download optimized circuits and provide a complete Age Verification Benchmark test.

Benchmark Process

  1. Download the Original Model 📥

  2. Download the Test Image Dataset and resize the images 📷✨

  3. Download Several Optimized Circuits 🔌

  4. Randomly Select 10 Resized Images for each circuit:

    • Test Accuracy
    • Measure Proof Time ⏱️
    • Evaluate Proof Size 📏
    • Calculate the average for the 10 images 📊
    • You also can set the number of test images by yourself according to your requirements.

Note: We referenced and cited the testing methods and code from Omron's benchmark. Memory usage can be monitored using system commands. 🖥️

Execution Instructions

  1. Compile ezkl:

    • If you wish to utilize GPU acceleration for enhanced performance, execute the following command:
      cargo build --release --bin ezkl --features ezkl,icicle
    • This will compile the ezkl binary with GPU support, optimizing the execution of your computations.
  2. Run the Optimization:

    • To run the optimization process, use the Notebook age_verification_optimize.ipynb. This program will automatically download the generated circuits, test datasets, and benchmarks, streamlining the workflow for efficient age verification model evaluation. 🛠️

Results

Actual Benchmark Results:

  • Hardware Configuration:

    Component Specification
    CPU AMD Ryzen 9 9950X (16 cores)
    RAM 32 GB
    GPU 1 x RTX 4090
  • Results:

    Circuit Average Raw Accuracy Average Proof Size Average Response Time Memory Usage Verification Result
    Circuit 1 0.95 ~ 0.96 3.072 KB ~1.6s ~2GB
    Circuit 2 ~0.96 3.072 KB ~1.5s ~2GB

@alexander-camuto
Copy link
Collaborator

ty have finally made it to you after reviewing other submissions -- ty for your patience

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0