Optimize Age Verification for Enhanced Performance [omron-subnet-competition-1] #975
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Optimization of AI Model Proving Time and Memory Usage in Age Verification Competition 🚀
Background
This PR addresses the optimization of age verification as part of the EZKL competition. The goal is to enhance the efficiency and accuracy of the age verification process through model and circuit optimizations.
Memory Constraints: The original model provided for the competition,
age.onnx
, requires an enormous amount of memory during circuit generation and proof computation, exceeding 128GB. The objective is to ensure it can run effectively on machines with at least 128GB of RAM. The competition recommends using the Mac2-M1 Ultra (AWS) with 128GB RAM.Prove Time Limitations: The proof generation time for the original model is excessively long, necessitating significant optimization. The competition recommends using the Apple M1 Ultra (20-cores) and Metal for optimal performance.
Precision Maintenance: To reduce memory usage and accelerate proof generation, adjustments to the circuit are required while ensuring a high level of accuracy is maintained.
Environment
Hardware Configuration
This PR includes optimizations to the model and circuitry, significantly reducing memory requirements. As a result, hardware configurations with only a few GB of RAM are sufficient. There are no specific requirements for the CPU. GPUs can also be used.🖥️
Software Configuration
The actual software environment for running this PR is:
Notice⚠️
Due to the version used for the Omron subnet competition on Bittensor being v19.0.7, this PR is developed based on v19.0.7 and needs to be tested and validated on v19.0.7. Other versions have not been validated yet. ❗
Changes Made
AI Model Optimization:
age.onnx
, including pruning and restructuring to create a minimal network model that improves performance.Circuit Generation:
in_f = round(in_float32 * (1 << input_scale)) mod q
, whereq
is the order of the finite field. Smaller scales may reduce accuracy, while larger scales increase computational load during inference.16384^2
results in failure. To mitigate this, we introduce division operations based onscale_rebase_multiplier
. Ifop_out_scale > (input_scale * scale_rebase_multiplier)
, a division op is added. A larger multiplier can cause proof failures, while a smaller one adds more division ops, increasing trace length and memory consumption.Through extensive experimentation and analysis, we identified optimal combinations of
num_inner_cols
,input_scale
,param_scale
, andscale_rebase_multiplier
to achieve the best overall performance.Performance Enhancements:
Modified Files
Benchmark Process
Download the Original Model 📥
Download the Test Image Dataset and resize the images 📷✨
Download Several Optimized Circuits 🔌
Randomly Select 10 Resized Images for each circuit:
Note: We referenced and cited the testing methods and code from Omron's benchmark. Memory usage can be monitored using system commands. 🖥️
Execution Instructions
Compile ezkl:
ezkl
binary with GPU support, optimizing the execution of your computations.Run the Optimization:
age_verification_optimize.ipynb
. This program will automatically download the generated circuits, test datasets, and benchmarks, streamlining the workflow for efficient age verification model evaluation. 🛠️Results
Actual Benchmark Results:
Hardware Configuration:
Results: