Curie is the first AI-agent framework designed for automated and rigorous scientific experimentation. Curie helps answer your curiosity through end-to-end experimentation automation, ensuring that every step—from hypothesis formulation to result interpretation—is conducted with precision, reliability, and reproducibility.
Key Features
- 🚀 Automated Experimentation – End-to-end workflow management: hypothesis formulation, experiment setup, experiment execution, result analysis and finding reflection. 8000
- 📊 Rigor Enhancement - Built-in verification modules enforce methodical procedure, reliability and interpretability.
- 🔬 Broad Applicability – Supports ML research, system analysis, and scientific discovery.
- 📖 Experimentation Benchmark - Provide 46 questions from 4 Computer Science domains, based on influential papers and open-source projects (
benchmark/experimentation_bench
).
- Install docker: https://docs.docker.com/engine/install/ubuntu/.
- Grant permission to docker via
sudo chmod 666 /var/run/docker.sock
. - Run
docker ps
to check that permission has been granted with the Docker daemon.
- Clone the repository:
git clone https://github.com/Just-Curieous/Curie.git
cd Curie
- Put your LLM API credentials under
curie/setup/env.sh
. Example:
export MODEL="gpt-4o"
export OPENAI_API_KEY="sk-xxx"
- Build the container image. This will take a few minutes. Note: you may need to setup a virtual environment before running pip install.
pip install -e .
docker images -q exp-agent-image | xargs -r docker rmi -f # remove any existing conflict image
cd curie && docker build --no-cache --progress=plain -t exp-agent-image -f ExpDockerfile_default .. && cd -
Use the following command to input your research question or problem statement: python3 -m curie.main -q "<Your research question>"
.
python3 -m curie.main \
-q "How does the choice of sorting algorithm impact runtime performance across different \
input distributions (random, nearly sorted, reverse sorted)?" --report
- Estimated runtime: ~5 minutes
- Sample log file: Available here
- Experiment report: Available here.
- Logs and Reproducibilty:
- Real-time logs are streamed to the console.
- Experiment logs and experiment report (
--report
) are stored inlogs/research_<ID>
- The full experimentation process (code, script and real results) is saved in
workspace/research_<ID>/
.
Example 2: How does the choice of activation function (e.g., ReLU, sigmoid, tanh) impact the model training convergence rate?
python3 -m curie.main -f benchmark/junior_ml_engineer_bench/q1_activation_func.txt --report
If you have a dataset but are unsure how to start training/deloying your ML models to achieve your goals, simply provide your dataset and question to Curie:
python3 -m curie.main -q 'Example: How to improve my prediction accuracy on my datastet' \
--task_config curie/configs/mle.json \
--dataset_dir <path to your dataset> \
--report
- You can include your own starter code by adding the argument
--workspace_name <path_to_your_workspace>
. - Check out an example from MLE-Bench.
Check out more computational questions, as well as Machine Learning questions and Machine Learning Systems questions.
- How to let Curie work on your own starter files?
- How to reproduce the results in `Large Language Monkeys'.
Curie is designed for scientific discovery across multiple domains:
- 🔬 Machine Learning & AI Research – Hyperparameter tuning and algorithm behavior
- 💻 System Performance Analysis – Benchmarking systems, optimizing configurations, investigating system trade-offs.
- 🧪 Algorithmic & Scientific Discovery – Validating hypotheses, automating computational simulations.
For any issues or feature requests, please open an issue on our GitHub Issues page.
Curie is released under the Apache 2.0 License. See LICENSE
for more details.