eZAutoML
is a framework designed to make Automated Machine Learning (AutoML) accessible to everyone. It provides an incredible easy to use interface based on Scikit-Learn API to build modelling pipelines with minimal effort.
The framework is built around a few core concepts:
- Optimizers: Black-box optimization methods for hyperparameters.
- Easy Tabular Pipelines: Simple domain-specific language to describe pipelines for preprocessing and model training.
- Scheduling: Work in progress; this feature enables horizontal scalability from a single computer to datacenters by using airflow executors.
The latest version of eZAutoML
can be installed via PyPI or from source.
pip install ezautoml
ezautoml --help
To install from source, you can clone this repo and install with pip
:
pip install -e .
Not only it can be used programatically but we provide an extremely lightweight CLI api to instantiate tabular AutoML pipelines with just a single command, for example:
ezautoml --dataset data/smoking.csv --target smoking --task classification --trials 10 --verbose
Options:
- dataset: Path to the dataset file (CSV, parquet...)
- target: The target column name for prediction
- task: Task type: classification/c or regression/r
- search: Black-box optimization algorithm to perform
- output: Directory to save the output models/results
- trials: Maximum number of trials inside an optimiation algorithm
- verbose: Increase logging verbosity
- version: Show the current version
For more detailed help, use:
ezautoml --help
There are future features that are still a work-in-progress and will be enabled in the future such as scheduling, metalearning, pipelines...
You can also use eZAutoML within Python scripts (though this feature is still being developed). This will allow you to work through Python code or via custom pipelines in the future.
import time
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import accuracy_score
from ezautoml.model import eZAutoML
from ezautoml.space.search_space import SearchSpace
from ezautoml.evaluation.metric import MetricSet, Metric
from ezautoml.evaluation.task import TaskType
from ezautoml.optimization.optimizers.random_search import RandomSearchOptimizer
# Load dataset (classification example)
data = load_breast_cancer()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Define metrics for classification
metrics = MetricSet(
{"accuracy": Metric(name="accuracy", fn=accuracy_score, minimize=False)},
primary_metric_name="accuracy"
)
# Load classification search space
search_space = SearchSpace.from_builtin("classification_space")
# Initialize eZAutoML for classification
ezautoml = eZAutoML(
search_space=search_space,
task=TaskType.CLASSIFICATION,
metrics=metrics,
max_trials=25,
max_time=600,
seed=42
)
ezautoml.fit(X_train, y_train)
test_accuracy = ezautoml.test(X_test, y_test)
ezautoml.summary(k=10)
We welcome contributions to eZAutoML! If you'd like to contribute, please fork the repository and submit a pull request with your changes. For detailed information on how to contribute, please refer to our contributing guide.
eZAutoML is licensed under the BSD 3-Clause License. See the LICENSE file for more information.