Databricks AutoML

Building a machine learning model traditionally requires significant expertise. A data scientist must understand which algorithm fits the problem, how to prepare the data correctly, which settings to tune, and how to evaluate the results. This process takes weeks even for experienced professionals. Organizations without dedicated data scientists often cannot benefit from machine learning at all.

Databricks AutoML changes this. It automates the most time-consuming and technical parts of building a machine learning model. AutoML examines your data, tries multiple algorithms and settings, evaluates the results, and presents the best model — all in minutes. It also generates the Python code for every experiment it runs, so data scientists can inspect, learn from, and customize the results rather than treating AutoML as a black box.

What AutoML Actually Does

AutoML is not magic. It is an automated version of the work a data scientist performs manually. Understanding what it automates helps you understand both its power and its limits.

When you provide a dataset and tell AutoML your goal — predict customer churn, forecast next month's revenue, classify support tickets by category — it runs through a systematic process:

  1. Data profiling — It examines your dataset to understand its structure, identifies the data types of each column, detects missing values, spots class imbalances in classification problems, and flags potential issues.
  2. Data preparation — It applies standard transformations: filling missing values, encoding categorical variables (converting text categories to numbers models can process), and scaling numerical features.
  3. Algorithm selection — It tries multiple algorithms appropriate for the problem type. For classification, it might test Logistic Regression, Decision Tree, Random Forest, XGBoost, and LightGBM. For regression, it tests similar options plus linear regression variants.
  4. Hyperparameter tuning — For each algorithm, it tries different settings to optimize performance.
  5. Evaluation and comparison — It evaluates every trial using cross-validation and ranks the results by the appropriate metric.
  6. Result presentation — It displays a ranked leaderboard of all trials and generates downloadable notebooks with the code for each experiment.

Three Problem Types AutoML Handles

Classification

Classification problems predict which category something belongs to. Examples include predicting whether a loan application will default (yes or no), which product category a customer will purchase next (electronics, clothing, or home goods), or whether an email is spam or legitimate.

AutoML evaluates classification models using metrics like accuracy, F1 score, ROC-AUC, and log loss. The right metric depends on the problem — for fraud detection where missing a fraud case is costly, recall matters more than overall accuracy.

Regression

Regression problems predict a number. Examples include predicting a house's sale price, forecasting next quarter's revenue, or estimating how long a customer's support ticket will take to resolve.

AutoML evaluates regression models using metrics like RMSE (root mean squared error), MAE (mean absolute error), and R-squared. These measure how close the model's predicted numbers are to the actual numbers.

Forecasting

Forecasting problems predict future values in a time series — data recorded at regular intervals over time. Examples include predicting daily website traffic for the next month, forecasting product demand for inventory planning, or estimating energy consumption for the next week.

AutoML's forecasting mode handles the special requirements of time series data. It preserves the chronological order of data (using past data to train, future data to evaluate), handles seasonality (patterns that repeat weekly, monthly, or annually), and tests specialized time series algorithms alongside general regression algorithms.

Using AutoML: A Step-by-Step Walkthrough

Starting AutoML in Databricks requires no code. The process works through the UI or through a Python API call.

Through the UI

Navigate to the Experiments section in the Databricks workspace and click "Start AutoML Experiment." A configuration form appears asking for:

  • Cluster — Which compute cluster to run experiments on
  • Problem type — Classification, Regression, or Forecasting
  • Dataset — The table or DataFrame containing your training data
  • Prediction target — Which column AutoML should learn to predict
  • Evaluation metric — How to measure model quality
  • Training time — How many minutes AutoML should spend searching (longer time means more trials tested)

After clicking Start, AutoML begins running experiments. A live dashboard shows each trial as it completes, with its algorithm name, settings, and evaluation metric score.

Through the Python API

from databricks import automl

summary = automl.classify(
    dataset=training_df,
    target_col="churned",
    primary_metric="f1",
    timeout_minutes=30
)

print(summary.best_trial)

The API is useful for integrating AutoML into automated pipelines where a UI interaction is not practical. The summary object contains the best trial details, the MLflow experiment link, and references to all generated notebooks.

AutoML's Generated Notebooks: Learning From the Machine

One feature that separates Databricks AutoML from many competitors is transparency. For every trial it runs, AutoML generates a Python notebook containing the complete code that produced that result. These notebooks are readable, well-commented Python code — not a black box.

A generated notebook for an XGBoost classification trial includes:

  • Data loading and exploration code
  • Preprocessing steps with explanations
  • Feature encoding logic
  • The exact model training code with the specific hyperparameters used
  • Evaluation code with metric calculations
  • Visualization code for confusion matrices and feature importance charts

Data scientists use these notebooks as starting points. Instead of writing a model training notebook from scratch, they take AutoML's best notebook, understand what it did, and modify it — add domain knowledge, include additional features, try a custom preprocessing step. AutoML accelerates the expert, not just the beginner.

Data Exploration: The AutoML Data Analysis Notebook

Before running any trials, AutoML generates a separate data exploration notebook. This notebook profiles the training dataset and reveals important characteristics that affect model building decisions.

The data exploration notebook shows:

  • The distribution of each numerical column — minimum, maximum, mean, and percentile values
  • The frequency of each value in categorical columns
  • Missing value counts per column and what percentage of data is missing
  • Class balance for classification targets — if 95% of rows are "not churned" and only 5% are "churned," the model needs special handling to learn the minority class
  • Correlation between features — highly correlated features may be redundant
  • Potential issues like columns with a single unique value (which provide no predictive information)

This profiling step is valuable even for organizations that do not end up using AutoML's trained model. The data exploration notebook teaches data scientists about their dataset quickly and systematically.

How AutoML Handles Missing Data

Real-world datasets almost always contain missing values. A customer record might be missing an email address. A sensor reading might be absent for a timestamp where the sensor was offline. Machine learning algorithms cannot process missing values directly — they need complete inputs.

AutoML applies imputation — a technique for filling in missing values based on the surrounding data. For numerical columns, it fills missing values with the median value of that column (the middle value when all values are sorted). For categorical columns, it fills missing values with the most frequent category or treats missing as its own separate category.

AutoML documents exactly which imputation strategy it applied to each column in the generated notebook. Data scientists can review these decisions and override them if domain knowledge suggests a better approach.

Feature Importance: Understanding What Drives Predictions

After training, AutoML provides feature importance scores for the best models. Feature importance answers the question: which input variables matter most to the model's predictions?

A churn prediction model might reveal that "days since last purchase" and "number of support tickets in the last 60 days" are the two most important features, contributing 40% and 25% of the model's predictive power respectively. Age and location contribute less than 5% each.

This information serves both technical and business purposes. Technically, it guides feature selection decisions — low-importance features might be dropped to simplify the model. For business stakeholders, it reveals which customer behaviors signal churn risk most strongly, informing intervention strategies entirely separate from the model.

AutoML displays feature importance as a bar chart in the best trial notebook and in the experiment UI, making it accessible without reading code.

AutoML and MLflow Integration

Every AutoML experiment runs as an MLflow experiment. All trials appear in the MLflow Tracking UI as individual runs. The best model gets registered in the MLflow Model Registry automatically when the data scientist chooses to promote it.

This integration means AutoML experiments benefit from everything MLflow provides — complete trial history, metric comparisons across all runs, artifact storage, and model versioning. An organization using AutoML for multiple projects can track every AutoML experiment in the same place as manually built models, creating a single view of all machine learning activity.

Time Series Forecasting with AutoML: A Deeper Look

Forecasting requires special consideration because time series data violates a key assumption of standard machine learning: observations are not independent. What happened last Tuesday strongly influences what happens next Tuesday. Standard algorithms that shuffle data randomly for training and testing break this temporal structure and produce unreliable results.

AutoML's forecasting mode handles this correctly by:

  • Using a time-based train/test split — training on earlier data and evaluating on later data, never the reverse
  • Detecting seasonal patterns automatically — weekly, monthly, and annual cycles
  • Testing time-series-specific algorithms like Prophet (developed by Meta for business forecasting) and ARIMA alongside general regression algorithms
  • Generating forecasts with confidence intervals — not just predicted values but a range within which the actual value is likely to fall

Specifying Forecast Horizon and Frequency

When setting up a forecasting experiment, two additional settings are required compared to classification or regression:

  • Forecast horizon — How many time steps into the future to predict. A demand forecast might look 30 days ahead. An energy forecast might look 24 hours ahead.
  • Frequency — The time interval between observations. Daily, hourly, weekly, or monthly.

Comparing AutoML to Manual Model Building

AutoML does not replace data scientists. It changes what data scientists spend their time on.

What AutoML Does Better

  • Tries more algorithm combinations than any human would have patience to test manually
  • Applies consistent preprocessing without human error
  • Produces results in minutes instead of days for initial exploration
  • Generates reproducible, readable code for every trial
  • Serves as an excellent baseline — the AutoML result sets the performance bar that custom models aim to beat

What Manual Building Does Better

  • Incorporates domain-specific knowledge that AutoML has no access to
  • Handles unusual data structures — graph data, image data, text requiring custom processing — that AutoML does not support
  • Optimizes for business-specific constraints that standard metrics do not capture
  • Builds models for problems requiring custom architectures, such as deep learning

The most effective approach uses AutoML first. The AutoML result becomes the baseline. Data scientists review the generated notebooks, understand what worked, and then apply their expertise to push performance higher.

AutoML with the Feature Store

Databricks AutoML integrates with the Feature Store. Instead of providing a raw dataset, a data scientist can specify a Feature Store table as the data source. AutoML retrieves the pre-computed features from the Feature Store, ensuring consistency with other models in the organization.

This integration creates a smooth end-to-end workflow:

  1. The data engineering team builds and maintains feature tables in the Feature Store
  2. A business analyst or junior data scientist runs AutoML against those feature tables
  3. AutoML produces a baseline model with generated notebooks
  4. Senior data scientists review the notebooks and customize them for improved performance
  5. The final model gets registered in the MLflow Model Registry and deployed to production

Real-World Scenario: Predicting Hospital Readmission

A regional hospital network wants to predict which patients are at high risk of being readmitted within 30 days of discharge. Early identification allows care coordinators to schedule follow-up calls and home health visits, reducing readmissions and improving patient outcomes.

The data analytics team has a dataset of 50,000 past discharge records with 80 columns — demographics, diagnoses, procedures, lab results, medication counts, prior admission history, and whether each patient was readmitted within 30 days (the target column).

The team runs AutoML with a 45-minute training budget using F1 score as the primary metric (chosen because both missing high-risk patients and incorrectly flagging low-risk patients have costs). AutoML runs 28 trials in 45 minutes, testing Logistic Regression, Decision Tree, Random Forest, XGBoost, LightGBM, and several variants of each.

The top result — a LightGBM model — achieves an F1 score of 0.71, significantly better than the team's previous rule-based approach that scored 0.48. The feature importance chart reveals that prior admission count, discharge diagnosis code, and medication count at discharge are the three strongest predictors. This insight immediately informs care coordinators about what to look for, independent of the model.

The team downloads the best trial notebook and spends two days customizing it — adding domain-specific feature interactions (combining diagnosis codes with age), adjusting the classification threshold (preferring higher recall over precision to catch more at-risk patients), and adding SHAP values (a method for explaining individual predictions). The final customized model achieves F1 of 0.76 and gets deployed as a daily scoring job that flags high-risk patients each morning.

Without AutoML, establishing the baseline and identifying which algorithms were worth pursuing would have taken two to three weeks. AutoML compressed that exploration to 45 minutes, letting the team spend their time on meaningful customization.

AutoML Limitations to Be Aware Of

Using AutoML effectively requires understanding what it does not handle:

  • Unstructured data — AutoML works with tabular data (rows and columns). It does not handle raw images, audio, video, or unprocessed text. Text must be pre-converted to numerical features before AutoML can use it.
  • Very large datasets — AutoML samples large datasets for speed. If your dataset has 100 million rows, AutoML may train on a sample. The generated notebook code uses the full dataset, but the AutoML exploration phase uses sampling.
  • Custom loss functions — AutoML optimizes for standard metrics. Problems requiring custom objective functions need manual model building.
  • Deep learning — AutoML does not currently build neural network architectures. It tests traditional machine learning algorithms, not deep learning models.
  • Feature engineering — AutoML applies standard preprocessing but does not create new features from existing ones (except basic transformations). Domain-specific feature engineering still requires expert input.

Key Points Summary

  • AutoML automates algorithm selection, hyperparameter tuning, data preprocessing, and evaluation — compressing weeks of work into minutes.
  • It supports three problem types: Classification (predicting categories), Regression (predicting numbers), and Forecasting (predicting future values in a time series).
  • AutoML generates complete Python notebooks for every trial, making its work transparent and customizable.
  • A data exploration notebook profiles the dataset before any training begins, identifying missing values, class imbalances, and distribution characteristics.
  • Feature importance charts reveal which input variables matter most to the model's predictions.
  • Every AutoML experiment logs to MLflow automatically, enabling comparison with manually built models.
  • Integration with the Feature Store lets AutoML use pre-computed, consistent feature definitions.
  • AutoML is best used as a baseline — the starting point that domain experts then customize and improve.
  • AutoML does not handle unstructured data, deep learning, or custom loss functions.

Leave a Comment