Databricks Workspace

The Databricks Workspace is the place where all your work lives. It is the first thing you see after logging in, and understanding it well saves you time every day. This topic walks through every major section of the workspace, explains what each part does, and shows you how to set up your personal environment so you can start working immediately.

Logging Into Databricks

Your organization's Databricks workspace has a unique URL, typically in the format https://[your-org].azuredatabricks.net (for Azure), https://[your-org].cloud.databricks.com (for AWS), or https://[your-org].gcp.databricks.com (for GCP). You log in using your email and password, or through your company's single sign-on (SSO) system such as Microsoft Azure AD, Okta, or Google Identity.

Once logged in, you land on the Databricks Home page. The left sidebar is your primary navigation tool.

The Left Sidebar – Your Navigation Menu

DATABRICKS LEFT SIDEBAR
─────────────────────────────
🏠  Home          → Your personal folder and recent items
🔍  Search        → Find notebooks, tables, clusters quickly
📓  Workspace     → All notebooks and folders
📂  Catalog       → Browse databases, tables, and volumes
⚡  Compute       → Manage clusters and SQL warehouses
⚙️  Workflows     → Create and monitor scheduled jobs
💬  SQL Editor    → Write and run SQL queries
📊  Dashboards    → View and build visual reports
🤖  Machine Learning → Experiments, models, feature store
🔔  Alerts        → Set up notifications for metrics

Each section of the sidebar opens a different part of the workspace. You can think of the sidebar as a remote control for the entire Databricks platform.

The Home Section

The Home section is your personal landing area. It shows recently visited notebooks, recently modified files, and quick links to your most-used resources. It also displays your personal folder — a private space where you store notebooks and files that only you can see by default.

The Home section also shows Recent Items, which lists the last notebooks you opened, clusters you used, and jobs you ran. This makes it easy to pick up where you left off after a break.

The Workspace Section – Organizing Your Notebooks

The Workspace section is a file system for all your notebooks, libraries, and folders. It works like Google Drive or a shared network folder, but instead of documents, it holds Databricks notebooks and code files.

WORKSPACE FILE STRUCTURE EXAMPLE
──────────────────────────────────
📁 Workspace
   ├── 📁 Users
   │    ├── 📁 priya@company.com   ← Personal folder
   │    │    ├── 📓 sales_analysis
   │    │    └── 📓 customer_segmentation
   │    └── 📁 rahul@company.com
   │         └── 📓 inventory_pipeline
   └── 📁 Shared                  ← Team-shared folder
        ├── 📁 Data Engineering
        │    └── 📓 etl_master_notebook
        └── 📁 Analytics
             └── 📓 monthly_report

Files inside a user's personal folder are private by default. Files inside the Shared folder are visible to all workspace members. Administrators can create additional shared folders with custom access permissions.

Creating a Notebook

To create a new notebook, right-click any folder in the Workspace browser and select Create > Notebook. A dialog box asks you for the notebook name, the default programming language (Python, SQL, Scala, or R), and the cluster to attach it to. You can always change these settings later.

Importing and Exporting Notebooks

Databricks notebooks can be exported as .ipynb (Jupyter format), .py (Python scripts), .sql (SQL files), or .dbc (Databricks archive format). To import a notebook from your local computer, right-click a folder and choose Import. This is useful when you want to bring a Jupyter notebook from your laptop into Databricks.

The Catalog Section – Browsing Your Data

The Catalog section (also called the Data Explorer) is where you browse all your databases, tables, views, and volumes. Think of it as a library catalog — it shows you every dataset that exists in your workspace and how it is organized.

CATALOG BROWSER STRUCTURE
──────────────────────────────
📦 Catalog: main
   └── 📁 Schema: retail_data
        ├── 🗂  Table: customers
        │    ├── customer_id (INT)
        │    ├── name (STRING)
        │    ├── city (STRING)
        │    └── signup_date (DATE)
        ├── 🗂  Table: transactions
        │    ├── txn_id (INT)
        │    ├── customer_id (INT)
        │    ├── amount (DOUBLE)
        │    └── txn_date (TIMESTAMP)
        └── 👁  View: high_value_customers

Clicking on a table shows you its column names, data types, sample rows, and statistics like minimum value, maximum value, and the number of null entries per column. This preview feature is extremely useful for understanding a new dataset before writing any code.

The Compute Section – Managing Your Processing Power

The Compute section shows all clusters and SQL Warehouses in your workspace. This is where you create new clusters, check the status of running clusters, and adjust cluster configurations.

Types of Compute in Databricks

COMPUTE TYPES IN DATABRICKS
─────────────────────────────────────────────────────
All-Purpose Cluster   → Used for interactive notebooks
                        Stays running until you stop it
                        Best for development and exploration

Job Cluster          → Created automatically when a job runs
                        Shuts down as soon as the job finishes
                        Best for production pipelines (cheaper)

SQL Warehouse        → Optimized for SQL queries only
                        Serverless option available
                        Best for analysts using the SQL Editor

For day-to-day exploration, use an All-Purpose Cluster. For scheduled production jobs, use Job Clusters — they spin up fresh for each run and shut down automatically, saving money.

Creating Your First Cluster

Go to Compute, click Create Cluster, and fill in these key fields:

Cluster Name: A descriptive name like "dev-exploration-cluster"
Cluster Mode: Single Node (for small jobs) or Standard (for distributed jobs)
Databricks Runtime Version: Choose the latest LTS (Long Term Support) version
Node Type: The type of virtual machine. Larger machines cost more but process data faster.
Autoscaling: Check this box to let the cluster grow and shrink automatically based on workload
Terminate after inactivity: Set this to 30–60 minutes to avoid paying for idle clusters

The SQL Editor – Writing Queries Like a Spreadsheet

The SQL Editor is a dedicated interface for analysts who prefer writing SQL queries rather than Python code. It connects to a SQL Warehouse (not a regular cluster) and provides a clean environment for writing queries, viewing results, and saving frequently used queries.

SQL EDITOR LAYOUT
──────────────────────────────
┌─────────────────────────────────────┐
│  SQL EDITOR                         │
│─────────────────────────────────────│
│  [Warehouse: Analytics-WH ▼]        │  ← Select SQL Warehouse
│─────────────────────────────────────│
│  SELECT city,                       │
│         COUNT(*) AS total_customers │
│  FROM main.retail_data.customers    │
│  GROUP BY city                      │
│  ORDER BY total_customers DESC      │
│                                     │
│  [▶ Run Query]  [Save Query]        │
│─────────────────────────────────────│
│  RESULTS                            │
│  city     | total_customers         │
│  Mumbai   | 4523                    │
│  Delhi    | 3892                    │
│  Pune     | 2110                    │
└─────────────────────────────────────┘

The SQL Editor also supports query history, so you can see every query you ran in the past and re-run them with one click.

Dashboards – Turning Data Into Charts

The Dashboards section lets you build visual reports from your SQL query results. After running a query in the SQL Editor, you can turn the result into a bar chart, line chart, pie chart, or data table, then pin it to a dashboard.

Dashboards in Databricks are shareable. You can schedule them to refresh automatically — for example, every morning at 7 AM — and send a link to your manager or client so they always see the latest data without needing to log into Databricks themselves.

Personal Settings and User Preferences

Click your username in the top-right corner to open User Settings. Here you can manage several important preferences.

Access Tokens

Access tokens let you connect to Databricks from external tools like Python scripts, BI tools (Tableau, Power BI), or command-line interfaces. To generate a token, go to User Settings > Access Tokens > Generate New Token. Keep the token secret — it acts like a password for programmatic access.

Git Integration

Databricks supports integration with GitHub, GitLab, and Azure DevOps. Once connected, you can commit notebook changes directly to a Git repository, create branches, and collaborate using pull requests — the same way software developers work. This keeps your data code versioned and auditable.

GIT INTEGRATION WORKFLOW
──────────────────────────────────────
Local Git Branch                Databricks Notebook
      │                                │
      │  git clone → pull into         │
      │  Databricks Repo               │
      ▼                                ▼
Work on code locally         Work on notebook in browser
      │                                │
      └──────── git push ──────────────┘
                     │
                     ▼
            Pull Request on GitHub
                     │
                     ▼
            Code Review + Merge

Notification Preferences

You can configure email notifications for job failures, job completions, and alert thresholds. Setting up failure notifications ensures you know immediately when a critical pipeline breaks, rather than discovering it hours later when someone notices that a report is empty.

Admin Settings – For Workspace Administrators

If you have administrator access, you see an additional Admin Console option in your user menu. The Admin Console lets you manage users and groups, set workspace-level permissions, configure Single Sign-On (SSO), view audit logs, and manage service principals (automated accounts used by applications).

Administrators also control which cloud storage buckets the workspace can access and set up security configurations like IP access lists (allowing only office IP addresses to log in).

The Databricks CLI – Working From the Command Line

The Databricks CLI (Command Line Interface) lets you interact with your workspace from a terminal. It is useful for automating tasks like deploying notebooks, managing clusters, or uploading files.

Install it using pip:

pip install databricks-cli

Configure it with your workspace URL and access token:

databricks configure --token
Host: https://your-workspace.azuredatabricks.net
Token: [paste your token here]

After configuration, you can run commands like:

databricks fs ls dbfs:/           → List files in Databricks File System
databricks clusters list          → Show all clusters
databricks jobs list              → Show all scheduled jobs
databricks workspace ls /Users    → List notebooks in workspace

Keyboard Shortcuts in Notebooks

Knowing keyboard shortcuts speeds up your notebook workflow significantly.

DATABRICKS NOTEBOOK SHORTCUTS
──────────────────────────────────
Ctrl + Enter   → Run current cell
Shift + Enter  → Run cell and move to next
Ctrl + Shift + P → Command palette
Esc + A        → Insert cell above
Esc + B        → Insert cell below
Esc + D + D    → Delete current cell
Ctrl + Z       → Undo
Ctrl + /       → Comment / Uncomment code

Key Points

The Databricks Workspace is the browser-based interface where all your data work happens.
The left sidebar provides navigation to notebooks, data catalog, compute, SQL editor, dashboards, and machine learning tools.
All-Purpose Clusters suit interactive work; Job Clusters suit automated production pipelines; SQL Warehouses suit SQL-only analytics.
The Catalog browser lets you explore all tables, schemas, and data assets without writing any code.
Git integration keeps your notebook code versioned and enables collaboration through pull requests.
Access Tokens allow external tools and scripts to connect to your Databricks workspace programmatically.
The Databricks CLI automates workspace management tasks from the command line.

Previous lesson

Back to course

Next lesson