Snowflake Best Practices for Production Deployments
Moving a Snowflake project from development to production is about more than copying SQL scripts to a live environment. Production systems need to be reliable, secure, cost-efficient, maintainable, and observable. This page brings together proven best practices across architecture, security, performance, cost management, and operations — the kind of guidance that separates a well-run production system from one that creates constant problems.
1. Architecture and Environment Separation
Always Use Multiple Environments
Every production Snowflake deployment needs at least three environments: development, staging (or UAT), and production. Never test new transformations or data loads directly in production.
Diagram: The Airline Safety Process
Before a new aircraft design carries passengers, it goes through wind tunnel testing, then test flights with no passengers, then limited route operations, and finally full commercial service. Each stage catches problems before they affect real customers. Your Snowflake environments follow the same logic: DEV catches mistakes, UAT validates with real data, PROD serves real users.
-- Naming convention for multi-environment setup
-- Each environment is a separate database or separate Snowflake account
-- Option 1: Separate databases within one account (suitable for small teams)
CREATE DATABASE shopstream_dev;
CREATE DATABASE shopstream_uat;
CREATE DATABASE shopstream_prod;
-- Option 2: Separate Snowflake accounts (recommended for large teams and strict isolation)
-- DEV account: developers.snowflakecomputing.com
-- UAT account: staging.snowflakecomputing.com
-- PROD account: prod.snowflakecomputing.com
-- Use consistent naming across all environments
-- Schema names, table names, and procedure names should match exactly
-- Only connection parameters change between environmentsUse Infrastructure as Code
Define Snowflake objects (warehouses, databases, roles, tables) in version-controlled code rather than running ad-hoc SQL commands. Tools like Terraform, Pulumi, and Schemachange manage Snowflake infrastructure as code.
-- Example: Store every DDL change in a versioned migration file
-- migrations/V001__create_orders_schema.sql
CREATE SCHEMA IF NOT EXISTS shopstream_prod.orders;
CREATE TABLE IF NOT EXISTS shopstream_prod.orders.raw_orders (
order_id VARCHAR(50),
customer_id VARCHAR(50),
order_date DATE,
amount NUMBER(12, 2),
_loaded_at TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP
);
-- migrations/V002__add_region_column.sql
ALTER TABLE shopstream_prod.orders.raw_orders
ADD COLUMN IF NOT EXISTS region VARCHAR(50);
-- Run migrations with Schemachange or Flybase
-- schemachange --snowflake-account your_account --change-history-table migrations.history2. Role and Permission Design
Follow the Least Privilege Principle
Every user and every service account gets only the minimum permissions needed for their job. No user runs production jobs with ACCOUNTADMIN. No BI tool has write access to raw tables.
-- Role hierarchy for a production Snowflake account
-- ACCOUNTADMIN: Only for account-level administration. Not used for daily work.
-- SYSADMIN: Creates databases, warehouses, and assigns ownership.
-- SECURITYADMIN: Creates roles and users.
-- Custom roles sit below these.
-- Create purpose-specific roles
CREATE ROLE etl_prod_role COMMENT = 'Data pipeline service accounts';
CREATE ROLE analyst_role COMMENT = 'Business analysts - read-only on presentation layer';
CREATE ROLE data_engineer_role COMMENT = 'Data engineering team - full access to all layers';
CREATE ROLE bi_tool_role COMMENT = 'Tableau, Power BI service accounts - read-only';
CREATE ROLE dba_role COMMENT = 'Database administrators - can create/alter objects';
-- Grant roles to parent roles (role hierarchy)
GRANT ROLE etl_prod_role TO ROLE data_engineer_role;
GRANT ROLE analyst_role TO ROLE data_engineer_role;
-- Grant specific privileges to ETL role
GRANT USAGE ON DATABASE shopstream_prod TO ROLE etl_prod_role;
GRANT USAGE ON ALL SCHEMAS IN DATABASE shopstream_prod TO ROLE etl_prod_role;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA shopstream_prod.raw TO ROLE etl_prod_role;
GRANT SELECT ON ALL TABLES IN SCHEMA shopstream_prod.presentation TO ROLE etl_prod_role;
-- Analysts get read-only access to presentation layer only
GRANT USAGE ON DATABASE shopstream_prod TO ROLE analyst_role;
GRANT USAGE ON SCHEMA shopstream_prod.presentation TO ROLE analyst_role;
GRANT SELECT ON ALL TABLES IN SCHEMA shopstream_prod.presentation TO ROLE analyst_role;
GRANT SELECT ON ALL VIEWS IN SCHEMA shopstream_prod.presentation TO ROLE analyst_role;
-- Auto-grant for future objects
GRANT SELECT ON FUTURE TABLES IN SCHEMA shopstream_prod.presentation TO ROLE analyst_role;
GRANT SELECT ON FUTURE VIEWS IN SCHEMA shopstream_prod.presentation TO ROLE analyst_role;Use Service Accounts for Automated Jobs
Never run automated pipelines using a human user's credentials. Create dedicated service accounts for each pipeline or application.
-- Create service accounts for pipelines
CREATE USER etl_pipeline_svc
DEFAULT_ROLE = etl_prod_role
DEFAULT_WAREHOUSE = etl_wh
COMMENT = 'Service account for data pipeline jobs'
MUST_CHANGE_PASSWORD = FALSE;
CREATE USER tableau_svc
DEFAULT_ROLE = bi_tool_role
DEFAULT_WAREHOUSE = reporting_wh
COMMENT = 'Service account for Tableau Server'
MUST_CHANGE_PASSWORD = FALSE;
-- Use key-pair authentication for service accounts (not password)
ALTER USER etl_pipeline_svc SET RSA_PUBLIC_KEY = 'your_public_key_here';3. Warehouse Configuration Best Practices
Size Warehouses for Workload Type
Different workloads need different warehouse sizes. A one-size-fits-all warehouse either wastes money (too large for simple queries) or becomes a bottleneck (too small for heavy transformations).
| Workload Type | Recommended Size | Auto-Suspend Setting |
|---|---|---|
| Ad-hoc analyst queries | XS to Small | 120 seconds |
| BI tool queries (Tableau, Power BI) | Small to Medium | 300 seconds |
| Hourly incremental ETL | Small to Medium | 60 seconds |
| Daily full batch transformation | Large to XL | 60 seconds after batch completes |
| Data science / ML feature engineering | XL to 2XL | 120 seconds |
-- Configure multi-cluster warehouses for variable concurrent load
CREATE WAREHOUSE reporting_wh
WAREHOUSE_SIZE = 'MEDIUM'
MIN_CLUSTER_COUNT = 1
MAX_CLUSTER_COUNT = 3 -- Scales up to 3 clusters during peak demand
SCALING_POLICY = 'ECONOMY' -- ECONOMY scales conservatively; STANDARD scales aggressively
AUTO_SUSPEND = 300
AUTO_RESUME = TRUE;4. Query Optimization Best Practices
Filter Early, Filter on Clustered Columns
Snowflake stores data in micro-partitions and uses metadata to skip entire partitions that do not match your filter. Put filters on the columns with the most selective values and on clustered columns to maximize partition pruning.
-- Good: Filters on clustered column (order_date) and selective column (region)
SELECT product_id, SUM(revenue)
FROM fact_orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-03-31' -- Clustered column first
AND region = 'North America'
GROUP BY product_id;
-- Bad: Scanning the entire table with a non-selective filter
SELECT product_id, SUM(revenue)
FROM fact_orders
WHERE YEAR(order_date) = 2024 -- Function on column prevents partition pruning!
GROUP BY product_id;
-- Check how many partitions your query scans
-- Look at the query profile in Snowsight for "Partitions scanned" vs "Partitions total"Use COPY INTO Instead of INSERT for Bulk Loads
-- Good: COPY INTO is optimized for bulk loading
COPY INTO raw_orders
FROM @orders_stage/2024/06/
FILE_FORMAT = (TYPE = 'CSV' SKIP_HEADER = 1)
ON_ERROR = 'CONTINUE';
-- Bad: INSERT INTO SELECT for large datasets is slower and more expensive
INSERT INTO raw_orders
SELECT * FROM external_table_reading_from_stage;Avoid SELECT * in Production Queries
-- Bad: Reads all columns, wastes I/O on columns you don't need
SELECT * FROM fact_order_items WHERE order_date = '2024-06-01';
-- Good: Read only the columns the query needs
SELECT order_id, customer_id, net_revenue, order_date
FROM fact_order_items
WHERE order_date = '2024-06-01';5. Data Quality Best Practices
Validate Data at Every Layer
-- After every major load, run validation checks
-- Create a reusable validation stored procedure
CREATE OR REPLACE PROCEDURE validate_load(
table_name STRING,
expected_min_rows INT,
key_column STRING
)
RETURNS STRING
LANGUAGE SQL
AS
$$
DECLARE
actual_rows INT;
null_keys INT;
dupes INT;
BEGIN
EXECUTE IMMEDIATE 'SELECT COUNT(*) FROM ' || table_name INTO actual_rows;
IF actual_rows < expected_min_rows THEN
RETURN 'FAIL: Expected at least ' || expected_min_rows || ' rows, got ' || actual_rows;
END IF;
EXECUTE IMMEDIATE
'SELECT COUNT(*) FROM ' || table_name || ' WHERE ' || key_column || ' IS NULL'
INTO null_keys;
IF null_keys > 0 THEN
RETURN 'FAIL: Found ' || null_keys || ' null values in ' || key_column;
END IF;
EXECUTE IMMEDIATE
'SELECT COUNT(*) FROM (SELECT ' || key_column || ' FROM ' || table_name ||
' GROUP BY ' || key_column || ' HAVING COUNT(*) > 1)'
INTO dupes;
IF dupes > 0 THEN
RETURN 'WARN: Found ' || dupes || ' duplicate values in ' || key_column;
END IF;
RETURN 'PASS: ' || actual_rows || ' rows, no nulls, no duplicates in ' || key_column;
END;
$$;
CALL validate_load('shopstream_presentation.dim.dim_customer', 1000, 'customer_id');6. Security Best Practices
Enable Multi-Factor Authentication
Require MFA for all human users, especially those with elevated privileges like SYSADMIN and SECURITYADMIN.
-- Enforce MFA for a user
ALTER USER john.smith SET MINS_TO_BYPASS_MFA = 0; -- Never bypass MFA
-- Enable network policy to restrict access to known IP ranges
CREATE NETWORK POLICY office_and_vpn_only
ALLOWED_IP_LIST = ('203.0.113.0/24', '198.51.100.50')
COMMENT = 'Only allow connections from office network and VPN';
ALTER ACCOUNT SET NETWORK_POLICY = office_and_vpn_only;Use Dynamic Data Masking for Sensitive Columns
-- Create a masking policy for email addresses
CREATE OR REPLACE MASKING POLICY mask_email AS (val STRING) RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN ('DATA_ENGINEER_ROLE', 'DBA_ROLE') THEN val
ELSE REGEXP_REPLACE(val, '.+@', '*****@')
END;
-- Apply the masking policy to the email column
ALTER TABLE dim_customer MODIFY COLUMN email
SET MASKING POLICY mask_email;
-- A DATA_ENGINEER sees: john.smith@example.com
-- An ANALYST sees: *****@example.com7. Cost Management Best Practices
Set Resource Monitors on Every Warehouse
-- Every production warehouse should have a resource monitor
CREATE OR REPLACE RESOURCE MONITOR etl_monthly_limit
CREDIT_QUOTA = 200
FREQUENCY = MONTHLY
START_TIMESTAMP = IMMEDIATELY
TRIGGERS
ON 70 PERCENT DO NOTIFY
ON 90 PERCENT DO NOTIFY
ON 100 PERCENT DO SUSPEND;
ALTER WAREHOUSE etl_wh SET RESOURCE_MONITOR = etl_monthly_limit;Use Zero-Copy Cloning for Development Environments
Instead of copying large datasets from production to development (which costs storage and time), use Snowflake's zero-copy clone feature. A clone shares the same underlying data storage as the original until changes are made.
-- Clone the production database for a development environment
-- This is instant and uses zero additional storage until data changes
CREATE DATABASE shopstream_dev CLONE shopstream_prod;
-- Clone a specific table for testing a new transformation
CREATE TABLE orders_backup CLONE fact_order_items;
-- Developers can now modify shopstream_dev without affecting prod
-- Only changed data in the clone incurs additional storage cost8. Pipeline Reliability Best Practices
Use Idempotent Loads
An idempotent operation produces the same result whether you run it once or ten times. Design your pipelines so re-running them on failure does not create duplicate data.
-- Bad: INSERT always adds rows, creating duplicates on re-run
INSERT INTO fact_orders SELECT * FROM stg_orders WHERE order_date = '2024-06-01';
-- Good: MERGE handles inserts and updates without duplicates
MERGE INTO fact_orders AS target
USING (
SELECT * FROM stg_orders WHERE order_date = '2024-06-01'
) AS source
ON target.order_id = source.order_id
WHEN MATCHED THEN
UPDATE SET
target.net_revenue = source.net_revenue,
target.order_status = source.order_status,
target.dw_updated_at = CURRENT_TIMESTAMP
WHEN NOT MATCHED THEN
INSERT (order_id, customer_id, net_revenue, order_date, order_status, dw_loaded_at)
VALUES (source.order_id, source.customer_id, source.net_revenue,
source.order_date, source.order_status, CURRENT_TIMESTAMP);Log Every Pipeline Run
-- Create a pipeline execution log table
CREATE TABLE shopstream_admin.pipeline_log (
log_id INTEGER AUTOINCREMENT PRIMARY KEY,
pipeline_name VARCHAR(200),
run_start TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP,
run_end TIMESTAMP_NTZ,
status VARCHAR(20),
rows_processed INTEGER,
error_message VARCHAR(2000)
);
-- Use in your stored procedures
CREATE OR REPLACE PROCEDURE run_daily_etl()
RETURNS STRING
LANGUAGE SQL
AS
$$
DECLARE
log_id_val INT;
row_count INT;
BEGIN
INSERT INTO shopstream_admin.pipeline_log (pipeline_name, status)
VALUES ('run_daily_etl', 'RUNNING');
SELECT MAX(log_id) INTO log_id_val FROM shopstream_admin.pipeline_log;
-- Run the actual pipeline
INSERT INTO fact_order_items SELECT * FROM stg_order_items WHERE ...;
SELECT COUNT(*) INTO row_count FROM stg_order_items WHERE ...;
UPDATE shopstream_admin.pipeline_log
SET run_end = CURRENT_TIMESTAMP, status = 'SUCCESS', rows_processed = row_count
WHERE log_id = log_id_val;
RETURN 'Completed: ' || row_count || ' rows';
EXCEPTION
WHEN OTHER THEN
UPDATE shopstream_admin.pipeline_log
SET run_end = CURRENT_TIMESTAMP, status = 'FAILED', error_message = SQLERRM
WHERE log_id = log_id_val;
RETURN 'Failed: ' || SQLERRM;
END;
$$;9. Documentation and Naming Conventions
Use Consistent Naming Conventions
- Databases:
projectname_environment(e.g.,shopstream_prod) - Schemas: single-word descriptive names (e.g.,
raw,staging,dim,fact,mart) - Tables: prefix indicates type —
raw_,stg_,dim_,fact_,v_for views - Warehouses: end in
_wh(e.g.,etl_wh,reporting_wh) - Roles: end in
_role(e.g.,analyst_role,etl_prod_role) - Tasks: end in
_taskand describe what they do (e.g.,load_raw_orders_task)
Add Comments to Every Object
-- Add descriptions to tables and columns
COMMENT ON TABLE dim_customer IS 'Customer dimension table. One row per unique customer. Updated daily.';
COMMENT ON COLUMN dim_customer.days_as_customer IS 'Number of days since the customer placed their first order.';
COMMENT ON COLUMN dim_customer.acquisition_channel IS 'Marketing channel that brought this customer. Values: organic, paid_search, social, referral, direct.';10. Observability and Alerting
Use Snowflake Alerts for Critical Failures
-- Create an alert that fires when pipeline fails
CREATE OR REPLACE ALERT etl_failure_alert
WAREHOUSE = monitoring_wh
SCHEDULE = '5 MINUTES'
IF (EXISTS (
SELECT 1
FROM shopstream_admin.pipeline_log
WHERE status = 'FAILED'
AND run_end >= DATEADD(minute, -10, CURRENT_TIMESTAMP)
))
THEN
CALL SYSTEM$SEND_EMAIL(
'ops_alerts',
'ops-team@shopstream.com',
'ALERT: Snowflake Pipeline Failure Detected',
'A pipeline failure was detected in the last 10 minutes. Check the pipeline_log table.'
);
ALTER ALERT etl_failure_alert RESUME;Key Points Summary
- Always maintain separate DEV, UAT, and PROD environments. Never test directly in production.
- Use Infrastructure as Code (Terraform, Schemachange) to manage Snowflake objects with version control.
- Follow the least privilege principle: every role gets only the permissions it needs. Use dedicated service accounts for automated jobs.
- Configure purpose-specific warehouses with appropriate sizes and auto-suspend settings for each workload type.
- Design pipelines to be idempotent: use MERGE instead of INSERT to prevent duplicates on re-runs.
- Apply Dynamic Data Masking to protect sensitive columns from unauthorized roles.
- Set Resource Monitors on every warehouse to prevent unexpected cost overruns.
- Use Zero-Copy Cloning to create development copies of production data instantly and cheaply.
- Log every pipeline run with start time, end time, row counts, and error messages. Alert automatically on failures.
- Use consistent naming conventions and add COMMENT descriptions to every object so the team always understands what each piece does.
