Snowflake Querying Data with SELECT and Filters
Querying data is the core skill every Snowflake user needs. Whether you are an analyst building reports, a data engineer writing transformation pipelines, or a data scientist exploring datasets, you express every data request as a SQL SELECT statement. Snowflake supports the full ANSI SQL standard plus many powerful extensions. This topic covers SELECT from basics through joins, aggregations, window functions, and Snowflake-specific SQL features.
The SELECT Statement: Anatomy of a Query
Every SQL query follows the same structure. Snowflake evaluates the clauses in a specific internal order — knowing this order helps you understand why certain errors occur and how to fix them.
SQL CLAUSE ORDER (written) SNOWFLAKE EVALUATION ORDER
-------------------------- --------------------------
SELECT columns 1. FROM (which table?)
FROM table 2. JOIN (combine tables)
JOIN other_table 3. WHERE (filter rows)
WHERE row_conditions 4. GROUP BY (group rows)
GROUP BY grouping_columns 5. HAVING (filter groups)
HAVING group_conditions 6. SELECT (pick columns)
ORDER BY sort_columns 7. DISTINCT (remove duplicates)
LIMIT n 8. ORDER BY (sort results)
9. LIMIT (trim output)
You write SELECT first, but Snowflake processes FROM first. That is why you cannot reference a column alias defined in SELECT inside a WHERE clause — WHERE runs before SELECT evaluates the alias.
Basic SELECT Examples
-- Select all columns (use sparingly on large tables — reads all columns)
SELECT * FROM RETAIL_DB.ANALYTICS.CUSTOMERS;
-- Select specific columns (always prefer this over SELECT *)
SELECT
customer_id,
first_name,
last_name,
email,
signup_date
FROM CUSTOMERS;
-- Rename columns with aliases
SELECT
customer_id AS id,
first_name || ' ' || last_name AS full_name,
UPPER(email) AS email_uppercase,
DATEDIFF('day', signup_date, CURRENT_DATE()) AS days_since_signup
FROM CUSTOMERS;
-- Select distinct values (removes duplicate rows)
SELECT DISTINCT country, city
FROM CUSTOMERS
ORDER BY country, city;
-- Limit results (useful during exploration)
SELECT * FROM ORDERS LIMIT 100;
Filtering Rows with WHERE
The WHERE clause filters rows before any aggregation happens. Only rows that satisfy the condition pass through to the rest of the query.
-- Simple equality filter
SELECT * FROM ORDERS WHERE status = 'SHIPPED';
-- Numeric range filter
SELECT * FROM ORDERS WHERE amount BETWEEN 100 AND 500;
-- Date filter
SELECT * FROM ORDERS WHERE order_date >= '2024-01-01';
-- Multiple conditions with AND / OR
SELECT * FROM ORDERS
WHERE status = 'SHIPPED'
AND order_date BETWEEN '2024-01-01' AND '2024-06-30'
AND amount > 200;
-- IN list filter (cleaner than many OR conditions)
SELECT * FROM ORDERS
WHERE status IN ('SHIPPED', 'DELIVERED', 'PROCESSING');
-- NOT IN (exclude values)
SELECT * FROM ORDERS
WHERE country NOT IN ('US', 'CA');
-- Pattern matching with LIKE
SELECT * FROM CUSTOMERS
WHERE email LIKE '%@gmail.com'; -- ends with @gmail.com
SELECT * FROM PRODUCTS
WHERE product_code LIKE 'EL-%'; -- starts with EL-
-- NULL handling (always use IS NULL, never = NULL)
SELECT * FROM CUSTOMERS WHERE phone_number IS NULL;
SELECT * FROM CUSTOMERS WHERE phone_number IS NOT NULL;
Joining Tables
Real data lives across multiple tables. JOINs combine rows from two or more tables based on a matching column. Understanding join types is essential for writing correct queries.
JOIN TYPE DIAGRAM ================== TABLE A (Customers) TABLE B (Orders) ------------------- ---------------- C001 Alice C001 Order#1 C002 Bob C001 Order#2 C003 Carol C004 Order#3 (no matching customer) (no C004 customer) INNER JOIN: Only rows matching in BOTH tables Result: Alice+Order#1, Alice+Order#2 Bob and Carol excluded (no orders), Order#3 excluded (no customer) LEFT JOIN: All rows from A, matching rows from B (NULL if no match) Result: Alice+Order#1, Alice+Order#2, Bob+NULL, Carol+NULL RIGHT JOIN: All rows from B, matching rows from A (NULL if no match) Result: Alice+Order#1, Alice+Order#2, NULL+Order#3 FULL OUTER JOIN: All rows from both tables Result: Alice+Order#1, Alice+Order#2, Bob+NULL, Carol+NULL, NULL+Order#3
-- INNER JOIN: customers who have placed at least one order
SELECT
c.customer_id,
c.first_name || ' ' || c.last_name AS customer_name,
o.order_id,
o.order_date,
o.amount
FROM CUSTOMERS c
INNER JOIN ORDERS o ON c.customer_id = o.customer_id
WHERE o.order_date >= '2024-01-01'
ORDER BY o.order_date DESC;
-- LEFT JOIN: all customers, with their order count (0 if no orders)
SELECT
c.customer_id,
c.first_name,
COUNT(o.order_id) AS total_orders,
SUM(o.amount) AS lifetime_value
FROM CUSTOMERS c
LEFT JOIN ORDERS o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.first_name
ORDER BY lifetime_value DESC NULLS LAST;
-- Multi-table join: customers + orders + products
SELECT
c.first_name || ' ' || c.last_name AS customer_name,
o.order_date,
p.product_name,
oi.quantity,
oi.unit_price,
(oi.quantity * oi.unit_price) AS line_total
FROM CUSTOMERS c
JOIN ORDERS o ON c.customer_id = o.customer_id
JOIN ORDER_ITEMS oi ON o.order_id = oi.order_id
JOIN PRODUCTS p ON oi.product_id = p.product_id
WHERE o.order_date >= '2024-01-01'
ORDER BY o.order_date, c.last_name;
Aggregating Data with GROUP BY
Aggregation collapses many rows into summary rows. GROUP BY defines which column values form a group. Every column in SELECT must either appear in GROUP BY or be wrapped in an aggregate function.
-- Total sales by month
SELECT
DATE_TRUNC('MONTH', order_date) AS sale_month,
COUNT(*) AS order_count,
COUNT(DISTINCT customer_id) AS unique_customers,
SUM(amount) AS total_revenue,
AVG(amount) AS avg_order_value,
MIN(amount) AS smallest_order,
MAX(amount) AS largest_order
FROM ORDERS
WHERE status = 'COMPLETED'
GROUP BY DATE_TRUNC('MONTH', order_date)
ORDER BY sale_month;
-- Sales by country and category with ROLLUP (adds subtotals)
SELECT
country,
product_category,
SUM(amount) AS revenue
FROM ORDERS o
JOIN ORDER_ITEMS oi ON o.order_id = oi.order_id
JOIN PRODUCTS p ON oi.product_id = p.product_id
GROUP BY ROLLUP(country, product_category)
ORDER BY country NULLS LAST, product_category NULLS LAST;
-- ROLLUP adds rows for: (country totals) and (grand total)
-- Filter aggregated results with HAVING
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(amount) AS total_spend
FROM ORDERS
GROUP BY customer_id
HAVING SUM(amount) > 5000 -- Only customers who spent more than $5000
AND COUNT(*) >= 3 -- And placed at least 3 orders
ORDER BY total_spend DESC;
Subqueries and CTEs
Complex queries often need intermediate results. Snowflake supports both inline subqueries and Common Table Expressions (CTEs). CTEs are strongly preferred because they are easier to read, test independently, and reuse within the same query.
-- Subquery approach (harder to read)
SELECT customer_id, total_spend
FROM (
SELECT customer_id, SUM(amount) AS total_spend
FROM ORDERS
GROUP BY customer_id
) ranked_customers
WHERE total_spend > 10000;
-- CTE approach (much clearer)
WITH customer_spend AS (
SELECT
customer_id,
SUM(amount) AS total_spend,
COUNT(*) AS order_count,
MAX(order_date) AS last_order_date
FROM ORDERS
GROUP BY customer_id
),
high_value AS (
SELECT *
FROM customer_spend
WHERE total_spend > 10000
)
SELECT
c.first_name || ' ' || c.last_name AS customer_name,
hv.total_spend,
hv.order_count,
hv.last_order_date
FROM high_value hv
JOIN CUSTOMERS c ON hv.customer_id = c.customer_id
ORDER BY hv.total_spend DESC;
Window Functions: Running Calculations Across Rows
Window functions compute values across a set of related rows without collapsing them into a single summary row. They are one of the most powerful tools in SQL for analytics.
WINDOW FUNCTION CONCEPT ======================== Without window function (GROUP BY): Customer | Total_Spend Alice | 5,200 (collapsed — individual orders gone) Bob | 3,100 With window function: Order_ID | Customer | Amount | Cumulative_Spend | Rank 1001 | Alice | 2,500 | 2,500 | 1 1002 | Alice | 1,800 | 4,300 | 2 1003 | Alice | 900 | 5,200 | 3 (individual rows kept — window function adds a calculated column)
-- Rank customers by spending within each country
SELECT
customer_id,
country,
total_spend,
RANK() OVER (PARTITION BY country ORDER BY total_spend DESC) AS country_rank,
DENSE_RANK() OVER (PARTITION BY country ORDER BY total_spend DESC) AS dense_rank,
ROW_NUMBER() OVER (PARTITION BY country ORDER BY total_spend DESC) AS row_num
FROM (
SELECT o.customer_id, c.country, SUM(o.amount) AS total_spend
FROM ORDERS o JOIN CUSTOMERS c ON o.customer_id = c.customer_id
GROUP BY o.customer_id, c.country
);
-- Running total of revenue by date
SELECT
order_date,
daily_revenue,
SUM(daily_revenue) OVER (ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS cumulative_revenue
FROM (
SELECT order_date, SUM(amount) AS daily_revenue
FROM ORDERS
GROUP BY order_date
)
ORDER BY order_date;
-- 7-day moving average of daily sales
SELECT
order_date,
daily_revenue,
AVG(daily_revenue) OVER (ORDER BY order_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
AS moving_avg_7day
FROM (
SELECT order_date, SUM(amount) AS daily_revenue
FROM ORDERS GROUP BY order_date
)
ORDER BY order_date;
-- LAG and LEAD: compare current row to previous/next row
SELECT
order_date,
daily_revenue,
LAG(daily_revenue, 1) OVER (ORDER BY order_date) AS prev_day_revenue,
LEAD(daily_revenue, 1) OVER (ORDER BY order_date) AS next_day_revenue,
daily_revenue - LAG(daily_revenue, 1) OVER (ORDER BY order_date) AS day_over_day_change
FROM (
SELECT order_date, SUM(amount) AS daily_revenue
FROM ORDERS GROUP BY order_date
)
ORDER BY order_date;
Useful Snowflake-Specific Functions
-- Date functions
SELECT
CURRENT_DATE(), -- today's date
CURRENT_TIMESTAMP(), -- current datetime with timezone
DATEADD('day', 30, CURRENT_DATE()), -- 30 days from now
DATEDIFF('day', '2024-01-01', CURRENT_DATE()), -- days since Jan 1
DATE_TRUNC('month', CURRENT_DATE()), -- first day of current month
LAST_DAY(CURRENT_DATE()), -- last day of current month
DAYOFWEEK(CURRENT_DATE()), -- 0=Sunday through 6=Saturday
TO_CHAR(CURRENT_DATE(), 'YYYY-MM-DD'); -- format date as string
-- String functions
SELECT
UPPER('hello world'), -- HELLO WORLD
LOWER('HELLO WORLD'), -- hello world
TRIM(' hello '), -- hello
LTRIM(' hello'), -- hello
LENGTH('snowflake'), -- 9
SUBSTRING('snowflake', 1, 4), -- snow
REPLACE('snowflake', 'snow', 'ice'), -- iceflake
SPLIT_PART('first.last@email.com', '@', 1), -- first.last
REGEXP_REPLACE(phone, '[^0-9]', ''), -- remove non-digits from phone
-- Conditional functions
IFF(amount > 1000, 'High', 'Low'), -- simple if-else
NVL(phone_number, 'No Phone'), -- replace NULL with default
COALESCE(phone, mobile, 'No Contact'), -- first non-NULL value
NULLIF(status, 'UNKNOWN'); -- return NULL if value equals arg
-- CASE expression (flexible conditional)
SELECT
order_id,
amount,
CASE
WHEN amount < 100 THEN 'Small'
WHEN amount < 500 THEN 'Medium'
WHEN amount < 2000 THEN 'Large'
ELSE 'Enterprise'
END AS order_tier
FROM ORDERS;
Key Points
- Snowflake evaluates SQL clauses in the order FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY — this determines what is visible at each stage
- Always specify columns explicitly rather than using SELECT * on large tables — this reduces data scanned and improves performance
- Use CTEs (WITH clauses) instead of nested subqueries for complex queries — they are easier to read, debug, and reuse
- Window functions (RANK, SUM OVER, LAG, LEAD) compute row-level analytics without collapsing your dataset like GROUP BY does
- HAVING filters groups after aggregation — use it instead of WHERE when filtering on aggregate values like SUM() or COUNT()
- Snowflake's date, string, and conditional functions cover the vast majority of data transformation needs without any external tooling
