Snowflake Querying Data with SELECT and Filters

Querying data is the core skill every Snowflake user needs. Whether you are an analyst building reports, a data engineer writing transformation pipelines, or a data scientist exploring datasets, you express every data request as a SQL SELECT statement. Snowflake supports the full ANSI SQL standard plus many powerful extensions. This topic covers SELECT from basics through joins, aggregations, window functions, and Snowflake-specific SQL features.

The SELECT Statement: Anatomy of a Query

Every SQL query follows the same structure. Snowflake evaluates the clauses in a specific internal order — knowing this order helps you understand why certain errors occur and how to fix them.

SQL CLAUSE ORDER (written)    SNOWFLAKE EVALUATION ORDER
--------------------------    --------------------------
SELECT    columns             1. FROM        (which table?)
FROM      table               2. JOIN        (combine tables)
JOIN      other_table         3. WHERE       (filter rows)
WHERE     row_conditions      4. GROUP BY    (group rows)
GROUP BY  grouping_columns    5. HAVING      (filter groups)
HAVING    group_conditions    6. SELECT      (pick columns)
ORDER BY  sort_columns        7. DISTINCT    (remove duplicates)
LIMIT     n                   8. ORDER BY    (sort results)
                              9. LIMIT       (trim output)

You write SELECT first, but Snowflake processes FROM first. That is why you cannot reference a column alias defined in SELECT inside a WHERE clause — WHERE runs before SELECT evaluates the alias.

Basic SELECT Examples

-- Select all columns (use sparingly on large tables — reads all columns)
SELECT * FROM RETAIL_DB.ANALYTICS.CUSTOMERS;

-- Select specific columns (always prefer this over SELECT *)
SELECT
    customer_id,
    first_name,
    last_name,
    email,
    signup_date
FROM CUSTOMERS;

-- Rename columns with aliases
SELECT
    customer_id                              AS id,
    first_name || ' ' || last_name          AS full_name,
    UPPER(email)                             AS email_uppercase,
    DATEDIFF('day', signup_date, CURRENT_DATE()) AS days_since_signup
FROM CUSTOMERS;

-- Select distinct values (removes duplicate rows)
SELECT DISTINCT country, city
FROM CUSTOMERS
ORDER BY country, city;

-- Limit results (useful during exploration)
SELECT * FROM ORDERS LIMIT 100;

Filtering Rows with WHERE

The WHERE clause filters rows before any aggregation happens. Only rows that satisfy the condition pass through to the rest of the query.

-- Simple equality filter
SELECT * FROM ORDERS WHERE status = 'SHIPPED';

-- Numeric range filter
SELECT * FROM ORDERS WHERE amount BETWEEN 100 AND 500;

-- Date filter
SELECT * FROM ORDERS WHERE order_date >= '2024-01-01';

-- Multiple conditions with AND / OR
SELECT * FROM ORDERS
WHERE status = 'SHIPPED'
  AND order_date BETWEEN '2024-01-01' AND '2024-06-30'
  AND amount > 200;

-- IN list filter (cleaner than many OR conditions)
SELECT * FROM ORDERS
WHERE status IN ('SHIPPED', 'DELIVERED', 'PROCESSING');

-- NOT IN (exclude values)
SELECT * FROM ORDERS
WHERE country NOT IN ('US', 'CA');

-- Pattern matching with LIKE
SELECT * FROM CUSTOMERS
WHERE email LIKE '%@gmail.com';    -- ends with @gmail.com

SELECT * FROM PRODUCTS
WHERE product_code LIKE 'EL-%';   -- starts with EL-

-- NULL handling (always use IS NULL, never = NULL)
SELECT * FROM CUSTOMERS WHERE phone_number IS NULL;
SELECT * FROM CUSTOMERS WHERE phone_number IS NOT NULL;

Joining Tables

Real data lives across multiple tables. JOINs combine rows from two or more tables based on a matching column. Understanding join types is essential for writing correct queries.

JOIN TYPE DIAGRAM
==================

TABLE A (Customers)     TABLE B (Orders)
-------------------     ----------------
C001 Alice              C001  Order#1
C002 Bob                C001  Order#2
C003 Carol              C004  Order#3   (no matching customer)
(no C004 customer)

INNER JOIN: Only rows matching in BOTH tables
  Result: Alice+Order#1, Alice+Order#2
  Bob and Carol excluded (no orders), Order#3 excluded (no customer)

LEFT JOIN: All rows from A, matching rows from B (NULL if no match)
  Result: Alice+Order#1, Alice+Order#2, Bob+NULL, Carol+NULL

RIGHT JOIN: All rows from B, matching rows from A (NULL if no match)
  Result: Alice+Order#1, Alice+Order#2, NULL+Order#3

FULL OUTER JOIN: All rows from both tables
  Result: Alice+Order#1, Alice+Order#2, Bob+NULL, Carol+NULL, NULL+Order#3
-- INNER JOIN: customers who have placed at least one order
SELECT
    c.customer_id,
    c.first_name || ' ' || c.last_name  AS customer_name,
    o.order_id,
    o.order_date,
    o.amount
FROM CUSTOMERS c
INNER JOIN ORDERS o ON c.customer_id = o.customer_id
WHERE o.order_date >= '2024-01-01'
ORDER BY o.order_date DESC;

-- LEFT JOIN: all customers, with their order count (0 if no orders)
SELECT
    c.customer_id,
    c.first_name,
    COUNT(o.order_id)   AS total_orders,
    SUM(o.amount)       AS lifetime_value
FROM CUSTOMERS c
LEFT JOIN ORDERS o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.first_name
ORDER BY lifetime_value DESC NULLS LAST;

-- Multi-table join: customers + orders + products
SELECT
    c.first_name || ' ' || c.last_name  AS customer_name,
    o.order_date,
    p.product_name,
    oi.quantity,
    oi.unit_price,
    (oi.quantity * oi.unit_price)       AS line_total
FROM CUSTOMERS c
JOIN ORDERS o        ON c.customer_id = o.customer_id
JOIN ORDER_ITEMS oi  ON o.order_id    = oi.order_id
JOIN PRODUCTS p      ON oi.product_id = p.product_id
WHERE o.order_date >= '2024-01-01'
ORDER BY o.order_date, c.last_name;

Aggregating Data with GROUP BY

Aggregation collapses many rows into summary rows. GROUP BY defines which column values form a group. Every column in SELECT must either appear in GROUP BY or be wrapped in an aggregate function.

-- Total sales by month
SELECT
    DATE_TRUNC('MONTH', order_date)     AS sale_month,
    COUNT(*)                            AS order_count,
    COUNT(DISTINCT customer_id)         AS unique_customers,
    SUM(amount)                         AS total_revenue,
    AVG(amount)                         AS avg_order_value,
    MIN(amount)                         AS smallest_order,
    MAX(amount)                         AS largest_order
FROM ORDERS
WHERE status = 'COMPLETED'
GROUP BY DATE_TRUNC('MONTH', order_date)
ORDER BY sale_month;

-- Sales by country and category with ROLLUP (adds subtotals)
SELECT
    country,
    product_category,
    SUM(amount)  AS revenue
FROM ORDERS o
JOIN ORDER_ITEMS oi ON o.order_id = oi.order_id
JOIN PRODUCTS p ON oi.product_id = p.product_id
GROUP BY ROLLUP(country, product_category)
ORDER BY country NULLS LAST, product_category NULLS LAST;
-- ROLLUP adds rows for: (country totals) and (grand total)

-- Filter aggregated results with HAVING
SELECT
    customer_id,
    COUNT(*)    AS order_count,
    SUM(amount) AS total_spend
FROM ORDERS
GROUP BY customer_id
HAVING SUM(amount) > 5000        -- Only customers who spent more than $5000
   AND COUNT(*) >= 3             -- And placed at least 3 orders
ORDER BY total_spend DESC;

Subqueries and CTEs

Complex queries often need intermediate results. Snowflake supports both inline subqueries and Common Table Expressions (CTEs). CTEs are strongly preferred because they are easier to read, test independently, and reuse within the same query.

-- Subquery approach (harder to read)
SELECT customer_id, total_spend
FROM (
    SELECT customer_id, SUM(amount) AS total_spend
    FROM ORDERS
    GROUP BY customer_id
) ranked_customers
WHERE total_spend > 10000;

-- CTE approach (much clearer)
WITH customer_spend AS (
    SELECT
        customer_id,
        SUM(amount)             AS total_spend,
        COUNT(*)                AS order_count,
        MAX(order_date)         AS last_order_date
    FROM ORDERS
    GROUP BY customer_id
),
high_value AS (
    SELECT *
    FROM customer_spend
    WHERE total_spend > 10000
)
SELECT
    c.first_name || ' ' || c.last_name  AS customer_name,
    hv.total_spend,
    hv.order_count,
    hv.last_order_date
FROM high_value hv
JOIN CUSTOMERS c ON hv.customer_id = c.customer_id
ORDER BY hv.total_spend DESC;

Window Functions: Running Calculations Across Rows

Window functions compute values across a set of related rows without collapsing them into a single summary row. They are one of the most powerful tools in SQL for analytics.

WINDOW FUNCTION CONCEPT
========================

Without window function (GROUP BY):
  Customer | Total_Spend
  Alice    | 5,200          (collapsed — individual orders gone)
  Bob      | 3,100

With window function:
  Order_ID | Customer | Amount | Cumulative_Spend | Rank
  1001     | Alice    | 2,500  | 2,500            | 1
  1002     | Alice    | 1,800  | 4,300            | 2
  1003     | Alice    | 900    | 5,200            | 3
  (individual rows kept — window function adds a calculated column)
-- Rank customers by spending within each country
SELECT
    customer_id,
    country,
    total_spend,
    RANK()       OVER (PARTITION BY country ORDER BY total_spend DESC) AS country_rank,
    DENSE_RANK() OVER (PARTITION BY country ORDER BY total_spend DESC) AS dense_rank,
    ROW_NUMBER() OVER (PARTITION BY country ORDER BY total_spend DESC) AS row_num
FROM (
    SELECT o.customer_id, c.country, SUM(o.amount) AS total_spend
    FROM ORDERS o JOIN CUSTOMERS c ON o.customer_id = c.customer_id
    GROUP BY o.customer_id, c.country
);

-- Running total of revenue by date
SELECT
    order_date,
    daily_revenue,
    SUM(daily_revenue) OVER (ORDER BY order_date
                             ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
                            AS cumulative_revenue
FROM (
    SELECT order_date, SUM(amount) AS daily_revenue
    FROM ORDERS
    GROUP BY order_date
)
ORDER BY order_date;

-- 7-day moving average of daily sales
SELECT
    order_date,
    daily_revenue,
    AVG(daily_revenue) OVER (ORDER BY order_date
                             ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
                            AS moving_avg_7day
FROM (
    SELECT order_date, SUM(amount) AS daily_revenue
    FROM ORDERS GROUP BY order_date
)
ORDER BY order_date;

-- LAG and LEAD: compare current row to previous/next row
SELECT
    order_date,
    daily_revenue,
    LAG(daily_revenue,  1) OVER (ORDER BY order_date) AS prev_day_revenue,
    LEAD(daily_revenue, 1) OVER (ORDER BY order_date) AS next_day_revenue,
    daily_revenue - LAG(daily_revenue, 1) OVER (ORDER BY order_date) AS day_over_day_change
FROM (
    SELECT order_date, SUM(amount) AS daily_revenue
    FROM ORDERS GROUP BY order_date
)
ORDER BY order_date;

Useful Snowflake-Specific Functions

-- Date functions
SELECT
    CURRENT_DATE(),                          -- today's date
    CURRENT_TIMESTAMP(),                     -- current datetime with timezone
    DATEADD('day', 30, CURRENT_DATE()),      -- 30 days from now
    DATEDIFF('day', '2024-01-01', CURRENT_DATE()),  -- days since Jan 1
    DATE_TRUNC('month', CURRENT_DATE()),     -- first day of current month
    LAST_DAY(CURRENT_DATE()),               -- last day of current month
    DAYOFWEEK(CURRENT_DATE()),              -- 0=Sunday through 6=Saturday
    TO_CHAR(CURRENT_DATE(), 'YYYY-MM-DD');  -- format date as string

-- String functions
SELECT
    UPPER('hello world'),                   -- HELLO WORLD
    LOWER('HELLO WORLD'),                   -- hello world
    TRIM('  hello  '),                      -- hello
    LTRIM('   hello'),                      -- hello
    LENGTH('snowflake'),                    -- 9
    SUBSTRING('snowflake', 1, 4),          -- snow
    REPLACE('snowflake', 'snow', 'ice'),   -- iceflake
    SPLIT_PART('first.last@email.com', '@', 1),  -- first.last
    REGEXP_REPLACE(phone, '[^0-9]', ''),   -- remove non-digits from phone

-- Conditional functions
    IFF(amount > 1000, 'High', 'Low'),     -- simple if-else
    NVL(phone_number, 'No Phone'),         -- replace NULL with default
    COALESCE(phone, mobile, 'No Contact'), -- first non-NULL value
    NULLIF(status, 'UNKNOWN');             -- return NULL if value equals arg

-- CASE expression (flexible conditional)
SELECT
    order_id,
    amount,
    CASE
        WHEN amount < 100  THEN 'Small'
        WHEN amount < 500  THEN 'Medium'
        WHEN amount < 2000 THEN 'Large'
        ELSE                    'Enterprise'
    END AS order_tier
FROM ORDERS;

Key Points

  • Snowflake evaluates SQL clauses in the order FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY — this determines what is visible at each stage
  • Always specify columns explicitly rather than using SELECT * on large tables — this reduces data scanned and improves performance
  • Use CTEs (WITH clauses) instead of nested subqueries for complex queries — they are easier to read, debug, and reuse
  • Window functions (RANK, SUM OVER, LAG, LEAD) compute row-level analytics without collapsing your dataset like GROUP BY does
  • HAVING filters groups after aggregation — use it instead of WHERE when filtering on aggregate values like SUM() or COUNT()
  • Snowflake's date, string, and conditional functions cover the vast majority of data transformation needs without any external tooling

Leave a Comment

Your email address will not be published. Required fields are marked *