Loading Data Into Snowflake Using COPY INTO

Loading data into Snowflake is the first step in any real project. Before you can query, transform, or share data, you must get it into a Snowflake table. The primary mechanism for bulk data loading is the COPY INTO command. It reads files from a staging area — either inside Snowflake or in cloud storage — and loads them into a target table efficiently and in parallel.

This topic covers the full data loading workflow: creating a stage, uploading files, verifying staged files, and running COPY INTO with practical examples for CSV, JSON, and Parquet formats.

The Data Loading Workflow: A Post Office Analogy

Loading data into Snowflake works like a post office system. Your raw data files are parcels. A stage is the sorting depot where parcels arrive before being delivered. COPY INTO is the delivery truck that takes parcels from the depot and delivers them to the correct table (the recipient's address).

DATA LOADING PIPELINE
======================

[Source Data]         [Stage]              [Snowflake Table]
     |                   |                        |
CSV files  ----PUT-----> Internal Stage           |
S3 bucket  -----------> External Stage --COPY INTO-> ORDERS table
Azure Blob -----------> External Stage --COPY INTO-> CUSTOMERS table
GCS bucket -----------> External Stage --COPY INTO-> PRODUCTS table

Stage Types: Internal vs External

A stage is a pointer to a location where data files live before loading. Snowflake supports two kinds of stages.

STAGE TYPE        WHERE FILES LIVE           BEST FOR
----------        ----------------           --------
Internal Stage    Inside Snowflake's own     Small files, quick uploads
                  managed cloud storage      via PUT command

External Stage    Your own S3 / Azure Blob   Large files, automated
                  / GCS bucket               pipelines, existing cloud data

Named Stage       Either type, with a name   Reusable reference to a location
Table Stage       Auto-created per table     Single-table quick loads
User Stage        Auto-created per user      Personal workspace uploads

Creating a Named Internal Stage

A named internal stage is a dedicated storage area inside Snowflake. You create it once and reference it by name in all your COPY INTO commands.

USE DATABASE RETAIL_DB;
USE SCHEMA RAW_DATA;

-- Create a simple internal stage
CREATE STAGE my_internal_stage
  COMMENT = 'Internal stage for retail data CSV files';

-- Create an internal stage with compression and file format settings
CREATE STAGE orders_stage
  FILE_FORMAT = (
    TYPE            = 'CSV'
    FIELD_DELIMITER = ','
    SKIP_HEADER     = 1
    NULL_IF         = ('NULL', 'null', '')
    EMPTY_FIELD_AS_NULL = TRUE
  )
  COMMENT = 'Stage for loading order CSV files';

Uploading Files to an Internal Stage with PUT

The PUT command uploads a file from your local machine to an internal stage. PUT runs from SnowSQL (the command-line client) or from the Snowsight file upload UI — not from a worksheet SQL editor directly.

-- From SnowSQL command line:
PUT file:///home/user/data/orders_2024.csv @orders_stage
  AUTO_COMPRESS = TRUE
  PARALLEL = 4;

-- PUT from Windows:
PUT file://C:\data\orders_2024.csv @orders_stage AUTO_COMPRESS=TRUE;

-- Verify the file arrived in the stage
LIST @orders_stage;

-- Output:
-- name                              size       md5                               last_modified
-- orders_stage/orders_2024.csv.gz   1245678   a1b2c3d4e5f6...                  2024-06-15 10:30:00.000

Creating a Named External Stage (Amazon S3)

An external stage points to a cloud bucket you already own. Snowflake reads files from there during COPY INTO — the files never move into Snowflake's own storage.

-- Create an external stage pointing to an S3 bucket
-- Using storage integration (recommended, no hardcoded keys)
CREATE STORAGE INTEGRATION s3_retail_integration
  TYPE                      = EXTERNAL_STAGE
  STORAGE_PROVIDER          = 'S3'
  ENABLED                   = TRUE
  STORAGE_ALLOWED_LOCATIONS = ('s3://my-retail-bucket/data/');

-- Create the external stage using the integration
CREATE STAGE s3_orders_stage
  URL             = 's3://my-retail-bucket/data/orders/'
  STORAGE_INTEGRATION = s3_retail_integration
  FILE_FORMAT     = (TYPE = 'CSV' SKIP_HEADER = 1)
  COMMENT         = 'External stage pointing to S3 orders folder';

-- List files visible in the external stage
LIST @s3_orders_stage;

File Format Objects: Reusable Format Definitions

Instead of repeating file format options in every COPY INTO command, define a named FILE FORMAT object once and reference it by name.

-- Create a reusable CSV file format
CREATE FILE FORMAT csv_standard
  TYPE              = 'CSV'
  FIELD_DELIMITER   = ','
  RECORD_DELIMITER  = '\n'
  SKIP_HEADER       = 1
  FIELD_OPTIONALLY_ENCLOSED_BY = '"'
  NULL_IF           = ('NULL', 'null', 'N/A', '')
  EMPTY_FIELD_AS_NULL = TRUE
  DATE_FORMAT       = 'YYYY-MM-DD'
  TIMESTAMP_FORMAT  = 'YYYY-MM-DD HH24:MI:SS';

-- Create a JSON file format
CREATE FILE FORMAT json_standard
  TYPE              = 'JSON'
  STRIP_OUTER_ARRAY = TRUE
  NULL_IF           = ('NULL', 'null');

-- Create a Parquet file format
CREATE FILE FORMAT parquet_standard
  TYPE              = 'PARQUET'
  SNAPPY_COMPRESSION = TRUE;

Loading CSV Data with COPY INTO

The COPY INTO command reads staged files and inserts rows into a target table. It processes multiple files in parallel across the warehouse's nodes for fast throughput.

-- Target table for orders data
CREATE TABLE ORDERS_RAW (
  order_id       INT,
  customer_id    INT,
  order_date     DATE,
  product_code   VARCHAR(50),
  quantity       INT,
  unit_price     DECIMAL(10,2),
  status         VARCHAR(30),
  country        VARCHAR(50)
);

-- Load all CSV files from the stage into the table
COPY INTO ORDERS_RAW
FROM @orders_stage
FILE_FORMAT = (FORMAT_NAME = 'csv_standard')
ON_ERROR    = 'CONTINUE';     -- Skip bad rows, continue loading good ones

-- Load only specific files by name pattern
COPY INTO ORDERS_RAW
FROM @s3_orders_stage
PATTERN     = '.*orders_2024.*\.csv'
FILE_FORMAT = (FORMAT_NAME = 'csv_standard')
ON_ERROR    = 'ABORT_STATEMENT';   -- Stop entirely if any error occurs

ON_ERROR Options Explained

ON_ERROR VALUE       BEHAVIOUR
--------------       ---------
ABORT_STATEMENT      Stop loading entire batch on first error (default)
CONTINUE             Skip bad rows, load everything else, report errors
SKIP_FILE            Skip entire files with errors, load clean files only
SKIP_FILE_n          Skip files with more than n errors
SKIP_FILE_n%         Skip files where more than n% of rows have errors

Checking COPY INTO Results

COPY INTO returns a result set showing what happened to each file it processed. Always inspect this output after loading.

COPY INTO ORDERS_RAW FROM @orders_stage FILE_FORMAT=(FORMAT_NAME='csv_standard');

-- Result output:
-- file                          status    rows_loaded  errors_seen  first_error
-- orders_stage/orders_jan.csv   LOADED    45200        0            NULL
-- orders_stage/orders_feb.csv   LOADED    48750        0            NULL
-- orders_stage/orders_mar.csv   PARTIALLY_LOADED  48100  3  Value 'N/A' not recognised for column UNIT_PRICE

-- Check load history for past loads
SELECT *
FROM TABLE(INFORMATION_SCHEMA.COPY_HISTORY(
  TABLE_NAME   => 'ORDERS_RAW',
  START_TIME   => DATEADD('hour', -24, CURRENT_TIMESTAMP())
));

Loading JSON Data

Snowflake loads semi-structured JSON data directly into a VARIANT column. You do not need to define individual columns upfront — the entire JSON object lands in one column, and you query it with dot notation later.

-- Sample JSON file content (one object per line after STRIP_OUTER_ARRAY):
-- {"event_id": 1001, "user": "alice", "action": "purchase", "amount": 49.99, "tags": ["sale","member"]}
-- {"event_id": 1002, "user": "bob",   "action": "view",     "page": "/products/shoes"}

-- Target table with a VARIANT column
CREATE TABLE EVENTS_RAW (
  raw_event   VARIANT,
  loaded_at   TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP()
);

-- Load JSON from stage
COPY INTO EVENTS_RAW (raw_event)
FROM (
  SELECT $1
  FROM @s3_orders_stage/events/
)
FILE_FORMAT = (FORMAT_NAME = 'json_standard');

-- Query individual JSON fields using colon notation
SELECT
  raw_event:event_id::INT          AS event_id,
  raw_event:user::VARCHAR          AS username,
  raw_event:action::VARCHAR        AS action_type,
  raw_event:amount::DECIMAL(10,2)  AS amount,
  raw_event:tags[0]::VARCHAR       AS first_tag
FROM EVENTS_RAW
LIMIT 10;

Loading Parquet Data

Parquet files are columnar binary files commonly produced by Spark, dbt, and data lakes. Snowflake reads Parquet files natively, matching column names automatically to the target table.

CREATE TABLE SALES_PARQUET (
  sale_id       INT,
  product_name  VARCHAR(200),
  revenue       DECIMAL(12,2),
  sale_date     DATE
);

COPY INTO SALES_PARQUET
FROM @s3_orders_stage/parquet/
FILE_FORMAT = (FORMAT_NAME = 'parquet_standard')
MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE;
-- Automatically maps Parquet column names to table column names regardless of case

Transforming Data During Load

COPY INTO supports inline SQL transformations so you can clean, filter, or reshape data as it loads — without a separate transformation step.

-- CSV file has 8 columns: $1 through $8
-- Load only some columns, cast types, and add a derived column
COPY INTO ORDERS_RAW (order_id, customer_id, order_date, amount, loaded_at)
FROM (
  SELECT
    $1::INT                            AS order_id,
    $2::INT                            AS customer_id,
    TO_DATE($3, 'MM/DD/YYYY')          AS order_date,
    ($4::DECIMAL(10,2) * $5::INT)      AS amount,    -- unit_price * quantity
    CURRENT_TIMESTAMP()                AS loaded_at
  FROM @orders_stage
)
FILE_FORMAT = (FORMAT_NAME = 'csv_standard');

Preventing Duplicate Loads

By default, Snowflake tracks which files it has already loaded from a stage into a specific table. If you run the same COPY INTO command twice, Snowflake skips files it already processed within the last 64 days. This prevents duplicate rows from re-loading the same file accidentally.

-- Force reload of already-loaded files (bypasses duplicate detection)
COPY INTO ORDERS_RAW
FROM @orders_stage
FILE_FORMAT = (FORMAT_NAME = 'csv_standard')
FORCE = TRUE;     -- Use with caution — creates duplicate rows

-- Check which files were already loaded from a stage into a table
SELECT *
FROM TABLE(INFORMATION_SCHEMA.COPY_HISTORY(
  TABLE_NAME => 'ORDERS_RAW',
  START_TIME => DATEADD('day', -7, CURRENT_TIMESTAMP())
))
ORDER BY last_load_time DESC;

Key Points

  • Stages are temporary holding areas for data files — internal stages live inside Snowflake, external stages point to S3, Azure, or GCS
  • The PUT command uploads files from your local machine to an internal stage via SnowSQL
  • Create named FILE FORMAT objects to define CSV, JSON, or Parquet settings once and reuse them across multiple COPY INTO commands
  • COPY INTO loads data in parallel across warehouse nodes — bigger warehouses load more files simultaneously and faster
  • ON_ERROR controls whether a bad row stops the entire load or gets skipped quietly
  • Snowflake tracks loaded files for 64 days and skips already-loaded files by default, preventing accidental duplicates
  • Inline SELECT inside COPY INTO lets you transform, cast, and filter data during loading without a separate ETL step

Leave a Comment

Your email address will not be published. Required fields are marked *