Loading Data Into Snowflake Using COPY INTO
Loading data into Snowflake is the first step in any real project. Before you can query, transform, or share data, you must get it into a Snowflake table. The primary mechanism for bulk data loading is the COPY INTO command. It reads files from a staging area — either inside Snowflake or in cloud storage — and loads them into a target table efficiently and in parallel.
This topic covers the full data loading workflow: creating a stage, uploading files, verifying staged files, and running COPY INTO with practical examples for CSV, JSON, and Parquet formats.
The Data Loading Workflow: A Post Office Analogy
Loading data into Snowflake works like a post office system. Your raw data files are parcels. A stage is the sorting depot where parcels arrive before being delivered. COPY INTO is the delivery truck that takes parcels from the depot and delivers them to the correct table (the recipient's address).
DATA LOADING PIPELINE
======================
[Source Data] [Stage] [Snowflake Table]
| | |
CSV files ----PUT-----> Internal Stage |
S3 bucket -----------> External Stage --COPY INTO-> ORDERS table
Azure Blob -----------> External Stage --COPY INTO-> CUSTOMERS table
GCS bucket -----------> External Stage --COPY INTO-> PRODUCTS table
Stage Types: Internal vs External
A stage is a pointer to a location where data files live before loading. Snowflake supports two kinds of stages.
STAGE TYPE WHERE FILES LIVE BEST FOR
---------- ---------------- --------
Internal Stage Inside Snowflake's own Small files, quick uploads
managed cloud storage via PUT command
External Stage Your own S3 / Azure Blob Large files, automated
/ GCS bucket pipelines, existing cloud data
Named Stage Either type, with a name Reusable reference to a location
Table Stage Auto-created per table Single-table quick loads
User Stage Auto-created per user Personal workspace uploads
Creating a Named Internal Stage
A named internal stage is a dedicated storage area inside Snowflake. You create it once and reference it by name in all your COPY INTO commands.
USE DATABASE RETAIL_DB;
USE SCHEMA RAW_DATA;
-- Create a simple internal stage
CREATE STAGE my_internal_stage
COMMENT = 'Internal stage for retail data CSV files';
-- Create an internal stage with compression and file format settings
CREATE STAGE orders_stage
FILE_FORMAT = (
TYPE = 'CSV'
FIELD_DELIMITER = ','
SKIP_HEADER = 1
NULL_IF = ('NULL', 'null', '')
EMPTY_FIELD_AS_NULL = TRUE
)
COMMENT = 'Stage for loading order CSV files';
Uploading Files to an Internal Stage with PUT
The PUT command uploads a file from your local machine to an internal stage. PUT runs from SnowSQL (the command-line client) or from the Snowsight file upload UI — not from a worksheet SQL editor directly.
-- From SnowSQL command line: PUT file:///home/user/data/orders_2024.csv @orders_stage AUTO_COMPRESS = TRUE PARALLEL = 4; -- PUT from Windows: PUT file://C:\data\orders_2024.csv @orders_stage AUTO_COMPRESS=TRUE; -- Verify the file arrived in the stage LIST @orders_stage; -- Output: -- name size md5 last_modified -- orders_stage/orders_2024.csv.gz 1245678 a1b2c3d4e5f6... 2024-06-15 10:30:00.000
Creating a Named External Stage (Amazon S3)
An external stage points to a cloud bucket you already own. Snowflake reads files from there during COPY INTO — the files never move into Snowflake's own storage.
-- Create an external stage pointing to an S3 bucket
-- Using storage integration (recommended, no hardcoded keys)
CREATE STORAGE INTEGRATION s3_retail_integration
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = 'S3'
ENABLED = TRUE
STORAGE_ALLOWED_LOCATIONS = ('s3://my-retail-bucket/data/');
-- Create the external stage using the integration
CREATE STAGE s3_orders_stage
URL = 's3://my-retail-bucket/data/orders/'
STORAGE_INTEGRATION = s3_retail_integration
FILE_FORMAT = (TYPE = 'CSV' SKIP_HEADER = 1)
COMMENT = 'External stage pointing to S3 orders folder';
-- List files visible in the external stage
LIST @s3_orders_stage;
File Format Objects: Reusable Format Definitions
Instead of repeating file format options in every COPY INTO command, define a named FILE FORMAT object once and reference it by name.
-- Create a reusable CSV file format
CREATE FILE FORMAT csv_standard
TYPE = 'CSV'
FIELD_DELIMITER = ','
RECORD_DELIMITER = '\n'
SKIP_HEADER = 1
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
NULL_IF = ('NULL', 'null', 'N/A', '')
EMPTY_FIELD_AS_NULL = TRUE
DATE_FORMAT = 'YYYY-MM-DD'
TIMESTAMP_FORMAT = 'YYYY-MM-DD HH24:MI:SS';
-- Create a JSON file format
CREATE FILE FORMAT json_standard
TYPE = 'JSON'
STRIP_OUTER_ARRAY = TRUE
NULL_IF = ('NULL', 'null');
-- Create a Parquet file format
CREATE FILE FORMAT parquet_standard
TYPE = 'PARQUET'
SNAPPY_COMPRESSION = TRUE;
Loading CSV Data with COPY INTO
The COPY INTO command reads staged files and inserts rows into a target table. It processes multiple files in parallel across the warehouse's nodes for fast throughput.
-- Target table for orders data CREATE TABLE ORDERS_RAW ( order_id INT, customer_id INT, order_date DATE, product_code VARCHAR(50), quantity INT, unit_price DECIMAL(10,2), status VARCHAR(30), country VARCHAR(50) ); -- Load all CSV files from the stage into the table COPY INTO ORDERS_RAW FROM @orders_stage FILE_FORMAT = (FORMAT_NAME = 'csv_standard') ON_ERROR = 'CONTINUE'; -- Skip bad rows, continue loading good ones -- Load only specific files by name pattern COPY INTO ORDERS_RAW FROM @s3_orders_stage PATTERN = '.*orders_2024.*\.csv' FILE_FORMAT = (FORMAT_NAME = 'csv_standard') ON_ERROR = 'ABORT_STATEMENT'; -- Stop entirely if any error occurs
ON_ERROR Options Explained
ON_ERROR VALUE BEHAVIOUR -------------- --------- ABORT_STATEMENT Stop loading entire batch on first error (default) CONTINUE Skip bad rows, load everything else, report errors SKIP_FILE Skip entire files with errors, load clean files only SKIP_FILE_n Skip files with more than n errors SKIP_FILE_n% Skip files where more than n% of rows have errors
Checking COPY INTO Results
COPY INTO returns a result set showing what happened to each file it processed. Always inspect this output after loading.
COPY INTO ORDERS_RAW FROM @orders_stage FILE_FORMAT=(FORMAT_NAME='csv_standard');
-- Result output:
-- file status rows_loaded errors_seen first_error
-- orders_stage/orders_jan.csv LOADED 45200 0 NULL
-- orders_stage/orders_feb.csv LOADED 48750 0 NULL
-- orders_stage/orders_mar.csv PARTIALLY_LOADED 48100 3 Value 'N/A' not recognised for column UNIT_PRICE
-- Check load history for past loads
SELECT *
FROM TABLE(INFORMATION_SCHEMA.COPY_HISTORY(
TABLE_NAME => 'ORDERS_RAW',
START_TIME => DATEADD('hour', -24, CURRENT_TIMESTAMP())
));
Loading JSON Data
Snowflake loads semi-structured JSON data directly into a VARIANT column. You do not need to define individual columns upfront — the entire JSON object lands in one column, and you query it with dot notation later.
-- Sample JSON file content (one object per line after STRIP_OUTER_ARRAY):
-- {"event_id": 1001, "user": "alice", "action": "purchase", "amount": 49.99, "tags": ["sale","member"]}
-- {"event_id": 1002, "user": "bob", "action": "view", "page": "/products/shoes"}
-- Target table with a VARIANT column
CREATE TABLE EVENTS_RAW (
raw_event VARIANT,
loaded_at TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP()
);
-- Load JSON from stage
COPY INTO EVENTS_RAW (raw_event)
FROM (
SELECT $1
FROM @s3_orders_stage/events/
)
FILE_FORMAT = (FORMAT_NAME = 'json_standard');
-- Query individual JSON fields using colon notation
SELECT
raw_event:event_id::INT AS event_id,
raw_event:user::VARCHAR AS username,
raw_event:action::VARCHAR AS action_type,
raw_event:amount::DECIMAL(10,2) AS amount,
raw_event:tags[0]::VARCHAR AS first_tag
FROM EVENTS_RAW
LIMIT 10;
Loading Parquet Data
Parquet files are columnar binary files commonly produced by Spark, dbt, and data lakes. Snowflake reads Parquet files natively, matching column names automatically to the target table.
CREATE TABLE SALES_PARQUET ( sale_id INT, product_name VARCHAR(200), revenue DECIMAL(12,2), sale_date DATE ); COPY INTO SALES_PARQUET FROM @s3_orders_stage/parquet/ FILE_FORMAT = (FORMAT_NAME = 'parquet_standard') MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE; -- Automatically maps Parquet column names to table column names regardless of case
Transforming Data During Load
COPY INTO supports inline SQL transformations so you can clean, filter, or reshape data as it loads — without a separate transformation step.
-- CSV file has 8 columns: $1 through $8
-- Load only some columns, cast types, and add a derived column
COPY INTO ORDERS_RAW (order_id, customer_id, order_date, amount, loaded_at)
FROM (
SELECT
$1::INT AS order_id,
$2::INT AS customer_id,
TO_DATE($3, 'MM/DD/YYYY') AS order_date,
($4::DECIMAL(10,2) * $5::INT) AS amount, -- unit_price * quantity
CURRENT_TIMESTAMP() AS loaded_at
FROM @orders_stage
)
FILE_FORMAT = (FORMAT_NAME = 'csv_standard');
Preventing Duplicate Loads
By default, Snowflake tracks which files it has already loaded from a stage into a specific table. If you run the same COPY INTO command twice, Snowflake skips files it already processed within the last 64 days. This prevents duplicate rows from re-loading the same file accidentally.
-- Force reload of already-loaded files (bypasses duplicate detection)
COPY INTO ORDERS_RAW
FROM @orders_stage
FILE_FORMAT = (FORMAT_NAME = 'csv_standard')
FORCE = TRUE; -- Use with caution — creates duplicate rows
-- Check which files were already loaded from a stage into a table
SELECT *
FROM TABLE(INFORMATION_SCHEMA.COPY_HISTORY(
TABLE_NAME => 'ORDERS_RAW',
START_TIME => DATEADD('day', -7, CURRENT_TIMESTAMP())
))
ORDER BY last_load_time DESC;
Key Points
- Stages are temporary holding areas for data files — internal stages live inside Snowflake, external stages point to S3, Azure, or GCS
- The PUT command uploads files from your local machine to an internal stage via SnowSQL
- Create named FILE FORMAT objects to define CSV, JSON, or Parquet settings once and reuse them across multiple COPY INTO commands
- COPY INTO loads data in parallel across warehouse nodes — bigger warehouses load more files simultaneously and faster
- ON_ERROR controls whether a bad row stops the entire load or gets skipped quietly
- Snowflake tracks loaded files for 64 days and skips already-loaded files by default, preventing accidental duplicates
- Inline SELECT inside COPY INTO lets you transform, cast, and filter data during loading without a separate ETL step
