Data Loading in Snowflake
π₯ Data Loading in Snowflake – A Beginner’s Guide
Snowflake makes it easy to load data from a variety of sources, whether it's local files, cloud storage, or streaming pipelines. Here's a breakdown of how data loading works in Snowflake, from concepts to practical steps.
π§© Key Concepts
Term Description
Stage Temporary or permanent storage location for files before loading (local, internal, or cloud-based)
File Format Definition of how data is structured (CSV, JSON, Parquet, etc.)
Copy Command SQL command used to load data from a stage into a Snowflake table
Warehouse Required to perform data loading (acts as compute engine)
✅ Steps to Load Data into Snowflake
π 1. Prepare Your Data
Format: CSV, JSON, Avro, Parquet, ORC, or XML
Compressed formats like .gz or .bz2 are supported
π 2. Choose a Staging Option
Type Description
Internal Stage Storage inside Snowflake (e.g., @%table, user, or named stage)
External Stage Cloud storage outside Snowflake (AWS S3, Azure Blob, GCS)
Internal Stage Examples:
@my_stage — Named internal stage
@%my_table — Table stage (automatically tied to the table)
π ️ 3. Create Required Objects
a. Create a Target Table
sql
Copy
Edit
CREATE TABLE employees (
id INT,
name STRING,
department STRING
);
b. Create a File Format (Optional but recommended)
sql
Copy
Edit
CREATE FILE FORMAT my_csv_format
TYPE = 'CSV'
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
SKIP_HEADER = 1;
c. Create a Named Stage (optional)
sql
Copy
Edit
CREATE STAGE my_stage
FILE_FORMAT = my_csv_format;
⬆️ 4. Upload Data to Stage
a. Internal Stage (using snowflake CLI or Web UI)
bash
Copy
Edit
snowsql -a <account> -u <user> -f load.sql
PUT file:///path/to/data.csv @my_stage;
b. External Stage Example (AWS S3):
sql
Copy
Edit
CREATE STAGE my_s3_stage
URL = 's3://mybucket/data/'
CREDENTIALS = (AWS_KEY_ID = '...' AWS_SECRET_KEY = '...');
π¦ 5. Load Data Using COPY INTO
sql
Copy
Edit
COPY INTO employees
FROM @my_stage/data.csv
FILE_FORMAT = (FORMAT_NAME = 'my_csv_format');
π Monitoring and Error Handling
Use VALIDATION_MODE to test data before loading:
sql
Copy
Edit
COPY INTO employees
FROM @my_stage/data.csv
FILE_FORMAT = (FORMAT_NAME = 'my_csv_format')
VALIDATION_MODE = 'RETURN_ALL_ERRORS';
Query LOAD_HISTORY() to track data load:
sql
Copy
Edit
SELECT * FROM INFORMATION_SCHEMA.LOAD_HISTORY WHERE TABLE_NAME = 'EMPLOYEES';
π Incremental & Continuous Data Loads
Use streams and tasks for incremental/automated loading
Integrate with tools like:
Snowpipe (real-time or auto-loading from cloud storage)
ETL tools: Fivetran, Talend, Matillion
Custom pipelines with Python or Spark
π Common Use Cases
Use Case Tools Involved
One-time batch load PUT + COPY INTO
Real-time streaming Snowpipe or Kafka connectors
Data ingestion from S3/Azure/GCS External Stage + COPY INTO
Scheduled ELT Snowflake Tasks + Streams
π Summary
Step What You Do
1. Prepare Structure and clean your data
2. Stage Upload to internal or external stage
3. Copy Use COPY INTO to load data into tables
Learn Data Engineering Snowflake course
Read More
A Step-by-Step Guide to Creating Tables in Snowflake
Snowflake’s Virtual Warehouses Explained
Setting Up Your First Database in Snowflake
Visit Our Quality Thought Training Institute in Hyderabad
Comments
Post a Comment