Data Loading in Snowflake

πŸ“₯ Data Loading in Snowflake – A Beginner’s Guide

Snowflake makes it easy to load data from a variety of sources, whether it's local files, cloud storage, or streaming pipelines. Here's a breakdown of how data loading works in Snowflake, from concepts to practical steps.

🧩 Key Concepts

Term Description
Stage Temporary or permanent storage location for files before loading (local, internal, or cloud-based)
File Format Definition of how data is structured (CSV, JSON, Parquet, etc.)
Copy Command SQL command used to load data from a stage into a Snowflake table
Warehouse Required to perform data loading (acts as compute engine)

✅ Steps to Load Data into Snowflake
πŸ“ 1. Prepare Your Data
Format: CSV, JSON, Avro, Parquet, ORC, or XML

Compressed formats like .gz or .bz2 are supported

🚚 2. Choose a Staging Option
Type Description
Internal Stage Storage inside Snowflake (e.g., @%table, user, or named stage)
External Stage Cloud storage outside Snowflake (AWS S3, Azure Blob, GCS)

Internal Stage Examples:
@my_stage — Named internal stage

@%my_table — Table stage (automatically tied to the table)

πŸ› ️ 3. Create Required Objects
a. Create a Target Table
sql
Copy
Edit
CREATE TABLE employees (
  id INT,
  name STRING,
  department STRING
);
b. Create a File Format (Optional but recommended)
sql
Copy
Edit
CREATE FILE FORMAT my_csv_format
  TYPE = 'CSV'
  FIELD_OPTIONALLY_ENCLOSED_BY = '"'
  SKIP_HEADER = 1;
c. Create a Named Stage (optional)
sql
Copy
Edit
CREATE STAGE my_stage
  FILE_FORMAT = my_csv_format;
⬆️ 4. Upload Data to Stage
a. Internal Stage (using snowflake CLI or Web UI)
bash
Copy
Edit
snowsql -a <account> -u <user> -f load.sql

PUT file:///path/to/data.csv @my_stage;
b. External Stage Example (AWS S3):
sql
Copy
Edit
CREATE STAGE my_s3_stage
  URL = 's3://mybucket/data/'
  CREDENTIALS = (AWS_KEY_ID = '...' AWS_SECRET_KEY = '...');
πŸ“¦ 5. Load Data Using COPY INTO
sql
Copy
Edit
COPY INTO employees
FROM @my_stage/data.csv
FILE_FORMAT = (FORMAT_NAME = 'my_csv_format');
πŸ” Monitoring and Error Handling
Use VALIDATION_MODE to test data before loading:

sql
Copy
Edit
COPY INTO employees
FROM @my_stage/data.csv
FILE_FORMAT = (FORMAT_NAME = 'my_csv_format')
VALIDATION_MODE = 'RETURN_ALL_ERRORS';
Query LOAD_HISTORY() to track data load:

sql
Copy
Edit
SELECT * FROM INFORMATION_SCHEMA.LOAD_HISTORY WHERE TABLE_NAME = 'EMPLOYEES';
πŸ”„ Incremental & Continuous Data Loads
Use streams and tasks for incremental/automated loading

Integrate with tools like:

Snowpipe (real-time or auto-loading from cloud storage)

ETL tools: Fivetran, Talend, Matillion

Custom pipelines with Python or Spark

πŸ“Š Common Use Cases
Use Case Tools Involved
One-time batch load PUT + COPY INTO
Real-time streaming Snowpipe or Kafka connectors
Data ingestion from S3/Azure/GCS External Stage + COPY INTO
Scheduled ELT Snowflake Tasks + Streams

πŸš€ Summary

Step What You Do
1. Prepare Structure and clean your data
2. Stage Upload to internal or external stage
3. Copy Use COPY INTO to load data into tables


Comments

Popular posts from this blog

Understanding Snowflake Editions: Standard, Enterprise, Business Critical

Why Data Science Course?

How To Do Medical Coding Course?