In this Snowflake article, you will learn how to load the CSV/Parquet/Avro data file from the Amazon S3 bucket External stage into the Snowflake table.
Related: Unload Snowflake table to Amazon S3 bucket.
I assume you already have a CSV/Parquet/Avro file in the Amazon S3 bucket you are trying to load to the Snowflake table. This article doesn’t cover how to upload a file to an S3 bucket.
COPY INTO
– Loads the data file from Amazon S3 to Snowflake table
Using COPY INTO <table name>
SQL command, we can load the data files from the Internal table stage, Internal user stage, Internal named stages, external named stage, and external location.
In this article, you will learn loading from an Amazon S3 bucket external location into the Snowflake table. Snowflake also supports loading fro Azure, GCP, and many more clouds which will cover in other articles.
Once you upload the file to the Amazon S3 bucket, Either from SnowSQL or Snowflake web console use COPY INTO <tablename>
command to load the compressed CSV file to the Snowflake database table.
COPY INTO EMP from 's3://mybucket/load/emp.csv'
credentials = (aws_key_id='xxxx' aws_secret_key='xxxxx' aws_token='xxxxxx');
+------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
| file | status | rows_parsed | rows_loaded | error_limit | errors_seen | first_error | first_error_line | first_error_character | first_error_column_name |
|------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------|
| emp.csv | LOADED | 5 | 5 | 1 | 0 | NULL | NULL | NULL | NULL |
+------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
1 Row(s) produced. Time Elapsed: 2.096s
By default, it also supports loading from GZ compressed file. Now, let’s run the select and see if the data loaded successfully.
select * from EMP;
+---------+----------+----------+------------+--------+
| FNAME | LNAME | SALARY | DEPARTMENT | GENDER |
|---------+----------+----------+------------+--------|
| James | Smith | 10000 | 10 | M |
| Michael | Jones | 20000 | 10 | M |
| Robert | Williams | 3000.4 | 10 | M |
| Maria | Jones | 23476.5 | 20 | F |
| Jen | Mary | 123445 | 20 | F |
+---------+----------+----------+------------+--------+
5 Row(s) produced. Time Elapsed: 0.646s
Options with COPY INTO
COMPRESSION
– Load file with specified compression
Use COMPRESSION
to specify the compressed file you wanted to load, By default, it loads a file with GZIP format, however, you can change it to use the following compressions AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
COPY INTO EMP from 's3://mybucket/load/emp.csv.bz2'
file_format = (type=CSV COMPRESSION = BZ2)
credentials = (aws_key_id='xxxx' aws_secret_key='xxxxx' aws_token='xxxxxx');
+--------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
| file | status | rows_parsed | rows_loaded | error_limit | errors_seen | first_error | first_error_line | first_error_character | first_error_column_name |
|--------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------|
| emp.csv.bz2 | LOADED | 5 | 5 | 1 | 0 | NULL | NULL | NULL | NULL |
+--------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
1 Row(s) produced. Time Elapsed: 1.902s
RECORD_DELIMITER
– Changes delimiter for record
In case if you have a file with record separator other than ‘\n’, use RECORD_DELIMITER
. for example, below query uses with ‘\n\r’
COPY INTO EMP from 's3://mybucket/load/emp.csv.gz'
file_format = (type=CSV RECORD_DELIMITER='\n\r')
credentials = (aws_key_id='xxxx' aws_secret_key='xxxxx' aws_token='xxxxxx');
1 Row(s) produced. Time Elapsed: 1.300s
FIELD_DELIMITER
– Specify field delimiter of loading file
Use FIELD_DELIMITER
to change the default delimiter, by default it used ‘,’ character, but you can change to use any character, for example, let’s see how to use pipe delimiter
COPY INTO EMP from 's3://mybucket/load/emp.csv.gz'
file_format = (type=CSV FIELD_DELIMITER='|')
credentials = (aws_key_id='xxxx' aws_secret_key='xxxxx' aws_token='xxxxxx');
1 Row(s) produced. Time Elapsed: 1.300s
SKIP_HEADER
– Ignore column names on header
Use HEADER
optional parameter to specify whether to include the table column headings in the output files, by default it is set to TRUE
, you can change it to FALSE
if you do not want column names of a header on the output file. for example.
DATE_FORMAT
– Specify input date format
By default date columns expects in ‘YYYY-MM-DD
‘, If your loading file has a different format, specify the input date format with DATE_FORMAT
option.
COPY INTO EMP from 's3://mybucket/load/emp.csv.gz'
file_format = (type=CSV DATE_FORMAT='MM-DD-YYYY')
credentials = (aws_key_id='xxxx' aws_secret_key='xxxxx' aws_token='xxxxxx');
1 Row(s) produced. Time Elapsed: 1.300s
TIME_FORMAT
– Specify Input time format
Time columns expect in ‘HH24:MI:SS
‘, If your loading file has a different format then use TIME_FORMAT
option specify the input format.
COPY INTO EMP from 's3://mybucket/load/emp.csv.gz'
file_format = (type=CSV TIME_FORMAT='HH24:MI:SS')
credentials = (aws_key_id='xxxx' aws_secret_key='xxxxx' aws_token='xxxxxx');
1 Row(s) produced. Time Elapsed: 1.300s
TIMESTAMP_FORMAT
– Specify input timestamp format
Timestamp columns expect in ‘YYYY-MM-DD HH24:MI:SS.FF3 TZHTZM
‘, If your loading file has a different format, use TIMESTAMP_FORMAT
option to specify the input format.
COPY INTO EMP from 's3://mybucket/load/emp.csv.gz'
file_format = (type=CSV TIMESTAMP_FORMAT='MM-DD-YYYY HH24:MI:SS.FF3 TZHTZM')
credentials = (aws_key_id='xxxx' aws_secret_key='xxxxx' aws_token='xxxxxx');
1 Row(s) produced. Time Elapsed: 1.300s
Load Parquet file from Amazon S3
First, you need to upload the file to Amazon S3 using AWS utilities, Once you have uploaded the Parquet file to the internal stage, now use the COPY INTO
tablename command to load the Parquet file to the Snowflake database table.
First, create a table EMP with one column of type Variant. Parquet raw data can be loaded into only one column.
create or replace table EMP(PARQUET_RAW VARIANT)
Run the below command to load to table.
COPY INTO EMP from (select $1 from @%EMP/data1_0_0_0.snappy.parquet)
file_format = (type=PARQUET COMPRESSION=SNAPPY);
+----------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
| file | status | rows_parsed | rows_loaded | error_limit | errors_seen | first_error | first_error_line | first_error_character | first_error_column_name |
|----------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------|
| emp.snappy.parquet | LOADED | 5 | 5 | 1 | 0 | NULL | NULL | NULL | NULL |
+----------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
1 Row(s) produced. Time Elapsed: 1.472s
Every record from emp.snappy.parquet
file will be loaded as JSON
into | PARQUET_RAW
column of EMP
table
SELECT * FROM EMP;
+------------------------------------+
| PARQUET_RAW |
|------------------------------------|
| { |
| "_COL_0": "James", |
| "_COL_1": "Smith", |
| "_COL_2": 1.000000000000000e+04, |
| "_COL_3": 10, |
| "_COL_4": "M" |
| } |
| { |
| "_COL_0": "Michael", |
| "_COL_1": "Jones", |
| "_COL_2": 2.000000000000000e+04, |
| "_COL_3": 10, |
| "_COL_4": "M" |
| } |
| { |
| "_COL_0": "Robert", |
| "_COL_1": "Williams", |
| "_COL_2": 3.000400000000000e+03, |
| "_COL_3": 10, |
| "_COL_4": "M" |
| } |
| { |
| "_COL_0": "Maria", |
| "_COL_1": "Jones", |
| "_COL_2": 2.347650000000000e+04, |
| "_COL_3": 20, |
| "_COL_4": "F" |
| } |
| { |
| "_COL_0": "Jen", |
| "_COL_1": "Mary", |
| "_COL_2": 1.234450000000000e+05, |
| "_COL_3": 20, |
| "_COL_4": "F" |
| } |
+------------------------------------+
5 Row(s) produced. Time Elapsed: 0.667s
Similarly, you can also load JSON/AVRO/CSV files from Amazon S3 into Snowflake table.
Conclusion
To load a CSV/Avro/Parquet file from Amazon S3 bucket into the Snowflake table, you need to use the COPY INTO <tablename> SQL. You can execute this SQL either from SnowSQL or from Snowflake web console. You can also change the compression and specify the data, time formats and many more options with COPY INTO.
Related Articles
- SnowSQL – Unload Snowflake table to Parquet file
- Download Snowflake table to local WINDOWS | Linux | MAC
- Snowflake – CREATE TABLE LIKE
- Snowflake – CREATE TABLE as SELECT
- Snowflake SQL – SELECT INTO or similar
- How to Load Parquet file into Snowflake table
- Load CSV file into Snowflake Database table
- How to Load JSON file into Snowflake table
References
- https://docs.snowflake.net/manuals/sql-reference/sql/put.html
- https://docs.snowflake.net/manuals/sql-reference/sql/copy-into-table.html
Happy Learning !!