Load CSV file into Snowflake Database table

In this Snowflake article, you will learn how to upload the CSV data file from the local filesystem(Windows|Linux|Mac OS) to Snowflake internal stage using PUT SQL command and then load CSV file from the internal stage to the Snowflake database table using COPY INTO SQL command.

`PUT` – Upload the file to Snowflake internal stage

Using PUT command, you can upload the CSV file to Snowflake Internal stage. You can upload to the following internal stages Snowflake supports

Name stage
Table stage
User stage or to the
External stage.

@~<username> – Is used to upload to Snowflake user stage

@%<tablename> – Is used to upload to Snowflake table stage

@<name> – Is used to upload to name stage

Below example uploads the emp.csv file to internal table EMP stage.


PUT file:///apps/sparkbyexamples/emp.csv @%EMP;
emp.csv_c.gz(0.00MB): [##########] 100.00% Done (0.224s, 0.00MB/s).
+---------+------------+-------------+-------------+--------------------+--------------------+----------+---------+
| source  | target     | source_size | target_size | source_compression | target_compression | status   | message |
|---------+------------+-------------+-------------+--------------------+--------------------+----------+---------|
| emp.csv | emp.csv.gz |         129 |         123 | NONE               | GZIP               | UPLOADED |         |
+---------+------------+-------------+-------------+--------------------+--------------------+----------+---------+
1 Row(s) produced. Time Elapsed: 1.483s

By default, the PUT command compresses the file using GZIP. You can change the default compression using SOURCE_COMPRESSION option.

Let’s see an example with the name internal stage.


put file:///apps/sparkbyexamples/emp.csv @~;

Another example with the name internal stage along with the path.


put file:///apps/sparkbyexamples/emp.csv @~/tmp;

`LIST` – To list the files from internal stages

To verify if the file uploaded successfully, use the LIST command with the stage as a parameter.


LIST @%EMP;
+------------+------+----------------------------------+------------------------------+
| name       | size | md5                              | last_modified                |
|------------+------+----------------------------------+------------------------------|
| emp.csv.gz |  128 | 5f973d95b017ec3eb416389e43fc778c | Thu, 5 Mar 2020 03:40:42 GMT |
+------------+------+----------------------------------+------------------------------+
1 Row(s) produced. Time Elapsed: 0.522s

`COPY INTO` – Load the CSV file to Snowflake table

COPY INTO SQL command is used to load the file from the internal stage into the table. Before looking into COPY INTO, first, let’s create a Snowflake table.


CREATE TABLE EMP 
      (FNAME VARCHAR, LNAME VARCHAR,
       SALARY VARCHAR, DEPARTMENT varchar,
       GENDER varchar);
+---------------------------------+
| status                          |
|---------------------------------|
| Table EMP successfully created. |
+---------------------------------+

Now use the COPY INTO tablename command to load the compressed CSV file to the Snowflake database table.


COPY INTO EMP from '@%EMP/emp.csv.gz';
+------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
| file       | status | rows_parsed | rows_loaded | error_limit | errors_seen | first_error | first_error_line | first_error_character | first_error_column_name |
|------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------|
| emp.csv.gz | LOADED |           5 |           5 |           1 |           0 | NULL        |             NULL |                  NULL | NULL                    |
+------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
1 Row(s) produced. Time Elapsed: 2.096s

Finally, let’s run the select and see if the data loaded successfully.


SELECT * FROM EMP;
+---------+----------+----------+------------+--------+
| FNAME   | LNAME    |   SALARY | DEPARTMENT | GENDER |
|---------+----------+----------+------------+--------|
| James   | Smith    |  10000   |         10 | M      |
| Michael | Jones    |  20000   |         10 | M      |
| Robert  | Williams |   3000.4 |         10 | M      |
| Maria   | Jones    |  23476.5 |         20 | F      |
| Jen     | Mary     | 123445   |         20 | F      |
+---------+----------+----------+------------+--------+
5 Row(s) produced. Time Elapsed: 0.646s

Options with COPY INTO

`COMPRESSION` – Load file with specified compression


COPY INTO EMP from '@%EMP/data_0_0_0.csv.bz2'
  file_format = (type=CSV COMPRESSION = BZ2);
+--------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
| file               | status | rows_parsed | rows_loaded | error_limit | errors_seen | first_error | first_error_line | first_error_character | first_error_column_name |
|--------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------|
| data_0_0_0.csv.bz2 | LOADED |           5 |           5 |           1 |           0 | NULL        |             NULL |                  NULL | NULL                    |
+--------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
1 Row(s) produced. Time Elapsed: 1.902s

`RECORD_DELIMITER` – Changes delimiter for record

In case if you have a file with record separator other than ‘\n’, use RECORD_DELIMITER. for example, below query uses with ‘\n\r’


COPY INTO EMP from '@%EMP/emp.csv.gz'  
  file_format = (type=CSV RECORD_DELIMITER='\n\r');
  
1 Row(s) produced. Time Elapsed: 1.300s

`FIELD_DELIMITER` – Specify field delimiter of loading file

Use FIELD_DELIMITER to change the default delimiter, by default it used ‘,’ character, but you can change to use any character, for example, let’s see how to use pipe delimiter


COPY INTO EMP from '@%EMP/emp.csv.gz'
  file_format = (type=CSV FIELD_DELIMITER='|');
  
1 Row(s) produced. Time Elapsed: 1.300s

`SKIP_HEADER` – Ignore column names on header

Use HEADER optional parameter to specify whether to include the table column headings in the output files, by default it is set to TRUE, you can change it to FALSE if you do not want column names of a header on the output file. for example.

`DATE_FORMAT` – Specify input date format

By default date columns expects in ‘YYYY-MM-DD‘, If your loading file has a different format, specify the input date format with DATE_FORMAT option.


COPY INTO EMP from '@%EMP/emp.csv.gz'
  file_format = (type=CSV DATE_FORMAT='MM-DD-YYYY');
  
1 Row(s) produced. Time Elapsed: 1.300s

`TIME_FORMAT` – Specify Input time format

Time columns expect in ‘HH24:MI:SS ‘, If your loading file has a different format then use TIME_FORMAT option specify the input format.


COPY INTO EMP from '@%EMP/emp.csv.gz'
  file_format = (type=CSV TIME_FORMAT='HH24:MI:SS') 
  
1 Row(s) produced. Time Elapsed: 1.300s

`TIMESTAMP_FORMAT` – Specify input timestamp format

Timestamp columns expect in ‘YYYY-MM-DD HH24:MI:SS.FF3 TZHTZM‘, If your loading file has a different format, use TIMESTAMP_FORMAT option to specify the input format.


COPY INTO EMP from '@%EMP/emp.csv.gz'
  file_format = (type=CSV TIMESTAMP_FORMAT='MM-DD-YYYY HH24:MI:SS.FF3 TZHTZM') 
  
1 Row(s) produced. Time Elapsed: 1.300s

Conclusion

To load a CSV file into the Snowflake table, you need to upload the data file to Snowflake internal stage and then load the file from the internal stage to the table. You can also change the compression.

References

Happy Learning !!

PUT – Upload the file to Snowflake internal stage

LIST – To list the files from internal stages

COPY INTO – Load the CSV file to Snowflake table

Options with COPY INTO

COMPRESSION – Load file with specified compression

RECORD_DELIMITER – Changes delimiter for record

FIELD_DELIMITER – Specify field delimiter of loading file

SKIP_HEADER – Ignore column names on header

DATE_FORMAT – Specify input date format

TIME_FORMAT – Specify Input time format

TIMESTAMP_FORMAT – Specify input timestamp format