Load CSV file into Snowflake Database table

  • Post author:
  • Post category:Snowflake
  • Post last modified:October 10, 2020

In this Snowflake article, you will learn how to upload the CSV data file from the local filesystem(Windows|Linux|Mac OS) to Snowflake internal stage using PUT SQL command and then load CSV file from the internal stage to the Snowflake database table using COPY INTO SQL command.

Related: Unload Snowflake table to CSV file

Loading a data CSV file to the Snowflake Database table is a two-step process.

  • First, by using PUT command upload the data file to Snowflake Internal stage.
  • Second, using COPY INTO command, load the file from the internal stage to the Snowflake table.
Load csv file into snowflake table

PUT – Upload the file to Snowflake internal stage

Using PUT command, you can upload the CSV file to Snowflake Internal stage. You can upload to the following internal stages Snowflake supports

  • Name stage
  • Table stage
  • User stage or to the
  • External stage.

@~<username> – Is used to upload to Snowflake user stage

@%<tablename> – Is used to upload to Snowflake table stage

@<name> – Is used to upload to name stage

Below example uploads the emp.csv file to internal table EMP stage.


PUT file:///apps/sparkbyexamples/emp.csv @%EMP;
emp.csv_c.gz(0.00MB): [##########] 100.00% Done (0.224s, 0.00MB/s).
+---------+------------+-------------+-------------+--------------------+--------------------+----------+---------+
| source  | target     | source_size | target_size | source_compression | target_compression | status   | message |
|---------+------------+-------------+-------------+--------------------+--------------------+----------+---------|
| emp.csv | emp.csv.gz |         129 |         123 | NONE               | GZIP               | UPLOADED |         |
+---------+------------+-------------+-------------+--------------------+--------------------+----------+---------+
1 Row(s) produced. Time Elapsed: 1.483s

By default, the PUT command compresses the file using GZIP. You can change the default compression using SOURCE_COMPRESSION option.

Let’s see an example with the name internal stage.


put file:///apps/sparkbyexamples/emp.csv @~;

Another example with the name internal stage along with the path.


put file:///apps/sparkbyexamples/emp.csv @~/tmp;

LIST – To list the files from internal stages

To verify if the file uploaded successfully, use the LIST command with the stage as a parameter.


LIST @%EMP;
+------------+------+----------------------------------+------------------------------+
| name       | size | md5                              | last_modified                |
|------------+------+----------------------------------+------------------------------|
| emp.csv.gz |  128 | 5f973d95b017ec3eb416389e43fc778c | Thu, 5 Mar 2020 03:40:42 GMT |
+------------+------+----------------------------------+------------------------------+
1 Row(s) produced. Time Elapsed: 0.522s

COPY INTO – Load the CSV file to Snowflake table

COPY INTO SQL command is used to load the file from the internal stage into the table. Before looking into COPY INTO, first, let’s create a Snowflake table.


CREATE TABLE EMP 
      (FNAME VARCHAR, LNAME VARCHAR,
       SALARY VARCHAR, DEPARTMENT varchar,
       GENDER varchar);
+---------------------------------+
| status                          |
|---------------------------------|
| Table EMP successfully created. |
+---------------------------------+

Now use the COPY INTO tablename command to load the compressed CSV file to the Snowflake database table.


COPY INTO EMP from '@%EMP/emp.csv.gz';
+------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
| file       | status | rows_parsed | rows_loaded | error_limit | errors_seen | first_error | first_error_line | first_error_character | first_error_column_name |
|------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------|
| emp.csv.gz | LOADED |           5 |           5 |           1 |           0 | NULL        |             NULL |                  NULL | NULL                    |
+------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
1 Row(s) produced. Time Elapsed: 2.096s

Finally, let’s run the select and see if the data loaded successfully.


SELECT * FROM EMP;
+---------+----------+----------+------------+--------+
| FNAME   | LNAME    |   SALARY | DEPARTMENT | GENDER |
|---------+----------+----------+------------+--------|
| James   | Smith    |  10000   |         10 | M      |
| Michael | Jones    |  20000   |         10 | M      |
| Robert  | Williams |   3000.4 |         10 | M      |
| Maria   | Jones    |  23476.5 |         20 | F      |
| Jen     | Mary     | 123445   |         20 | F      |
+---------+----------+----------+------------+--------+
5 Row(s) produced. Time Elapsed: 0.646s

Options with COPY INTO

COMPRESSION – Load file with specified compression

Use COMPRESSION to specify the compressed file you wanted to load, By default, it loads a file with GZIP format, however, you can change it to use the following compressions AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE


COPY INTO EMP from '@%EMP/data_0_0_0.csv.bz2'
  file_format = (type=CSV COMPRESSION = BZ2);
+--------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
| file               | status | rows_parsed | rows_loaded | error_limit | errors_seen | first_error | first_error_line | first_error_character | first_error_column_name |
|--------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------|
| data_0_0_0.csv.bz2 | LOADED |           5 |           5 |           1 |           0 | NULL        |             NULL |                  NULL | NULL                    |
+--------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
1 Row(s) produced. Time Elapsed: 1.902s

RECORD_DELIMITER – Changes delimiter for record

In case if you have a file with record separator other than ‘\n’, use RECORD_DELIMITER. for example, below query uses with ‘\n\r’


COPY INTO EMP from '@%EMP/emp.csv.gz'  
  file_format = (type=CSV RECORD_DELIMITER='\n\r');
  
1 Row(s) produced. Time Elapsed: 1.300s

FIELD_DELIMITER – Specify field delimiter of loading file

Use FIELD_DELIMITER to change the default delimiter, by default it used ‘,’ character, but you can change to use any character, for example, let’s see how to use pipe delimiter


COPY INTO EMP from '@%EMP/emp.csv.gz'
  file_format = (type=CSV FIELD_DELIMITER='|');
  
1 Row(s) produced. Time Elapsed: 1.300s

SKIP_HEADER – Ignore column names on header

Use HEADER optional parameter to specify whether to include the table column headings in the output files, by default it is set to TRUE, you can change it to FALSE if you do not want column names of a header on the output file. for example.

DATE_FORMAT – Specify input date format

By default date columns expects in ‘YYYY-MM-DD‘, If your loading file has a different format, specify the input date format with DATE_FORMAT option.


COPY INTO EMP from '@%EMP/emp.csv.gz'
  file_format = (type=CSV DATE_FORMAT='MM-DD-YYYY');
  
1 Row(s) produced. Time Elapsed: 1.300s

TIME_FORMAT – Specify Input time format

Time columns expect in ‘HH24:MI:SS ‘, If your loading file has a different format then use TIME_FORMAT option specify the input format.


COPY INTO EMP from '@%EMP/emp.csv.gz'
  file_format = (type=CSV TIME_FORMAT='HH24:MI:SS') 
  
1 Row(s) produced. Time Elapsed: 1.300s

TIMESTAMP_FORMAT – Specify input timestamp format

Timestamp columns expect in ‘YYYY-MM-DD HH24:MI:SS.FF3 TZHTZM‘, If your loading file has a different format, use TIMESTAMP_FORMAT option to specify the input format.


COPY INTO EMP from '@%EMP/emp.csv.gz'
  file_format = (type=CSV TIMESTAMP_FORMAT='MM-DD-YYYY HH24:MI:SS.FF3 TZHTZM') 
  
1 Row(s) produced. Time Elapsed: 1.300s

Conclusion

To load a CSV file into the Snowflake table, you need to upload the data file to Snowflake internal stage and then load the file from the internal stage to the table. You can also change the compression.

References

Happy Learning !!

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

You are currently viewing Load CSV file into Snowflake Database table