RDS Postgresql S3 import of CSV and gzip files

Posted: 2020-05-25 | Categories: tech

When you are on an RDS Postgresql on AWS, you can import data from S3 into a table, the official documentation is here. The documentation only shows very basic examples of files directly in the root folder of the buckek. It also assumes the use of psql which is great for scripting but rubish for human-work (i.e. copy paste).

So here is my extension to the documentation:

assumption

You need to be connected to your RDS database.
My examples work with pgadmin.
You ran the following command to install the extension

CREATE EXTENSION aws_s3 CASCADE;

Your RDS instance can access the S3 bucket via a role (I did not test other methods) as documented here

How to import

First of all, I’m ignoring creating the s3_uri. I find s3_uri too complex when you are trying things for the first time. I also ignore it because it uses \gset which only works on psql and I prefer to use pgadmin for DBA activities.

There are two things missing in the documentation that when understood make the process easy.

The path (directories) in S3 should be with the name of the file, not the bucket
The path (as point 1) must NOT start with /

The following example would work. – is for a line-comment in SQL so, everything on the same line, after it is ignored

SELECT aws_s3.table_import_from_s3 (
  'public.tabletest',  -- the table where you want the data to be imported
  '', -- column list. Empty means to import everything
  '(FORMAT csv, HEADER true, DELIMITER '','', QUOTE ''"'', ESCAPE ''\'')', -- this is what I use to import standard CSV
  'my_bucket_name_only', -- the bucket name and ONLY the bucket name, without anything else
  'ciao/hello/filename.csv.gz', -- the path from the bucket to the file. Does not have to be gz
  'eu-west-1' -- the region where the bucket is
);

About import gzip files

As per documentation your gzipped files in S3 need to have the following Amazon S3 metadata:

Key: Content-Encoding
Value: gzip