RDS Postgresql S3 import of CSV and gzip files

When you are on an RDS Postgresql on AWS, you can import data from S3 into a table, the official documentation is here. The documentation only shows very basic examples of files directly in the root folder of the buckek. It also assumes the use of psql which is great for scripting but rubish for human-work (i.e. copy paste).

So here is my extension to the documentation:

assumption

CREATE EXTENSION aws_s3 CASCADE;

How to import

First of all, I’m ignoring creating the s3_uri. I find s3_uri too complex when you are trying things for the first time. I also ignore it because it uses \gset which only works on psql and I prefer to use pgadmin for DBA activities.

There are two things missing in the documentation that when understood make the process easy.

  1. The path (directories) in S3 should be with the name of the file, not the bucket
  2. The path (as point 1) must NOT start with /

The following example would work. – is for a line-comment in SQL so, everything on the same line, after it is ignored

SELECT aws_s3.table_import_from_s3 (
  'public.tabletest',  -- the table where you want the data to be imported
  '', -- column list. Empty means to import everything
  '(FORMAT csv, HEADER true, DELIMITER '','', QUOTE ''"'', ESCAPE ''\'')', -- this is what I use to import standard CSV
  'my_bucket_name_only', -- the bucket name and ONLY the bucket name, without anything else
  'ciao/hello/filename.csv.gz', -- the path from the bucket to the file. Does not have to be gz
  'eu-west-1' -- the region where the bucket is
);

About import gzip files

As per documentation your gzipped files in S3 need to have the following Amazon S3 metadata:

comments powered by Disqus