Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

Requirement: To load millions of rows into a table from S3 using Python and avoid memory issue

I see there are two methods psycopg2's copy_from and copy_expert.

Which of these are most efficient and avoid memory issue

Also, I see that Redshift(Which is Postgres) support COPY Command to load data from S3 file but not sure if Postgres DB support such feature

First the community Postgres does not support COPY directly from S3 . Second copy_from vs copy_expert is not really the issue. That will be the network lag from ` S3` and streaming the rows. Adrian Klaver Nov 12, 2020 at 22:24 Whats main difference b/w copy_from and copy_expert , my understanding is both does the same functionality of loading data from the file into a table Kar Nov 13, 2020 at 14:47 The difference is the copy_from has a subset of the COPY options available, whereas copy_expert allows you to submit your own COPY string with your choice of options. For more detail see full commands starting here . Adrian Klaver Nov 13, 2020 at 15:15

My implementation changing copy_from to copy_expert . Extensive analysis of PostgreSQL load can be found here: https://hakibenita.com/fast-load-data-python-postgresql .

COPY_FROM

def insert_with_string_io(df: pd.DataFrame, table_name: str):
        buffer = io.StringIO()
        df.to_csv(buffer, index=False, header=False)
        buffer.seek(0)
        with conn.cursor() as cursor:
                cursor.copy_from(file=buffer, table=table_name, sep=",", null="")
            except (Exception, psycopg2.DatabaseError) as error:
                print("Error: %s" % error)

COPY_EXPERT

def insert_with_string_io(df: pd.DataFrame):
        buffer = io.StringIO()
        df.to_csv(buffer, index=False, header=False)
        buffer.seek(0)
        with conn.cursor() as cursor:
                cursor.copy_expert(f"COPY <database>.<schema>.<table> FROM STDIN (FORMAT 'csv', HEADER false)" , buffer)
            except (Exception, psycopg2.DatabaseError) as error:
                print("Error: %s" % error)
        

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.