Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
Requirement: To load millions of rows into a table from S3 using Python and avoid memory issue
I see there are two methods psycopg2's copy_from and copy_expert.
Which of these are most efficient and avoid memory issue
Also, I see that Redshift(Which is Postgres) support COPY Command to load data from S3 file but not sure if Postgres DB support such feature
–
–
–
My implementation changing
copy_from
to
copy_expert
. Extensive analysis of PostgreSQL load can be found here:
https://hakibenita.com/fast-load-data-python-postgresql
.
COPY_FROM
def insert_with_string_io(df: pd.DataFrame, table_name: str):
buffer = io.StringIO()
df.to_csv(buffer, index=False, header=False)
buffer.seek(0)
with conn.cursor() as cursor:
cursor.copy_from(file=buffer, table=table_name, sep=",", null="")
except (Exception, psycopg2.DatabaseError) as error:
print("Error: %s" % error)
COPY_EXPERT
def insert_with_string_io(df: pd.DataFrame):
buffer = io.StringIO()
df.to_csv(buffer, index=False, header=False)
buffer.seek(0)
with conn.cursor() as cursor:
cursor.copy_expert(f"COPY <database>.<schema>.<table> FROM STDIN (FORMAT 'csv', HEADER false)" , buffer)
except (Exception, psycopg2.DatabaseError) as error:
print("Error: %s" % error)
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.