postgresql - Python Postgres Package: psycopg2 copy_from vs copy_expert

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

Requirement: To load millions of rows into a table from S3 using Python and avoid memory issue

I see there are two methods psycopg2's copy_from and copy_expert.

Which of these are most efficient and avoid memory issue

Also, I see that Redshift(Which is Postgres) support COPY Command to load data from S3 file but not sure if Postgres DB support such feature

First the community Postgres does not support


    COPY

directly from

S3

. Second


    copy_from


    copy_expert

is not really the issue. That will be the network lag from ` S3` and streaming the rows. – Adrian Klaver Nov 12, 2020 at 22:24 Whats main difference b/w


    copy_from

and


    copy_expert

, my understanding is both does the same functionality of loading data from the file into a table – Kar Nov 13, 2020 at 14:47 The difference is the


    copy_from

has a subset of the


    COPY

options available, whereas


    copy_expert

allows you to submit your own


    COPY

string with your choice of options. For more detail see full commands starting here . – Adrian Klaver Nov 13, 2020 at 15:15

My implementation changing copy_from to copy_expert . Extensive analysis of PostgreSQL load can be found here: https://hakibenita.com/fast-load-data-python-postgresql .

COPY_FROM

def insert_with_string_io(df: pd.DataFrame, table_name: str):
        buffer = io.StringIO()
        df.to_csv(buffer, index=False, header=False)
        buffer.seek(0)
        with conn.cursor() as cursor:
                cursor.copy_from(file=buffer, table=table_name, sep=",", null="")
            except (Exception, psycopg2.DatabaseError) as error:
                print("Error: %s" % error)
COPY_EXPERT
def insert_with_string_io(df: pd.DataFrame):
        buffer = io.StringIO()
        df.to_csv(buffer, index=False, header=False)
        buffer.seek(0)
        with conn.cursor() as cursor:
                cursor.copy_expert(f"COPY <database>.<schema>.<table> FROM STDIN (FORMAT 'csv', HEADER false)" , buffer)
            except (Exception, psycopg2.DatabaseError) as error:
                print("Error: %s" % error)
        Thanks for contributing an answer to Stack Overflow!
Please be sure to answer the question. Provide details and share your research!
But avoid …
Asking for help, clarification, or responding to other answers.
Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.

推荐文章

踏实的墨镜 · PostgreSQL:遍历文本数组并执行SQL开发者社区

1 月前

伤情的消防车 · RDS PostgreSQL的PASE插件（IVFFlat或HNSW算法）向量检索_云数据库 RDS(RDS)-阿里云帮助中心

1 月前

活泼的领结 · PostgreSQL为什么不能用CLOG来独立地判断事务是否运行，而要去遍历Proc Array呢?_问答-阿里云开发者社区

1 月前

失落的皮蛋 · git 虚拟合并策略merge.ours.driver_git config --global merge.ours.driver true-CSDN博客

6 月前

谦虚好学的脸盆 · 卸任中国驻印尼大使，刚回国的陆慷履新中联部副部长

8 月前

听话的汤圆 · 新医科战略中口腔医学教育发展的思考 - PMC

9 月前

奔放的针织衫 · 全网争做“刘畊宏女孩/男孩” 直播健身能否迎来新风口？

11 月前

从未表白的脆皮肠 · 市市场监管局到东明县调研

1 年前