Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I need to insert multiple rows with one query (number of rows is not constant), so I need to execute query like this one:
INSERT INTO t (a, b) VALUES (1, 2), (3, 4), (5, 6);
The only way I know is
args = [(1,2), (3,4), (5,6)]
args_str = ','.join(cursor.mogrify("%s", (x, )) for x in args)
cursor.execute("INSERT INTO t (a, b) VALUES "+args_str)
but I want some simpler way.
I built a program that inserts multiple lines to a server that was located in another city.
I found out that using this method was about 10 times faster than executemany
. In my case tup
is a tuple containing about 2000 rows. It took about 10 seconds when using this method:
args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute("INSERT INTO table VALUES " + args_str)
and 2 minutes when using this method:
cur.executemany("INSERT INTO table VALUES(%s,%s,%s,%s,%s,%s,%s,%s,%s)", tup)
–
–
–
–
–
New execute_values
method in Psycopg 2.7:
data = [(1,'x'), (2,'y')]
insert_query = 'insert into t (a, b) values %s'
psycopg2.extras.execute_values (
cursor, insert_query, data, template=None, page_size=100
The pythonic way of doing it in Psycopg 2.6:
data = [(1,'x'), (2,'y')]
records_list_template = ','.join(['%s'] * len(data))
insert_query = 'insert into t (a, b) values {}'.format(records_list_template)
cursor.execute(insert_query, data)
Explanation: If the data to be inserted is given as a list of tuples like in
data = [(1,'x'), (2,'y')]
then it is already in the exact required format as
the values
syntax of the insert
clause expects a list of records as in
insert into t (a, b) values (1, 'x'),(2, 'y')
Psycopg
adapts a Python tuple
to a Postgresql record
.
The only necessary work is to provide a records list template to be filled by psycopg
# We use the data list to be sure of the template length
records_list_template = ','.join(['%s'] * len(data))
and place it in the insert
query
insert_query = 'insert into t (a, b) values {}'.format(records_list_template)
Printing the insert_query
outputs
insert into t (a, b) values %s,%s
Now to the usual Psycopg
arguments substitution
cursor.execute(insert_query, data)
Or just testing what will be sent to the server
print (cursor.mogrify(insert_query, data).decode('utf8'))
Output:
insert into t (a, b) values (1, 'x'),(2, 'y')
–
–
–
–
Update with psycopg2 2.7:
The classic executemany()
is about 60 times slower than @ant32 's implementation (called "folded") as explained in this thread: https://www.postgresql.org/message-id/20170130215151.GA7081%40deb76.aryehleib.com
This implementation was added to psycopg2 in version 2.7 and is called execute_values()
:
from psycopg2.extras import execute_values
execute_values(cur,
"INSERT INTO test (id, v1, v2) VALUES %s",
[(1, 2, 3), (4, 5, 6), (7, 8, 9)])
Previous Answer:
To insert multiple rows, using the multirow VALUES
syntax with execute()
is about 10x faster than using psycopg2 executemany()
. Indeed, executemany()
just runs many individual INSERT
statements.
@ant32 's code works perfectly in Python 2. But in Python 3, cursor.mogrify()
returns bytes, cursor.execute()
takes either bytes or strings, and ','.join()
expects str
instance.
So in Python 3 you may need to modify @ant32 's code, by adding .decode('utf-8')
:
args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x).decode('utf-8') for x in tup)
cur.execute("INSERT INTO table VALUES " + args_str)
Or by using bytes (with b''
or b""
) only:
args_bytes = b','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute(b"INSERT INTO table VALUES " + args_bytes)
–
–
cursor.copy_from is the fastest solution I've found for bulk inserts by far. Here's a gist I made containing a class named IteratorFile which allows an iterator yielding strings to be read like a file. We can convert each input record to a string using a generator expression. So the solution would be
args = [(1,2), (3,4), (5,6)]
f = IteratorFile(("{}\t{}".format(x[0], x[1]) for x in args))
cursor.copy_from(f, 'table_name', columns=('a', 'b'))
For this trivial size of args it won't make much of a speed difference, but I see big speedups when dealing with thousands+ of rows. It will also be more memory efficient than building a giant query string. An iterator would only ever hold one input record in memory at a time, where at some point you'll run out of memory in your Python process or in Postgres by building the query string.
–
–
–
A last item I would like to show you is how to insert multiple rows using a dictionary. If you had the following:
namedict = ({"first_name":"Joshua", "last_name":"Drake"},
{"first_name":"Steven", "last_name":"Foo"},
{"first_name":"David", "last_name":"Bar"})
You could easily insert all three rows within the dictionary by using:
cur = conn.cursor()
cur.executemany("""INSERT INTO bar(first_name,last_name) VALUES (%(first_name)s, %(last_name)s)""", namedict)
It doesn't save much code, but it definitively looks better.
–
–
Security vulnerabilities
As of 2022-11-16, the answers by @Clodoaldo Neto (for Psycopg 2.6), @Joseph Sheedy, @J.J, @Bart Jonk, @kevo Njoki, @TKoutny and @Nihal Sharma contain SQL injection vulnerabilities and should not be used.
The fastest proposal so far (copy_from
) should not be used either because it is difficult to escape the data correctly. This is easily apparent when trying to insert characters like '
, "
, \n
, \
, \t
or \n
.
The author of psycopg2 also recommends against copy_from
:
copy_from() and copy_to() are really just ancient and incomplete methods
The fastest method
The fastest method is cursor.copy_expert
, which can insert data straight from CSV files.
with open("mydata.csv") as f:
cursor.copy_expert("COPY mytable (my_id, a, b) FROM STDIN WITH csv", f)
copy_expert
is also the fastest method when generating the CSV file on-the-fly. For reference, see the following CSVFile
class, which takes care to limit memory usage.
import io, csv
class CSVFile(io.TextIOBase):
# Create a CSV file from rows. Can only be read once.
def __init__(self, rows, size=8192):
self.row_iter = iter(rows)
self.buf = io.StringIO()
self.available = 0
self.size = size
def read(self, n):
# Buffer new CSV rows until enough data is available
buf = self.buf
writer = csv.writer(buf)
while self.available < n:
row_length = writer.writerow(next(self.row_iter))
self.available += row_length
self.size = max(self.size, row_length)
except StopIteration:
break
# Read requested amount of data from buffer
write_pos = buf.tell()
read_pos = write_pos - self.available
buf.seek(read_pos)
data = buf.read(n)
self.available -= len(data)
# Shrink buffer if it grew very large
if read_pos > 2 * self.size:
remaining = buf.read()
buf.seek(0)
buf.write(remaining)
buf.truncate()
else:
buf.seek(write_pos)
return data
This class can then be used like:
rows = [(1, "a", "b"), (2, "c", "d")]
cursor.copy_expert("COPY mytable (my_id, a, b) FROM STDIN WITH csv", CSVFile(rows))
If all your data fits into memory, you can also generate the entire CSV data directly without the CSVFile
class, but if you do not know how much data you are going to insert in the future, you probably should not do that.
f = io.StringIO()
writer = csv.writer(f)
for row in rows:
writer.writerow(row)
f.seek(0)
cursor.copy_expert("COPY mytable (my_id, a, b) FROM STDIN WITH csv", f)
Benchmark results
914 milliseconds - many calls to cursor.execute
846 milliseconds - cursor.executemany
362 milliseconds - psycopg2.extras.execute_batch
346 milliseconds - execute_batch
with page_size=1000
265 milliseconds - execute_batch
with prepared statement
161 milliseconds - psycopg2.extras.execute_values
127 milliseconds - cursor.execute
with string-concatenated values
39 milliseconds - copy_expert
generating the entire CSV file at once
32 milliseconds - copy_expert
with CSVFile
All of these techniques are called 'Extended Inserts" in Postgres terminology, and as of the 24th of November 2016, it's still a ton faster than psychopg2's executemany() and all the other methods listed in this thread (which i tried before coming to this answer).
Here's some code which doesnt use cur.mogrify and is nice and simply to get your head around:
valueSQL = [ '%s', '%s', '%s', ... ] # as many as you have columns.
sqlrows = []
rowsPerInsert = 3 # more means faster, but with diminishing returns..
for row in getSomeData:
# row == [1, 'a', 'yolo', ... ]
sqlrows += row
if ( len(sqlrows)/len(valueSQL) ) % rowsPerInsert == 0:
# sqlrows == [ 1, 'a', 'yolo', 2, 'b', 'swag', 3, 'c', 'selfie' ]
insertSQL = 'INSERT INTO "twitter" VALUES ' + ','.join(['(' + ','.join(valueSQL) + ')']*rowsPerInsert)
cur.execute(insertSQL, sqlrows)
con.commit()
sqlrows = []
insertSQL = 'INSERT INTO "twitter" VALUES ' + ','.join(['(' + ','.join(valueSQL) + ')']*len(sqlrows))
cur.execute(insertSQL, sqlrows)
con.commit()
But it should be noted that if you can use copy_from(), you should use copy_from ;)
–
–
I've been using ant32's answer above for several years. However I've found that is thorws an error in python 3 because mogrify
returns a byte string.
Converting explicitly to bytse strings is a simple solution for making code python 3 compatible.
args_str = b','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute(b"INSERT INTO table VALUES " + args_str)
executemany accept array of tuples
https://www.postgresqltutorial.com/postgresql-python/insert/
""" array of tuples """
vendor_list = [(value1,)]
""" insert multiple vendors into the vendors table """
sql = "INSERT INTO vendors(vendor_name) VALUES(%s)"
conn = None
# read database configuration
params = config()
# connect to the PostgreSQL database
conn = psycopg2.connect(**params)
# create a new cursor
cur = conn.cursor()
# execute the INSERT statement
cur.executemany(sql,vendor_list)
# commit the changes to the database
conn.commit()
# close communication with the database
cur.close()
except (Exception, psycopg2.DatabaseError) as error:
print(error)
finally:
if conn is not None:
conn.close()
The cursor.copyfrom solution as provided by @jopseph.sheedy (https://stackoverflow.com/users/958118/joseph-sheedy) above (https://stackoverflow.com/a/30721460/11100064) is indeed lightning fast.
However, the example he gives are not generically usable for a record with any number of fields and it took me while to figure out how to use it correctly.
The IteratorFile needs to be instantiated with tab-separated fields like this (r
is a list of dicts where each dict is a record):
f = IteratorFile("{0}\t{1}\t{2}\t{3}\t{4}".format(r["id"],
r["type"],
r["item"],
r["month"],
r["revenue"]) for r in records)
To generalise for an arbitrary number of fields we will first create a line string with the correct amount of tabs and field placeholders : "{}\t{}\t{}....\t{}"
and then use .format()
to fill in the field values : *list(r.values())) for r in records
:
line = "\t".join(["{}"] * len(records[0]))
f = IteratorFile(line.format(*list(r.values())) for r in records)
complete function in gist here.
Another nice and efficient approach - is to pass rows for insertion as 1 argument,
which is array of json objects.
E.g. you passing argument:
[ {id: 18, score: 1}, { id: 19, score: 5} ]
It is array, which may contain any amount of objects inside.
Then your SQL looks like:
INSERT INTO links (parent_id, child_id, score)
SELECT 123, (r->>'id')::int, (r->>'score')::int
FROM unnest($1::json[]) as r
Notice: Your postgress must be new enough, to support json
If you're using SQLAlchemy, you don't need to mess with hand-crafting the string because SQLAlchemy supports generating a multi-row VALUES
clause for a single INSERT
statement:
rows = []
for i, name in enumerate(rawdata):
row = {
'id': i,
'name': name,
'valid': True,
rows.append(row)
if len(rows) > 0: # INSERT fails if no rows
insert_query = SQLAlchemyModelName.__table__.insert().values(rows)
session.execute(insert_query)
–
–
–
–
From @ant32
def myInsertManyTuples(connection, table, tuple_of_tuples):
cursor = connection.cursor()
insert_len = len(tuple_of_tuples[0])
insert_template = "("
for i in range(insert_len):
insert_template += "%s,"
insert_template = insert_template[:-1] + ")"
args_str = ",".join(
cursor.mogrify(insert_template, x).decode("utf-8")
for x in tuple_of_tuples
cursor.execute("INSERT INTO " + table + " VALUES " + args_str)
connection.commit()
except psycopg2.Error as e:
print(f"psycopg2.Error in myInsertMany = {e}")
connection.rollback()
psycopg2 2.9.3
data = "(1, 2), (3, 4), (5, 6)"
query = "INSERT INTO t (a, b) VALUES {0}".format(data)
cursor.execute(query)
data = [(1, 2), (3, 4), (5, 6)]
data = ",".join(map(str, data))
query = "INSERT INTO t (a, b) VALUES {0}".format(data)
cursor.execute(query)
If you want to insert multiple rows within one insert statemens (assuming you are not using ORM) the easiest way so far for me would be to use list of dictionaries. Here is an example:
t = [{'id':1, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 6},
{'id':2, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 7},
{'id':3, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 8}]
conn.execute("insert into campaign_dates
(id, start_date, end_date, campaignid)
values (%(id)s, %(start_date)s, %(end_date)s, %(campaignid)s);",
As you can see only one query will be executed:
INFO sqlalchemy.engine.base.Engine insert into campaign_dates (id, start_date, end_date, campaignid) values (%(id)s, %(start_date)s, %(end_date)s, %(campaignid)s);
INFO sqlalchemy.engine.base.Engine [{'campaignid': 6, 'id': 1, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}, {'campaignid': 7, 'id': 2, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}, {'campaignid': 8, 'id': 3, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}]
INFO sqlalchemy.engine.base.Engine COMMIT
–
The Solution am using can insert like 8000 records in 1 millisecond
curtime = datetime.datetime.now()
postData = dict()
postData["title"] = "This is Title Text"
postData["body"] = "This a Body Text it Can be Long Text"
postData['created_at'] = curtime.isoformat()
postData['updated_at'] = curtime.isoformat()
data = []
for x in range(8000):
data.append(((postData)))
vals = []
for d in postData:
vals.append(tuple(d.values())) #Here we extract the Values from the Dict
flds = ",".join(map(str, postData[0]))
tableFlds = ",".join(map(str, vals))
sqlStr = f"INSERT INTO posts ({flds}) VALUES {tableFlds}"
db.execute(sqlStr)
connection.commit()
rowsAffected = db.rowcount
print(f'{rowsAffected} Rows Affected')
–
Finally in SQLalchemy1.2 version, this new implementation is added to use psycopg2.extras.execute_batch() instead of executemany when you initialize your engine with use_batch_mode=True like:
engine = create_engine(
"postgresql+psycopg2://scott:tiger@host/dbname",
use_batch_mode=True)
http://docs.sqlalchemy.org/en/latest/changelog/migration_12.html#change-4109
Then someone would have to use SQLalchmey won't bother to try different combinations of sqla and psycopg2 and direct SQL together..
tup = [(gid, pid) for pid in items]
args_str = ",".join([str(s) for s in tup])
# insert into group values (1, 10), (1, 11), (1, 12), (1, 13)
yield from cur.execute("INSERT INTO group VALUES " + args_str)
–
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.