How To Speed Up Insertion From Pandas.DataFrame .to_sql
Solution 1:
The problem here is for each row an insert query is made, then before next row insert it waits for ACK.
Try running this snippet before import pandas as pd
from pandas.io.sql import SQLTable
def _execute_insert(self, conn, keys, data_iter):
print("Using monkey-patched _execute_insert")
data = [dict((k, v) for k, v in zip(keys, row)) for row in data_iter]
conn.execute(self.insert_statement().values(data))
SQLTable._execute_insert = _execute_insert
This is a patch by nhockham on to_sql insert which inserts line by line. Here's the github issue.
If you can forgo using pandas.to_sql I suggest you try sql-alchemy bulk insert or just write script to make a multirow query by yourself.
Edit: To clarify we are modifying _execute_insert method of Class SQLTable in pandas.io.sql So this has to be added in the scripts before import pandas module.
The last line is the change.
conn.execute(self.insert_statement(), data)
has been changed to :
conn.execute(self.insert_statement().values(data))
The first line will insert row by row while last line will insert all rows in one sql statement.
Update: For newer versions of pandas, we will need a slight modification of the above query.
from pandas.io.sql import SQLTable
def _execute_insert(self, conn, keys, data_iter):
print("Using monkey-patched _execute_insert")
data = [dict(zip(keys, row)) for row in data_iter]
conn.execute(self.table.insert().values(data))
SQLTable._execute_insert = _execute_insert
Post a Comment for "How To Speed Up Insertion From Pandas.DataFrame .to_sql"