Skip to content Skip to sidebar Skip to footer

How To Speed Up Insertion From Pandas.DataFrame .to_sql

Hello i am currently trying to write data from four pandas data frame to mysql on my local machine , my machine is taking 32 seconds for inserting 20,000 records (5000 for each t

Solution 1:

The problem here is for each row an insert query is made, then before next row insert it waits for ACK.

Try running this snippet before import pandas as pd

from pandas.io.sql import SQLTable

def _execute_insert(self, conn, keys, data_iter):
    print("Using monkey-patched _execute_insert")
    data = [dict((k, v) for k, v in zip(keys, row)) for row in data_iter]
    conn.execute(self.insert_statement().values(data))

SQLTable._execute_insert = _execute_insert

This is a patch by nhockham on to_sql insert which inserts line by line. Here's the github issue.

If you can forgo using pandas.to_sql I suggest you try sql-alchemy bulk insert or just write script to make a multirow query by yourself.

Edit: To clarify we are modifying _execute_insert method of Class SQLTable in pandas.io.sql So this has to be added in the scripts before import pandas module.

The last line is the change.

conn.execute(self.insert_statement(), data) has been changed to :

conn.execute(self.insert_statement().values(data))

The first line will insert row by row while last line will insert all rows in one sql statement.

Update: For newer versions of pandas, we will need a slight modification of the above query.

from pandas.io.sql import SQLTable

def _execute_insert(self, conn, keys, data_iter):
    print("Using monkey-patched _execute_insert")
    data = [dict(zip(keys, row)) for row in data_iter]
    conn.execute(self.table.insert().values(data))

SQLTable._execute_insert = _execute_insert

Post a Comment for "How To Speed Up Insertion From Pandas.DataFrame .to_sql"