Friday, May 14, 2010

ActiveRecord Bulk Insertion of data in rails

why we go for bulk insertion?

You take one scenario like there is a loop which insert a record each time it runs. For each and every time it will run a Insert SQL statement for creating the record. You no need to care if it is small number of data. You take millions of record then it will definitely get into performance degradation and slow up your application. So in this case you should have something for bulk insertion.

Solution for Bulk Insert:

When I am looking for bulk insertion of data with activerecord, I got an excellent and efficient way of doing bulk insertion.You may heard of Activerecord::Extensions which gives you the API for bulk insertion.

Step 1:

First install gem

sudo gem install ar-extensions

step 2:

Include this gem in your model

include ar-extensions

Step 3:

Usage : Model.import(columns,values,options)

:columns => array of column identifier
:values => array of values
:options => hash of options

If your model is employee.rb

columns = [emp_id, emp_name, emp_address, emp_dob]
values = [[1, abc, def, '2010-01-01'],[2, foss, texas, '2009-01-01'],[3, xyz, mno, '2008-01-01'],[4, sammy, NYC, '2007-01-01'],......... etc.] (you can add millions of record data)

Employee.import(columns, values)


:validate => true|false default : true. if it is false then it won't use validations

:on_duplicate_key_update => Array|Hash – this tells import to use the associated Array or Hash to update rows where a duplicate key already exists. This is specific MySQL

:ignore => true|false – this tells import to issue a “INSERT IGNORE” when running each SQL statement. This is useful when you are importing mass amounts of records and you don’t care if MySQL finds duplicate entries.

While using this import API you can reduce the data insertion time. I was more surprised with speed when using import API. I hope you also get surprised by using bulk insert.

References -