Bulk Import in Ruby On Rails older versions

Import usually happens in the Rails app as record-by-record which is good until the number of records to be inserted is less. I recently had to write a feature of importing records for one of our use-cases. MySQL DB was used with Rails. There was an existing implementation in our app of record-by-record import. Out of curiosity I searched for some gems that could help in bulk imports and came across activerecord-import. I would recommend you to go through its documentation before reading further here.

If you are familiar with Rails, you would know about its callback functionalities. Unluckily the gem doesn’t provide these features out of the box. Here are the limitations I faced while using the gem

  1. The only database operation involved in the gem is to execute the insert query.
  2. Insert happens only for the provided model i.e. Associations of the model are not imported.
  3. After insert, it won’t fetch and return the records.
  4. Since it does not fetch the records that are inserted, it does not support running the callbacks.

But I had to find a way to use it. So I did and here is the implementation.

  • Create a column called import_id in your model (entity table) that you are trying to import.
  • Read from your CSV file and build an Array of Active record objects. Do not forget to update a unique import_id for each new record. DO NOT CALL SAVE ACTION FOR RECORDS. Since the gem uses a single insert query to insert all records, we will be able to map the inserted records to the active record objects using this import_id. You can validate the active record objects when getting created and discard the invalid ones. In case if you are not familiar with how to validate active record objects, you can refer to the below code. It will trigger all validations defined in your model.
record.valid?
  • Run before callbacks on the parent entity. Yes, you can trigger the callbacks manually for an active record but with a small limitation. I will talk only about the before and after callbacks here. You can either trigger only before callbacks or before and after callbacks together. I am not sure why it was implemented in this way in rails. But, you can easily overcome this by using the below implementation in that model. Make sure to set the instance variable true after running before callbacks.
# Example of a before calls
before_update :set_extra_data,

Below is implementation on how to trigger callbacks

def run_required_callbacks(active_records, is_after = false)
return if active_records.blank?
active_records.each do |record|
# import_id will be blank in case we already have ID column populated (Update case).
if is_after # after callbacks
record
.import_id.blank? ? (record.run_callbacks(:update) { true }) : (record.run_callbacks(:create) { true })
record.run_callbacks(:save) { true }
else # before callbacks
record
.run_callbacks(:save) { false }
record.import_id.blank? ? (record.run_callbacks(:update) { false }) : (record.run_callbacks(:create) { false })
end
end
end

Using the above implementations you will be able to trigger callbacks.

  • We may require the previous_changes method of record to work in after_callbacks. So we will store all the changes in the records.
def get_all_changes(records)
create_case_changes = {}
update_case_changes = {}
records.each do |record|
record.new_record? ? (create_case_changes[record.import_id.to_s] = record.changes) : (update_case_changes[record.id.to_s] = record.changes)
end
[create_case_changes, update_case_changes]
end
  • Execute Bulk Import by calling method of the gem
def execute_bulk_insert(klass, records, retry_count = 0, exception = nil)
raise exception.message if retry_count > 2
begin
klass.import records, validate: false, on_duplicate_key_update: klass.columns.collect(&:name)
rescue => exception
Rails.logger.info exception.message
return execute_bulk_insert(klass, records, retry_count + 1, exception)
end
end
  • I would recommend you to execute bulk insert for records in batches for eg. 100 records in a batch. Make sure that the query is not very long to parse and does not create replica lags in DB.
  • Fetch imported records.
  • Inserting associations is a little bit tricky. If you have tried the above steps you would know gem will not insert associations automatically. You have to maintain the ER Model of your schema to understand which tables need to be created first for eg. belongs_to relations need to be created first before creating parent records.
    Create a list of association records per model and execute Bulk Insert for each association in the same way as above. As per the use case, I did not have to create an association of associations. Also, I didn’t have to run callbacks for my associations. In case if you have to run, please use all the above steps recursively. I would recommend whitelisting the associations to be imported. But if you have to find the associations dynamically for a model, you can make use of the rails reflection method.
model.reflections
  • The next step is to set previous_changes data from the stored changes to the fetched records. We will manually set it.
def set_previous_changes(imported_records, create_case_changes, update_case_changes)
imported_records.each do |record|
if record.import_id.blank?
record.instance_variable_set(:@previously_changed, update_case_changes[record.id.to_s])
record.instance_variable_set(:@_start_transaction_state, new_record: false, id: record.id)
else
record.instance_variable_set(:@previously_changed, create_case_changes[record.import_id])
record
.instance_variable_set(:@_start_transaction_state, new_record: true)
end
end
end
  • Run after callbacks using the method defined above.
run_required_callbacks(imported_records, true)
  • Make sure that you put the code from running before callbacks to after_callbacks in a transaction block to support rollback. Below is an example to run code in a transaction block.
ActiveRecord::Base.transaction do
records = import_records(*)
end
  • Run after_commit callbacks out of transaction block.
def run_after_commit_callback(records)
records.each do |record|
begin
record.run_callbacks(:commit) { true }
rescue => e
Rails.logger.error "Error while running after commit, Message => #{e.message}, Trace => #{e.backtrace.join('\n')}"
end
end
end

Here is the result for importing an entity that has 2 associations in it. We were able to reduce the number of inserts from 30K to just 300.

Sources:
Zach Dennis, ActiveRecord-Import wiki: https://github.com/zdennis/activerecord-import

Programming Nerd ashishm.dev