Eager loading is a way to find objects of a certain class and a number of named associations.
Here I share my thoughts on using it with Rails.
What are N + 1 queries?
It mainly occurs when you load the bunch of objects and then for each object you make one more query to find associated object.
N + 1 queries problem
Consider below schema:
Class User << ActiveRecord::Base has_many :addresses end Class Address << ActiveRecord::Base belongs_to :user end
Consider below example for lazy load:
User.all.each do |user|
puts “Name : #{user.name}”
user.addresses.each_with_index do |add, index|
puts “Address #{index + 1} : #{address.address_line_1}”
end
end
Suppose there is only 10 user and each has 3 addresses then in above example
1 query will be fired to find out users.
i.e. SELECT “users”.* FROM “users”
And 3 queries to find addresses for each user.
i.e. SELECT “addresses”.* FROM “addresses” WHERE “user_id” = {user_id}
So in above overall 31 queries will be fire 1 to fetch user records and others to fetch records from addresses i.e.
30 (N) which makes N + 1 queries.
Above can be optimized by using includes or joins.
Using includes
User.includes(:addresses)
Above 1, query is required to fetch the records from users table.
i.e. SELECT “users”.* FROM “users”
After loading the users, find will collect the user_id from each one and load all the referenced addresses with one query. Doing so will cut down the number of queries from 31 to 2.
i.e. SELECT “addresses”.* FROM “addresses” WHERE “addresses”.”user_id” IN (1, 2, 3 , 4, 5, 6, 7, 8, 9, 10)
Using Joins
User.joins(:addresses)
Above will generate below response
SELECT users.* FROM ‘users’ INNER JOIN ‘addresses’ ON users.id = addresses.user_id
Above will further reduce the query to 1.
Note: Above will remove the user records which do not have associated address records due to inner join.
So for above we need to modify the join to left join as
User.joins(‘LEFT JOIN addresses ON addresses.user_id = users.id’)
Above will also returns the multiple objects for the same user if there will be more than 1 records for address for the user, so following modification will be required.
User.joins(‘LEFT JOIN addresses ON addresses.user_id = users.id’).select(‘DISTINCT users.*’)
Also if you wants to use above records for update, then you needs to set read-only(false) like
User.joins(‘addresses‘).readonly(false)
We can also eagerload the polymorphic records.
Consider the following example.
User has_many :addresses, :as => :resource Client has_many :addresses, :as => :resource Address belongs_to :resource, :polymorphic => true
Eager load for users & clients can be simply done as
User.includes(:addresses) SELECT users.* FROM users SELECT addresses.* FROM addresses where addresses.resource_type = ‘USER’ and addresses.resource_id IN (1,2, 5…user_ids)
Same will be for clients
Here we can eager load for address also.
Address.includes(:resource) SELECT addresses.* FROM addresses SELECT users.* FROM users WHERE users. id IN (1,2,5 …resource_id of addresses where resource_type is ‘User’) SELECT clients.* FROM clients WHERE clients.id IN (1,2,5 …resource_id of addresses where resource_type is ‘Client’)
So 1 query to fetch records from Model class and 1 for each associated polymorphic association.
So here it will be only 1 + 2 = 3 queries
We can also eager loads from Model where we define the associations.
Consider below example
User
has_many :addresses, -> { includes(:phones)}
Address
has_many :phones
When here we make a call to the addresses of the user for example
user.addresses
Here, ActiveRecord will eager load all the phones associated with the addresses and will resolve the N + 1 queries problem.
Nested Eager Loading
Yes, We can also eager load a relation of an object you’re eager loading, such as
User has_many :addresses has_many :orders Address has_many :phones has_one :code Phone has_one :service Code has_one :code_type
So here we can eager load all the associated records from user with following
User.includes(:addresses => {{:phones => :service}, {:code => :code_type}}, :orders)
Also, it will be very useful when we are sending records with json response, we can send each required objects, consider below example with many associations and methods.
render :json => customer.as_json(
:only => [:id, :industry_id, :resident],
:include => {
:company => { :only => [:id,:company_number] },
:address => { :only =>
[:id, :first_name, :last_name, :phone, :address_line_1, :city, :province, :zip, :country_id]
},
:identification_information => { :only => [:id, :passport_number]},
:primary_contact_user => {
:only => [:id, :email],
:include => { :address => { :only => [:id, :first_name, :last_name, :phone]}},
:methods => [:is_primary_account?]
}
},
:methods => [:super_user_is_primary_contact?]
)
output would look like:
{
"customer":{
"id":10008,
"industry_id":10000,
"resident":false,
"super_user_is_primary_contact?":false,
"company":{
"company_number":"FS123 Changed",
"id":10000
},
"address":{
"address_line_1":"Street Address Changed",
"city":"Jackson Lane",
"country_id":10054,
"first_name":"Mylene",
"id":10029,
"last_name":"Lynda",
"phone":"9876543210",
"province":"Account State",
"zip":"12345"
},
"primary_contact_user":{
"email":"[email protected]",
"id":10259,
"is_primary_account?":false,
"address":{
"first_name":"Raju",
"id":10533,
"last_name":"Varma",
"phone":"98744113212"
}
}
},
"success":true,
"request_id":11036
}
Eager loading is a powerful tool, but it should be used carefully. You do not want to be loading too much information since it can slow your application.
If you ever feel, your rails app is slow one of the first things, you should look for are these N + 1 queries. There’s an awesome gem that helps you out with that – Bullet. It’ll warn you when you have a N+1 case (you’re loading information without eager loading), and it’ll also warn you when you are eager loading data that it is not going to be used.
Also, if you do not want to think where to use and not use eager load (wants to achieve automatic eager loading), then there is an awesome gem Goldiloader. Here you do not need to specify eager loads. ActiveRecord did not make you think about eager loading, and it just did the “right” thing by default with Goldiloader without you to specifying any eager loads.
There are 3 ways to do eager loading in Rails:
1. includes
2. preload
3. eager_load
#preload is using separate DB queries to get the data.
#eager_load is using one big query with LEFT JOIN for each eager loaded table.
#includes delegates the job to #preload or #eager_load depending on the presence or absence of the condition related to one of the preloaded tables.
#eager_load
User.eager_load(:addresses).where(‘addresses.country = ?’, ‘India’) User.includes(:addresses).where(‘addresses.country = ?’, ‘India’)
Above will generate only one query to eager load the data.
SELECT "users"."id" AS t0_r0, "users"."name" AS t0_r1, "users"."email" AS t0_r2, "users"."created_at" AS t0_r3, "users"."updated_at" AS t0_r4, "addresses"."id" AS t1_r0, "addresses"."user_id" AS t1_r1, "addresses"."country" AS t1_r2, "addresses"."street" AS t1_r3, "addresses"."postal_code" AS t1_r4, "addresses"."city" AS t1_r5, "addresses"."created_at" AS t1_r6, "addresses"."updated_at" AS t1_r7 FROM "users" LEFT OUTER JOIN "addresses" ON "addresses"."user_id" = "users"."id" WHERE (addresses.country = 'India')
In the above example, Rails detected that the condition in where clause is using columns from preloaded (included) table names. So, #includes delegates the job to #eager_load.
#preload
If you try above using preload, you will get an exception because you have not joined users table with addresses.
User.preload(:addresses).where(‘addresses.country = ?’, ‘India’) SELECT "users".* FROM "users" WHERE (addresses.country = 'India') Fail with no such column: addresses.country
So above can be achieved by using preload as follows.
User.preload(:addresses).where(:country => ‘India’) SELECT "users".* FROM "users" SELECT "addresses".* FROM "addresses" WHERE "addresses"."country" = 'India' AND "addresses"."user_id" IN (1, 2, 3)
Above if we had use includes instead of preload it would have done same thing as above (used separate DB queries to fetch data).
Very detailed explaination. Thanks