[Ruby on Rails (RoR)] Joins vs Preload vs Includes vs Eager load in Rails ActiveRecord
Joins vs Preload vs Includes vs Eager load
Rails ActiveRecord provide few ways to load associated data and before moving forward.
let’s consider one scenario as below, there is a User table that has a one-to-many association with the Post table.
1 | # app/models/user.rb |
1 | # app/models/user.rb |
With the scenario been set, let’s begin with our discussion.
Joins
The Joins works perfectly fine for comparing and filtering data using associated data.
1 | users = User.joins(:posts).where("posts.published = ?", true) |
The query will load all users with posts having published attribute value as true and we can do so only because of the magic of joins.
Pros
- It uses
INNER JOIN
query to load associated data, as a result, associated data does not get impacted.
ex. If the first user has a total of 12 posts(include 4 posts which published is true).
1 | => users = User.joins(:posts).where(“posts.published = ?”, true) |
Cons
- It does not solve the (n+1) issue all alone, since it does not store associated data for future use. (Note: It can be solved by using joins+preload ex.
1 | users = User.joins(:posts).where(“posts.published = ?”, true).preload(posts)) |
-
By default, it loads data with two separate queries for loading user data and its associated post data.
-
It may give a duplicate record in a few cases. Solution over it is using select with it.
ex. Let’s consider that the total users are 4,
To see it let’s create some sample data.
1 | def self.setup |
With the above sample data if we execute User.joins(:posts)
then this is the result we get:
1 | #<User id: 9, name: "Neeraj"> |
We can avoid the duplication by using distinct .
1 | User.joins(:posts).distinct |
then that is the result we get:
1 | #<User id: 9, name: "Neeraj"> |
Extra : Joins with select example as below,
1 | user = User.joins(:posts).select(“users.id, posts.name as post_name, posts.published as published”).where(id: 1) |
It will give output of ActiveRecord array with post name, user id and post published for user with id as 1, so we can have all associated post name as user.map(&:post_name)
Preload
The Preload works perfectly fine for loading associated data.
1 | users = User.preload(:posts) |
The query will load and store all users with associated posts data.
Pros
It stores the associated data, as a result, it solves the (n+1) issue
Cons
- It by default loads data with two separate query for loading data and its associated posts data.
1 | User.preload(:posts).where("users.name='Neeraj'") |
- It can not be used for comparing and filtering for associated data, such as where can not be used for associated data filtering in preload.
1 | User.preload(:posts).where("posts.desc='ruby is awesome'") # It will throw an error. |
Includes
The Includes works perfectly fine for loading, comparing and filtering data with associated data.
By default Includes loads the association data in a separate query just like preload.
1 | users = User.includes(:posts) |
However it is smarter than preload. Whenever a WHERE clause is added with association, it switches from using two separate queries to single LEFT OUTER JOIN
query to fetch the data and applies the supplied condition.
So includes changes from two queries to a single query in some cases.
1 | users = User.includes(:posts).where("posts.published = ?", true) |
The query will load all users with posts having published attribute value as true.
Pros
-
It uses
OUTER LEFT JOIN
query to load associated data. -
It solves the (n+1) issue all alone, since it stores associated data for future use.
-
It is smart enough to load associated data with either single query for simple cases or with two separate query for other cases. In case, for some reason if query for includes needs to force into single query, then it can be performed with includes+references.
ex.
1 | User.includes(:posts).references(:posts) |
Cons
- It uses
OUTER LEFT JOIN
query to load associated data, as a result, associated data gets impacted.
ex. Let’s consider if the first user has a total of 12 posts,
1 | => users = User.includes(:posts).where(“posts.published = ?”, true) |
Eager load
The Eager load works perfectly fine for loading, comparing and filtering data using associated data.
1 | users = User.eager_load(:posts).where("posts.published = ?", true) |
The query will load all users with posts having published attribute value as true.
Pros
-
It uses
OUTER LEFT JOIN
query to load associated data. -
It solves the (n+1) issue all alone, since it stores associated data for future use.
-
By default, it loads data with a single query for users and its associated post data.
Cons
- It uses
OUTER LEFT JOIN
query to load associated data, as a result, associated data gets impacted.
ex. If the first user has a total of 12 posts(include 4 posts which published is true).
1 | => users = User.eager_load(:posts).where(“posts.published = ?”, true) |
Extra: Since eager_load force data to load with a single query to load user and its associated post data, it takes comparatively more time to load a query than to load two separate query in most of the cases. So it depends more on the use-case to use joins/preload/includes/eager_load while loading/comparing/filtering associated data.
Summary
I’d roughly summarize my approach to these methods like this:
-
If just filtering, not accessing relationships, use joins.
-
If accessing relationships, start with eager_load.
-
If eager_load is slow using single query, will use includes to try two separate queries and compare performance.
Hopefully this is enough to prevent some of the more basic performance deadends when using joins, includes, preload, and eager_load.