[Ruby on Rails (RoR)] Joins vs Preload vs Includes vs Eager load in Rails ActiveRecord

Joins vs Preload vs Includes vs Eager load

Rails ActiveRecord provide few ways to load associated data and before moving forward.

let’s consider one scenario as below, there is a User table that has a one-to-many association with the Post table.

1
2
3
4
5
# app/models/user.rb

class User < ApplicationRecord
has_many :posts
end
1
2
3
4
5
# app/models/user.rb

class Post < ApplicationRecord
belongs_to :user
end

With the scenario been set, let’s begin with our discussion.

Joins

The Joins works perfectly fine for comparing and filtering data using associated data.

1
2
3
4
users = User.joins(:posts).where("posts.published = ?", true)

# =>
SELECT "users".* FROM "users" INNER JOIN "posts" ON "posts"."user_id" = "users"."id" WHERE "posts"."published" = "ture"

The query will load all users with posts having published attribute value as true and we can do so only because of the magic of joins.

Pros

  • It uses INNER JOIN query to load associated data, as a result, associated data does not get impacted.

ex. If the first user has a total of 12 posts(include 4 posts which published is true).

1
2
3
4
5
6
7
8
9
=> users = User.joins(:posts).where(“posts.published = ?”, true)

# =>
SELECT "users".* FROM "users" INNER JOIN "posts" ON "posts"."user_id" = "users"."id" AND "posts"."published" = "true"

=> users.first.posts.length # it will ouptput 12, but not 4. Because it will load posts with another query statement.

# =>
SELECT "posts".* FROM "posts" WHERE "posts"."published" = "true"

Cons

  • It does not solve the (n+1) issue all alone, since it does not store associated data for future use. (Note: It can be solved by using joins+preload ex.
1
2
3
4
5
users = User.joins(:posts).where(“posts.published = ?”, true).preload(posts))

# =>
SELECT "users".* FROM "users" INNER JOIN "posts" ON "posts"."user_id" = "users"."id" AND "posts"."published" = "ture"
SELECT "posts".* FROM "posts" WHERE "posts"."published" = "true"
  • By default, it loads data with two separate queries for loading user data and its associated post data.

  • It may give a duplicate record in a few cases. Solution over it is using select with it.
    ex. Let’s consider that the total users are 4,

To see it let’s create some sample data.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def self.setup
User.delete_all
Post.delete_all

u = User.create name: 'Neeraj'
u.posts.create! title: 'ruby', desc: 'ruby is awesome'
u.posts.create! title: 'rails', desc: 'rails is awesome'
u.posts.create! title: 'JavaScript', desc: 'JavaScript is awesome'

u = User.create name: 'Neil'
u.posts.create! title: 'JavaScript', desc: 'Javascript is awesome'

u = User.create name: 'Trisha'
end

With the above sample data if we execute User.joins(:posts) then this is the result we get:

1
2
3
4
#<User id: 9, name: "Neeraj">
#<User id: 9, name: "Neeraj">
#<User id: 9, name: "Neeraj">
#<User id: 10, name: "Neil">

We can avoid the duplication by using distinct .

1
2
User.joins(:posts).distinct
# Or User.joins(:posts).select('distinct users.*').to_a

then that is the result we get:

1
2
#<User id: 9, name: "Neeraj">
#<User id: 10, name: "Neil">

Extra : Joins with select example as below,

1
user = User.joins(:posts).select(“users.id, posts.name as post_name, posts.published as published”).where(id: 1)

It will give output of ActiveRecord array with post name, user id and post published for user with id as 1, so we can have all associated post name as user.map(&:post_name)


Preload

The Preload works perfectly fine for loading associated data.

1
2
3
4
5
users = User.preload(:posts)

# =>
SELECT "users".* FROM "users"
SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN (1)

The query will load and store all users with associated posts data.

Pros

It stores the associated data, as a result, it solves the (n+1) issue

Cons

  • It by default loads data with two separate query for loading data and its associated posts data.
1
2
3
4
5
User.preload(:posts).where("users.name='Neeraj'")

# =>
SELECT "users".* FROM "users" WHERE (users.name='Neeraj')
SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN (3)
  • It can not be used for comparing and filtering for associated data, such as where can not be used for associated data filtering in preload.
1
2
3
4
5
User.preload(:posts).where("posts.desc='ruby is awesome'") # It will throw an error.

# =>
SQLite3::SQLException: no such column: posts.desc:
SELECT "users".* FROM "users" WHERE (posts.desc='ruby is awesome')

Includes

The Includes works perfectly fine for loading, comparing and filtering data with associated data.

By default Includes loads the association data in a separate query just like preload.

1
2
3
4
5
users = User.includes(:posts)

# =>
SELECT "users".* FROM "users"
SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN (1, 3, 4, 5, 6)

However it is smarter than preload. Whenever a WHERE clause is added with association, it switches from using two separate queries to single LEFT OUTER JOIN query to fetch the data and applies the supplied condition.

So includes changes from two queries to a single query in some cases.

1
2
3
4
5
6
7
8
users = User.includes(:posts).where("posts.published = ?", true)

# =>
SELECT "users"."id" AS t0_r0, "users"."name" AS t0_r1, "posts"."id" AS t1_r0,
"posts"."title" AS t1_r1,
"posts"."user_id" AS t1_r2, "posts"."desc" AS t1_r3
FROM "users" LEFT OUTER JOIN "posts" ON "posts"."user_id" = "users"."id"
WHERE "posts"."published" = "true"

The query will load all users with posts having published attribute value as true.

Pros

  • It uses OUTER LEFT JOIN query to load associated data.

  • It solves the (n+1) issue all alone, since it stores associated data for future use.

  • It is smart enough to load associated data with either single query for simple cases or with two separate query for other cases. In case, for some reason if query for includes needs to force into single query, then it can be performed with includes+references.
    ex.

1
User.includes(:posts).references(:posts)

Cons

  • It uses OUTER LEFT JOIN query to load associated data, as a result, associated data gets impacted.
    ex. Let’s consider if the first user has a total of 12 posts,
1
2
3
=> users = User.includes(:posts).where(“posts.published = ?”, true)

=> users.first.posts.length # it may ouptput less than 12 as ex.4

Eager load

The Eager load works perfectly fine for loading, comparing and filtering data using associated data.

1
2
3
4
5
6
7
users = User.eager_load(:posts).where("posts.published = ?", true)

# =>
SELECT "users"."id" AS t0_r0, "users"."name" AS t0_r1, "posts"."id" AS t1_r0,
"posts"."title" AS t1_r1, "posts"."user_id" AS t1_r2, "posts"."desc" AS t1_r3
FROM "users" LEFT OUTER JOIN "posts" ON "posts"."user_id" = "users"."id" WHERE
"posts"."published" = "true"

The query will load all users with posts having published attribute value as true.

Pros

  • It uses OUTER LEFT JOIN query to load associated data.

  • It solves the (n+1) issue all alone, since it stores associated data for future use.

  • By default, it loads data with a single query for users and its associated post data.

Cons

  • It uses OUTER LEFT JOIN query to load associated data, as a result, associated data gets impacted.
    ex. If the first user has a total of 12 posts(include 4 posts which published is true).
1
2
3
=> users = User.eager_load(:posts).where(“posts.published = ?”, true)

=> users.first.posts.length # it may ouptput less than 12 as ex.4

Extra: Since eager_load force data to load with a single query to load user and its associated post data, it takes comparatively more time to load a query than to load two separate query in most of the cases. So it depends more on the use-case to use joins/preload/includes/eager_load while loading/comparing/filtering associated data.


Summary

I’d roughly summarize my approach to these methods like this:

  • If just filtering, not accessing relationships, use joins.

  • If accessing relationships, start with eager_load.

  • If eager_load is slow using single query, will use includes to try two separate queries and compare performance.

Hopefully this is enough to prevent some of the more basic performance deadends when using joins, includes, preload, and eager_load.

References

[1] Making sense of ActiveRecord joins, includes, preload, and eager_load | Scout APM Blog - https://scoutapm.com/blog/activerecord-includes-vs-joins-vs-preload-vs-eager_load-when-and-where

[2] Joins vs Preload vs Includes vs Eager load in Rails | by Tushar Adhao | Medium - https://medium.com/@tadhao/joins-vs-preload-vs-includes-vs-eager-load-in-rails-5f721c44b3a9

[3] Preload, Eagerload, Includes and Joins | BigBinary Blog - https://bigbinary.com/blog/preload-vs-eager-load-vs-joins-vs-includes

[4] Joins vs Includes vs Preload vs Eagerload - https://blazarblogs.wordpress.com/2019/08/09/joins-vs-includes-vs-preload-vs-eagerload/

[5] Active Record Query Interface — Ruby on Rails Guides - https://guides.rubyonrails.org/active_record_querying.html

[6] A guide for preloading associations in rails - https://bhserna.com/guide-for-preloading-associations-in-rails.html

[7] What is the difference between includes, preload, eager_load and joins in ActiveRecord? - https://bhserna.com/includes-preload-eager-load-joins-in-rails.html