[Talking-Rails] ActiveRecord::Calculations speed up Rails app by calculating aggregate values of columns in ActiveRecord models

ActiveRecord::Calculations speed up Rails app

In general scenarios, calculations that can be done at the database level are not recommended to be done at the Rails app level if there are no special requirements. Avoid unnecessary time, network and resource consumption

ActiveRecord::Calculations provide methods for calculating aggregate values of columns in ActiveRecord models.

Instance Public methods

average(column_name)

Calculates the average value on a given column. Returns nil if there’s no row. See calculate for examples with options.

1
Person.average(:age) # => 35.8

calculate(operation, column_name)

This calculates aggregate values in the given column. Methods for count, sum, average, minimum, and maximum have been added as shortcuts.

1
2
3
4
5
6
7
Person.calculate(:count, :all) # The same as Person.count
Person.average(:age) # SELECT AVG(age) FROM people...

# Selects the minimum age for any family without any minors
Person.group(:last_name).having("min(age) > 17").minimum(:age)

Person.sum("2 * age")

There are two basic forms of output:

  • Single aggregate value: The single value is type cast to Integer for COUNT, Float for AVG, and the given column’s type for everything else.

  • Grouped values: This returns an ordered hash of the values and groups them. It takes either a column name, or the name of a belongs_to association.

1
2
3
4
5
6
7
8
9
10
11
12
values = Person.group('last_name').maximum(:age)
puts values["Drake"]
# => 43

drake = Family.find_by(last_name: 'Drake')
values = Person.group(:family).maximum(:age) # Person belongs_to :family
puts values[drake]
# => 43

values.each do |family, max_age|
...
end

count(column_name = nil)

Count the records.

1
2
3
4
5
6
7
8
9
10
11
Person.count
# => the total count of all people

Person.count(:age)
# => returns the total count of all people whose age is present in database

Person.count(:all)
# => performs a COUNT(*) (:all is an alias for '*')

Person.distinct.count(:age)
# => counts the number of different age values

If count is used with Relation#group, it returns a Hash whose keys represent the aggregated column, and the values are the respective amounts:

1
2
Person.group(:city).count
# => { 'Rome' => 5, 'Paris' => 3 }

If count is used with Relation#group for multiple columns, it returns a Hash whose keys are an array containing the individual values of each column and the value of each key would be the count.

1
2
3
Article.group(:status, :category).count
# => {["draft", "business"]=>10, ["draft", "technology"]=>4,
["published", "business"]=>0, ["published", "technology"]=>2}

If count is used with Relation#select, it will count the selected columns:

1
2
Person.select(:age).count
# => counts the number of different age values

Note: not all valid Relation#select expressions are valid count expressions. The specifics differ between databases. In invalid cases, an error from the database is thrown.


ids()

Pluck all the ID’s for the relation using the table’s primary key

1
2
Person.ids # SELECT people.id FROM people
Person.joins(:companies).ids # SELECT people.id FROM people INNER JOIN companies ON companies.person_id = people.id

maximum(column_name)

Calculates the maximum value on a given column. The value is returned with the same data type of the column, or nil if there’s no row. See calculate for examples with options.

1
2
3
4
5
6
7
8
9
Person.maximum(:age) # => 93
``````

### minimum(column_name)

Calculates the minimum value on a given column. The value is returned with the same data type of the column, or `nil` if there's no row. See calculate for examples with options.

```ruby
Person.minimum(:age) # => 7

Pick the value(s) from the named column(s) in the current relation. This is short-hand for `relation.limit(1).pluck(*column_names).first1, and is primarily useful when you have a relation that’s already narrowed down to a single row.

Just like pluck, pick will only load the actual value, not the entire record object, so it’s also more efficient. The value is, again like with pluck, typecast by the column type.

1
2
3
4
5
6
7
Person.where(id: 1).pick(:name)
# SELECT people.name FROM people WHERE id = 1 LIMIT 1
# => 'David'

Person.where(id: 1).pick(:name, :email_address)
# SELECT people.name, people.email_address FROM people WHERE id = 1 LIMIT 1
# => [ 'David', 'david@loudthinking.com' ]

pluck(*column_names)

Use pluck as a shortcut to select one or more attributes without loading a bunch of records just to grab the attributes you want.

1
Person.pluck(:name)

instead of

1
Person.all.map(&:name)

Pluck returns an Array of attribute values type-casted to match the plucked column names, if they can be deduced. Plucking an SQL fragment returns String values by default.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Person.pluck(:name)
# SELECT people.name FROM people
# => ['David', 'Jeremy', 'Jose']

Person.pluck(:id, :name)
# SELECT people.id, people.name FROM people
# => [[1, 'David'], [2, 'Jeremy'], [3, 'Jose']]

Person.distinct.pluck(:role)
# SELECT DISTINCT role FROM people
# => ['admin', 'member', 'guest']

Person.where(age: 21).limit(5).pluck(:id)
# SELECT people.id FROM people WHERE people.age = 21 LIMIT 5
# => [2, 3]

Person.pluck(Arel.sql('DATEDIFF(updated_at, created_at)'))
# SELECT DATEDIFF(updated_at, created_at) FROM people
# => ['0', '27761', '173']

See also ids - https://api.rubyonrails.org/v6.1.4/classes/ActiveRecord/Calculations.html#method-i-ids.

sum(column_name = nil)

Calculates the sum of values on a given column. The value is returned with the same data type of the column, 0 if there’s no row. See calculate for examples with options.

1
Person.sum(:age) # => 4562

Performance tips

Optimize the performance of a Ruby on Rails app by calculating aggregate values of columns.

Prefer pluck instead of map

If you are interested in only a few values per row, you should use pluck instead of map.

For example:

1
2
3
Order.where(number: 'R545612547').map &:id
# Order Load (5.0ms) SELECT `orders`.* FROM `orders` WHERE `orders`.`number` = 'R545612547' ORDER BY orders.created_at DESC
=> [1]

As with select, map will load the order into memory and it will get the id attribute.

Using pluck will be faster, because it doesn’t need to load an entire object into memory.

So this will be much faster:

1
2
3
Order.where(number: 'R545612547').pluck :id
# SQL (0.8ms) SELECT `orders`.`id` FROM `orders` WHERE `orders`.`number` = 'R545612547' ORDER BY orders.created_at DESC
=> [1]

For this particular case, pluck is six times faster than map.

Prefer ActiveRecord::Calculations#sum instead of Enumerable#sum

Usually in Rails applications we find many references to Enumerable::sum for summing values. This is a common mistake because ActiveRecord::Calculations provides a way to do this without loading a bunch of ActiveRecord objects in memory. If you want to perform mathematical operations for a set of records following the Rails way, ActiveRecord::Calculations is the best way to do them in the database.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Benchmark.ips do |x|
x.report("SQL sum") do
Loan.sum(:balance)
end

x.report("Ruby sum") do
Loan.sum(&:balance)
# Same as: Loan.all.map { |loan| loan.balance }.sum
end

x.compare!
end

# Comparison:
# SQL sum: 7.89 i/s
# Ruby sum: 0.03 i/s - 209.85x slower

Prefer ActiveRecord::Calculations#maximum instead of Enumerable#max

As we explained above, to perform better with calculations you should use ActiveRecord::Calculations methods whenever is possible.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Benchmark.ips do |x|
x.report("SQL max") do
Loan.maximum(:amount)
end

x.report("Ruby max") do
Loan.pluck(:amount).max
end

x.compare!
end

# Comparison:
# SQL max: 541.9 i/s
# Ruby max: 0.5 i/s - 1113.47x slower

Prefer ActiveRecord::Calculations#minimum instead of Enumerable#min

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Benchmark.ips do |x|
x.report("SQL min") do
Loan.minimum(:amount)
end

x.report("Ruby min") do
Loan.pluck(:amount).min
end

x.compare!
end

# Comparison:
# SQL min: 533.3 i/s
# Ruby min: 0.5 i/s - 1017.21x slower

References

[1] ActiveRecord::Calculations - https://api.rubyonrails.org/v6.1.4/classes/ActiveRecord/Calculations.html

[2] Calculations methods used on database records in Rails 6 | by Ajith Kumar | Analytics Vidhya | Medium - https://medium.com/analytics-vidhya/calculations-methods-used-on-database-records-in-rails-6-f147221dd5f6

[3] Tips for Writing Fast Rails: Part 2 - FastRuby.io | Rails Upgrade Service - https://www.fastruby.io/blog/rails/performance/writing-fast-rails-part-2.html

[4] Tips for Writing Fast Rails: Part 3 - FastRuby.io | Rails Upgrade Service - https://www.fastruby.io/blog/rails/performance/writing-fast-rails-part-3.html

[5] Active Record Query Interface — Ruby on Rails Guides - https://guides.rubyonrails.org/active_record_querying.html

[6] evanphx/benchmark-ips: Provides iteration per second benchmarking for Ruby - https://github.com/evanphx/benchmark-ips

[7] Grouping and Aggregating - Back-End Engineering Curriculum - Turing School of Software and Design - https://backend.turing.edu/module2/lessons/grouping_and_aggregating