[Awesome Ruby Gem] Use creek gem to parse large Excel (xlsx and xlsm) files with images

creek

Creek is a Ruby gem that provides a fast, simple and efficient method of parsing large Excel (xlsx and xlsm) files.

Features

  • parse images with with_images and images_at.

  • map cell names with letter and number(A1, B3 and etc) by default.

Installation

You can install it as a gem:

1
$ gem install creek

or add it into a Gemfile (Bundler):

1
2
3
4
5
# Gemfile

# pythonicrubyist/creek: Ruby library for parsing large Excel files.
# https://github.com/pythonicrubyist/creek
gem 'creek', '2.5.3'

Usages

Basic Usage

Creek can simply parse an Excel file by looping through the rows enumerator:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
require 'creek'
creek = Creek::Book.new 'spec/fixtures/sample.xlsx'
sheet = creek.sheets[0]

sheet.rows.each do |row|
puts row # => {"A1"=>"Content 1", "B1"=>nil, "C1"=>nil, "D1"=>"Content 3"}
end

sheet.simple_rows.each do |row|
puts row # => {"A"=>"Content 1", "B"=>nil, "C"=>nil, "D"=>"Content 3"}
end

sheet.rows_with_meta_data.each do |row|
puts row # => {"collapsed"=>"false", "customFormat"=>"false", "customHeight"=>"true", "hidden"=>"false", "ht"=>"12.1", "outlineLevel"=>"0", "r"=>"1", "cells"=>{"A1"=>"Content 1", "B1"=>nil, "C1"=>nil, "D1"=>"Content 3"}}
end

sheet.simple_rows_with_meta_data.each do |row|
puts row # => {"collapsed"=>"false", "customFormat"=>"false", "customHeight"=>"true", "hidden"=>"false", "ht"=>"12.1", "outlineLevel"=>"0", "r"=>"1", "cells"=>{"A"=>"Content 1", "B"=>nil, "C"=>nil, "D"=>"Content 3"}}
end

sheet.state # => 'visible'
sheet.name # => 'Sheet1'
sheet.rid # => 'rId2'

Filename considerations

By default, Creek will ensure that the file extension is either *.xlsx or *.xlsm, but this check can be circumvented as needed:

1
2
path = 'sample-as-zip.zip'
Creek::Book.new path, :check_file_extension => false

By default, the Rails file_field_tag uploads to a temporary location and stores the original filename with the StringIO object. (See this section of the Rails Guides for more information.)

Creek can parse this directly without the need for file upload gems such as Carrierwave or Paperclip by passing the original filename as an option:

1
2
3
4
5
# Import endpoint in Rails controller
def import
file = params[:file]
Creek::Book.new file.path, check_file_extension: false
end

Parsing images

Creek does not parse images by default. If you want to parse the images, use with_images method before iterating over rows to preload images information. If you don’t call this method, Creek will not return images anywhere.

Cells with images will be an array of Pathname objects. If an image is spread across multiple cells, same Pathname object will be returned for each cell.

1
2
3
sheet.with_images.rows.each do |row|
puts row # => {"A1"=>[#<Pathname:/var/folders/ck/l64nmm3d4k75pvxr03ndk1tm0000gn/T/creek__drawing20161101-53599-274q0vimage1.jpeg>], "B2"=>"Fluffy"}
end

Images for a specific cell can be obtained with images_at method:

1
2
3
4
puts sheet.images_at('A1') # => [#<Pathname:/var/folders/ck/l64nmm3d4k75pvxr03ndk1tm0000gn/T/creek__drawing20161101-53599-274q0vimage1.jpeg>]

# no images in a cell
puts sheet.images_at('C1') # => nil

Creek will most likely return nil for a cell with images if there is no other text cell in that row - you can use images_at method for retrieving images in that cell.

Remote files

1
2
remote_url = 'http://dev-builds.libreoffice.org/tmp/test.xlsx'
Creek::Book.new remote_url, remote: true

Mapping cells with header names

By default, Creek will map cell names with letter and number(A1, B3 and etc). To be able to get cell values by header column name use with_headers (can be used only with #simple_rows method!!!) during creation (Note: header column is first string of sheet)

1
creek = Creek::Book.new file.path, with_headers: true

References

[1] pythonicrubyist/creek: Ruby library for parsing large Excel files. - https://github.com/pythonicrubyist/creek

[2] creek | RubyGems.org | your community gem host - https://rubygems.org/gems/creek