[Awesome Ruby Gem - Abstract Syntax Tree (AST)] Use parser gem to parse Abstract Syntax Tree (AST) in Ruby

Posted on 2021-07-31 Edited on 2025-06-10 In Programming Language , Ruby , Awesome Ruby Gem , Abstract Syntax Tree (AST) Views: Word count in article: 5.8k Reading time ≈ 5 mins.

parser

Parser is a production-ready Ruby parser written in pure Ruby. It recognizes as much or more code than Ripper, Melbourne, JRubyParser or ruby_parser, and is vastly more convenient to use.

You can also use unparser mbj/unparser: Turn Ruby AST into semantically equivalent Ruby source - https://github.com/mbj/unparser to produce equivalent source code from Parser’s ASTs.

Features

Precise source location reporting.
Documented AST format which is convenient to work with.
A simple interface and a powerful, tweakable one.
Parses 1.8, 1.9, 2.0, 2.1, 2.2 and 2.3 syntax with backwards-compatible AST formats.
Parses MacRuby and RubyMotion syntax extensions.
Rewriting support.
Parsing error recovery.
Improved clang-like diagnostic messages with location information.
Written in pure Ruby, runs on MRI >=2.0.0, JRuby and Rubinius (and historically, all versions of Ruby since 1.8)
Only one runtime dependency: the ast gem.
Insane Ruby lexer rewritten from scratch in Ragel.
100% test coverage for Bison grammars (except error recovery).
Readable, commented source code.

Installation

You can install it as a gem:

1	$ gem install parser

or add it into a Gemfile (Bundler):

# Gemfile

# whitequark/parser: A Ruby parser.
# https://github.com/whitequark/parser
gem 'parser', '3.0.2.0'

Then, run bundle install.

1	$ bundle install

Usage

Load Parser (see the backwards compatibility section below for explanation of emit_* calls):

require 'parser/current'
# opt-in to most recent AST format:
Parser::Builders::Default.emit_lambda              = true
Parser::Builders::Default.emit_procarg0            = true
Parser::Builders::Default.emit_encoding            = true
Parser::Builders::Default.emit_index               = true
Parser::Builders::Default.emit_arg_inside_procarg0 = true
Parser::Builders::Default.emit_forward_arg         = true
Parser::Builders::Default.emit_kwargs              = true
Parser::Builders::Default.emit_match_pattern       = true

Parse a chunk of code:

p Parser::CurrentRuby.parse("2 + 2")
# (send
#   (int 2) :+
#   (int 2))

Access the AST’s source map:

p Parser::CurrentRuby.parse("2 + 2").loc
# #<Parser::Source::Map::Send:0x007fe5a1ac2388
#   @dot=nil,
#   @begin=nil,
#   @end=nil,
#   @selector=#<Source::Range (string) 2...3>,
#   @expression=#<Source::Range (string) 0...5>>

p Parser::CurrentRuby.parse("2 + 2").loc.selector.source
# "+"

Traverse the AST: see the documentation for gem ast.

Parse a chunk of code and display all diagnostics:

parser = Parser::CurrentRuby.new
parser.diagnostics.consumer = lambda do |diag|
  puts diag.render
end

buffer = Parser::Source::Buffer.new('(string)', source: "foo *bar")

p parser.parse(buffer)
# (string):1:5: warning: `*' interpreted as argument prefix
# foo *bar
#     ^
# (send nil :foo
#   (splat
#     (send nil :bar)))

If you reuse the same parser object for multiple #parse runs, you need to #reset it.

You can also use the ruby-parse utility (it’s bundled with the gem) to play with Parser:

$ ruby-parse -L -e "2+2"
(send
  (int 2) :+
  (int 2))
2+2
 ~ selector
~~~ expression
(int 2)
2+2
~ expression
(int 2)
2+2

$ ruby-parse -E -e "2+2"
2+2
^ tINTEGER 2                                    expr_end     [0 <= cond] [0 <= cmdarg]
2+2
 ^ tPLUS "+"                                    expr_beg     [0 <= cond] [0 <= cmdarg]
2+2
  ^ tINTEGER 2                                  expr_end     [0 <= cond] [0 <= cmdarg]
2+2
  ^ false "$eof"                                expr_end     [0 <= cond] [0 <= cmdarg]
(send
  (int 2) :+
  (int 2))

Documentation

Documentation for Parser is available online File: README — Documentation by YARD 0.8.7.4 - https://whitequark.github.io/parser/
.

Node names

Several Parser nodes seem to be confusing enough to warrant a dedicated README section.

(block)

The (block) node passes a Ruby block, that is, a closure, to a method call represented by its first child, a (send), (super) or (zsuper) node. To demonstrate:

$ ruby-parse -e 'foo { |x| x + 2 }'
(block
  (send nil :foo)
  (args
    (arg :x))
  (send
    (lvar :x) :+
    (int 2)))

(begin) and (kwbegin)

TL;DR: Unless you perform rewriting, treat (begin) and (kwbegin) as the same node type.

Both (begin) and (kwbegin) nodes represent compound statements, that is, several expressions which are executed sequentally and the value of the last one is the value of entire compound statement. They may take several forms in the source code:

foo; bar: without delimiters
(foo; bar): parenthesized
begin foo; bar; end: grouped with begin keyword
def x; foo; bar; end: grouped inside a method definition

and so on.

$ ruby-parse -e '(foo; bar)'
(begin
  (send nil :foo)
  (send nil :bar))
$ ruby-parse -e 'def x; foo; bar end'
(def :x
  (args)
  (begin
    (send nil :foo)
    (send nil :bar)))

Note that, despite its name, kwbegin node only has tangential relation to the begin keyword. Normally, Parser AST is semantic, that is, if two constructs look differently but behave identically, they get parsed to the same node. However, there exists a peculiar construct called post-loop in Ruby:

1
2
3

begin
  body
end while condition

This specific syntactic construct, that is, keyword begin…end block followed by a postfix while, behaves very unlike other similar constructs, e.g. (body) while condition. While the body itself is wrapped into a while-post node, Parser also supports rewriting, and in that context it is important to not accidentally convert one kind of loop into another.

$ ruby-parse -e 'begin foo end while cond'
(while-post
  (send nil :cond)
  (kwbegin
    (send nil :foo)))
$ ruby-parse -e 'foo while cond'
(while
  (send nil :cond)
  (send nil :foo))
$ ruby-parse -e '(foo) while cond'
(while
  (send nil :cond)
  (begin
    (send nil :foo)))

(Parser also needs the (kwbegin) node type internally, and it is highly problematic to map it back to (begin).)