« back to discussion

Just learned about Ripper in ruby

So I'm into the first part of this book. Absolutely fascinating and incredible book. I have owned my fair share of ruby books and this one sits at the top.

I am understanding concepts for questions I have had since the beginning that I never found answers for that satisfied me, or didn't leave me hanging (basically, where does the 'magic' come from, how are things parsed and compiled, what exactly is YARV and bytecode, etc.)

One part of the book talks about Ripper, and tool that comes with ruby which shows you how ruby parses your code at a base level.

Basically, ruby transforms your code three times before it even executes. These steps can be summarized as

tokenize -> parse -> compile

Without giving away too much of the book, I just wanted to paste a few code snippets for the first two steps, as I have never heard this tool mentioned anywhere else, and the results are very cool.

Tokenize: This step is basically breaking down your source code .rb files into small chunks or tokens. Syntax mistakes are not handled in this step, this is just breaking your code down into single characters or fragments.

Parse: This step is what actually what 'intelligently' can 'understand' the tokens, according to a very large set of grammar rules. These grammar rules are located in parse.y, a file which is over 10,000 lines long.

To use ripper, you just require it. Try some out in irb.

require 'ripper'

You can run Ripper.methods to see a few options, but here are the main ones going along with what I said above.

For the code snippet I will use:

puts "pineapple" * 3 if 1 < 2

First, to see just a simple list of tokens.

>> pp Ripper.tokenize('puts "pineapple"*3 if 1 < 2') yields:

["puts", " ", """, "pineapple", """, "*", "3", " ", "if", " ", "1", " ", "<", " ", "2"]

Not particularly fascinating, but you can see how ruby is finding keywords and recognizing ruby syntax.

next with lexical identification

>> pp Ripper.lex('puts "pineapple"*3 if 1 < 2') yields:

[[[1, 0], :on_ident, "puts"],
 [[1, 4], :on_sp, " "],
 [[1, 5], :on_tstring_beg, "\""],
 [[1, 6], :on_tstring_content, "pineapple"],
 [[1, 15], :on_tstring_end, "\""],
 [[1, 16], :on_op, "*"],
 [[1, 17], :on_int, "3"],
 [[1, 18], :on_sp, " "],
 [[1, 19], :on_kw, "if"],
 [[1, 21], :on_sp, " "],
 [[1, 22], :on_int, "1"],
 [[1, 23], :on_sp, " "],
 [[1, 24], :on_op, "<"],
 [[1, 25], :on_sp, " "],
 [[1, 26], :on_int, "2"]]

in the brackets are line and column, followed by the token itself in symbol format, followed by the code. These symbols correspond to the parse.y C code. For example :on_ident would be the same as tIDENTIFIER.

Lastly, we can see the textual representation as an Abstract Syntax Tree (AST):

>> pp Ripper.sexp('puts "pineapple"*3 if 1 < 2') yields:

   [:binary, [:@int, "1", [1, 22]], :<, [:@int, "2", [1, 26]]],
    [:@ident, "puts", [1, 0]],
        [:string_content, [:@tstring_content, "pineapple", [1, 6]]]],
       [:@int, "3", [1, 17]]]],

You can see how ruby is parsing and tokenizing the code in this simplified AST.

Very cool. To see this explained x100 better and more interesting than what I wrote above, I absolutely recommend you check out the book. Very pleased with what I'm learning in it.

While its something that I probably would never directly use (writing my own compiler), I 100% stand behind the idea that the more you know about your craft on all levels the better off you will be.

over 4 years ago, by pineapple

4 Replies


If you care to see a more comprehensive version of the above AST, you can run:
ruby --dump parsetree file.rb to see the full thing that will be converted to bytecode.

His book really goes into breaking down what all of this means, but I thought I should include it anyway for completeness.

This is the output from the same sample code above:

# @ NODE_SCOPE (line: 1)
# +- nd_tbl: (empty)
# +- nd_args:
# |   (null node)
# +- nd_body:
#     @ NODE_IF (line: 1)
#     +- nd_cond:
#     |   @ NODE_CALL (line: 1)
#     |   +- nd_mid: :<
#     |   +- nd_recv:
#     |   |   @ NODE_LIT (line: 1)
#     |   |   +- nd_lit: 1
#     |   +- nd_args:
#     |       @ NODE_ARRAY (line: 1)
#     |       +- nd_alen: 1
#     |       +- nd_head:
#     |       |   @ NODE_LIT (line: 1)
#     |       |   +- nd_lit: 2
#     |       +- nd_next:
#     |           (null node)
#     +- nd_body:
#     |   @ NODE_FCALL (line: 1)
#     |   +- nd_mid: :puts
#     |   +- nd_args:
#     |       @ NODE_ARRAY (line: 1)
#     |       +- nd_alen: 1
#     |       +- nd_head:
#     |       |   @ NODE_CALL (line: 1)
#     |       |   +- nd_mid: :*
#     |       |   +- nd_recv:
#     |       |   |   @ NODE_STR (line: 1)
#     |       |   |   +- nd_lit: "pineapple"
#     |       |   +- nd_args:
#     |       |       @ NODE_ARRAY (line: 1)
#     |       |       +- nd_alen: 1
#     |       |       +- nd_head:
#     |       |       |   @ NODE_LIT (line: 1)
#     |       |       |   +- nd_lit: 3
#     |       |       +- nd_next:
#     |       |           (null node)
#     |       +- nd_next:
#     |           (null node)
#     +- nd_else:
#         (null node)

From here, this node tree will be converted into bytecode using a stack.

pineapple, over 4 years ago



wsg, over 4 years ago


Yep WSG I highly recommend it, but you definitely need to know ruby basics first (blocks, syntax, hashes, procs, lambdas).

Also, only a few chapters are free. If you skip ahead and even like it a bit I really recommend stopping where you are and just getting it for $20, its cheap for the quality of content you get. I'd also like to think he put it in a certain order, so skipping ahead to hashes might be more complex than it needs to be without reading the first parts.

I just finished chapter 1, and I can say while it won't make me directly a better ruby programmer in terms of syntax or little tricks or whatever... that really isn't it's purpose — it has taught me a large amount of how things work under the hood which has always bothered me that I didn't know. My mind can be a sponge, and I want to know everything about a topic when I hear something new, even if I don't use it (C, compilers, parsers, virtual machines, bytecode, Ruby vs JRuby, what is shift/reduce).

Not to mention people in the past have tried to explain to me on a core level how a language (any language) is 'built' and it was always a bit over my head. Now i see that you actually write one huge grammar file in yacc or equivalent syntax, and the yacc or bison parser generator will look at your syntax file and create a parser, which that parser will then turn your code into a node tree.. those node trees are really the heart of the 'decisions' that can be made, because at that point the bytecode which is created works off of the structure and data of the nodes (and what reads that bytecode is a virtual machine).

Ive often wondered how can a language be created which can be so complex it's mind boggling... building a simple Pineapple app is challenging already. His simple puts 2+2 example is a lot to soak in, let alone a 2000 line .rb file. So many seemingly millions of combinations for ways to parse the syntax of a language. I'm not saying its easy or anything, but now I see how it's 'doable' whereas before it just made me sick to my stomach :)

He also talks about the stack. I'm sure you've seen the stack too deep error, well now I see how the code is actually pushed onto a stack in the format of receiver, arguments, function.. then it leaves the result of that process on the stack, which can be conveniently be left as an argument for the next function call to be pushed onto the stack. It's all really quite clever.

I'm rambling with what I read, but it just answers so many questions I had that weren't really pertinent enough to make a stack overflow thread or even really ask anyone (way too many questions, way too little time). He's answered a huge portion of them in only one chapter.

I rarely get this excited about books, heh. Just very happy with what I'm learning in it.

pineapple, over 4 years ago


He released the rest of this book. Such good stuff. PS... syntax highlighting is causing 500 errors occasionally, just refresh the page if that happens.

pineapple, over 3 years ago

Login or to comment.

« back to discussion

Tutorials are any resources you learn from.

Examples: an intro to html5 screencast, a pdf about git, photoshop effects tutorials, meta-programming in ruby, lambda calculus, higher-order fixed-point combinators.

Tools are websites, apps or services used -on- your project (indirectly), to aid the process.

Examples: A color scheme generator, email marketing software, usability heat maps, css3 code generators, a downloadable png compressor.

Assets are downloadable files used -in- your projects, usually as code, textures, or images.

Examples: a jquery sticky menu, photoshop brushes, background textures, mvc frameworks, twitter bootstrap, 960 grid system.