Perl 6 Core Hacking: Where's Da Sauce, Boss?

2016-08-04 | 3071 words | Locating the source code for specific core methods and subs

Imagine you were playing with Perl 6 and you came across a buglet or you were having some fun with the Perl 6 bug queue—you'd like to debug a particular core subroutine or method, so where's the source for it at?

Asked such a question, you might be told it's in Rakudo compiler's GitHub repository. Depending on how deep down the rabbit hole you wish to go, you may also stop by NQP's repo, which is a subset of Perl 6 that's used in Rakudo, or the MoarVM's repo, which is the leading virtual machine Perl 6 runs on.

The answer is fine, but we can do better. We'd like to know exactly where da sauce is.

Stick to The Basics

The most obvious way is to just use grep command in the source repository. The code is likely in src/ directory, or src/core more specifically.

We'll use a regex that catches sub, method, and multi keywords. For example, here's our search for path sub or method:

$ grep -nER '^\s*(multi|sub|method|multi sub|multi method)\s+path' src/core

src/core/Cool.pm:229:    method path() { self.Stringy.IO }
src/core/CompUnit/Repository/Locally.pm:26:    method path-spec(CompUnit::Repository::Locally:D:) {
src/core/CompUnit/Repository/AbsolutePath.pm:46:    method path-spec() {
src/core/CompUnit/Repository/NQP.pm:32:    method path-spec() {
src/core/CompUnit/Repository/Perl5.pm:46:    method path-spec() {
src/core/CompUnit/PrecompilationStore/File.pm:93:    method path(CompUnit::PrecompilationId $compiler-id,
src/core/CompUnit/PrecompilationUnit.pm:17:    method path(--> IO::Path) { ... }
src/core/IO/Spec/Win32.pm:58:    method path {
src/core/IO/Spec/Unix.pm:61:    method path {
src/core/IO/Handle.pm:714:    method path(IO::Handle:D:)            { $!path.IO }

It's not too terrible, but it's a rather blunt tool. We have these problems:

  • There are false positives; we have several path-spec methods found
  • It doesn't tell us which of the results is for the actual method we have in our code. There's Cool, IO::Spec::Unix, and IO::Handle all with method path in them. If I call "foo".IO.path, which of those get called?

The last one is particularly irksome, but luckily Perl 6 can tell us where the source is from. Let's ask it!

But here's line number... So code me maybe

The Code class from which all subs and methods inherit provides .file and .line methods that tell which file that particular Code is defined in, including the line number:

say "The code is in {.file} on line {.line}" given &foo;

sub foo {
    say 'Hello world!';
}

# OUTPUT:
# The code is in test.p6 on line 3

That looks nice and simple, but it gets more awkward with methods:

class Kitty {
    method meow {
        say 'Meow world!';
    }
}

say "The code is in {.file} on line {.line}" given Kitty.^can('meow')[0];

# OUTPUT:
# The code is in test.p6 on line 2

We got extra cruft of the .^can metamodel call, which returns a list of Method objects. Above we use the first one to get the .file and .line number from, but is it really the method we were looking for? Take a look at this example:

class Cuddly {
    method meow ('meow', 'meow') {
        say 'Meow meow meow!';
    }
}

class Kitty is Cuddly {
    multi method meow ('world') {
        say 'Meow world!';
    }

    multi method meow ('meow') {
        say 'Meow meow';
    }
}

We have a method meow in one class and in another class we have two multi methods meow. How can we print the location of the last method, the one that takes a single 'meow' as an argument?

First, let's take a gander at all the items .^can returns:

say Kitty.^can('meow');
# OUTPUT:
# (meow meow)

Wait a minute, we have three methods in our code, so how come we only have two meows in the output? Let's print the .file and .line for both meows:

for 0, 1 {
    say "The code is in {.file} on line {.line}"
        given Kitty.^can('meow')[$_];
}
# OUTPUT:
# The code is in gen/moar/m-CORE.setting on line 587
# The code is in test.p6 on line 2

The second meow gives us a sane result; it's our method defined in class Cuddly. The first one, however, gives us some weird file.

What's happening here is the line is referencing the proto for the multies. Since in this case instead of providing our own proto we use the autogenerated one, the referenced file has nothing to do with our code. We can, of course, add a proto into the code, but then the line number would still reference the proto, not the last meow method. Is there anything that we can do?

You .cando It!

The Routine class, from which both Method and Sub classes inherit, provides the .cando method. Given a Capture, it returns a list of candidates that can handle it, with the narrowest candidate first in the list, and since the returned object is a Code, we can query its specific .file and .line:

class Cuddly {
    method meow ('meow', 'meow') {
        say 'Meow meow meow!';
    }
}

class Kitty is Cuddly {
    multi method meow ('world') {
        say 'Meow world!';
    }

    multi method meow ('meow') {
        say 'Meow meow';
    }
}

my $code = gather {
    for Kitty.^can('meow') -> $meth {
        .take for $meth.cando: \(Kitty, 'meow');
    }
}

say "The code is in {.file} on line {.line}" with $code[0];

# OUTPUT:
# The code is in test.p6 on line 12

Hooray! We got the correct location of the multi we wanted. We still have our two classes with three meow methods total. On line 17–21 we loop over the two meow Methods the .^can metamodel call gives us. For each of them we call the .cando method with the Capture that matches the multi we want (note that we do need to provide the needed object as the first argument of the Capture). We then .take all found candidates to gather them into the $code variable.

The first value we get is the narrowest candidate and is good 'nuf for us, so we call the .file and .line on it, which gives us the location we were looking for. Sounds like we nailed this .file and .line business down rather well. Let's dive into the core, shall we?

Can't see the core files for the setting

If this is the first time you're to see the print out of the .file/.line for some core stuff, you're in for a surprise. Actually, we've already seen the surprise, but you may have thought it to be a fluke:

say "{.file}:{.line}" given &say;
# OUTPUT:
# gen/moar/m-CORE.setting:29038

All of the nice, good looking files you see in src/core in the repo actually get compiled into one giant file called the "setting." My current setting is 40,952 lines long and the .line of core subs and methods refers to one of those thousands of lines.

Now sure, we could pop the setting open and watch our editor grind to a stuttering halt (I'm looking at you, Atom!). However, that doesn't help us find the right repo file to edit if we want to make changes to how it works. So what do we do?

A keen eye will look at the contents of the setting or at the file that generates it and notice that for each of the separate files in the repo, the setting has this type of comment before the contents of the file are inserted into the setting:

#line 1 src/core/core_prologue.pm

This means if we're clever enough, we can write a sub that translates a line number in the setting to the separate file we can locate in the repo. Here's a plan of action: we pop open the setting file and read it line by line. When we encounter one of the above comments, we make a note of which file we're in as well as how many lines deep in the setting we're currently at.

The location of the setting file may differ, depending on how you installed Perl 6, but on my system (I use rakudobrew), it's in $*EXECUTABLE.parent.parent.parent.child('gen/moar/m-CORE.setting'), so the code for finding the actual file that defines our core sub or method is this:

sub real-location-for ($wanted) {
    state $setting = $*EXECUTABLE.parent.parent.parent.child: 'gen/moar/m-CORE.setting';
    my ($cur-line-num, $offset) = 0, 0;
    my $file;
    for $setting.IO.lines -> $line {
        return %( :$file, :line($cur-line-num - $offset), )
            if ++$cur-line-num == $wanted;

        if $line ~~ /^ '#line 1 ' $<file>=\S+/ {
            $file   = $<file>;
            $offset = $cur-line-num + 1;
        }
    };
    fail 'Were not able to find location in setting.';
}

say "{.<file>}:{.<line>}" given real-location-for &say.line;


# OUTPUT:
# src/core/io_operators.pm:17

The $wanted contains the setting line number given to us by .line call and the $cur-line-num contains the number of the current line we're examining. We loop until the $cur-line-num reaches $wanted and return a Hash with the results. For each line that matches our special comment, we store the real name of the file the code is from into $file and store the $offset of the first line of the code in that file. Once done, we simply subtract the $offset from the setting $cur-line-num and we get the line number in the source file.

This is pretty awesome and useful, but it's still not what I had in mind when I said we wanted to know exactly where da sauce is. I don't want to clone the repo and go to the repo and open my editor. I want to just look at code.

If it's worth doing, it's worth overdoing

There's one place where we can stare at Rakudo's source code until it blushes and looks away: GitHub. Since our handy sub gives us a filename and a line number, we can construct a URL that points to a specific file and line in the source code, like this one, for example: https://github.com/rakudo/rakudo/blob/nom/src/core/Str.pm#L16

There's an obvious problem with such an approach: the URL points to the master branch (called nom, for "New Object Model," in Rakudo). Commits go into the repo daily, and unless we rebuild our Perl 6 several times a day, there's a good chance the location our GitHub URL points to is wrong.

Not only do we have to point to a specific file and line number, we have to point to the right commit too. On GitHub's end, it's easy: we just replace nom in the URL with the appropriate commit number—we just need Rakudo to tell us what that number is.

The two dynamic variables $*VM and $*PERL contain some juicy information. By introspecting them, we can locate some useful info and what looks like commit prefix parts in version numbers:

say $*VM.^methods;
# (BUILD platform-library-name Str gist config prefix precomp-ext
# precomp-target precomp-dir name auth version signature desc)

say $*VM.version;
# v2016.06

say $*PERL.^methods;
# (BUILD VMnames DISTROnames KERNELnames Str gist compiler name auth version
# signature desc)

say $*PERL.compiler.^methods;
# (BUILD build-date Str gist id release codename name auth version
# signature desc)

say $*PERL.compiler.version;
# v2016.06.10.g.7.cff.429

Rakudo is a compiler and so we're interested in the value of $*PERL.compiler.version. It contains the major release version, followed by g, followed by the commit prefix of this particular build. The prefix is split up on number-letter boundaries, so we'll need to join up all the bits and split on g. But, take a look at $*VM.version, which is the version of the virtual machine we're running the code on. There aren't any gs and commits in it and for a good reason: it's a tagged major release, and the name of the tag is the version. The same will occur for Rakudo on release builds, like the ones shipped with Rakudo Star. So we'll need to check for such edge cases and this is the code:

my $where = .Str ~~ /g/
    ?? .parts.join.split("g")[*-1]
    !! .Str
given $*PERL.compiler.version;

given a $*PERL .compiler .version, if it contains letter g, join up version bits, split on g, and the last portion will be our commit prefix; if it doesn't contain letter g, then we're dealing with a release tag, so we'll take it as-is. All said and done, our code for locating source becomes this:

my $where = .Str ~~ /g/
    ?? .parts.join.split("g")[*-1]
    !! .Str
given $*PERL.compiler.version;

say [~] 'https://github.com/rakudo/rakudo/blob/',
        $where, '/', .<file>, '#L', .<line>
given real-location-for &say.line;

# OUTPUT:
# https://github.com/rakudo/rakudo/blob/c843682/src/core/io_operators.pm#L17

Hey! Awesome! We got a link that points to the correct commit and file! Let celebrations begin! Wait. What? You followed the link and noticed the line number is not quite right? What gives? Did we mess up our algorithm?

Crank Up The Insanity

If you take a look again at the script that generates the setting file, you'll notice it strips things: comments and special backend-specific chunks of code.

There are two ways to fix this. The sane approach would be to commit a change that would make that script insert an empty line for each line it skips and then pretend that we didn't commit that just to make our personal project work. Then, there's the Zoffix Way to fix this: we got the GitHub link, so why don't we fetch that code and figure out what the right line number is. Hey! That second way sounds much more fun! Let's do just that!

The one link we've seen so far is this: https://github.com/rakudo/rakudo/blob/c843682/src/core/io_operators.pm#L17. It's not quite what we want, since it's got HTML and bells and whistles in it. We want raw code and GitHub does offer that at a slightly different URL: https://raw.githubusercontent.com/rakudo/rakudo/c843682/src/core/io_operators.pm. The plan of action then becomes:

  • Get the line number in the setting
  • Use our real-location-for sub to get the filename and sorta-right line number in a source file
  • Get the commit our compiler was built with
  • Generate a GitHub URL for raw code for that file on that commit and fetch that code
  • Use the same algorithm as in the setting generating script to convert the code we fetched into the version that lives in our setting, while keeping track of the number of lines we strip
  • When we reach the correct line number in the converted file, we adjust the original line number we had by the number of lines we stripped
  • Generate a regular GitHub URL to the commit, file, and corrected line number
  • ???
  • Profit!

I could go over the code, but it's just a dumb, unfun algorithm, and most importantly, you don't need to know it. Because... there's a module that does just that!

What Sorcery Is This?

The module is called CoreHackers::Sourcery and when you use it, it'll augment the Code class and all core classes that inherit from it with .sourcery method, as well as provide a sourcery subroutine.

So, to get the location of the code for say sub, just run:

use CoreHackers::Sourcery;
&say.sourcery.put;

# OUTPUT:
# src/core/io_operators.pm:20 https://github.com/rakudo/rakudo/blob/c843682/src/core/io_operators.pm#L20

That gives us the correct location of the proto. We can either pop open a file in a repo checkout or view the code at the provided GitHub URL.

Want to get the location of a specific multi? There's no need to mess with .cando! The arguments you give to the .sourcery method will be used to select the best matching multi, so to find the location of the say multi that will handle say "foo" call, just run:

&say.sourcery("foo").put;

# OUTPUT:
# src/core/io_operators.pm:22 https://github.com/rakudo/rakudo/blob/c843682/src/core/io_operators.pm#L22

That covers the subs. For methods, you can go with the whole .^can meta dance, but we like simple things, and so we'll use the subroutine form of sourcery:

put sourcery Int, 'abs';         # method of a type object
put sourcery 42,  'split';       # method of an Int object
put sourcery 42,  'base', \(16); # best candidate for `base` method called with 16 as arg

This is pretty handy. And the whole hitting the GitHub thing? The module will cache the code fetched from GitHub, so things like this won't take forever:

put "Int.{.name} is at {.sourcery}" for Int.^methods;

However, if you do actually run that code, after some output you'll be greeted with this error:

# Method 'sourcery' not found for invocant of class 'Method+{Callable[Bool:D]}'
#   in block  at test.p6 line 1
#   in block <unit> at test.p6 line 1

The class it mentions is not a pure Method object, but has a mixin in it. While CoreHackers::Sourcery recomposes all core subclasses of Code class after augmenting it, it doesn't do that for such mixes, so you'd have to recompose them yourself:

for Int.^methods {
    .WHAT.^compose;
    put "Int.{.name} is at {.sourcery}" ;
}

Or better still, just use the subroutine form of sourcery:

put "Int.{.name} is at {sourcery $_}" for Int.^methods;

Do It For Me

For most stuff, we wouldn't want to do a whole bunch of typing to use a module and call subs and then copy/paste URLs or filenames. You'll notice sourcery returns a list of two items: the filename and the URL. This means we can make some nice and short aliases to call it and automatically pop open either our editor or web browser:

$ alias sourcery='perl6 -MCoreHackers::Sourcery -MMONKEY-SEE-NO-EVAL \
    -e '\''run "atom", "/home/zoffix/rakudo/" \
        ~ EVAL "sourcery(@*ARGS[0])[0]" '\'''

$ alias sourcery-web='perl6 -MCoreHackers::Sourcery -MMONKEY-SEE-NO-EVAL \
    -e '\''run "firefox", EVAL "sourcery(@*ARGS[0])[1]" '\'''

# opens Atom editor at the spot to edit code for Int.base
$  sourcery 'Int, "base"'

# opens Firefox, showing code for Int.base
$  sourcery 'Int, "base"'

We EVAL the argument we give to these aliases, so be careful with them. For sourcery alias, we run the Atom editor and give it the file to open. I prepended the location of my local Rakudo checkout, but you'd use yours. Most editors support opening file:line-number format to open files at a particular spot; if yours doesn't, modify the command.

For sourcery-web we use the URL returned by sourcery and open Firefox browser at this location. And just like that, with a few keystrokes, we can jump in to view or edit the code for a particular core sub or method in Rakudo!

Conclusion

We've learned where Rakudo's source lives, how to find the commit the current compiler is built off, and how to locate the source code for a particular sub or method in a giant file called the setting. We then further hacked away the inconveniences by getting to the actual place in the source code we can edit, culminating with a shiny module and a couple of handy command line aliases.

Happy hacking!

UPDATE 2016.08.05

Inspired by this blog post, lizmat++ has changed the setting generation script to not skip any lines, so making adjustments to line numbers by fetching source from GitHub is no longer necessary, as the line numbers match up with the original source.