Day 15 – Santa had too much eggnog

Santa had too much eggnog

We’re just over a week from Christmas and Santa is sending his elves the final present lists. Unfortunately, Santa had a bit too much eggnog and so the list that he sent to his elves was … not the greatest. Take a look at some of it:

Johnny
 - 4 bsaeball gluvs
 - 2 batts
 - 2 ballz
Mary
 - 3 fancee dols
 - 1 dressss
 - 1 bbaskebtall

Santa somehow managed to keep a nice format that we could mostly process with regexen, so the elves started hammering away at a nice grammar:

grammar Santa'sList {
  rule TOP        {       <kid's-list>+    }
  rule kid's-list {     <name>     <gift>+ }
  rule gift       { '-' <quantity> <item>  }
  token name      { <-[\n]>   }
  token quantity  { <.digit>+ }
  token item      { <.alpha>+ % \h+ }
}

While the elves figured that they could try to figure out what he meant in an action object, they decided it would be more interesting to create a token that they could reuse not just in the grammar, but in any random regex — these elves are crafty!

They wanted to make a new token that they’d call <fuzzy> that could somehow capture Santa’s drunken scribblings (can we call his typed list a scribbling?). But regex syntax doesn’t actually allow for doing any kind of fuzzy matching. But here Raku’s engine comes to the rescue. So first they created a code block inside of the token. Code blocks are normally defined with just { 🦋 } but because they needed to define the success of a match, they opted instead of the <?{ 🦋 }> conditional block, which will not only run the code, but will also fail if the block returns a false-y value.

  token fuzzy {
    (<.alpha>+ % \h+)
    <?{
      # «ö» code here
    }>
  }

Before they started writing their code, they did two other things. First they named the capture to be a bit easier to maintain down the road. And secondly, they realized they needed to actually get the list of possible toys into the token somehow. So they added a signature to the token to pass it in.

  token fuzzy(**@toys) {
    $<santa's-text>=(<.alpha>+ % \h+)
    <?{
      # «ö» code here
    }>
  }

Now they could begin the code itself. They would take Santa’s text, and compare it to each of the possible toys, and decide which one was the closest match:

  token fuzzy(**@toys) {
    $<santa's-text>=(<.alpha>+ % \h+)
    <?{
      my $best = @toys
                   .map({ $^toy, qgram($toy,$<santa's-text>.Str)})
                   .sort( *.tail )
                   .tail;
      say "Santa meant to write {$best[0]}";
    }>
  }

The Q-gram function they used creates N-grams for each word, and compares them to see how many they have in common. With testing they found that the best value for N (the length of each substring) was about half the average length. The way that Raku works, writing the Q-gram function was super easy:

  #| Generate space-padded N-grams of length n for string t.
  sub ngrams = -> \t, \n {
    my \s = (' ' x n - 1)  ~ t ~  (' ' x n - 1);
    do for ^(t.chars + n) { s.substr: $_, n }
  }

  #| Calculate Q-gram score using bag operations
  sub qgram (\a, \b) {
    my \q  = (a.chars + b.chars) div 4;
    my \aₙ = ngrams(a,q).BagHash;
    my \bₙ = ngrams(b,q).BagHash;

    (aₙ ∩ bₙ) / (aₙ ∪ bₙ)      # Coefficient de communauté de Jaccard
  }

Raku let the elves calculate N-grams in just two clean lines of code, and then use those to calculate the Jaccard-index between the two strings in just four more easy to read lines of code.

Putting this back into their grammar, they ended up with the following:

grammar Santa'sList {
  rule TOP        {       <kid's-list>+    }
  rule kid's-list {     <name>     <gift>+ }
  rule gift       { '-' <quantity> <item>  }
  token name      { <-[\n]>   }
  token quantity  { <.digit>+ }
  token item      { <fuzzy(@gifts)> }
  token fuzzy     { … }
  sub ngrams      { … }
  sub qgrams      { … }
}

That’s a pretty handy format, but an important problem remains. How do they get access to the best matched text? If they were to match and request, say, $<kid's-list>[0]<gift>[0]<item> they would only get Santa’s original illegible mess. They could do an action but that requires doing a parse with actions, which means the fuzzy token is tied to the vagaries of grammar parsing. Works fine here, but… less reusable.

But elves are good at packing and wrapping. They decide to make a package that wraps the fuzzy token so that both Santa’s original and the corrected version are easily accessible in a DWIM manner. This ‘package’ can’t be declared with package or module, though, because the wrapping process requires using the special sub EXPORT. Their basic process looks like the following:

sub EXPORT {
  # Make the fuzzy token in the elves' factory
  my token fuzzy (*@words) { … } 

  # Wrap it in wrapping paper (apply a role) so it's prettier (easier to use)
  &fuzzy.wrap( … )

  # Ship out (export) the wrapped version
  Map.new( '&fuzzy' => &fuzzy )
}

Any other special tools the elves need can be included in the EXPORT block, for example, the Q- and N-gram functions. So how will they actually do the wrapping? First, they design the paper, that is, a parameterized role that will override the .Str to give the clean/corrected value, but also provide access to a .fuzz function to allow access to older values:

  role Fuzzy[$clear,$fuzz] {
    method Str  { $clear }
    method fuzz { $fuzz  }
  }

Now, the wrapped function could look something like the following:

  &fuzzy.wrap(
    sub (|) {
      my $match = callsame;

      # Failed match evals to false, and is just passed along
      # Successful match gets Fuzzy role mixed in.
      $match
        ?? $match but Fuzzy[$match.??, $match.??]
        !! $match
    }
  );

There’s a small problem. The results of the calculations they ran inside of the token aren’t available. One solution they thought of involved adding new parameters to to the fuzzy token with the trait is raw so that the values could be passed back, but that felt like something the old C++ elves would do. No, Santa’s Raku elves had a better idea: dynamic variables. They made two of them, and refactored the original fuzzy method to assign to them:

  my token fuzzy(**@toys) {
    $<santa's-text>=(<.alpha>+ % \h+)
    <?{
      my $best = @toys
                  .map({ $^toy, qgram($toy,$<santa's-text>.Str)})
                  .sort( *.tail )
                  .tail;
      $*clear = $best[0];
      $*fuzz  = ~$<santa's-text>;
    }>
  }

  &fuzzy.wrap(
    sub (|) {
      my $*fuzz;
      my $*clear;

      my $match = callsame;   # sets $match to result of the original

      $match
        ?? $match but Fuzzy[$*clear, $*fuzz]
        !! $match
    }
  );

They did a test with some values and all went well, until an item wasn’t found:

"I like the Norht Pole" ~~ /I like the $<dir>=<fuzzy: <North South>> Pole/;
say $<dir>.clear;   # --> "North"
say $<dir>.fuzz;    # --> "Norht"

"I like the East Pole" ~~ /I like the $<dir>=<fuzzy: <North South>> Pole/;
say $<dir>.clear;   # --> "North"
say $<dir>.fuzz;    # --> "East"

What happened? The elves realized that their token was matching no matter what. This is because the <?{ 🦋 }> block will only fail if it returns a falsey value. The last statement, being an assignment of a string, will virtually always be truthy. To fix this, they added a simple conditional to the end of the block to fail if the Q-gram score wasn’t sufficiently high.

  my token fuzzy(**@toys) {
    $<santa's-text>=(<.alpha>+ % \h+)
    <?{
      my $best = @toys
                   .map({ $^toy, qgram($toy,$<santa's-text>.Str)})
                   .sort( *.tail )
                   .tail;

      $*clear = $best[0];
      $*fuzz  = ~$<santa's-text>;

      # Arbitrary but effective score cut off.
      $best[1] > 0.33
    }>
  }

With that, they were done, and able to process Santa’s horrible typing.

Of course, there were a lot of improvements that the elves could still make to make their fuzzy token more useful. After they had made use of it (and taken the eggnog away from Santa so they wouldn’t need it), they polished it up so that it could bring joy to everyone.


With that, I can also announce the release of Regex::FuzzyToken. To use it, just do like the elves and in a grammar or any other code, say use Regex::FuzzyToken and the token fuzzy will be imported into your current scope. It has a few extra features, so take a look at its readme for information on some of its options.

While not everyone will use or have need of a fuzzy token, I hope that this shows off some interesting possibilities when creating tokens that might be better defined programmatically, as well as other cool Raku features like Bag operators, dynamic variables, and parameterized roles.

Edit March 2020: A small change was made in the EXPORT sub. It now returns a Map instead of a Hash. Maps do not containerized their values.

Day 14 – Thinking Beyond Types: an Introduction to Rakudo’s MOP

It’s Christmas season! Christmas would not be Christmas without the caroling that’s part of the festivities, so let’s make it possible to sing some.

We could simply make a carol one giant string, but that’s not good enough. Being a song, carols often have a chorus that’s repeated in between verses. If we were to store it as one string, we’d be repeating ourselves. On top of that, people are not perfect; they may forget a verse of a carol, or even just a single line of a verse. We need a type to represent a carol. This could be a type of song, but since we only care about carols, it’s a bit early to abstract this out.

Now, to make this more interesting, let’s handle this without making instances of any type of any kind. All behaviour for all Christmas carols will be handled by type objects. This will be enforced using the Uninstantiable REPR.

At first, we might have a Christmas::Carol role:

role Christmas::Carol is repr<Uninstantiable> {
    proto method name(::?CLASS:U: --> Str:D)        {*}
    proto method verse(::?CLASS:U: Int:D --> Seq:D) {*}
    proto method chorus(::?CLASS:U: --> Seq:D)      {*}
    proto method lyrics(::?CLASS:U: --> Seq:D)      {*}

    method sing(::?CLASS:U: --> ::?CLASS:U) {
        .say for @.lyrics;
        self
    }
}

This would then be done by a class representing a specific carol:

class Christmas::Carol::JingleBells does Christmas::Carol {
    multi method name(::?CLASS:U: --> 'Jingle Bells') { }

    multi method verse(::?CLASS:U: 1 --> Seq:D) {
        lines q:to/VERSE/
        Dashing through the snow
        In a one-horse open sleigh
        O'er the fields we go
        Laughing all the way
        Bells on bobtails ring
        Making spirits bright
        What fun it is to ride and sing
        A sleighing song tonight
        VERSE
    }
    multi method verse(::?CLASS:U: 2 --> Seq:D) {
        lines q:to/VERSE/
        A day or two ago
        I thought I'd take a ride
        And soon, Miss Fanny Bright
        Was seated by my side
        The horse was lean and lank
        Misfortune seemed his lot
        He got into a drifted bank
        And then we got upset
        VERSE
    }
    multi method verse(::?CLASS:U: 3 --> Seq:D) {
        lines q:to/VERSE/
        A day or two ago
        The story I must tell
        I went out on the snow
        And on my back I fell
        A gent was riding by
        In a one-horse open sleigh
        He laughed as there I sprawling lie
        But quickly drove away
        VERSE
    }
    multi method verse(::?CLASS:U: 4 --> Seq:D) {
        lines q:to/VERSE/
        Now the ground is white
        Go it while you're young
        Take the girls tonight
        And sing this sleighing song
        Just get a bobtailed bay
        Two forty as his speed
        Hitch him to an open sleigh
        And crack, you'll take the lead
        VERSE
    }

    multi method chorus(::?CLASS:U: --> Seq:D) {
        lines q:to/CHORUS/
        Jingle bells, jingle bells
        Jingle all the way
        Oh, what fun it is to ride
        In a one-horse open sleigh, hey
        Jingle bells, jingle bells
        Jingle all the way
        Oh, what fun it is to ride
        In a one-horse open sleigh
        CHORUS
    }

    multi method lyrics(::?CLASS:U: --> Seq:D) {
        gather for 1..4 {
            take $_ for @.verse($_);
            take "";
            take $_ for @.chorus;
            take "" if $_ != 4;
        }
    }
}

There’s a problem with this approach, though. What happens if you want to hold onto a collection of Christmas carols to carol around the neighbourhood with?

use Christmas::Carol::JingleBells;
use Christmas;:Carol::JingleBellRock;
use Christmas::Carol::DeckTheHalls;
use Christmas::Carol::SilentNight;
# And so on...

That’s no good! You don’t need to know who wrote a Christmas carol in order to sing it. On top of that, no one thinks of Christmas carols in terms of symbols; they think of them in terms of their name. To represent them effectively, we need to make it so we can look up a Christmas carol using its name, while also making it possible to introduce new carols that can be looked up this way at the same time. How can we do this?

The way we’ll be using here requires a bit of explanation on how types in Raku work.

The Metaobject Protocol

Classes may contain three different types of method declarations. The two you most commonly see are public and private methods:

class Example {
    method !private-method(|) { ... }
    method public-method(|)   { ... }
}

There is a third type of method declaration you can make, which is exclusive to classes (this a lie, but this is the case when writing Raku with Rakudo alone), by prefixing the method’s name with ^. Calls to these are typically made using the .^ dispatch operator, which you often see when you need to introspect an object (.^name, .^methods, etc.). However, these don’t behave at all like you would otherwise expect a method to. Let’s take a look at what the invocant and parameters are when we invoke a method of this type using the .^ dispatch operator:

class Example {
    method ^strange-method(\invocant: |params --> List:D) {
        (invocant, params)
    }
}

say Example.^strange-method;
# OUTPUT:
# (Perl6::Metamodel::ClassHOW+{<anon>}.new \(Example))

Whoa whoa whoa, WTF? Why is the class this type of method is declared in its first parameter instead of its invocant? What even is that object that ended up as its invocant instead, and where is it coming from?

Before this can be explained, first we’ll need to understand a little bit about what the metaobject protocol (MOP) is. The MOP is a feature specific to Rakudo through which the behaviour for all objects that can exist in Raku are implemented. These are implemented based on kinds (types of types), such as classes, roles, and grammars. The behaviour for any type is driven by what is called a higher-order working (HOW), which is a metaobject. These are typically instances of a metaclass of some sort. For instance, HOWs for classes are created by the Metamodel::ClassHOW metaclass.

The HOW for any given object can be introspected. How is this done? How, you ask? How? HOW!? By calling HOW on it, of course!

role Foo { }
say Foo.HOW.^name; # OUTPUT: Metamodel::ParametricRoleGroupHOW

Methods of HOWs are called metamethods, and these are what are used to handle the various behaviours that types can support. Some examples of behaviours of kinds that metamethods handle include type names, attributes, methods, inheritance, parameterization, and typechecking. Since most of these are not features specific to any one kind, these are often mixed into metaclasses by metaroles. For instance, the Metamodel::Naming metarole is what handles naming for any type that can be named.

So that third type of method declaration from earlier? That doesn’t actually declare a method for a class; instead, it declares a metamethod that gets mixed into that class’ HOW, similarly to how metaroles are used. The .^ dispatch operator is just sugar for invoking a metamethod of an object using that object as its first argument, which metamethods accept in most cases. For instance, these two metamethod calls are equivalent:

say Int.^name;         # OUTPUT: Int
say Int.HOW.name(Int); # OUTPUT: Int

Metamethods are the tool we’ll be using to implement Christmas carols purely as types.

Spreading the Joy

To start, instead of having a Christmas::Carol role that gets done by Christmas carol classes, let’s make our carols roles mixed into a Christmas::Carol class instead. Through this class, we will stub the methods a Christmas carol should have, like it was doing as a role, but in addition will hold on to a dictionary of Christmas carols by their name.

We can store carols using an add_carol metamethod:

my constant %CAROLS = %();
method ^add_carol(Christmas::Carol:U, Str:D $name, Mu $carol is raw --> Mu) {
    %CAROLS{$name} := $carol;
}

Now we can mark roles as being carols like so:

role Christmas::Carol::JingleBells { ... }
Christmas::Carol.^add_carol: 'Jingle Bells', Christmas::Carol::JingleBells;

This isn’t a great API for people to use though. A trait could make it so this could be handled from a role’s declaration. Let’s make an is carol trait for this:

multi sub trait_mod:<is>(Mu \T, Str:D :carol($name)!) {
    Christmas::Carol.^add_carol: $name, T
}

Now we can define a role as being a carol like this instead:

role Christmas::Carol::JingleBells is carol('Jingle Bells') { ... }

To make it so we can fetch carols by name, we can simply make our Christmas::Carol class parametric. This can be done by giving it a parameterize metamethod which, given a name, will create a Christmas::Carol mixin using any carol we know about by that name:

method ^parameterize(Christmas::Carol:U $this is raw, Str:D $name --> Christmas::Carol:U) {
    self.mixin: $this, %CAROLS{$name}
}

Now we can retrieve our Christmas carols by parameterizing Christmas::Carol using a carol name. What will the name of the mixin type returned be, though?

say Christmas::Carol['Jingle Bells'].^name;
# OUTPUT: Christmas::Carol+{Christmas::Carol::JingleBells}

That’s a bit ugly. Let’s reset the mixin‘s name during parameterization:

method ^parameterize(Christmas::Carol:U $this is raw, Str:D $name --> Christmas::Carol:U) {
    my Christmas::Carol:U $carol := self.mixin: $this, %CAROLS{$name};
    $carol.^set_name: 'Christmas::Carol[' ~ $name.perl ~ ']';
    $carol
}

This gives our Jingle Bells carol a name of Christmas::Carol["Jingle Bells"] instead. Much better.

Let’s add one last metamethod: carols. This will return a list of names for the carols known by Christmas::Carol:

method ^carols(Christmas::Carol:U --> List:D) {
    %CAROLS.keys.list
}

With that, our Christmas::Carol class is complete:

class Christmas::Carol is repr<Uninstantiable> {
    proto method name(::?CLASS:U: --> Str:D)        {*}
    proto method chorus(::?CLASS:U: --> Seq:D)      {*}
    proto method verse(::?CLASS:U: Int:D --> Seq:D) {*}
    proto method lyrics(::?CLASS:U: --> Seq:D)      {*}

    method sing(::?CLASS:U: --> ::?CLASS:U) {
        .say for @.lyrics;
        self
    }

    my constant %CAROLS = %();
    method ^add_carol(Christmas::Carol:U, Str:D $name, Mu $carol is raw --> Mu) {
        %CAROLS{$name} := $carol;
    }
    method ^carols(Christmas::Carol:U --> List:D) {
        %CAROLS.keys.list
    }
    method ^parameterize(Christmas::Carol:U $this is raw, Str:D $name --> Christmas::Carol:U) {
        my Christmas::Carol:U $carol := self.mixin: $this, %CAROLS{$name};
        $carol.^set_name: 'Christmas::Carol[' ~ $name.perl ~ ']';
        $carol
    }
}

multi sub trait_mod:<is>(Mu \T, Str:D :carol($name)!) {
    Christmas::Carol.^add_carol: $name, T
}

Now, this is great and all, but how is this an improvement on our original code? By defining carols this way, we no longer need to know the symbol for a carol in order to sing it, and we no longer need to know which module even declared the carol in the first place. So long as we know that the Christmas::Carol class exists, we know all of the carols all of the modules we import happen to be aware of.

This means there can be a module defining a collection of carols:

use Christmas::Carol::JingleBells;
use Christmas::Carol::JingleBellRock;
use Christmas::Carol::DeckTheHalls;
use Christmas::Carol::SilentNight;
unit module Christmas::Carol::Collection;

From another module, we can make another collection using this, and define more carols:

use Christmas::Carol::Collection;
use Christmas::Carol::JingleBells::BatmanSmells;
unit module Christmas::Carol::Collection::Plus;

We can then import this and easily sing all of the original module’s carols, in addition to the ones this new module adds, by name:

use Christmas::Carol;
use Christmas::Carol::Collection::Plus;

Christmas::Carol[$_].sing for Christmas::Carol.^carols;

At this point, you may be wondering: “Couldn’t you just write code that does the same thing as this using instances?”. You’d be right! What this shows is that while there is a protocol for working with types when using Rakudo, the behaviour of any given type isn’t particularly unique; it’s driven mainly by metaobjects that you can have complete control over.

Just with metamethod declarations in classes alone, you can augment or override behaviours of any type that supports inheritance. This is far from the extent of what the MOP allows you to do with types! Alas, the more advanced features for working with the MOP would require more explanation, and would be best left for another time.

Day 13 – A Little R&R

A Little R&R

camelialoveferris

Introduction

Raku is a really nice language. Versatile, expressive, fast, dwimmy. The only problem I sometimes have with it is that it can be a little slow. Fortunately that can easily be solved by the NativeCall interface, which makes it easy to call C code in your Raku program. Now, as nice as C is, it is a rather old language with some limitations. A newer language that can fill its niche is known as Rust. I’ll show some examples of having Raku talk to Rust.

FFI

Rust code can be called from other languages using the FFI standard. FFI stands for “Foreign Function Interface” and allows you to export a Rust library as a standard shared library (a .so file on Linux or .dll on Windows). This is done by adding the following section in your Cargo.toml:

[lib]
crate-type = ["cdylib"]

After adding this section you will find the libraries in your target/debug or target/release folder when you build the library with Cargo. Also be sure to add the libc dependency to get access to standard C types.

Primitives

We can use the same primitive types as in C: numbers (and chars) and arrays.

Numbers and chars

Rust:

#[no_mangle]
pub extern fn addition(a: u32, b:32) -> u32 {
        a + b
}

Raku:

use NativeCall;
sub addition(uint32, uint32) returns uint32 is native('foo') { * }

Note the #[no_mangle], this keeps the name of the function unchanged in the final library file. While Rust has standardized name mangling (contrary to C++, where the name mangling is platform-dependent), it is still nice to call a function with it’s original name.

Arrays and strings

Rust:

use std::ffi::CStr;
use std::os::raw::c_char;

#[no_mangle]
pub unsafe extern fn count_chars(s: *const c_char) -> u32 {
        CStr::from_ptr(s).to_str().unwrap().chars().count() as u32
}

#[no_mangle]
pub extern fn lunch() -> *mut c_char {
        let c_string = CString::new("🌮🍚").expect("CString::new failed");
        c_string.into_raw()
}

#[no_mangle]
pub unsafe extern fn free_lunch(ptr: *mut c_char) {
        let _ = CString::from_raw(ptr);
}

Raku:

sub count_chars(Str is encoded('utf8')) returns uint32 is native ('foo') { * }
sub lunch() returns CArray[uint8] is native('foo') { * }
sub free_lunch(CArray[uint8]) is native('foo') { * }

Rust has first class support for UTF-8, making it a great fit for Raku. Using CString also guarantees to add a null byte at the end of the string, so you can get the significant bytes by looping until the AT-POS() value equals 0… if, that is, you choose to return an array rather than populate it.

Structs

Rust:

use std::mem::swap;

#[repr(C)]
pub struct Point {
    x: f32,
    y: f32,
}

impl Point {
        fn print(&self) {
                println!("x: {}, y: {}", self.x, self.y);
        }
}

#[no_mangle]
pub unsafe extern "C" fn flip(p: *mut Point) {
    swap(&mut (*p).x, &mut (*p).y);
        (*p).print();
}

Raku:

class Point is repr('CStruct') {
    has num32 $.x;
    has num32 $.y;
}

sub flip(Pointer[Point]) is native('./librnr.so') { * }

sub flipper {
    my Point $p .= new(x => 3.Num, y => 4.Num);
    say "x: ", $p.x, ", y: ", $p.y;
    flip(nativecast(Pointer[Point], $p));
}

Rust separates objects into structs (which we are all familiar with), and traits, which are kind of like roles in Raku.

Concurrency

Rust:

#[no_mangle]
pub extern "C" fn multithread(count: i32) {
    let threads: Vec<_> = (1..8)
        .map(|id| {
            thread::spawn(move || {
                println!("Starting thread {}", id);
                let mut x = 0;
                for y in 0..count {
                    x += y;
                    println!("Thread {}, {}/{}: {}", id, y, count, x);
                }
            })
        })
        .collect();

    for t in threads {
        t.join().expect("Could not join a thread!");
    }
}

Raku:

sub multithread(int32) is native('./librnr.so') { * }

sub multi-multithread {
    my @numbers = (3_000..50_000).pick(10);

    my @promises;

    for @numbers -> $n {
        push @promises, start {
            multithread($n);
            True;
        };
    }

    await Promise.allof(@promises);
}

Rust and Raku both have first-class concurrency support. This allows you to easily tweak your programs to get the highest possible performance.

Closing statement

These were some examples of interactions between Rust and Raku, two of the most promising languages when looking to the future. If you found this interesting, be sure to check out Andrew Shitov’s A Language a Day articles. Thanks for reading and happy holidays.

Day 12 – Making a simple bot in Raku

Making IRC bots is incredibly simple in Raku, thanks to IRC::Client. It allows you to create a very simple bot in about 20 lines of code. There’s a plugin system that allows easy re-use of code between multiple bots, and adding customized features can be as easy as dropping in an anonymous class.

So, let’s get to it!

Get your dependencies installed

Raku uses zef as the standard module installer, and if you’re reading this, I’m assuming you have it available to you. Install IRC::Client with zef, and you should be good to get started.

zef install IRC::Client

Setting up the bot

To set up the bot, we’ll need to have a nickname to use, a server to connect to and a list of channels to join. To make it easier to run this is a program from your shell, I’ll be using a MAIN sub as well.

use IRC::Client;

sub MAIN () {
  IRC::Client.new(
    nick => 'raku-advent',
    host => 'irc.darenet.org',
    channels => < #advent >,
  ).run;
}

Let’s save this in a file called bot.pl6, and run it.

perl6 bot.pl6

This will run, and if you’re in the channel you specified in channels, you should see the bot joining in a short moment. However, the program itself doesn’t seem to provide any output. It would be highly convenient, especially during development, to show what it’s doing. This is possible by enabling the debug mode. Adding this to the new method call, making it look as follows.

IRC::Client.new(
  nick => 'raku-advent',
  host => 'irc.darenet.org',
  channels => < #advent >,
  debug => True,
).run;

If you restart the application now, you will see there’s a lot of output all of a sudden, showcasing the IRC commands the bot is receiving and sending in response. Now all we need to do is add some functionality.

Making the bot work

As described earlier, functionality of the bot is added in using plugins. These can be any class that implements the right method names. For now, we’ll stick to irc-to-me, which is a convenience method which is triggered whenever the bot is spoken to in a private message, or directly addressed in a channel.

The simplest example to get started with here is to simply have it respond with the message you sent to the bot. Let’s do this by adding an anonymous class as a plugin to the new method call.

IRC::Client.new(
  nick => 'raku-advent',
  host => 'irc.darenet.org',
  channels => < #advent >,
  debug => True,
  plugins => [
    class {
      multi method irc-to-me ($e) {
        $e.text
      }
    }
  ],
).run;

When you restart the bot and talk to it on IRC, you will see it responding to you with the same message you sent it.

 <@tyil> raku-advent: hi
 <raku-advent> tyil, hi
 <@tyil:> raku-advent: how are you doing
 <raku-advent> tyil, how are you doing

Adding some real features

So, you’ve seen how easy it is to get started with a simple IRC bot in just over a dozen lines. Let’s add two features that you may want your bot to support.

For convenience sake, I will only cover the class implementing the features, not the entire IRC::Client.new block.

Uptime

First off, let’s make the bot able to show the time its been running for. For this, I’ll make it respond to people asking it for “uptime”. We can use the irc-to-me convenience method for this again. After all, we probably don’t want it to respond every time someone discusses uptime, only when the bot is asked directly about it.

In Raku, there’s a special variable called $*INIT-INSTANT, which contains an Instant of the moment the program started. We can use this to easily get the Duration that the program has been running for.

class {
  multi method irc-to-me ($ where *.text eq 'uptime') {
    my $response = "I've been alive for";
    my ($seconds, $minutes, $hours, $days, $weeks) =
      (now - $*INIT-INSTANT).polymod(60, 60, 24, 7);

    $response ~= " $weeks weeks" if $weeks;
    $response ~= " $days days" if $days;
    $response ~= " $hours hours" if $hours;
    $response ~= " $minutes minutes" if $minutes;
    $response ~= " $seconds seconds" if $seconds;

    $response ~ '.';
  }
}

Now, whenever you ask the bot for uptime, it will respond with a human friendly uptime notification.

 <@tyil> uptime
 <@tyil> raku-advent: uptime
 <raku-advent> tyil, I've been alive for 5 minutes 8 seconds.

User points

Most channels have a bot that keeps track of user points, or karma as it’s sometimes referred to. There’s a module already that does this for us, called IRC::Client::Plugin::UserPoints. We don’t have to do much apart from installing it and adding it to the list of plugins.

zef install IRC::Client::Plugin::UserPoints

Once this finishes, the module can be used in your code. You will need to import it with a use statement, which you can put directly under the use IRC::Client line.

use IRC::Client;
use IRC::Client::Plugin::UserPoints;

Now, in the list of plugins, add it as a new entry.

plugins => [
  IRC::Client::Plugin::UserPoints.new,
  class {
    ...
  },
],

This plugin makes the bot respond to !scores, !sum and whenever a nick is
given points using a ++ suffix, for instance, t​yil++.

 <@tyil> raku++
 <@tyil> raku++
 <@tyil> !scores
 <raku-advent> tyil, « raku » points: main: 2

Finding plugins

All plugins for IRC::Client that are shared on the community have the prefix IRC::Client::Plugin::, so you can search for that on modules.perl6.org to find plugins to use. Of course, you can easily add your own plugins to the ecosystem as well!

Winding down

As you can see, with some very simple code you can add some fun or important
tools to your IRC community using the Raku programming language. Try it out and
have some fun, and share your ideas with others!

Day 11 – Packaging with Libarchive

Distributing physical gifts involves wrapping them up into packages, but suppose you want to distribute digital gifts. How can you use Raku to help you wrap them up? Enter Libarchive!

Simple wrapping files into a package

Let’s wrap up just two files, myfile1 and myfile2 into a single package.zip file. (Libarchive just as easily creates tar files, cpio, rar, even iso9660 images for cds or dvds.)

use Libarchive::Simple;

given archive-write('package.zip') {
    .add: 'myfile1', 'myfile2';
    .close;
}

This very simple syntax looks a little weird for those unfamiliar… here is a more ‘traditional’ way of writing the same thing:

use Libarchive::Write;

my $handle = Libarchive::Write.new('package.zip');
$handle.add('myfile1', 'myfile2');
$handle.close;

What is the difference? Libarchive::Simple provides a few shorthand routines for accessing the various Libarchive functionalities. One of these is archive-write() which is identical to Libarchive::Write.new().

The second example takes the return from new() and stores it in the variable $handle. Then we call two methods on that variable to add the files, and close the file.

The given statement makes this even simpler by topicalizing that variable, that is, storing it in the topic variable $_. Since $_ can be used as the default object for method calls, we don’t need to explicitly refer to it when calling methods.

.add('myfile1') is equivalent to $_.add('myfile1')

But what happened to the parentheses? Another little shorthand when calling methods — rather than surrounding your arguments to a method with parentheses, you can just precede them with a colon:

.add: 'myfile1';

Nice! I love programming with Raku!

Package a bunch of files by smartmatching

A handy routine to help in your packaging is dir(). It will return a lazy list of IO::Path objects for a directory. By happy coincidence, Libarchive add can take IO::Path just as easily as a filename.

given archive-write('package.zip') {
    .add: 'mydir', dir('mydir');
    .close;
}

Note we’ve added the directory itself first, then used dir() to get a list of the files inside mydir, which also get added. If you don’t include the directory itself, it won’t be part of the package. That works fine most of the time, depending on your format and your unpackaging program, but it is good practice to include the directory to make sure it gets created the way you want it to.

dir has an extra feature — it can filter the directory by smartmatching the string with a :test argument. Lets include only jpeg files, allowing them to end in either .jpg or .jpeg:

given archive-write('package.zip') {
    .add: 'mydir', dir('mydir', test => /:i '.' jpe?g $/);
    .close;
}

Ecosystem modules like File::Find or Concurrent::File::Find can easily generate even more complicated lists of files for including by recursively adding an entire hierarchy to the package.

Create your files on the fly while packaging

You aren’t limited to adding existing files. You can use the write() method to generate a file for the package on the fly. You can specify content as a Str, a Blob, or even an IO::Handle or IO::Path to get the content from.

given archive-write('package.zip') {
    .write: 'myfile', q:to/EOF/;
        Myfile
        ------
        A special file for a special friend!
        EOF
    .close;
}

Here we’re using a special Raku quoting construct called the heredoc.

The q:to/EOF/ says to use the lines following up until the EOF marker and make them into the content of a file named ‘myfile’ included in the package file. As a friendly benefit, the amount of indentation of the terminator is automatically removed from each line to the quoted lines. How convenient!

Stream your package instead of making files

Making files with your packages is great and all, but I’ve got a web site I want to return my custom CD images from — why bother with a temporary file? Just stream the output on the fly!

For this example, we’re going to stream the package as an iso9660 file (the image used for CD writers) to STDOUT, but you can stream to other programs too.

given archive-write($*OUT, format => 'iso9660') {
    .add: 'myfile1', 'myfile2', 'mydir', dir('mydir');
    .close;
}

Usually the format can be inferred from the suffix on a specified filename, but since we are streaming there is no filename, so the format must be specified. $*OUT is a special filehandle that is automatically opened for you for writing to STDOUT.

Burn that image to a CD and mount it and you’ll see the specified files. So easy!

Libarchive has so many cool features it would take days to go over them all, but I hope this little intro has whet your appetite for packaging things up. Raku has fantastic syntax, features and expressivity that make it so easy to interface with libraries like this.

Have fun packaging your own things, be they physical or digital!

Day 10 – A Teaser

Santa has a special treat: a teaser if you will. A part of a chapter from the upcoming book “Migrating Perl to Raku”, to be published January 2020.


Optimization Considerations

If you are an experienced Perl programmer, you have (perhaps inadvertently) learned a few tricks to make execution of your Perl program faster. Some of these idioms work counter-productively in Raku. This chapter deals with some of them and provides the alternative idioms in Raku.

Blessed Hashes Vs. Objects

Objects in Perl generally consist of blessed hashes. As we’ve seen before, this implies that accessors need to be made for them. Which means an extra level of indirection and overhead. So many Perl programmers “know” that the object is basically a hash, and access keys in the blessed hash directly.

Consequently, many Perl programmers decide to forget about creating objects altogether and just use hashes. Which is functionally ok if you’re just using the hash as a store for keys and associated values. But in Raku it is better to actually use objects for that from a performance point of view. Take this example where a hash is created with two keys / values:

    # Raku

    for ^1000000 {    # do this a million times

        my %h = a => 42, b => 666;

    }

    say now - INIT now;  # 1.4727555

Now, if we use an object with two attributes, this is more 4x as fast:

    # Raku

    class A {

        has $.a;

        has $.b;

    }


    for ^1000000 {

        my $obj = A.new(a => 42, b => 666);

    }

    say now - INIT now;  # 0.3511395

But, you might argue, accessing the keys in the hash will be faster than calling an accessor to fetch the values?

Nope. Using accessors in Raku is faster as well. Compare this code:

    # Raku

    my %h = a => 42, b => 666;


    for ^10000000 {    # do this ten million times

        my $a = %h<a>;

    }

    say now - INIT now;  # 0.4713363


To:

    # Raku

    class A {

        has $.a;

        has $.b;

    }

    my $obj = A.new(a => 42, b => 666);


    for ^10000000 {

        my $a = $obj.a;
    
}

    say now - INIT now;  # 0.36870995

Note that using accessors is also faster, albeit not much, but still significantly so.

So why is the accessor method faster in Raku? Well, really because Raku is able to optimise to attributes in an object to a list, easily indexed by a number internally. Whereas for a hash lookup, the string must be hashed before it can be looked up. And that takes a lot more work than just a lookup by index.

Of course, as with all benchmarks, this is just a snapshot in time. Optimisation work continues to be performed on Raku, which may change the outcome of these tests in the future. So, always test yourself if you want to be sure that some optimisation is a valid approach, but only if you’re really interested in squeezing performance out of your Raku code. Remember, premature optimisation is the root of all evil!


Santa hopes you liked it.

Day 9: a chain (or Russian doll) of containers

A lot of electronic in a couple of containers

If you’re in the business, you’ve probably by now heard about containers. They can be described as executables on steroids, or also, as its namesake, a great way of shipping applications anywhere, or have then stored and ready to use whenever you need them. These kinda-executables are called images, and you can find them in a number of places called registries, starting with Docker Hub, joined lately by the GitHub Container Registry and other places like Quay.io or RedHat container catalog. These last are up and coming, you and have to add them to your default configuration. Most containers are registered in Docker Hub anyway.

Also, since they are kinda-executables, they are architecture and operating system specific. Docker hub marks their architecture and operating system, with Linux being the most common ones. You can, however, run Linux images everywhere, as long as the Docker daemon is running in a Linux virtual machine; that is also the default configuration in Macs.

Of course, there’s a slew of containers you can use with Raku, even if we don’t really have one we can call official. I’m going to go with my own, since, well, I’m more familiar with them. But there’re these nightly images by tyil, for instance, or the kinda-official Rakudo Star images, which have not been updated since, well, Rakudo Star itself was updated last March.

The Alps as seen from a plane

Let’s start with the basic image, the tiny Russian Doll with Nicky the tsar. Since it’s going to be inside, we need to make it real tiny. Here it is, jjmerelo/alpine-perl6:

FROM alpine:latest
LABEL version="2.2" maintainer="JJMerelo@GMail.com" perl6version="2019.11"

# Environment
ENV PATH="/root/.rakudobrew/versions/moar-2019.11/install/bin:/root/.rakudobrew/versions/moar-2019.11/install/share/perl6/site/bin:/root/.rakudobrew/bin:${PATH}" \
    PKGS="curl git perl" \
    PKGS_TMP="curl-dev linux-headers make gcc musl-dev wget" \
    ENV="/root/.profile" \
    VER="2019.11"

# Basic setup, programs and init
RUN mkdir /home/raku \
    apk update && apk upgrade \
    && apk add --no-cache $PKGS $PKGS_TMP \
    && git clone https://github.com/tadzik/rakudobrew ~/.rakudobrew \
    && echo 'eval "$(~/.rakudobrew/bin/rakudobrew init Sh)"' >> ~/.profile \
    && eval "$(~/.rakudobrew/bin/rakudobrew init Sh)"\
    && rakudobrew build moar $VER \
    && rakudobrew global moar-$VER \
    && rakudobrew build-zef\
    && zef install Linenoise App::Prove6\
    && apk del $PKGS_TMP \
    && RAKUDO_VERSION=`sed "s/\n//" /root/.rakudobrew/CURRENT` \
       rm -rf /root/.rakudobrew/${RAKUDO_VERSION}/src /root/zef \
       /root/.rakudobrew/git_reference

# Runtime
WORKDIR /home/raku
ENTRYPOINT ["raku"]

This image was created just last week, after the release of Raku 2019.11, the first one to actually be called Raku and the one that calls its executable raku too.

First thing you see is that FROM which declares the teeny container that’s inside this one. We’re using Alpine Linux, a distribution little known outside the containerer community, that uses a couple of tricks to avoid bloating the number of files, and thus the size, of the container. This image will add up to less than 300 MBs, while an equivalent image with Debian or Ubuntu will be twice as much. That means that downloading it will take half as much, which is what we’re looking for.

Because there’s this thing, too: real-life containers, when empty, can be Russian-dolled and put inside one another so that they don’t occupy so much space. Something similar happens to containers. They are built putting layers on top of each other, the inner layer usually an operating system. Let’s check out the rest.

The next LABELs are simply tags or metadata that can be extracted from the image by inspection. Not really that important.

But the ENV block kinda is, over all the first one, which defines the PATH that is going to be used across the Russian doll buildup. The rest of the variables are mainly used while building the image. They will also help to make it somewhat generic, so that we can just change the value of a variable and get a new version; we put that into VER.

So far, no building has taken place, but in this humongous RUN statement is where we download rakudobrew, put it to work building the version contained in VER, set that version as the default one, install zef and a couple of modules we are going to need, and then delete what we will no longer be needing in the rest of the outer dolls to keep the whole thing small.

Finally, after setting up a working directory, we define an entry point, which is the real executable-within-the-executable. The container can be used in place of this command, so that anything that can be done with raku, can be done with this this executable. For instance, let’s run this program:

my @arr;
my ($a, $b) = (1,1);
for ^5 {
    ($a,$b) = ($b, $a+$b);
    @arr.push: ($a.item, $b.item);
    say @arr
};
say @arr;

We will give our containerized Raku an alias:

alias raku-do='docker run --rm -t -v `pwd`:/home/raku  jjmerelo/alpine-perl6'

We can run the program above with:

raku-do itemizer-with-container.p6

But you can take it a step further. Create this shell script and put it in the path:

#!/bin/bash

docker run --rm -t -v `pwd`:/home/raku  jjmerelo/alpine-perl6 $@

You can then use this in the shebang line: !/usr/bin/env raku-do.sh. This will create a throwaway image, that will be ephemerally created to run the script, and then thrown away (that’s the --rm in the line). The current directory (pwd) will be aliased to /home/raku, remember, our working directory, which means that the raku inside the container will see it right there. You see? With this you can have raku run wherever docker is installed. Pretty much everywhere, nowadays.

But let’s build up on this. Containers are extensively used for testing, since instead of building and installing, you can put everything in a single container, and download and use it for testing straight away. That’s is actually what made a containerer out of me, the long 20 minutes it took to brew rakudo for a few seconds of testing for every module. After that base container, I created this one, jjmerelo/test-perl6. Here it is:

FROM jjmerelo/alpine-perl6:latest
LABEL version="4.0.2" maintainer="JJ Merelo <jjmerelo@GMail.com>"

# Set up dirs
RUN mkdir /test
VOLUME /test
WORKDIR /test


# Will run this
ENTRYPOINT perl6 -v && zef install --deps-only . && zef test .

This is actually simplicity itself: the only thing that changes is the entrypoint and the working dir. Instead of running directly the raku compiler, it does a couple of things: install dependencies needed to run the tests, and then issue zef test . to run the tests.

That really speeds things up when testing. Put it in your .travis.yml file this way:

language: minimal

services:
  - docker

install: docker pull jjmerelo/test-perl6

script: docker run -t -v  $TRAVIS_BUILD_DIR:/test jjmerelo/test-perl6

And you’re good to go. Takes all of a minute and a half, as opposed to more than 20 minutes if you use the official Travis image, which is based in rakudobrew.

The Russian doll does not stop there: jjmerelo/perl6-test-openssl includes additional Alpine packages which are needed to install OpenSSL. And, based on that one, jjmerelo/perl6-doccer, which packs everything that’s needed to test the Raku documentation.

You should really try this yourself. If you’ve got even just a few additional modules to download when testing your module, just build up from the test-perl6 image and get your own! You’ll save time, and also save computing time, thus saving energy.

Actual Russian, or maybe Latvian, dolls, bought in Riga during PerlCon

 

Day 8 – Parsing Firefox’ user.js with Raku (Part 2)

Yesterday, we made a short Grammar that could parse a single line of the user.js that Firefox uses. Today, we’ll be adding a number of test cases to make sure everything we want to match will match properly. Additionally, the Grammar can be expanded to match multiple lines, so we can let the Grammar parse an entire user.js file in a single call.

Adding more tests

To get started with matching other argument types, we should extend the list of
test cases that are defined in MAIN. Let’s add a couple to match true,
false, null and integer values.

my @inputs = (
  'user_pref("browser.startup.homepage", "https://searx.tyil.nl");',
  'user_pref("extensions.screenshots.disabled", true);',
  'user_pref("browser.search.suggest.enabled", false);',
  'user_pref("i.have.no.nulls", null);',
  'user_pref("browser.startup.page", 3);',
);

I would suggest to update the for loop as well, to indicate which input it is currently trying to match. Things will fail to match, and it will be easier to see which output belongs to which input if we just print it out.

for @inputs {
  say "\nTesting $_\n";
  say UserJS.parse($_);
}

If you run the script now, you’ll see that only the first test case is actually
working, while the others all fail on the argument. Let’s fix each of these
tests, starting at the top.

Matching other types

To make it easy to match all sorts of types, let’s introduce a proto regex. This will help keep everything into small, managable blocks. Let’s also rename the argument rule to constant, which will more aptly describe the things we’re going to match with them. Before adding new functionalities, let’s see what the rewritten structure would be.

rule argument-list {
  '('
  <( <constant>+ % ',' )>
  ')'
}

proto rule constant { * }

rule constant:sym<string> {
  '"'
  <( <-["]>+? )>
  '"'
}

As you can see, I’ve given the constant the sym adverb named string. This makes it easy to see for us that it’s about constant strings. Now we can also easily add additional constant types, such as booleans.

rule constant:sym<boolean> {
  | 'true'
  | 'false'
}

This will match both the bare words true and false. Adding just this and running the script once more will show you that the next two test cases are now working. Adding the null type is just as easy.

rule constant:sym<null> {
  'null'
}

Now all we need to pass the 5th test case is parsing numbers. In JavaScript, everything is a float, so let’s stick to that for our Grammar as well. Let’s accept one or more numbers, optionally followed by both a dot and another set of numbers. Of course, we should also allow a - or a + in front of them.

rule constant:sym<float> {
  <[+-]>? \d+ [ "." \d+ ]?
}

Working out some edge cases

It looks like we can match all the important types now. However, there’s some edge cases that are allowed that aren’t going to work yet. A big one is of course a string containing a "`. If we add a test case for this, we can see it failing when we run the script.

my @inputs = (
  ...
  'user_pref("double.quotes", "\"my value\"");',
);

To fix this, we need to go back to constant:sym, and alter the rule to take escaped double quotes into account. Instead of looking for any character that is not a ", we can alter it to look for any character that is not directly following a \, because that would make it escaped.

rule constant:sym<string> {
  '"'
  <( .*? <!after '\\'> )>
  '"'
}

Parsing multiple lines

Now that it seems we are able to handle all the different user_pref values that Firefox may throw at us, it’s time to update the script to parse a whole file. Let’s move the inputs we have right now to user.js, and update the MAIN subroutine to read that file.

sub MAIN () {
  say UserJS.parse('user.js'.IO.slurp);
}

Running the script now will print a Nil value on STDOUT, but if you still have Grammar::Tracer enabled, you’ll also notice that it has no complaints. It’s all green!

The problem here is that the TOP rule is currently instructed to only parse a single user_pref line, but our file contains multiple of such lines. The parse method of the UserJS Grammar expects to match the entire string it is told to parse, and that’s causing the Grammar to ultimately fail.

So, we’ll need to alter the TOP rule to allow matching of multiple lines. The easieset way is to wrap the current contents into a group, and add a quantifier to that.

rule TOP {
  [
    <function-name>
    <argument-list>
    ';'
  ]*
}

Now it matches all lines, and correctly extracts the values of the user_pref statements again.

Any comments?

There is another edge case to cover: comments. These are allowed in the user.js file, and when looking up such files online for preset configurations, they’re often making extensive use of them. In JavaScript, comments start with // and continue until the end of the line.

We’ll be using a token instead of a rule for this, since that doesn’t handle whitespace for us. The newline is a whitespace character, and is significant for a comment to denote its end. Additionally, the TOP rule needs some small alteration again to accept comment lines as well. To keep things readable, we should move over the current contents of the matching group to it’s own rule.

rule TOP {
  [
  | <user-pref>
  | <comment>
  ]*
}

token comment {
  '//'
  <( <-[\n]>* )>
  "\n"
}

rule user-pref {
  <function-name>
  <argument-list>
  ';'
}

Now you should be able to parse comments as well. It shouldn’t matter wether they are on their own line, or after a user_pref statement.

## Make it into an object

What good is parsing data if you can’t easily play with it afterwards. So, let’s make use of Grammar Actions to transform the Match objects into a list of UserPref objects. First, let’s declare what the class should look like.

class UserPref {
  has $.key;
  has $.value;

  submethod Str () {
    my $value;

    given ($!value) {
      when Str  { $value = "\"$!value\"" }
      when Num  { $value = $!value }
      when Bool { $value = $!value ?? 'true' !! 'false' }
      when Any  { $value = 'null' }
    }

    sprintf('user_pref("%s", %s);', $!key, $value);
  }
}

A simple class containing a key and a value, and some logic to turn it back into a string usable in the user.js file. Next, creating an Action class to make these objects. An Action class is like any regular class. All you need to pay attention to is to name the methods the same as the rules used in the Grammar.

class UserJSActions {
  method TOP ($/) {
    make $/.map({
      UserPref.new(
        key => $_[0].made,
        value => $_[1].made,
      )
    })
  }

  method constant:sym<boolean> ($/) {
    make (~$/ eq 'true' ?? True !! False)
  }

  method constant:sym<float> ($/) {
    make +$/
  }

  method constant:sym<null> ($/) {
    make Any
  }

  method constant:sym<string> ($/) {
    make ~$/
  }
}

The value methods convert the values as seen in the user.js to Raku types. The TOP method maps over all the user_pref statements that have been parsed, and turns each of them into a UserPref object. Now all that is left is to add the UserJSActions class as the Action class for the parse call in MAIN, and use its made value.

sub MAIN () {
  my $match = UserJS.parse('user.js'.IO.slurp, :actions(UserJSActions));

  say $match.made;
}

Now we can also do things with it. For instance, we can sort all the user_pref statements alphabatically.

sub MAIN () {
  my $match = UserJS.parse('user.js'.IO.slurp, :actions(UserJSActions));
  my @prefs = $match.made;

  for @prefs.sort(*.key) {
    .Str.say
  }
}

Sorting alphabetically may be a bit boring, but you have all sorts of possibilities now, such as filtering out certain options or comments, or merging in multiple files from multiple sources.

I hope this has been an interesting journey into parsing a whole other programming language using Raku’s extremely powerful Grammars!

The complete code

parser.pl6

class UserPref {
  has $.key;
  has $.value;

  submethod Str () {
    my $value;

    given ($!value) {
      when Str  { $value = "\"$!value\"" }
      when Num  { $value = $!value }
      when Bool { $value = $!value ?? 'true' !! 'false' }
      when Any  { $value = 'null' }
    }

    sprintf('user_pref("%s", %s);', $!key, $value);
  }
}

class UserJSActions {
  method TOP ($/) {
    make $/.map({
      UserPref.new(
        key => $_[0].made,
        value => $_[1].made,
      )
    })
  }

  method constant:sym<boolean> ($/) {
    make (~$/ eq 'true' ?? True !! False)
  }

  method constant:sym<float> ($/) {
    make +$/
  }

  method constant:sym<null> ($/) {
    make Any
  }

  method constant:sym<string> ($/) {
    make ~$/
  }
}

grammar UserJS {
  rule TOP {
    [
    | <user-prefix>
    | <comment>
    ]*
  }

  token comment {
    '//' <( <-[\n]>* )> "\n"
  }

  rule user-pref {
    <function-name>
    <argument-list>
    ';'
  }

  rule function-name {
    'user_pref'
  }

  rule argument-list {
    '('
    <( <constant>+ % ',' )>
    ')'
  }

  proto rule constant { * }

  rule constant:sym<string> {
    '"'
    <( .*? <!after '\\'> )>
    '"'
  }

  rule constant:sym<boolean> {
    | 'true'
    | 'false'
  }

  rule constant:sym<null> {
    'null'
  }

  rule constant:sym<float> {
    <[+-]>? \d+ [ "." \d+ ]?
  }
}

sub MAIN () {
  my $match = UserJS.parse('user.js'.IO.slurp, :actions(UserJSActions));
  my @prefs = $match.made;

  for @prefs.sort(*.key) {
    .Str.say
  }
}

user.js

// Comments are welcome!

user_pref("browser.startup.homepage", "https://searx.tyil.nl");
user_pref("extensions.screenshots.disabled", true); //uwu
user_pref("browser.search.suggest.enabled", false);
user_pref("i.have.no.nulls", null);
user_pref("browser.startup.page", +3);
user_pref("double.quotes", "\"my value\"");

Day 7 – Parsing Firefox’ user.js with Raku

One of the simplest way to properly configure Firefox, and make the configurations syncable between devices without the need of 3rd party services, is through the user.js file in your Firefox profile. This is a simple JavaScript file that generally contains a list of user_pref function calls. Today, I’ll be showing you how to use the Raku programming language’s Grammars to parse the content of a user.js file. Tomorrow, I’ll be expanding on the basis created here, to allow people to programmatically interact with the user.js file.

The format

Let’s take a look at the format of the file first. As an example, let’s use the startup page configuration setting from my own user.js.

user_pref("browser.startup.homepage", "https://searx.tyil.nl");

Looking at it, we can deconstruct one line into the following elements:

  • Function name: in our case this will almost always be the string user_pref;
  • Opening bracket;
  • List of arguments, seperated by ,
  • Closing bracket;
  • A ; ending the statement.

We can also see that string arguments are enclosed in ". Integers, booleans and null values aren’t quoted in JavaScript, so that’s something we need to take into account as well. But let’s set those aside for now, and first get the example line parsed.

Setting up the testing grounds

I find one of the easiest ways to get started with writing a Grammar is to just write a small Raku script that I can execute to see if things are working, and then extend the Grammar step by step. The starting situation would look like this.

grammar UserJS {
  rule TOP { .* }
}

sub MAIN () {
  my @inputs = ('user_pref("browser.startup.homepage", "https://searx.tyil.nl");');

  for @inputs {
    say UserJS.parse($_);
  }
}

Running this script should yield a single Match object containing the full test string.

「user_pref("browser.startup.homepage", "https://searx.tyil.nl");」

The and markers indicate that we have a Match object, which in this case signifies that the Grammar parsed the input correctly. This is because the placeholder .* that we’re starting out with. Our next steps will be to add rules in front of the .* until that particular bit doesn’t match anything anymore, and we have defined explicit rules for all parts of the user.js file.

Adding the first rule

Since the example starts with the static string user_pref, let’s start on matching that with the Grammar. Since this is the name of the function, we’ll add a rule named function-name to the grammar, which just has to match a static string.

rule function-name {
  'user_pref'
}

Next, this rule needs to be incorporated with the TOP rule, so it will actually be used. Rules are whitespace insensitive, so you can re-order the TOP rule to put all elements we’re looking for one after another. This will make it more readable in the long run, as more things will be tacked on as we continue.

rule TOP {
    <function-name>
    .* 
}

Running the script now will yield a little more output than before.

「user_pref("browser.startup.homepage", "https://searx.tyil.nl");」
 function-name => 「user_pref」

The first line is still the same, which is the full match. It’s still matching everything, which is good. If it didn’t, the match would fail and it would return a Nil. This is why we keep the .* at the end.

There’s an extra line this time, though. This line shows the function-name rule having a match, and the match being user_pref. This is in line with our expectations, as we told it to match that literal, exact string.

Parsing the argument list

The next part to match is the argument list, which consists of an opening bracket, a closing bracket to match and a number of arguments in between them. Let’s make another rule to parse this part. It may be a bit naive for now, we will improve on this later.

rule argument-list {
  '('
  .+
  ')'
}

Of course, the TOP rule will need to be expanded to include this as well.

rule TOP {
    <function-name>
    <argument-list> 
    .* 
}

Running the script will yield another line, indicating that the argument-list rule matches the entire argument list.

「user_pref("browser.startup.homepage", "https://searx.tyil.nl");」
 function-name => 「user_pref」
 argument-list => 「("browser.startup.homepage", "https://searx.tyil.nl")」

Now that we know this basic rule works, we can try to improve it to be more accurate. It would be more convenient if we could get a list of arguments out of it, and not include the brackets. Removing the brackets is the easier part, so let’s do that first. You can use the <( and )> markers to indicate where the result of the match should start and end respectively.

rule argument-list {
  '('
  <( .+ )>
  ')' 
}

You can see that the output of the script now doesn’t show the brackets on the argument-list match. Now, to make a list of the arguments, it would be easiest to create an additional rule to match a single argument, and match the , as a seperator for the arguments. We can use the % operator for this.

rule argument-list {
  '('
  <( <argument>+ % ',' )>
  ')'
}

rule argument {
  .+
}

However, when you try to run this, all you’ll see is a Nil as output.

Debugging a grammar

Grammars are quite a hassle to debug without any tools, so I would not recommend trying that. Instead, let’s use a module that makes this much easier: Grammar::Tracer. This will show information on how the Grammar is matching all the stuff. If you use Rakudo Star, you already have this module installed. Otherwise, you may need to install it.

zef install Grammar::Tracer

Now you can use it in the script by adding use Grammar::Tracer at the top of the script, before the grammar declaration. Running the script now will yield some content before you see the Nil.

TOP
| function-name
| * MATCH "user_pref"
| argument-list
| | argument
| | * MATCH "\"browser.startup.homepage\", \"https://searx.tyil.nl\");"
| * FAIL
* FAIL

Looking at this, you can see that an argument is being matched, but it’s being too greedy. It matches all characters up until the end of the line, so the argument-list can’t match the closing bracket anymore. To fix this, we must update the argument rule to be less greedy. For now, we’re just matching strings that appear within double quotes, so let’s change the rule to more accurately match that.

rule argument {
  '"'
  <( <-["]>+? )>
  '"'
}

This rule matches a starting ", then any character that is *not* a ", then another ". There’s also <( and )> in use again to make the surrounding " not end up in the result. If you run the script again, you will see that the argument-list contains two argument matches.

「user_pref("browser.startup.homepage", "https://searx.tyil.nl");」
 function-name => 「user_pref」
 argument-list => 「"browser.startup.homepage", "https://searx.tyil.nl"」
  argument => 「browser.startup.homepage」
  argument => 「https://searx.tyil.nl」

I’m ignoring the output of Grammar::Tracer for now, since there’s no problems arising. I would generally suggest just leaving in there until you’re completely satisfied with your Grammars, so you can immediately see what’s going wrong where during development.

The statement’s end

Now all there’s left to explicitly match in the TOP rule, is the statement terminator, ;. This can replace the .*, since it’s the last character of the string.

rule TOP {    
    <function-name>
    <argument-list>
    ';' 
}

The final Grammar should look like this.

grammar UserJS {
  rule TOP {
    <function-name>
    <argument-list>
    ';' 
  }

  rule function-name { 
    'user_pref' 
  } 

  rule argument-list {
    '('
     <( <argument+ % ',' )>
    ')' 
  } 

  rule argument {    
    '"'
       <( <-["]> )>
    '"'
  }

Now, the problem here is that it’s still quite naïve. It won’t deal with double quotes inside strings, not with Boolean values or integers. The current Grammar is also not capable of matching multiple lines. All of these problems can be solved, some easier than others. Come back here tomorrow to learn how!

Day 6 – Put some (GitHub) Actions in your Raku (repositories)

After being in beta for quite some time, GitHub actions were finally introduced to the general public in November 2019. They have very soon become ubiquitous, over all combined with the other release that were recently made by GitHub, the package (and container) registry.

We can put them to good use with our Raku modules. Well see how.

We could use some action

An action is a script that is triggered by an event in your repository. In principle, anything you or a program does when interacting with a repository could trigger an action. Of course, this includes git actions, which include basically pushing to the repository, but also all kinds of things happening in the repository, from changes in the wiki to adding a review to a pull request.

And what kind of things can you do? GitHub creates a container with some basic toolchains, as well as language interpreters and compilers of your choice. At the very basic level, what you have is a container where you can run a script triggered by an event.

GitHub actions reside in a YAML file places within the .github/workflows directory in your repository. Let’s go for our first one:

This script is as simple as it gets. It contains a single job, with a single step. Let’s go little by little:

  • We give it a name, “Merry Christmas”. That name will show up in your list of actions
  • on is the list of events that will trigger this action. We will just list a single event.
  • jobs is an array that will include the list of jobs that will be run sequentially.
  • Every job will have its own key in the, which will be used to refer to it (and also to store variables, more on this later), and can run in its own environment, which you have to select. We’ll take ubuntu-latest, which is a Bionic box, but there are other to choose from (more on this later).
  • A job has a series of steps, every one with a name and then a sequence of commands. run will run on whatever environment is defined in that specific step; in this case, a simple shell script that prints Merry Xmas!

Since we’ve instructed via the on command to run every time there’s a push to the repository, the tab Actions will show the result of running it, just like this. If nothing goes wrong, and how could it, since it’s simply a script, it will show green check marks and produce the result:

Merry Xmas from a GitHub Action

These steps form a kind of pipeline, and every step can produce an output or change the environment that is going to be used in the next step; that means that you can create pipe actions that just process input and produce something for an output, like this one

The first step in this action, code-named “Pre-Merry Xmas!”, declares a couple off environment variables via env. We will collate them in a single sentence. But here comes the gist of it: GitHub Actions use meta-sentences, preceded with ::, that are printed to output and interpreted as commands for the next step. In this case, ::set-env sets an environment variable.

The next step showcases the use of Python, which is another default tool in this environment; as a matter of fact, it’s included in every environment out there, together with Node; you can use it in its default version or set the version as an action variable. This step also uses a similar mechanism to set, instead of an environment variable, an output that can be used by the next step.

Unlike Python, Ruby does not have a default version available in the path; however, it’s only a matter of finding the path to it and you can use it, like here. This step also uses the output of the previous step; GHAs have contexts, in this case a step context, which can be used to access the output of previous steps. steps.greet.outputs.printd access the context of the step whose id is greet (which we declared via the id key there), and since we declared the output to be called printd, outputs.printd will retrieve the output by that name. Contexts are not available from within the action environment, which is why we need to assign it first to an environment variable. Output will look like this, and it will use green check marks, as well as reveal the output in the raw log and if you click on the step name.

If you are a long-term Perl use like I am, you will miss that. Ruby, Python, Node, popular languages, fair enough. But Perl is in the base Ubuntu 16.04 install. Even if we can use that environment, it seems to have been eliminated from there. Where do we have to go to use Perl? To the Windows environments. Let’s use it to create a polite bot that greets you when you create or edit an issue:

Check out first the on command, that is set to be fired every time an issue is created, edited or assigned a milestone, an action that, for some reason, is called being milestoned.

This lawn has been milestoned

The main difference you see above is the presence of the windows-latest as the environment this action will be run on. But next we see another nice things of actions: they can be simply published in GitHub, and can be reused. This checkout action does what it says: checks out the repo code, which is not available by default. We are not really going to run any check on the code, but we need the little Perl script we’ve created. More on this later.

The next step is the one that actually will operate when an issue is created, changed or, wait for it, milestoned. We declare two different environment variables: one will be used to comment on issues that don’t mention “Merry”, the other if they do. But the nice thing comes next: we can work with the issue body, which is available as a context variable: github.event.issue.body. The next variable is the magic key that opens the door to the GitHub API. No need to upload it or anything, it will be there ready for you, and GitHub will keep track of it and hide it wherever it appears. We will also need the issue number to comment on it, and we store it in the $ISSUE variable.

Let’s next run the action. We will use the fantastic Perl regexes to check for the presence of the word Merry in the body, using this mini-script:

print( ( ($ENV{BODY} =~ /Merry/) == 1)? $ENV{GREETING} : $ENV{HEY});

The next few PowerShell commands are, by far, the most difficult part of this article.

We run the script so that we capture, and store, the result in a variable. And the next commands create PowerShell hashes, and $body is converted to JSON. By using Invoke-RestMethod we use GitHub API to create a comment with the greetings in the issue that was milestoned or any or the other stuff.

Issue commented and milestoned

As the image above shows, couple of comments: one when it was created and the other, well, check the image.

However, last time we checked this was a Raku Advent Calendar, right? We want our Raku!

Using Raku in GitHub actions

Last time I checked, Raku was not among the very limited number of languages that are available in any of the environments. However, that does not mean we cannot use it. Actions can be upgraded with anything that can be installed, in the case of Windows using Chocolatey (or downloading it via curl or any other command). We’ll also use it to run a real test. Dummy, but real. All actions actually either succeed or fail; you can use that for carrying out tests. Check out this action:

Which is testing using this script:

The regex here uses the Raku syntax to perform more or less the same thing that the previous Perl script did, but let’s focus on the action above. It runs three PowerShell commands, one of them using Chocolatey to install Rakudo Star, and then set the command path and refresh it so that it can be used in the last command, the usual zef test . that actually runs the tests.

Rakudo Star has not been updated since March; a new update is coming very soon, but meanwhile, the combination Windows/GitHub Actions/Rakudo is not really the best way to go, since the bundled zef version is broken and can’t be updated from within a GitHub action.

This test takes quite a while; you have to download and install Raku every single time, plus it does not work if you need to install any additional module. Fortunately, there are many more ways to do it. Meet the Raku container.

Using dockerized actions

GitHub actions can be created in two different environments. One of them is called node12, and can actually run any operating system, the other is docker, which is Linux exclusive.

These containers will be built on the run and then executed, with commands executed directly inside the container. By default, the ENTRYPOINT of the container will be run, as usual. Previously, we have used actions/checkout for checking out the repository; these official actions can be complemented with our own; in this case, we will use the Raku container action which you can also check out in the Actions markecplace.

This action basically contains a Dockerfile, this one:

This Dockerfile does little more than establish the system PATH and an entry point that can be used for testing. It does not have anything that is Action-specific.

It uses the very basic Alpine Raku container, which is the basis for a whole series of Raku testing containers.

But again, let’s go back to where the action is, that is, er, the action.

Sweet and simple, right?

Yes, I couldn’t help but call the test for the Advent Calendar AdvenTest.

It checks out the repository using the official checkouting action, and then runs the test, which is the default command in the Dockerfile that is created in that action. It would also install ecosystem dependencies, if there were any.

How long does this one take? Just short of 30 seconds, or one quarter of what the other one took.

Tell me more!

GitHub actions are a world of possibilities (and occasionally, also a world of pain). Containerized tools mean that you will be able to work on the repository and the world at large using your favorite language, that is, Raku, starting actions from any kind of events, interactive or periodical; for instance, you could schedule tests every week, or start deployments when tests have been cleared.

If you liked CI tools such as Travis or CircleCI, you will love GitHub actions. Put them to good use in your Raku repositories.