Day 7 – Parsing Firefox’ user.js with Raku

One of the simplest way to properly configure Firefox, and make the configurations syncable between devices without the need of 3rd party services, is through the user.js file in your Firefox profile. This is a simple JavaScript file that generally contains a list of user_pref function calls. Today, I’ll be showing you how to use the Raku programming language’s Grammars to parse the content of a user.js file. Tomorrow, I’ll be expanding on the basis created here, to allow people to programmatically interact with the user.js file.

The format

Let’s take a look at the format of the file first. As an example, let’s use the startup page configuration setting from my own user.js.

user_pref("browser.startup.homepage", "https://searx.tyil.nl");

Looking at it, we can deconstruct one line into the following elements:

  • Function name: in our case this will almost always be the string user_pref;
  • Opening bracket;
  • List of arguments, seperated by ,
  • Closing bracket;
  • A ; ending the statement.

We can also see that string arguments are enclosed in ". Integers, booleans and null values aren’t quoted in JavaScript, so that’s something we need to take into account as well. But let’s set those aside for now, and first get the example line parsed.

Setting up the testing grounds

I find one of the easiest ways to get started with writing a Grammar is to just write a small Raku script that I can execute to see if things are working, and then extend the Grammar step by step. The starting situation would look like this.

grammar UserJS {
  rule TOP { .* }
}

sub MAIN () {
  my @inputs = ('user_pref("browser.startup.homepage", "https://searx.tyil.nl");');

  for @inputs {
    say UserJS.parse($_);
  }
}

Running this script should yield a single Match object containing the full test string.

「user_pref("browser.startup.homepage", "https://searx.tyil.nl");」

The and markers indicate that we have a Match object, which in this case signifies that the Grammar parsed the input correctly. This is because the placeholder .* that we’re starting out with. Our next steps will be to add rules in front of the .* until that particular bit doesn’t match anything anymore, and we have defined explicit rules for all parts of the user.js file.

Adding the first rule

Since the example starts with the static string user_pref, let’s start on matching that with the Grammar. Since this is the name of the function, we’ll add a rule named function-name to the grammar, which just has to match a static string.

rule function-name {
  'user_pref'
}

Next, this rule needs to be incorporated with the TOP rule, so it will actually be used. Rules are whitespace insensitive, so you can re-order the TOP rule to put all elements we’re looking for one after another. This will make it more readable in the long run, as more things will be tacked on as we continue.

rule TOP {
    <function-name>
    .* 
}

Running the script now will yield a little more output than before.

「user_pref("browser.startup.homepage", "https://searx.tyil.nl");」
 function-name => 「user_pref」

The first line is still the same, which is the full match. It’s still matching everything, which is good. If it didn’t, the match would fail and it would return a Nil. This is why we keep the .* at the end.

There’s an extra line this time, though. This line shows the function-name rule having a match, and the match being user_pref. This is in line with our expectations, as we told it to match that literal, exact string.

Parsing the argument list

The next part to match is the argument list, which consists of an opening bracket, a closing bracket to match and a number of arguments in between them. Let’s make another rule to parse this part. It may be a bit naive for now, we will improve on this later.

rule argument-list {
  '('
  .+
  ')'
}

Of course, the TOP rule will need to be expanded to include this as well.

rule TOP {
    <function-name>
    <argument-list> 
    .* 
}

Running the script will yield another line, indicating that the argument-list rule matches the entire argument list.

「user_pref("browser.startup.homepage", "https://searx.tyil.nl");」
 function-name => 「user_pref」
 argument-list => 「("browser.startup.homepage", "https://searx.tyil.nl")」

Now that we know this basic rule works, we can try to improve it to be more accurate. It would be more convenient if we could get a list of arguments out of it, and not include the brackets. Removing the brackets is the easier part, so let’s do that first. You can use the <( and )> markers to indicate where the result of the match should start and end respectively.

rule argument-list {
  '('
  <( .+ )>
  ')' 
}

You can see that the output of the script now doesn’t show the brackets on the argument-list match. Now, to make a list of the arguments, it would be easiest to create an additional rule to match a single argument, and match the , as a seperator for the arguments. We can use the % operator for this.

rule argument-list {
  '('
  <( <argument>+ % ',' )>
  ')'
}

rule argument {
  .+
}

However, when you try to run this, all you’ll see is a Nil as output.

Debugging a grammar

Grammars are quite a hassle to debug without any tools, so I would not recommend trying that. Instead, let’s use a module that makes this much easier: Grammar::Tracer. This will show information on how the Grammar is matching all the stuff. If you use Rakudo Star, you already have this module installed. Otherwise, you may need to install it.

zef install Grammar::Tracer

Now you can use it in the script by adding use Grammar::Tracer at the top of the script, before the grammar declaration. Running the script now will yield some content before you see the Nil.

TOP
| function-name
| * MATCH "user_pref"
| argument-list
| | argument
| | * MATCH "\"browser.startup.homepage\", \"https://searx.tyil.nl\");"
| * FAIL
* FAIL

Looking at this, you can see that an argument is being matched, but it’s being too greedy. It matches all characters up until the end of the line, so the argument-list can’t match the closing bracket anymore. To fix this, we must update the argument rule to be less greedy. For now, we’re just matching strings that appear within double quotes, so let’s change the rule to more accurately match that.

rule argument {
  '"'
  <( <-["]>+? )>
  '"'
}

This rule matches a starting ", then any character that is *not* a ", then another ". There’s also <( and )> in use again to make the surrounding " not end up in the result. If you run the script again, you will see that the argument-list contains two argument matches.

「user_pref("browser.startup.homepage", "https://searx.tyil.nl");」
 function-name => 「user_pref」
 argument-list => 「"browser.startup.homepage", "https://searx.tyil.nl"」
  argument => 「browser.startup.homepage」
  argument => 「https://searx.tyil.nl」

I’m ignoring the output of Grammar::Tracer for now, since there’s no problems arising. I would generally suggest just leaving in there until you’re completely satisfied with your Grammars, so you can immediately see what’s going wrong where during development.

The statement’s end

Now all there’s left to explicitly match in the TOP rule, is the statement terminator, ;. This can replace the .*, since it’s the last character of the string.

rule TOP {    
    <function-name>
    <argument-list>
    ';' 
}

The final Grammar should look like this.

grammar UserJS {
  rule TOP {
    <function-name>
    <argument-list>
    ';' 
  }

  rule function-name { 
    'user_pref' 
  } 

  rule argument-list {
    '('
     <( <argument+ % ',' )>
    ')' 
  } 

  rule argument {    
    '"'
       <( <-["]> )>
    '"'
  }

Now, the problem here is that it’s still quite naïve. It won’t deal with double quotes inside strings, not with Boolean values or integers. The current Grammar is also not capable of matching multiple lines. All of these problems can be solved, some easier than others. Come back here tomorrow to learn how!

5 thoughts on “Day 7 – Parsing Firefox’ user.js with Raku

  1. Tyil, very helpful, thanks! I’ve always appreciated the power of Raku grammars, but have always been reluctant to use them for “simple” projects. But your demo shows me that I could usually get better results sooner if I went your route instead of my usual hack and test…

    Liked by 1 person

  2. Great stuff!

    There appears to be a couple places where the ‘<‘ or ‘>’ aren’t escaped properly so not appearing though. Either that or my browser is being crap.

    Like

Leave a Reply to Tom Browder (@tbrowder) Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: