Silly Formatting

January 31, 2020
software rust parsers sillyfmt

In my time at Dropbox, I found myself fairly often trying to quickly read through “traces”, which were essentially developer-formatted text logged from the Dropbox desktop client. The Dropbox client was originally written back before structured logging was common in industry, and in any case setting up structured logs can add a fair bit of overhead to the day-to-day debugging lifecycle.

After spending an inordinate amount of time opening complicated traces in vim and using the rudimentary formatting options available there, plus a bunch of manual labor, I eventually decided to save myself a lot of time by looking for a formatter that could automate the work away.

Unfortunately, it looks like the vast majority of text formatters available on the internet are designed for pretty specific input data, i.e. valid code of a particular format. This wasn’t suitable for my purposes, since I didn’t want to extract the code-like bits from the surrounding text, and especially since there was an unfortunate tendency for people to format structures into strings and then escape them (which breaks pretty much every parser I know of).

So, I built a really hacky script called sillyfmt, which (as is implied by the name) did the absolute bare minimum necessary to output formatted text. The original version didn’t even have a parser… just a half-broken lexer and some text formatting code.

The results looked something like this (using some minified JSON fragments as input):

$ sillyfmt
[{"_id":"5e345fc4179ff645f74b0c61","index":0,"guid":"81e5ad0e-2071-4d44-8720-7f02468cdadf","isActive":false,"balance":"$3,701.06","picture":"http://placehold.it/32x32","age":30,"eyeColor":"green","name":"Earnestine Bender","gender":"female","company":"EXOVENT","email":"[email protected]","phone":"+1 (882) 427-2769","address":"876 Homecrest Court, Hall, Washington, 6511","about":"Aute dolor aute nostrud reprehenderit non commodo aliquip enim. Esse ad proident dolor exercitation laborum est labore est non Lorem adipisicing. Nulla ullamco id mollit proident.\r\n","registered":"2014-10-30T02:45:54 +07:00","latitude":17.48696,"longitude":167.668504,"tags":["adipisicing","eiusmod","culpa","dolor","duis","dolore","magna"],"friends":[{"id":0,"name":"Chandler Robinson"},{"id":1,"name":"Herrera Hess"},{"id":2,"name":"Elva Glass"}],"greeting":"Hello, Earnestine Bender! You have 10 unread messages.","favoriteFruit":"banana"},{"_id":"5e345fc401c64bb893ffe75b","index":1,"guid":"ff35ffeb-2a96-4c71-9f04-a624c3163
[
  {
    "_id":"5e345fc4179ff645f74b0c61" ,
    "index":0 ,
    "guid":"81e5ad0e-2071-4d44-8720-7f02468cdadf" ,
    "isActive":false ,
    "balance":" $ 3 ,
    701.06" ,
    "picture":"http://placehold.it/32x32" ,
    "age":30 ,
    "eyeColor":"green" ,
    "name":"Earnestine Bender" ,
    "gender":"female" ,
    "company":"EXOVENT" ,
    "email":"earnestinebender @ exovent.com" ,
    "phone":" + 1 ( 882 ) 427-2769" ,
    "address":"876 Homecrest Court ,
    Hall ,
    Washington ,
    6511" ,
    "about":"Aute dolor aute nostrud reprehenderit non commodo aliquip enim. Esse ad proident dolor exercitation laborum est labore est non Lorem adipisicing. Nulla ullamco id mollit proident.\r\n" ,
    "registered":"2014-10-30T02:45:54 + 07:00" ,
    "latitude":17.48696 ,
    "longitude":167.668504 ,
    "tags": [
      "adipisicing" ,
      "eiusmod" ,
      "culpa" ,
      "dolor" ,
      "duis" ,
      "dolore" ,
      "magna"
    ] ,
    "friends": [
      {
        "id":0 ,
        "name":"Chandler Robinson"
      } ,
      {
        "id":1 ,
        "name":"Herrera Hess"
      } ,
      {
        "id":2 ,
        "name":"Elva Glass"
      }
    ] ,
    "greeting":"Hello ,
    Earnestine Bender ! You have 10 unread messages." ,
    "favoriteFruit":"banana"
  } ,
  {
    "_id":"5e345fc401c64bb893ffe75b" ,
    "index":1 ,
    "guid":"ff35ffeb-2a96-4c71-9f04-a624c3163

Early on, I’d decided that this should be a pretty Unix-y tool, so it takes input on STDIN and writes the formatted text to STDOUT. For interactive use, it formats on every newline (sometimes good, sometimes bad).

This version was actually more than good enough for my purposes, so it stayed this way for over a year.

There were a couple issues with it:

  1. It had a hardcoded list of actionable tokens, and it was hard to add more / customize behavior
  2. It had this really annoying behavior of adding spaces at the end of every line (I tended to use it as a tool within my vim session, which shows trailing spaces)

So, when I had some time between jobs, I decided to try doing something slightly fancier and used a parser-generator.

You can see the full LALRPOP file here, but the core cases I was trying to handle:

This didn’t actually work out like I hoped. While LALRPOP is a very nice parser, error recovery of the form “try your best to make sense of the input” isn’t really in the design spec, and a grammar which is loose enough to accept all of the inputs I wanted isn’t necessarily parseable.

I ended doing some real hacky stuff to get it to parse. In particular, there’s a preprocessing path which runs through the input and prepends or appends however many delimiters are necessary to make a semi-balanced string.

For example, if the input was

[[[

It would become

[[[]]]

before getting passed to the parser. I also never really got it to the point where I was fully confident that it didn’t drop some of the input on the way out, since LALRPOP doesn’t seem to return the end of the parsed input. This is probably related to the dropped_token field in LALRPOP’s error_recovery.

The LALRPOP-variant of sillyfmt has output that looks like this:

$ sillyfmt
Hit enter twice to format, or re-run with --newline
[{"_id":"5e345fc4179ff645f74b0c61","index":0,"guid":"81e5ad0e-2071-4d44-8720-7f02468cdadf","isActive":false,"balance":"$3,701.06","picture":"http://placehold.it/32x32","age":30,"eyeColor":"green","name":"Earnestine Bender","gender":"female","company":"EXOVENT","email":"[email protected]","phone":"+1 (882) 427-2769","address":"876 Homecrest Court, Hall, Washington, 6511","about":"Aute dolor aute nostrud reprehenderit non commodo aliquip enim. Esse ad proident dolor exercitation laborum est labore est non Lorem adipisicing. Nulla ullamco id mollit proident.\r\n","registered":"2014-10-30T02:45:54 +07:00","latitude":17.48696,"longitude":167.668504,"tags":["adipisicing","eiusmod","culpa","dolor","duis","dolore","magna"],"friends":[{"id":0,"name":"Chandler Robinson"},{"id":1,"name":"Herrera Hess"},{"id":2,"name":"Elva Glass"}],"greeting":"Hello, Earnestine Bender! You have 10 unread messages.","favoriteFruit":"banana"},{"_id":"5e345fc401c64bb893ffe75b","index":1,"guid":"ff35ffeb-2a96-4c71-9f04-a624c3163

[
  {
    "_id": "5e345fc4179ff645f74b0c61",
    "index": 0,
    "guid": "81e5ad0e-2071-4d44-8720-7f02468cdadf",
    "isActive": false,
    "balance": "$3,
    701.06",
    "picture": "http: //placehold.it/32x32",
    "age": 30,
    "eyeColor": "green",
    "name": "Earnestine Bender",
    "gender": "female",
    "company": "EXOVENT",
    "email": "[email protected]",
    "phone": "+1 (882)427-2769",
    "address": "876 Homecrest Court,
    Hall,
    Washington,
    6511",
    "about": "Aute dolor aute nostrud reprehenderit non commodo aliquip enim. Esse ad proident dolor exercitation laborum est labore est non Lorem adipisicing. Nulla ullamco id mollit proident.\r\n",
    "registered": "2014-10-30T02: 45: 54 +07: 00",
    "latitude": 17.48696,
    "longitude": 167.668504,
    "tags":
    [
      "adipisicing",
      "eiusmod",
      "culpa",
      "dolor",
      "duis",
      "dolore",
      "magna"
    ],
    "friends":
    [
      {
        "id": 0,
        "name": "Chandler Robinson"
      },
      { "id": 1, "name": "Herrera Hess" },
      { "id": 2, "name": "Elva Glass" }
    ],
    "greeting": "Hello,
    Earnestine Bender! You have 10 unread messages.",
    "favoriteFruit": "banana"
  },
  {
    "_id": "5e345fc401c64bb893ffe75b",
    "index": 1,
    "guid": "ff35ffeb-2a96-4c71-9f04-a624c3163
  }
]

Note that it doesn’t have as many spurious spaces, but it does add some tokens into the output that are “fake” (i.e. the ]} at the end).

Since I was bored, I also put together a WASM variant, linked below.

WASM sillyfmt

Github

Silly Formatting 3: readline and canonical input modes

May 4, 2020
software rust sillyfmt readline unix macos

Building tractor (升级) as an online card game

April 19, 2020
software tractor shengji rust typescript react

Silly Formatting 2

March 6, 2020
software rust parsers wasm stdweb sillyfmt fuzzing