Parser combinators is such a cool concept, very nice tool in your toolkit if you are working on external DSLs. I've been playing with them a little bit recently. Combining different parsers using higher order functions is fun, especially if you are using Scala.
Parser combinators are provided as a library in Scala over the core language. Let's use an example to walk through the details ...
Problem: HTTP's accept header provides a way for the client to request what content-type it prefers. For this exercise let's just parse the header into a list of
AcceptHeader objects. Sorting the list based on the quality factor (or q value) is trivial and not related with this discussion, so skipping.
According to the HTTP specification the grammar for the Accept header is --
Accept = "Accept" ":" #( media-range [ accept-params ] )
media-range = ( "*/*"
| ( type "/" "*" )
| ( type "/" subtype )
) *( ";" parameter )
accept-params = ";" "q" "=" qvalue *( accept-extension )
accept-extension = ";" token [ "=" ( token | quoted-string )
Example of such a header is:
application/html;q=0.8, text/*, text/xml, application/json;q=0.9
The goal now is to parse the header value into a list of
AcceptHeader objects, where an
AcceptHeader is defined as a case class:
See below for a possible approach on parsing the accept header using the combinator technique:
Note: I did not implement accept-extension defined in the specification's grammar in this example.
- First look at the
lazy val acceptEntry:
mediaType < ~ "/"indicates that result of parsing slash ("/") is irrelevant and only carry forward the result on the left (that's of the mediaType).
~is a method in the Parsers trait that stands for sequential combinator.
optmethod stands for optional value for quality factor (q).
^^is a method in the Parsers trait -- it has a parser on the left and a function on it's right (that's doing some case matching, in this case). If the parsing effort on the left is successful it applies the function on the right to that parse result.
- The subsequent lines expand and define each of the parsers defined in
- regex is defined for media type and subtype allowing for alpha-numeric, hyphen (-) and asterisk(*) values
- For qualityFactor:
";" ~> "q" ~> "=" ~> floatingPointNumber-- ignores all the parsed results on the left as we are only interested in knowing what the value of q is, which is defined as a
- Now jump back to the first line which says accept is
rep1sepis a method in the Parsers trait. We are saying that the accept entry will repeat one or more times, and each entry is separated by a comma (",)
You may test the functionality via
List(AcceptHeader(application,html,0.8), AcceptHeader(text,*,1.0), AcceptHeader(text,xml,1.0), AcceptHeader(application,json,0.9))
We just scratched the surface here. Debasish Ghosh's DSLs in Action dedicated a chapter for parser combinators, which helped me quite a bit in furthering my understanding. (Highly recommend Ghosh's book if you are contemplating about implementing DSLs).