Wednesday, 22 June 2011
Simple, functional, sound and complete parsing for all context-free grammars
Parsers for context-free grammars can be implemented directly and naturally in a functional style known as ``combinator parsing'', using recursion following the structure of the grammar rules. However, naive implementations fail to terminate on left-recursive grammars, and despite extensive research the only complete parsers for general context-free grammars are constructed using other techniques such as Earley parsing. Our main contribution is to show how to construct simple, sound and complete parser implementations directly from grammar specifications, for all context-free grammars, based on combinator parsing. We then construct a generic parser generator and show that generated parsers are sound and complete. The formal proofs are mechanized using the HOL4 theorem prover. Memoized parsers based on our approach are polynomial-time in the size of the input. Preliminary real-world performance testing on highly ambiguous grammars indicates our parsers are faster than those generated by the popular Happy parser generator.
Keywords: verified, mechanized, formal, parsing, sound, complete, functional, combinator
[ridge11parsing-cpp.pdf] [additional material]