My Thesis: DURA
This text has been updated on January the 3rd 2000. It is probably the last update of this page, since I finished the first version of the text yesterday evening. It will be over in a couple of month. Wow ! What a relief !
Since I have this rather tight relationship with the University, I decided three years ago to present a PhD thesis. My friends at PhiDaNi's have already started calling me Herr Doktor, but there is still quite a lot of work to be done. Originally I planned to present this thesis around february-march 1997. In practice, due to a significant overload of work, I am afraid that december 1997 might be a more realistic deadline now. Well. All in all, perhaps, december 1999 is more like it. The thesis will be about the automatic generation of parsers for very wide classes of languages, beyond the commonly accepted class of context-free languages. And the date has been set to the 18th of march 2000, unless something big happens in the meantime.
DURA has been used succefully in several projects now, including RainCode. We produce parsers for virtually all languages you've heard of, even the most bizarre. COBOL, PL/1, CSP, Natural, etc... The text is rather long (1020 pages) since it includes the entire source code of the tools in literate programming form, but this was one of the original goals of this work: not only to describe the theory, but how can it be put into production.
In order to achieve improved expressivity, DURA (that is the name of tool I am currently designing and implementing, that refers to the latin sentence "Dura Lex Sed Lex") supports improved grammatical operators such as:
DURA automatically translates such expressions to a Yacc-like form so that the LALR table algorithms available in the literature can be used. Synthetic grammar rules are created automatically.
DURA also performs other tasks, such as:
DURA is also meant to be a highly pragmatic tool, meant to be used for real-world languages and possibly huge parsers. For instance, DURA automates most of the tree-building when parsing, by identifying grammar rules and YAFL non-terminals. This identification process makes DURA grammars modular, just as YAFL happens to be modular.
I have started specifying the improved LR algorithm I use using Object-Z, an OO extension to the Z specification language. Producing a reasonably formal description of such complex stuff is a rather tedious task, but it is also very rewarding. This specification is meant to be used as reference material to explain and validate the implementation, and as basis for a formal proof that is related to the thesis, namely, the fact that the generated parser will not cycle except in two very specific situations.
The thesis's text is 1020 pages long, and there are still quite a few typos. The document's length is mostly due to the fact that large amounts of source code is included in literate programming form, using Norman Ramsey's excellent tool noweb.
If after reading all this, you still want to see the real thing, click here for the zipped Postscript version.