ANTLR Creole Parser infraestructure
Company Blogs June 21, 2011 By Miguel Ángel Pastor Olivar
This is the first entry on my Liferay blog so, first of all, I would like to introduce myself. My name is Miguel Pastor and I have only been working four months in Liferay (since March 2011) but I'm very happy to be part of this incredible team.
In my first blog entry I would like to describe the general ideas of my first "contribution": The new Creole Parser infrastructure. Through this post, we will see an overview of the main architecture's components and some of our main ideas that could be introduced in a future.
The parser infrastructure is built on top of the main techniques used to build programming languages. The common flow of a typical parsing process would be the following:
- Parse the creole source code.
- The result of the previous parsing process is an Abstract Syntax Tree (AST)
- The previous AST is traversed by different visitors (right now there is some semantic validation visitors and a XHTML translation visitor)
In the following sections we are going to dive deeper inside some of the previous components
The parsing process
The parser is built with the invaluable help of ANTLR 3. For those whose don't know ANTLR, it is a tool that provides a framework for construction recognizers, interpreters, compilers, and translators from grammatical descriptions (LL(k)).
The grammar definition includes the needed actions to build the proper abstract syntax tree (see next section) from creole source code. This grammar is the responsible for validating the source's syntax and building the AST
Is someone is interested you could take a look at the grammar definition.
Abstract Syntax Tree
This kind of structure is usually used in compilers in order to represent the abstract structure of a program. Next phases of the compiler will perform multiple operations over this structure (usually using the Visitor Pattern).
The next figure shows a partial view of the hierarchy used in the AST representation (Composite Pattern)
Imagine for a moment that we have the following creole source code
= Header 1
== Header 2
[[http://www.google.com|Link to Google]]
=== Header 3
The abstract representation of the previous source code looks like something similar to this:
Code Generation and wiki engine
Once the previous structure has been built, a bunch of visitors could traverse it in order to do some work. The main feature implemented right now is code generation and link extraction:
- XHtmlTranslationVisitor offers the basic funcionality to traverse the AST and generate the XHTML code (generic behaviour). In order to integrate with the wiki engine infrastructure already built in Liferay there is a XHtmlTranslator class (extending the previous one) conducting Liferay's particularities like link generation or table of contents.
- LinkNodeCollectionVisitor allows us to extract all the nodes who represents a link in the original source code.
Following the previous patterns we could add a new visitor class to traverse the AST structure and perform whatever we want.
Using the previous pattern, i have already two main ideas in order to improve our current parser's infrastructure in order to allow extensions by external contributions:
- The first one is creating a traversable structure in order to inject multiple visitor. Using this mechanism we could add new visitors in order to add new functions to our current parser. For example, imagine that we want to translate the creole code to XML instead of XHTML
- The second one is adding extensions (similar to TableOfContents). At this moment the grammar allows (needs a little hacking) to include new terms in our Creole code using this syntax @@new term@@. This extension will be available at the AST in a node of type ExtensionNode (it does not exist right now so the final name could be different) and therefore, the visitor interface will have a method to deal with this kinds of nodes.
The above ideas has not been implemented yet but it should not be too complex :).
I'd love to hear your comments!