(Un)Parsing in a Broad Sense
Having multiple representations of the same instance is common in software language engineering: models can be visualised as graphs, edited as text, serialised as XML. When mappings between such representations are considered, many terms are used with incompatible meanings and varying sets of underlying assumptions:
- parsing
- tokenising
- stripping
- concatenating
- imploding
- exploding
- unparsing
- printing
- pretty-printing
- formatting
- visualising
- rendering
- recognising
- …
This is the (mega)model of 12 classes of artefacts found in software language processing (dotted lines denote mappings that rely on either lexical or syntactic definitions; solid lines denote universally defined mappings. The loops are examples of transformations):
It can be used to systematically explore the technical research space of bidirectional mappings to build on top of the existing body of work and discover as of yet unused relationships.
Artefacts
- STR
- a string, a file, a byte stream
- LEX
- a finite sequence of untyped strings (called lexemes) which, when concatenated, yields STR; includes spaces, line breaks, comments, etc — collectively, layout
- TOK
- a finite sequence of typed tokens, possibly with layout removed, some classified as numbers, strings, etc
- REG
- a hierarchical source model constructed with regular means but adding grouping to typing; in fact a possibly incomplete tree connecting most tokens together in one structure
- FOR
- a forest of parse trees, a parse graph or an ambiguous parse tree with sharing; a tree-like structure that models STR according to a syntactic definition, some collection of possible syntactic interpretations of STR
- PTR
- an unambiguous parse tree where the leaves can be concatenated to form STR
- CST
- a parse tree with concrete syntax information; structurally similar to PTR, but abstracted from layout and other minor details; comments could still be a part of the CST model, depending on the use case
- AST
- a tree which contains only abstract syntax information, the ultimate enriched intermediate representation for language processing
- PIC
- a picture, which can be an ad hoc model, a hand-drawn “natural model” or a rendering of a formal model
- DRA
- a model of a drawing, expressed in terms of visual primitives, their sizes, coordinates and other configurable attributes, a drawing in the sense of GraphML or SVG, or a metamodel-indepenent syntax but metametamodel-specific syntax like OMG HUTN
- GRA
- a graphical representation of a model (not necessarily a tree), abstracted from concrete visualisation details, mentioning at most layout grid strategies, a “boxes and arrows” model like those used in Graphviz tools
- DIA
- a diagram, an abstract graphical model with an explicit advanced metamodel and a known visual notation, like a UML/EMF diagram
Publications / presentations
-
Vadim Zaytsev, Anya Helene Bagge.
Parsing in a Broad Sense,
MoDELS’14, LNCS 8767, 2014.
(slides)
(the MAIN paper on this topic)
-
Vadim Zaytsev.
grammarware/bx-parsing
, GitHub, March 2014.
(if you like to see some code (in Rascal))
-
Paul Klint, Ralf Lämmel, Chris Verhoef.
Toward an Engineering Discipline for Grammarware, ACM ToSEM, 2005.
(introduces the term “grammars in a broad sense”)
-
Vadim Zaytsev.
Case Studies in Bidirectionalisation, TFP, 2014. Extended abstract.
(slides)
(a more FP/BX targeted piece)
-
Vadim Zaytsev, Anya Helene Bagge.
Modelling Parsing and Unparsing, Parsing@SLE, 2014.
(slides)
(a more technical piece targeted for parsing researchers)
-
Anya Helene Bagge, Ralf Lämmel, Vadim Zaytsev.
Reflections on Courses for SLE, EduSymp, 2014.
(slides)
(contains the megamodel, presents how it was used in teaching)
-
Anya Helene Bagge, Vadim Zaytsev.
Languages, Models and Megamodels: A Tutorial, SATToSE, 2014.
(slides)
(uses the megamodel as one of the examples)
-
Vadim Zaytsev.
Understanding Metalanguage Integration by Renarrating a Technical Space Megamodel, GEMOC’14, 2014.
(slides)
(argues how megamodels are useful in teaching, uses this megamodel as an example)