GMeta
A Generic Formal Metatheory Framework
for First-Order Representations
GMeta is a generic framework for first-order representations of variable binding that provides once and for all many of the so-called infrastructure lemmas and definitions required in mechanization of formal metatheory. The framework employs datatype-generic programming and modular programming techniques to provide a universe representing a family of datatypes. This universe is generic in two different ways:
- Language-generic: The universe is language-generic in the sense that several object languages can be represented in the universe;
- Representation-generic: The universe is representation-generic in the sense that the particular choice of first-order representations (for example, locally nameless or de Bruijn representations) is parameterizable. This form of generic programming is based on modular programming in the style of ML-modules.
Using this universe, several libraries providing generic infrastructure lemmas and definitions are implemented. These libraries are then used in case studies based on the POPLmark challenge, showing that dealing with challenging binding constructs, like the ones found in System System F with subtyping (F<:), is possible with GMeta. All of GMeta's generic infrastructure is implemented in Coq, ensuring the soundness of the generic infrastructure. Furthermore, due to GMeta's modular design, the libraries can be easily used, extended and customized by end users.
GMeta Library
The figure below shows an overview of the modular structure of GMeta. The structure is hierarchical, with the more general modules at the top and the more specific modules at the bottom:
- DGP Layer: The core DGP infrastructure is defined at the top-most layer. The main component is a universe that acts as a generic language that the lower-level modules use to define the infrastructure lemmas and definitions.
- Representation Layer: This layer is where the generic infrastructure lemmas and definitions for particular first-order representations are defined. GMeta currently supports locally nameless and de Bruijn representations, and we are planning to support more representations in the future.
- Isomorphism Layer: This layer provides simple module signatures for isomorphisms that serve as interfaces between the object language and its representation in the generic language. The adequacy of the object language representation follows from the isomorphism laws.
- Templates Layer: This layer provides templates for the basic infrastructure lemmas and definitions required by particular meta-theoretical developments. Templates are ML-style functors parametrized by isomorphisms between the syntactic sorts of object language and their corresponding representations in the generic language.
- End User Layer: End users will use GMeta's libraries to develop metatheory for particular object languages: the simply typed lambda calculus (STLC) or the System F<: are used in our case studies.
Library Codes
DGP Layer | DGP_Core | Source | HTML |
---|---|---|---|
Representations Layer |
LNameless_Meta | Source | HTML |
LNameless_Meta_Env | Source | HTML | |
deBruijn_Meta | Source | HTML | |
deBruijn_Meta_Env | Source | HTML | |
Isomorphisms Layer |
LNameless_Isomorphism | Source | HTML |
deBruijn_Isomorphism | Source | HTML | |
Templates Layer |
LN_Template | Source | HTML |
LN_Template_One_Sort | Source | HTML | |
LN_Template_Two_Sort | Source | HTML | |
dB_Template | Source | HTML | |
dB_Template_Two_Sort | Source | HTML |
Isomorphism Generator
Download
Installation
There is a "build.sh" file in the directory. To build the generator
simple execute that file:
> ./build.sh
Now you should have a "genIsos" executable in your directory.
- Either copy it to the main directory of your coq development
- Put genIsos somewhere in your PATH
Papers
- GMeta: A Generic Formal Metatheory Framework for First-Order Representations. Gyesik Lee, Bruno C. d. S. Oliveira, Sungkeun Cho, and Kwangkeun Yi. ESOP 2012: The 22nd European Symposium on Programming.
Tutorial
Here we give an explanation how to formalize the metatheory of the simply typed lambda calculus (STLC) using GMeta. The formalization can be done in four steps.
- Defining Syntax: The first step is to define the syntax for the object language
that is going to be formalized, using one of the first-order representations
supported by GMeta.
- In case of STLC, the sets of pseudo-types and pseudo-terms should be defined by the user. Cf. 1. step in Syntax or in Isomorphism.
- Defining isomorphisms for interfacing with GMeta: The second step is
to provide annotations to guide the isomorphism generation tool
to produce the isomorphism modules automatically. Cf. 1. and 2. step
in Syntax.
- The isomorphisms can be created manually. Cf. 2. step in Isomorphism.
- Importing the infrastructure: After having generated the isomorphism modules,
we can use them to instantiate some of GMeta infrastructure
templates, and import these to obtain the desired infrastructure
lemmas and definitions.
- In case of STLC, if the user chooses to use the locally nameless approach for the formalization, the template in LN_Template.v can be used. Cf. 3. step in Formalization.
- Metatheory formalization: Now the user can proceed with the
formalization using the imported infrastructure.
- In case of STLC, cf. 4. step in Formalization.
Formalization of STLC using the locally nameless approach
STLC | Syntax | Source | HTML |
---|---|---|---|
Isomorphism | Source | HTML | |
Formalization | Source | HTML |
Case Studies
In order to verify the effectiveness of GMeta in reducing the infrastructure overhead, we conducted case studies using locally nameless and de Bruijn representations.
Infrastructure
GMeta can reduce the infrastructure overhead because it provides reuse of boilerplate definitions and lemmas. By boilerplate we mean the following:- TYPE 1. Common operations and lemmas: Operations such as free and bound variables, term size or substitution-like operations (such as substitutions for parameters and variables in the locally nameless style; or shifting in the de Bruijn style) are of this type. Also, basic lemmas about the operations (such as several forms of permutation lemmas about substitutions, or lemmas involving some simple conditions about variables) fall in this category.
- TYPE 2. Lemmas involving unary inductive relations (predicates): When formalizing metatheory it is common to use inductive relations. For example several different forms of well-formedness are usually described in this way. For unary inductive relations, we can use the predicate form instead to provide the associated lemmas in GMeta. An example is the closed terms predicate.
- TYPE 3. Lemmas involving general inductive relations: General inductive relations can be dealt in GMeta with a corresponding generic inductive relation. There is a cost in using these generic inductive relations because they require a mapping between the concrete relation defined directly over the object language and the generic relation, so they are not as easy to reuse as type 1 and 2. The typical example is well-formed terms in an environment.
Setup
For each case study, GMeta was employed in two different ways, according to how much the user needs to know about GMeta and DGP in order to successfully achieve reuse:- Basic approach: In the basic approach, the end-user is required to know about the interfaces of GMeta's templates and a few simple tactics that automatically simplify the templates definitions into a form similar to the hand-written versions of those definitions. Occasionally the user may also need to apply the adequacy lemmas directly. However, the user does not need to know about DGP in order to use GMeta.
- Advanced approach: In the advanced approach, it is possible to obtain additional reuse by manually providing a mapping between concrete definitions of well-formed environments and corresponding generic notions defined in the GMeta libraries. This requires some knowledge about DGP and some more tactics in GMeta.
Results
The case studies measure the number of definitions and lemmas that have been saved (both the total number and the number of boilerplate definitions and lemmas). The results are shown below and presented in terms of the percentage of definitions and lemmas saved.
-
Case study using locally nameless approach (Reference is POPLmark solutions by Aydemir et al. (2008)).
Definitions Infrastructure
(lemmas +
definitions)Core
(lemmas +
definitions)Overall inf. overhead total ratio STLC Aydemir et al. 11 13 + 3 4 + 0 17 31 55% GMeta basic 7 4 + 0 4 + 0 1 15 7% System F<: Aydemir et al. 20 48 + 7 17 + 1 60 93 65% GMeta basic 13 26 + 1 17 + 1 25 58 43% GMeta advanced 11 15 + 0 18 + 1 11 45 24% The numbers in bold face are the numbers that were presented by Aydemir et al. (2008) - Case study using de Bruijn indices (Reference is POPLmark solutions
by Vouillon).
Definitions Infrastructure
(lemmas +
definitions)Core
(lemmas +
definitions)Overall inf. overhead total ratio System F<: Vouillon 27 24 + 0 50 + 0 41 101 41% GMeta basic 12 1 + 0 52 + 0 3 65 5% The reason why de Bruijn style, even in the basic case, looks better in saving the boilerplate is twofold:
- There is no need for well-formedness definitions,
hence no need for equivalence proofs between generic and concrete ones.
Note that most of the 11 lemmas/definitions in the boilerplate part of the GMeta advanced line for System F<: using locally nameless approach are related to well-formedness (locally closedness) of terms/types.
- The well-formedness of types in an environment in de Brujin style can be
defined by a very simple recursive function while in the locally nameless approach
a relatively complicated, inductively defined predicates is necessary.
And such a definition involves many auxiliary lemmas.
Note that most of the 14 (= 25 - 11) lemmas/definitions in the boilerplate part of the GMeta basic line for System F<: using locally nameless approach are related to well-formedness of terms/types in an environment.
Except for these two points, almost same boilerplate part is saved in both representation styles.
- There is no need for well-formedness definitions,
hence no need for equivalence proofs between generic and concrete ones.
Case studies using locally nameless approach
STLC | Isomorphism* | Source | HTML |
---|---|---|---|
Basic | Definitions | Source | HTML |
Infrastructure | Source | HTML | |
Soundness | Source | HTML |
System F<: | Isomorphism* | Source | HTML |
---|---|---|---|
Basic | Definitions | Source | HTML |
Infrastructure | Source | HTML | |
Soundness | Source | HTML | |
Advanced | Definitions | Source | HTML |
Infrastructure | Source | HTML | |
Soundness | Source | HTML |
Case studies using de Bruijn indices
System F<: | Isomorphism | Source | HTML |
---|---|---|---|
Basic | Main | Source | HTML |
* Isomorphism can be generated automatically.
Download
The whole libraries including case studies can be downloaded from here. Note that the libraries are implemented using the Coq version 8.2pl1. We are going to update the libraries to the new Coq versions. (Uncompress the tar file and use "make" to compile the whole library. )GMeta 2
We also have a experimental and incomplete version (both in Agda and in Coq) of GMeta that uses a more expressive universe based on:Morris, P. and Thorsten Altenkirch and Conor McBride, Exploring the regular tree types, In Types for Proofs and Programs, 2004
The main purpose is just to illustrate that more expressive universes are essentially orthogonal to the main ideas of GMeta. You can find the Agda version here and the Coq version here.People
- Gyesik Lee (Seoul National University, Korea)
- Bruno C. d. S. Oliveira (Seoul National University, Korea)
- Sungkeun Cho (Seoul National University, Korea)
- Kwangkeun Yi (Seoul National University, Korea)