The LET Project: program analysis for smalL/safE/smarT code

ROPAS: Research On Program Analysis System
National Creative Research Initiatives Center
Seoul National University
[Korean]

Principal investigator: Kwangkeun Yi

Goal

A completely new computing environment emerges when all the computers around the globe are connected via a light-speed network. All of the code for a particular program will not necessarily be ready beforehand in a host computer. Code fragments are dispersed across the globe and will be brought to a host via the light-speed network just when they are needed for the host program's continuation. In the global computing environment an extremely large number of such code fragments will compete for providing the optimal functionality to the hosts in need.

Our goal is to achieve compiler technologies for this global computing environment of the future. In particular, we will focus on the following three compilation problems for higher-order & typed programming languages like ML:

compiler must generate safe code: not only must the compiler assure that the code will not damage the host but the host must be able to verify the established safety of the incoming code.
compiler must generate small code: the code size must be as small as possible, in order to minimize the delivery cost over the network. Compact code will move swiftly over the network, arriving at the host faster than other competing code.
compiler must generate smart code: the code must be able to tailor itself to the most common inputs that occur during its use at the host.

Approach

Our research hypothesis is that semantic-based static analysis will play the key role in solving the above three compilation problems. Static analysis is the technique of estimating the input program's run-time properties before execution. By ``semantic-based,'' we mean that the analysis has to be based on programming language's semantic foundations (e.g. various formalisms of semantics specifications, type theories, computational logics and models, and etc.). Firm foot on formal foundations of the source programming languages will enable us to clearly understand the fundamental structures of the static analysis problems and hence help us to avoid unsound and ad-hoc solutions.

The semantic-based static analysis will be essential in solving the three problems as follows:

safe code: both establishing the code's safety by the compiler (code provider) and verifying the established safety by the user (code consumer) are two instances of statically checking the run-time behaviors of programs. One at the source-level (by the compiler) and the other at the machine-code level (by the user). Three safety problems will be addressed by static analysis technologies: type safety (e.g., code does not ``core dump''), application-dependent invariant safety (e.g., code does not corrupt the host's heap invariants), and resource safety (e.g., code does not consume more than 10MBytes of the host's memory).
small code: in order to generate as dense a code as possible for the same functionality the program's run-time behavior have to be analyzed. If the analysis reports that some values or memory-effects can happen redundantly or uselessly, we have to remove the wasteful expressions from the input program. By static analysis we can also compare the code size and the size of the result values. If the value size is smaller than the code size, it would reduce the transmission cost if we ship the value rather than the code.
smart code: in order for a code to tailor itself to a popular set of inputs, the code must be equipped with another code to supervise such optimizations which will be activated when some inputs are available. By static analysis we can generate such supervisor code because we can foresee a run-time optimization potentiality during the analysis. The supervisor code will activate sharper optimizations given an information about the inputs.

Our research position is to aggressively adopt recent progress in programming language theories into a set of practical compilation techniques. The major thrust for promoting the potential synergy between the language theories and compilation practices comes from our focus on semantic-based static analysis.

The LET Project: program analysis for smalL/safE/smarT code

Goal

Approach

Members/Publications/Softwares