The LET Project: program analysis for
smalL/safE/smarT code
ROPAS: Research On Program Analysis System
National Creative Research Initiatives Center
Seoul National University
[Korean]
Principal investigator: Kwangkeun Yi
Goal
A completely new computing environment emerges when all the computers
around the globe are connected via a light-speed network.
All of the code for a particular program will not necessarily be ready
beforehand in a host computer. Code fragments are dispersed
across the globe and will be brought to a host via the light-speed
network just when they are needed for the host program's continuation.
In the global computing environment an extremely large number of such
code fragments will compete for providing the optimal functionality
to the hosts in need.
Our goal is to achieve compiler technologies for this global computing
environment of the future. In particular, we
will focus on the following three compilation problems for
higher-order & typed programming languages like ML:
- compiler must generate safe code:
not only must the compiler assure that the code will not damage the
host but the host
must be able to verify the established safety of the incoming code.
- compiler must generate small code:
the code size must be as
small as possible, in order to minimize the delivery cost over the
network. Compact code will move swiftly over the network, arriving at the
host faster than other competing code.
- compiler must generate smart code:
the code must be able
to tailor itself to the most common inputs that occur during its use
at the host.
Approach
Our research hypothesis is that
semantic-based static analysis
will play the key role in solving the above three compilation
problems. Static analysis is the technique of estimating the input
program's run-time properties before execution. By ``semantic-based,''
we mean that the analysis has to be based on programming
language's semantic foundations (e.g. various formalisms of semantics
specifications, type theories, computational logics and models, and
etc.). Firm foot on formal foundations of the source programming
languages will enable us to clearly understand the fundamental
structures of the static analysis problems and hence help us to avoid
unsound and ad-hoc solutions.
The semantic-based static analysis will be essential in solving the
three problems as follows:
- safe code: both establishing the code's safety by the compiler (code
provider) and verifying the established safety by the user (code
consumer) are two instances of statically checking the run-time
behaviors of programs. One at the source-level (by the compiler) and
the other at the machine-code level (by the user).
Three safety problems will be addressed by static analysis technologies:
type safety (e.g., code does not ``core dump''),
application-dependent invariant safety (e.g.,
code does not corrupt the host's heap invariants), and
resource safety (e.g., code does not consume
more than 10MBytes of the host's memory).
- small code: in order to generate as dense a code as possible for
the same functionality the program's run-time behavior have to be
analyzed. If the analysis reports that some values or
memory-effects can happen redundantly or uselessly, we have to
remove the wasteful expressions from the input program.
By static analysis we can also compare the code size and the
size of the result values. If the value size is smaller than the
code size, it would reduce the transmission cost if we ship the
value rather than the code.
- smart code: in order for a code to tailor itself
to a popular set of inputs, the code must be equipped with another code to
supervise such optimizations which will be activated when some inputs are
available. By static analysis we can generate such supervisor
code because we can foresee a run-time optimization
potentiality during the analysis. The supervisor code will activate sharper
optimizations given an information about the inputs.
Our research position is to aggressively adopt recent progress in
programming language theories into a set of practical compilation
techniques. The major thrust for promoting the potential synergy between the
language theories and compilation practices comes from our focus on
semantic-based static analysis.