About

This repository contains the source code and the scripts for the algorithm and experiments presented in the paper: Jens Dietrich, Nicholas Hollingum, Bernhard Scholz: Giga-Scale Exhaustive Points-To Analysis for Java in Under a Minute. OOPSLA'15.

Repository Structure

Java Sources

Java source code is contained in three folders:

java/src contains the main Java packages

java/benchmarks contains the Java scripts used to run individual experiments and

java/testcases contains junit tests.

The repository contains implementations of three different algorithms:

nz.ac.massey.gp2.pointsto.TCReachability is an implementation of the algorithm presented in Dietrich, Hollingum, Scholz: Giga-Scale Exhaustive Points-To Analysis for Java in Under a Minute. OOPSLA'15.

Benchmark Data

The algorithms operate on graphs that represent the semantics of a Java program.
The repository contains representations of these graphs encoded in comma-separated file (CSV) format.datasets/dacapo9 contains the dacapo data sets.
For each program in the dacapo benchmark, there are input files (Alloc.csv,Assign.csv,Load.csv and Store.csv) as well as result data (VarPoints.csv).
The result data are computed with the "base-line" Melski-Reps algorithm (see above).

This means that for these datasets we can run experiments using these results as oracles, in particular to check for false negatives (soundness) and for false positives (precision).

Limitations

There are two components missing in this distribution:

The OpenJDK data set is missing for legal reasons.
We expect that we will be able to add this to the repository at a later stage.

The paper also contains a comparison with a LogicBlox-based implementation.
For licensing reasons, LogicBlox is not included in this repository.
However, the datalog program used is included in the repository (in lb/*.logic). A LogicBlox license can be obtained from the LogicBlox web site.

Note that the distribution submitted to the OOPSLA artifact evaluation did contain both the OpenJDK dataset and LogicBlox under a special license arrangement.

Using the Repository

Dependencies

Java JDK (Tested on: 1.7.0_67 and 1.8.0_20)

Apache Ant

optional Bourne Again Shell (BASH)

optional LogicBlox

You can verify a correct download by compiling and running our test framework:

$ cd java
$ ant test
$ <your browser> build/junit-reports/index.html

The experimental framework is designed to run in Unix-like environments.
We have tested on Ubuntu 14.10, Linux Mint 17.1, and use Lubuntu 14.10-Alternate for our expeimentation.
If you wish to use a different environemt, some manual work may be required to execute experiments and understand the output.

LogicBlox Setup

If you do not have LogicBlox, its experiments will be excluded automatically from our framework.
Assuming you have obtained a LogicBlox license, the following setup is necessary:

Running Experiments

These instructions assume you are using a properly setup environment.If these commands do not work on your machine, you can inspect the BASH scripts to determine how to emulate them.

The experimental framework is designed to reproduce our published results on your hardware.
The script run.sh performs all experimentation on a single benchmark, the scripts Table*.sh supply flags to run.sh which restricts the experiments it runs.
To list the available flags, run ./run.sh -h.
To confirm that everything is working correctly, start by running a simple analysis of the sunflow benchmark

$ ./Table1-2.sh -i datasets/dacapo9/sunflow

This prints a statistical summary of sunflow's points-to graph.
All the experimental results in our paper can be reproduced in this way.

Verifying Correctness

In our context, correctness is defined as consistency between implementations.
To this end, DaCapo datasets are destributed with the correct VarPointsTo relation included.
The script correctness.sh can be used to verify the consistency between the supplied VarPointsTo and a chosen algorithm.
The following command checks the consistency of the Transitive-Closure implementation against VarPointsTo for the Avrora benchmark:

$ ./correctness.sh tc datasets/dacapo9/avrora

Input Data

In addition to the DaCapo 2009 dataset provided, we have included a compressed version of the DaCapo 2006 benchmarks, which are commonly used in the research literature.
To use 2006 benchmarks, simply unzip them and direct the run.sh/Table*.sh script to that directory with -i <DaCapo 2006 directory>.

Our minimalistic analysis is designed to work for programs with allocation, assignment, field-load and field-store semantics.
If you wish to supply your own input data, create a directory with the following header-less CSV files:

Alloc.csv, abstract heap allocations of the form [0] = new Obj();, and [1] is an abstract-memory location for the Obj.

Assign.csv, variable assignments and casts of the form [1] = [0];.

Load.csv, field loads of the form [1] = [0].[2].

Store.csv, field stores of the form [1].[2] = [0].

Here [n] refers to the n-th data column of the file (e.g. Load.csv has 3 columns: load-base-variable,assigned-to-variable,loaded-field).
Simple example input files are included in the datasets/.testdata/* directories, which include pictures and java source code.