Introduction

Javalib is a library written in OCaml with the
aim to provide a high level representation of Java.class files. Thus it stands for a good starting
point for people who want to develop static analysis for
Java byte-code programs, benefiting from the strength of
OCaml language.

This document briefly presents the API provided by
Javalib and gives some samples of code in
OCaml.

Global architecture

All modules of Javalib are sub-modules of the package
module Javalib_pack in order to avoid possible namespace
conflicts. The user interface consists of three modules,
JBasics, JCode and Javalib. These
modules are the only ones someone should need to write static
analysis. The other ones provide lower level functions and should
not be used.

JBasics module

This module gives a representation of all the basic types that
appear at the byte-code level.

These types are:

JVM very basic types (such as int, float, double, ...)

JVM more elaborated types (arrays, objects)

class names, method signatures and field signatures

others: constant pool types and stackmaps types

The types representing class names, method signatures and field
signatures are abstract. The directives to build them are in
JBasics and are respectively make_cn,
make_ms and make_fs.

This module also provides some sets and maps containers relative
to the abstract class names, field signatures and method
signatures. These maps and sets are very efficient because they
rely on the hidden indexation of each abstract type, thereby
reducing the cost of comparison operations between types. Moreover
the maps are based on Patricia Trees implementation, that
is known to be really fast.

JCode module

This module provides a representation of the JVM opcodes and of
the code structure at the byte-code level.

It is important to notice that the code is represented by an
array of opcodes, and that the position of an opcode in this array
corresponds to its relative address in the original binary file
(this is the same numbering as javap). In
Javalib representation, an opcode includes an instruction
and its arguments. That's why we needed to introduce a dummy opcode
OpInvalid to keep the correct numbering.

Javalib module

This is the main module of Javalib which contains the
definition of classes, fields and methods types, some methods to
get a high level representation from binary classes and an internal
module JPrint to pretty-print or dump every type defined
in the three user modules.

Tutorial

To begin this tutorial, open an OCaml toplevel, for
instance using the Emacstuareg-mode.

Two standard libraries str, unix, and four
specific libraries, extlib, zip, ptrees
and javalib must be loaded. However, paths to find these
four latter librairies must be specified with the
#directory directives first:

#directory "<package_install_path>extlib"#directory "<package_install_path>camlzip"#directory "<package_install_path>ptrees"#directory "<package_install_path>javalib"(*<package_install_path> is given by command 'ocamlfind printconf'. If it is the same path than standard ocaml library just replace by '+'.*)

You can also build a toplevel including all these libraries
using the command make ocaml in the sources
repository of Javalib. This command builds an executable
named ocaml which is the result of the
ocamlmktop command.

Making class names, field signatures and method signatures

Imagine you want to access the method
m:(Ljava.lang.String;)V and the field
f:I of the class A.

You first need to build the signatures associated to each
entity. According to the Javalib API you will write:

Getting
a class representation from a binary file

The methods you need are in the Javalib module. You can
open this module cause you will need it very often.

open Javalib

Then, you need to build a class_path to specify
where the classes you want to load have to be found:

let class_path = class_path "./"(* for instance *)

You can now load the class ./A.class
corresponding to aname.

let a = get_class class_path aname

When you don't need a classpath any more, close it with
close_class_path if you don't want to get file
descriptors exceptions in a near futur.

Getting fields and
methods from a class

You now have the class a of type
Javalib.interface_or_class. You might want to recover its
method m of type Javalib.jmethod and
field f of type Javalib.any_field.

Simply do:

let m = get_method a ms
let f = get_field a fs

Note:

The methods get_method and
get_field raise the exception
Not_found if the method or field asked for can't
be found.

It's important to notice that a can be a
Class of jclass or an Interface of jinterface
(see type interface_or_class), but that the methods
get_method and get_field work
equaly on it. That's why get_field returns a value
of type any_field which can be ClassField of
class_field or Interface_field of interface_field.
Indeed, according to the JVM specification, we need to make the
distinction between interface fields and class fields.

A more sophisticated
example

Now we would like to write a function that takes a
classpath and a classname as
parameters and that returns, for each method of this class, a set
of the fields accessed for reading (instructions
getstatic and getfield).

Here is the code:

open Javalib_pack
open Javalib
open JBasics
open JCode
let get_accessed_fields (class_path : class_path)
(cn : class_name) =
(* We first recover the interface or class associated to the class name cn. *)let c = get_class class_path cn in(* Then, we get all the methods of c. *)let methods : jcode jmethod MethodMap.t = get_methods c in(* For each method of c, we associate a field set containing all the accessed fields. *)
MethodMap.map
(fun m ->
match m with(* A method can be abstract or concrete. *)
| AbstractMethod _ ->
(* An abstract method has no code to parse. *)
FieldSet.empty
| ConcreteMethod cm ->
(match cm.cm_implementation with(* A concrete method can be native so that we don't know its behaviour. In this case we suppose that no fields have been accessed which is not safe. *)
| Native -> FieldSet.empty
| Java code ->
(* The code is stored in a lazy structure, for performance purposes. Indeed when loading a class the Javalib does not parse its methods. *)let jcode = Lazy.force code in(* We iter on the array of opcodes, building our field set at the same time. *)
Array.fold_left
(fun s op ->
match op with
| OpGetField (_, fs)
| OpGetStatic (_, fs) ->
(* We add the field signature in our field set. In this example, we ignore the classes in which the fields are defined. *)
FieldSet.add fs s
| _ -> s
) FieldSet.empty jcode.c_code
)
) methods

Another use case

We see that the method m might raise an
NullPointer exception if we call the method
equals on an uninitialized string s. To avoid
this, a good practice is to replace the test
s.equals("str") by the expression
"str".equals(s) which will return false rather
than raising an exception.

Now we define a function that takes a classpath and a
classname as parameters and that returns a map associating
each concrete method signature to a list of
(int,string) couples representing
the program points and the strings on which the
java.lang.String.equals method is called.

let get_equals_calls (class_path : class_path)
(cn : class_name) =
(* We first recover the interface or class associated to the class name cn. *)let java_lang_string = make_cn "java.lang.String"inlet equals_ms =
make_ms "equals" [TObject (TClass java_lang_object)]
(Some (TBasic`Bool)) inlet c = get_class class_path cn in(* Then, we get all the concrete methods of c. *)let methods : jcode concrete_method MethodMap.t =
get_concrete_methods c in(* For each concrete method of c, we associate a (int*string) list containing all the strings passed as parameters to String.equals method, associated to the program point where the call occurs. *)
MethodMap.map
(fun m ->
(match m.cm_implementation with(* A concrete method can be native so that we don't know its behaviour. In this case we suppose that no call to String.equals which is not safe. *)
| Native -> []
| Java code ->
(* The code is stored in a lazy structure, for performance purposes. Indeed when loading a class the Javalib does not parse its methods. *)let jcode = Lazy.force code inlet code = jcode.c_code inlet l = ref [] in(* We iter on the array of opcodes, building our list of (int*string) at the same time. *)
Array.iteri
(fun pp op ->
match op with
| OpConst (`String s) ->
(* We detect that a string s is pushed on the stack. The next instruction might be an invokevirtual on String.equals. *)
(match (next_instruction code pp) with
| Some (inst,ppi) ->
(match inst with
| OpInvoke (`Virtual (TClass cn), ms)
when cn = java_lang_string
&& ms = equals_ms ->
(* We add the program point of the invokevirtual and the pushed string in our list. *)
l := (ppi, s) :: !l
| _ -> ()
)
| None -> ()
)
| _ -> ()
) code;
(* We simply return our list, in the reverse order so that the program points appear in ascending order. *)
List.rev !l
)
) methods