Architecture

Lightweight program (which we call filters) operate on program source files and/or data files and produce data files. The filters can be stacked in pipelines, where each filter in the pipeline reads data files generated by prior filters and in turn generates new data files. The design motivation behind this structure is to allow pipelines of filter programs to be constructed to implement program analysis. This modular design is important to isolate the language-specific first pipeline stages from later language-independent modules and in this way support sophisticated analysis for multilingual condebases.

The AST files have very different structures for C, Python and Javascript, but the parsers are designed to handle each kind of AST differently. Those parsers filter the AST files, detecting and recording function calls and their arguments. Initially, the program is capable of detecting literals and variables as arguments. Reaching Definition Analysis has been implemented for C/C++ programs that call Python programs (but none of the other languages) to handle statically assigned variables as arguments to functions. The current version of the program handles part of the Python.h interface between C and Python. It only analyzes “PyRun_SimpleFile” calls. Other mechanisms for calling Python from C will also be implemented in the future. The version can also handle PyV8 's eval function to call a JavaScript program from Python, and JQuerry's ajaz function to call a Python program from JavaScript. In the future, the program will be able to handle cases in which a JavaScript program is called from a C program, and both JavaScript and Python functions call C programs.

When the designated function used to call another program of another language, such as “PyRun_SimpleFile” , JSContext().eval() or $.ajax(), is found, its argument (name of the Python or JavaScript file) is considered a function call and the executable portion of that file is represented as the main function in the original program. That creates the connection between the two files, which allows the subsequent programs to build the call graph.

“mergeFunCall.py” combines all individual csv files from the list of source files into one. This file is then used as input to “generateDot.py”. This program translates the csv file to a dot file, which represents the csv file as a graph. The Dot program builds the final graph via GraphVizand saves it as a PDF file. Circular nodes represent C programs, rectangular nodes represent Python programs, and hexagonal nodes represent JavaScript programs. Recursive functions are denoted by dashed nodes. Errors, such as circularity in a system or unidentifiable interoperability, are denoted by double-lined dashed nodes.