9:00am

9:15am

ORC is a modular re-implementation of MCJIT that allows for more flexible configuration, better memory management, more fine-grained testing, and easier addition of new features. Its feature set includes of all MCJIT’s current functionality, plus built-in support for lazy and remote compilation. This talk describes ORC’s current features and design concepts, and provides demonstrations of how it can be used.

10:30am

Have you ever experienced significant performance swings in your application after seemingly insignificant changes? A random NOP shifting code addresses causing a 20% speedup or regression? This talk will explore some of the common and not so common architectural reasons why code placement/alignment can affect performance on older and newer x86 processors. Even though ideas will be shared on how to avoid/fix some of these issues in compilers, other very low level issues will not have good compiler solutions, but are still important to recognize for knowledge and identification purposes.

10:30am

This talk is a sequel to my talk at the 2014 LLVM Developers' Meeting, in which I discussed @llvm.assume; scoped-noalias metadata; and parameter attributes that specify pointer alignment, dereferenceability, and more. The past two years have seen changes to the metadata representation itself (e.g. distinct vs. uniqued metadata), as well as new metadata that specify pointer alignment, dereferenceability, control loop optimizations, and more. Several new attributes and intrinsics allow for more-detailed control over pointer-aliasing and control-flow properties, and new intrinsics to support masked and scatter/gather memory accesses have been added. Support for older features, such as fast-math flags and the returned attribute, have been significantly extended. I'll explain the semantics of many of these new features, their intended uses, and a few ways they shouldn't be used. Finally, I'll discuss how Clang exposes and leverages these new features to encourage the generation of higher-performance code.

11:15am

Though invented long time ago in 1957, coroutines are getting popular in this century. More and more languages adopt them to deal with lazily produced sequences and to simplify asynchronous programming. However, until recently, coroutines in high level languages were distinctly not a zero-overhead abstraction. We are rectifying that by adding coroutine support to LLVM that allows, finally, for high-level language to have efficient coroutines

In this talk, we will look at coroutine examples in C++ and LLVM IR, at optimization passes that deal with coroutines and at LLVM coroutine representation that C++ and other frontend can use to describe coroutines to LLVM.

LLVM coroutines are functions that can suspend their execution and return control back to their callers. Suspended coroutines can be resumed to continue execution when desired.

Though coroutine support in LLVM is motivated primarily by the desire to support C++ Coroutines, the LLVM coroutine representation is language neutral and can be used to support coroutines in other languages as well.

Gor Nishanov is a Principal Software Design Engineer on the Microsoft C++ team. He works on design and standardization of C++ Coroutines, and on asynchronous programming models. Prior to joining C++ team, Gor was working on distributed systems in Windows Clustering team.

11:15am

SVE is a new vector ISA extension for AArch64 targeted at HPC applications; one major distinguishing feature is that vector registers do not have a fixed size from a compiler perspective. This talk will cover the changes made to LLVM IR to support vectorizing loops in a vector length agnostic manner, as well as improvements in vectorization enabled by the predication and gather/scatter features of the extension. See https://community.arm.com/groups/processors/blog/2016/08/22/technology-update-the-scalable-vector-extension-sve-for-the-armv8-a-architecture for more details on the architecture.

12:00pm

Multiple members of the community recently considered seriously a possible move of our repository to git (and GitHub). With hundreds of emails exchanged on the topic, it is worth gathering everyone in the same room to discuss this. We will consider the two main variants of the move proposal detailed here: http://llvm.org/docs/Proposals/GitHubMove.htmlTo help driving the discussion, you're invited to fill this survey: https://goo.gl/forms/ZYs0Wv9g0w0ikCRQ2

12:00pm

Devirtualization - changing indirect virtual calls to direct calls is important C++ optimization. This talk will cover past work on devirtualization including optimizations made by the frontend and by LLVM using !invariant.group and @llvm.assume intrinsic and different LTO tricks. The speaker will also cover interesting problems that he faced and the future work and ideas how to make devirtualization better.

12:00pm

Currently, LoopVectorizer in LLVM is specialized in auto-vectorizing innermost loops. SIMD and DECLARE SIMD constructs introduced in OpenMP4.0 and enhanced in OpenMP4.5 are gaining popularity among performance hungry programmers due to the ability to specify a vectorization region much larger in scope than the traditional inner loop auto-vectorization would handle and also due to several advanced vectorizing compilers delivering impressive performance for such constructs. Hence, there is a growing interest in LLVM developer community in improving LoopVectorizer in order to adequately support OpenMP functionalities such as outer loop vectorization and whole function vectorization. In this Technical Talk, we discuss our approaches in achieving that goal through a series of incremental steps and further extending it for outer loop auto-vectorization.

15 years focused on shared memory parallelism followed by 10 years focused on vectorization for IA32/Intel64 SIMD extensions. SPEC HPG rep from Intel (2001-2007), involved in the development of SPEC OMP2001, HPC2002, and MPI2007 benchmarks.

2:15pm

This year LLVM's loop passes have been greatly improved. Along with enabling new algorithms, such as new advanced loop unrolling heuristics, some long-living problems have been addressed, which resulted in significant compile time improvements and, in general, cleaner pass pipeline. We'll talk about the journey we've done along various loop passes, share our thoughts on how to avoid in future some problems we met, and share the methodology we used to find these problems.

2:15pm

rev.ng is an open-source static binary analysis framework based on QEMUand LLVM. Its core component, revamb, is a static binary translatorwhich aims is to translate a Linux program compiled for any of the 17ISAs supported by QEMU and produce an equivalent binary for a, possiblydifferent, architecture supported by the LLVM compiler framework.

revamb aims to translate and re-optimize legacy/closed source programsbut can also be employed for a number of security-related purposes,such as retrofitting binary hardening techniques (e.g., CFI) orinstrumenting existing binaries with good performance figures (e.g., forblack box fuzzing purposes).

More in general, rev.ng can be used to perform binary analysis on a widerange of architectures in the comfortable LLVM environment. As anexample, rev.ng can be used to recover high-level information such asan accurate CFG and function boundaries from a binary program.

At its current status, revamb is able to successfully translate the 105coreutils binaries compiled for ARM, x86-64 and MIPS and pass over 80%of coreutils's testsuite on all of them. The programs have been linkedstatically, therefore they include handwritten assembly and their textis in the order of the hundreds of kilobytes.

I'm interested in several topics concerning the computer security field. My main focus is currently static binary analysis for reverse engineering purposes, but I've also been working in the system security and exploitation fields. I also have a strong interest in privacy, end-to-end encrypted communication systems and in the challenges posed by authentication of public-keys. |
I love GNU/Linux and Free Software in general.

3:00pm

The LLVM project is a more than decade old. It thrives because of its great development community. A constant source of fresh blood in the form of manpower and ideas are students.

There are a lot of challenges in order to help newcomers become more productive or better integrated. Getting started, time to first bugfix, time to first accepted patch and time to commit rights are dependent on student's excellence but on the guidance and mentoring, too. Keeping the students around after completing their tasks is often one of the major challenges in mentor's career.

I'd like to use this BoF to share experience when working with students. I'd like to discuss current issues, challenges and opportunities when working and raising the next generation of LLVM developers.

3:00pm

Clang was written in part to deliver fast compile times for C & C++ code. However, the traditional way C compilers integrate with build systems places many limitations on how efficiently that can be done. This talk introduces llbuild -- a new framework for building build systems -- which was designed to help solve this problem, and envisions a new architecture for compiling software which would allow us to significantly improve compilation times for large software projects.

3:00pm

Code-hoisting identifies identical computations across the program and hoists them to a common dominator so as to save code size. Although the main goal of code-hoisting is not to remove redundancies: it effectively exposes redundancies and enables other passes like LICM to remove more redundancies. The main goal of code-hoisting is to reduce code size with the added benefit of exposing more instruction level parallelism and reduced register pressure.

We present a code hoisting pass that we implemented in llvm. It is based on Global Value Numbering infrastructure available in llvm. The experimental results show an average of 2.5\% savings in code size, although the code size increases in many cases because it enables more inlining. This is an optimistic algorithm in the sense that we consider all identical computations in a function as potential candidates to be hoisted. We make an extra effort to hoist candidates by partitioning the potential candidates in a way to enable partial hoisting in case common hoisting points for all the candidates cannot be found. We also formalize cases when register pressure will reduce as a result of hoisting.

3:45pm

4:15pm

In this BoF session, we will discuss ways in which LLVM can be extended to support non-default floating-point behavior. Topics will include respecting FP rounding modes in optimization passes, preserving FP exception status, avoiding false FP exceptions and enabling run-time handling of FP exceptions.

I've been a tools developer at Intel for 17 years and have been working with LLVM since 2012, contributing to areas such as MCJIT, LLDB and Windows exception handling. I'm about to dive into LLVM's representation and handling of floating point operations.

I have spent the majority of my adult life getting the Intel compiler to generate superb code for x86 processors. I developed many major pieces of functionality in the Intel compiler's back end including its register allocator. My current focus is to draw on that experience to help improve LLVM's generated code for x86.

4:15pm

1) MemorySSA in Five Minutes - George Burgess IvAbstract: MemorySSA is a utility that has recently been landed in LLVM. This talk will give a high-level introduction to what MemorySSA is and how we expect to use it

2) Polly as an analysis pass in LLVM - Utpal BoraIn this talk, we will introduce a new interface to use polyhedral dependence analysis of Polly in LLVM transformation passes such as Loop Vectorizer. As part of GSoC 2016, we implemented an interface to Polly, and provided new APIs that can be used as an Analysis pass within LLVM's transformation passes. We will describe our implementation and demonstrate some loop transformations using the new interface (PolyhedralInfo). Details on GSoC- http://utpalbora.com/gsoc/2016.html

3) RISC-V: Towards a reference LLVM backend - Alex BradburyThis talk will present work towards establishing RISC-V as a reference quality backend in LLVM. By maintaining a regularly updated patchset that implements a production quality backend alongside a companion tutorial, we can make LLVM development accessible to a much wider audience. We will explore the work that has been done to reach these goals, problems faced, and how you can contribute.

4) Error -- Structured Error Handling in LLVM - Lang HamesLLVM’s new Error scheme enables rich error handling and recovery by supporting user-defined error types and strict requirements on error checking. This talk provides an overview of how the scheme works, and how it can be used in your code.

5) Reducing the Computational Complexity of RegionInfo- Nandini SinghalThe LLVM RegionInfo pass provides a convenient abstraction to reason about independent single-entry-single-exit regions of the control flow graph. RegionInfo has proven useful in the context of Polly and the AMD GPU backend, but the quadratic complexity of RegionInfo construction due to the use of DominanceFrontier makes the use of control flow regions costly and consequently prevents the use of this convenient abstraction. In this work, we present a new approach for RegionInfo construction that replaces the use of DominanceFrontier with a clever combination of LoopInfo, DominanceInfo, and PostDominanceInfo. As these three analysis are (or will soon be) preserved by LLVM and consequently come at zero cost while the quadratic cost of DominanceFrontier construction is avoided, the overall cost of using RegionInfo has been largely reduced, which makes it practical in a lot more cases. Several other problems in the context of RegionInfo still need to be addressed. These include how to remove the RegionPass framework which makes little sense in the new pass manager, how to connect Regions and Loops better, and how we can move from a built-upfront-everything analysis to an on-demand analysis, which step-by-step builds only the needed regions. We hope to discuss some of these topics with the relevant people of the LLVM community as part of the poster session.

6) Toward Fixed-point Optimization in LLVM - Nathan WilsonAs one might imagine, LLVM’s optimization pipeline is not universally optimal. Running the available optimization passes in a different order, and changing the number of times each runs, can improve performance on some programs. Anecdotal evidence has suggested that simply running the current optimization pipeline multiple times often yields better-performing programs. Given compile-time constraints, simply duplicating the current pipeline for all inputs would be unacceptable for many users. How many times is enough? What if we could be smarter about it to get the performance benefits without the compile-time cost? To answer this question, we implemented a fixed-point optimization scheme in LLVM and evaluated its use when compiling LLVM’s test suite. Under this scheme, the function-pass pipeline would continue to execute while the input IR differed from the final IR. This experiment revealed that we will be able to capture the performance benefits of repeating the pipeline at reduced cost, and that four times is enough.

7) FileCheck Follies - Paul RobinsonFileCheck is *the* critical tool for LLVM testing. See how to make it NOT do what you want! Watch it silently pass a bogus test! All examples REAL and "ripped from the headlines" of the commits list!

5:00pm

Loop Vectorizer currently translates scalar operations into a new sequence of SIMD operations where every operation can be represented very naturally in LLVM-IR using its native instructions and vector operands. This sequence passes though common and target-specific optimizations before being lowered to target code. However, we aim to work with composite operations or idioms during vectorization. This requirement stems from the rich vector ISA’s supported by targets – SIMD instruction sets include CISC-like operations such as clamping or saturating arithmetic, multiply-and-accumulate-pairs or sum-of-absolute-differences. Two additional categories of such non-primitive vector operations are nonconsecutive forms of memory accesses and masked vector operations. Current LLVM-IR can support such idioms by patterns of instructions or intrinsics, making cost estimation and/or following optimization steps problematic. As a part of our drive to enhance vectorization in LLVM, we are revisiting the ability of LLVM-IR to represent composite SIMD operations. In this session we’d like to discuss the different categories of composite SIMD operations, alternative approaches to represent them and propose a new generic solution.

5:00pm

Stack-use-after-scope is the check in AddressSanitizer which detects access to variables from outside the scope it was declared. This talk covers issues we had to resolve to make the feature usable, results of applying the check to the Google code, and examples of bugs it detected.

2) How compiler frontend is different from what IDE needs? - Ilya BiryukovWe’ve been writing our own C++ frontends at JetBrains for a few years now. Given that most people use clang these days, this may come as a surprise that we don’t. In this small talk we’ll cover the reasons that were driving our decision to roll out our own implementation and try to highlight how it’s different from what’s being done in clang.

3) Enabling Polyhedral Optimizations in Julia - Matthias ReisingerJulia is a relatively young programming language with a focus on technical computing. While being dynamic it is designed to achieve high performance comparable to that of statically compiled languages. The execution of Julia programs is driven by a just-in-time compiler that relies on LLVM to produce efficient code at run-time. This talk highlights the recent integration of Polly into this environment which has enabled the use of polyhedral optimization in Julia programs.

4) Extending Clang AST Matchers to Preprocessor Constructs - Jeff TrullClang's libTooling provides powerful mechanisms for identifying and modifying source code via the AST. However, parts of the source code are hidden or obfuscated from these tools due to the action of the preprocessor. This is particularly true of legacy code, where applying refactoring tools is highly desirable. The speaker will demonstrate how to write an AST Matcher that identifies sections of code associated with preprocessor conditional directives, and will make suggestions on how to improve tooling in this area.

5) SMACK Software Verification Toolchain - Zvonimir RakamaricTool prototyping is an essential step in developing novel software verification algorithms and techniques. However, implementing a verifier prototype that can handle real-world programs is a large endeavor. In this talk, we present the SMACK software verification toolchain. SMACK provides a modular and extensible software verification ecosystem that decouples the front-end source language details from back-end verification algorithms. It achieves that by translating from the LLVM compiler intermediate representation into the Boogie intermediate verification language. SMACK offers the following benefits: (1) it can be used as an automated off-the-shelf software verifier in an applied software verification project, (2) it enables researchers to rapidly develop and release new verification algorithms, and (3) it allows for adding support for new languages in its front-end. We have used SMACK to verify numerous C/C++ programs, including industry examples, showing it is mature and competitive. Likewise, SMACK is already being used in several existing verification projects.

6) Finding code clones in the AST with clang - Raphael IsemannThis talk will introduce clang’s new clone detection framework that uses hash-code comparison to search for groups of AST nodes that are similar in a certain configurable sense.

9:00am

ThinLTO was first introduced at EuroLLVM 2015 as "A Fine-Grained Demand-Driven Infrastructure". The presentation was based on an early prototype made as a proof-of-concept. Taking this original concept, we redesign it from scratch in LLVM by extending the bitcode format, redesigning the high-level workflow to remove the "demand-driven" iterative part, and adding new capabilities such as the incremental build support. We added supports in two linkers: Gold on Linux and ld64 on Darwin.

We propose in this presentation to go through the final design and how it is implemented in LLVM.

10:00am

While the quality of debug info at -O0 has reached a satisfactory level, debugging code that was optimized by LLVM still poses a challenge, primarily because variable locations may get dropped at any time in the compilation.

We will start by presenting statistics aimed at identifying the worst offenders among the compilation stages and highlight known problems including debug value location tracking in the backend, the register allocator, optimizing transformations, and shortcomings of LLVM IR, before opening the floor to a discussion on strategies for improving the quality of debug info for optimized code.

10:00am

The current concept of poison in LLVM is known to be broken, leaving LLVM in a state where certain miscompilation bugs are hard or even impossible to fix. Moreover, the concepts of poison and undef values in LLVM are hard to reason about and are often misunderstood by developers.

However, we need concepts similar to poison and undef to enable certain important optimizations.

In this talk, we will present the motivation behind poison and undef and why they are broken. We'll also present a proposal to kill undef and extend poison, while retaining their performance benefits.

This talk is meant to increase awareness of the issues and motivations behind poison/undef and discuss how to fix it.

10:00am

In this presentation we will discuss and demonstrate an approach to build various Formal Methods (FM) tools leveraging LLVM. FM has seen a significant increase in usage in software over the past decade, being used in critical system design, security, and prototyping. We will discuss the benefits and drawbacks of LLVM IR for FM and the need for an Abstract Representation (AR) that allows for the analysis via engineering approximations. In particular we want to talk about our approach and tools that mapped our chosen AR, developed at NASA, and then extending our initial set of analysis into more logical and hierarchal relationship. Lastly we want to present what we feel are the difficulties, future challenges and successes of FM tools integrating with LLVM community.

11:15am

11:15am

Optimization diagnostics have been part of LLVM for years. While important, these diagnostics had a narrow focus on providing user feedback on the success or failure of Auto-vectorization. This work explores the possibility of extending on this foundation in order to build up a full-fledged performance analysis tool set using the compiler. The talk will first lay out the elements of this tool set. Then we will evaluate and refine it through an exploration of real- world use cases.

11:15am

Last year we presented a proposal to bring up a new instructionselection framework, GlobalISel, in LLVM. This talk will show the progress made with the design and implementation of that proposal as well as pointing out the areas that need to be develop.

As a backend developer, you will learn what it takes to start using GlobalISel for your target and as a LLVM developer, you will see which aspects of GlobalISel require your contributions.

12:00pm

Many members of the LLVM community from both industry and academiaare working towards addressing an important problem:shipping software as LLVM IR for more flexible analysis and transformation.Examples of these efforts include technologies such as `-fembed-bitcode`,ThinLTO, and WLLVM.

We propose a BoF for these parties and all interested tomeet and discuss the benefits and technical challenges involved,learn about each others' goals and use-cases, and to identifycollaboration opportunities across these overlapping projects.

Our interest:We at UIUC are developing a system called "ALLVM" in whichall components are represented as LLVM IR first andforemost. Our goal is to explore the potential benefits ofthe approach for improving performance, strengthening security,and simplifying failure diagnosis for production code.A second goal is to make ALLVM available widely as a platform for research. As part of this ongoing project we aredeveloping and automating the construction of completeLLVM-based representations of real-world software, as wellas building an ecosystem of supporting tools.

12:00pm

12:00pm

Many architectures allow addressing parts of a register independently. Be it the infamous high/low 8 bit registers of X86, the 32/64bit addressing modes of X86-64 and AArch64 or GPUs with wide loads and stores where with computation on sub register lanes.

LLVM recently gained support to track liveness on subregister granularity. In combination with improved heuristics for register classes of varying sizes the average register count decreased for 20% for GPU shader programs.

This talk gives an introduction to typical situations benefiting from sub register liveness modeling. It shows how a target architecture developer can model them and explains the register allocation techniques employed by llvm.

12:45pm

2:15pm

Extending an existing compiler intermediate language with parallel constructs is a challenging task. Maintainability dictates a minimal extension that will not disturb too many of the existing analyses and transformations. At the same time, the parallel constructs need to be powerful enough to express different, orthogonal execution scenarios. For C/C++, OpenMP is one of the most prominent parallelization frameworks that, on its own, allows for multiple parallelization schemes. Additionally, other parallel languages such as OpenCL, CUDA or Cilk++ would profit from the translation to lower level parallel constructs. Finally, automatic parallelizers and new (partially) parallel languages such as Julia can be utilized best with general parallel constructs that allow to express parallel (or better concurrent) execution in an independent and intuitive way throughout the compilation.

In this BOF we want to continue the discussion about PIR, a parallel extension of the LLVM-IR. The discussion began in the context of the LLVM-HPC working group on IR extensions for parallelization. We will introduce the design and concepts behind PIR and shortly report on the lessons learned during the ongoing development of our prototype. In the course of this introduction we will talk about the goals, common problems as well as use cases that motivated our current design. Afterwards we will initiate an open ended discussion with the audience for which we allocate the majority of time (≈20 minutes).

2:15pm

The number one piece of feedback we've heard from Windows users of Clang is that they want to be able to debug their programs in Visual Studio. More than that, though, there is a world of Windows tools, such as profilers, post-mortem crash analyzers, self-debugging tools (dbghelp), and symbol servers, that makes it really worth implementing CodeView support in LLVM. Since the last dev meeting, we've been hard at work studying the format and slowly adding support for it to LLVM. This talk will give an overview of the format, and then go back and focus on the aspects that most impacted our design decisions in Clang and LLVM. As others in the community have discovered while working on LTO, LLDB, modules, and llvm-dsymutil, type information can often end up being the dominating factor in the performance of the toolchain. CodeView has some interesting design choices for solving that problem that I will share. I will close by talking about where we want to go in the future, and how we will eventually use LLD to package our CodeView into a PDB file.

2:15pm

The ability to perform interprocedural analysis is one of the most powerful features of Clang Static Analyzer. This talk is devoted to the ongoing improvement of this feature. We will discuss our implementation of summary-based interprocedural analysis as well as cross translation unit analysis. These features allow faster analysis with a greater number of potentially found bugs. We are going to describe our implementation details and approaches and discuss their pros and cons.

Marshall is a long-time LLVM and Boost participant. He is a principal engineer at Qualcomm, Inc. in San Diego, and the code owner for libc++, the LLVM standard library implementation. He is the author of the Boost.Algorithm library and maintains several other Boost libraries.

3:00pm

In LLVM 3.8 the autoconf build system was deprecated and it was removed in favor of the newer CMake system starting in 3.9. This talk provides a brief introduction to the CMake programming language to ensure everyone basic familiarity. It will include a post-mortem on the LLVM autoconf->CMake transition, and discuss some of the useful features of the LLVM CMake build system which can improve developer productivity. We will explore a case study on packaging and shipping an LLVM toolchain with CMake including an in-depth explanation of many of the new features of the LLVM CMake build system. Lastly it will provide a status report of the current state of the build system as well as presenting some of the future improvements on the horizon.

3:00pm

Maintaining a low code size overhead is important in computing domains where memory is a scarce resource. Outlining is an optimization which identifies similar regions of code and replaces them with calls to a function. This talk introduces a novel method of compressing code using an interprocedural outliner on LLVM MIR.

3:45pm

Enabling Polyhedral Optimizations in Julia - Matthias ReisingerJulia is a relatively young programming language with a focus on technical computing. While being dynamic it is designed to achieve high performance comparable to that of statically compiled languages. The execution of Julia programs is driven by a just-in-time compiler that relies on LLVM to produce efficient code at run-time. This poster highlights the recent integration of Polly into this environment which has enabled the use of polyhedral optimization in Julia programs.

Towards a generic accelerator offloading approach: implementing OpenMP 4.5 offloading constructs in Clang and LLVM - Gheorghe-Teodor BerceaThe OpenMP 4.5 programming model enables users to run on multiple types of accelerators from a single application source code. Our goal is to integrate a high-performance implementation of OpenMP’s programming model for accelerators in the Clang/LLVM project. This poster is a snapshot of our ongoing efforts towards fully supporting the generation of code for OpenMP device offloading constructs. We have submitted several Clang patches that address some of the major issues that, in our view, prevent the adoption of a generic accelerator offloading strategy. At compiler level, we introduce a new OpenMP-enabled driver implementation which generalizes the current Clang-CUDA approach. The new driver can handle the compilation of several host and device architecture types and can be extended to other offloading programming models such as OpenACC. We developed libomptarget, a runtime library that supports execution of OpenMP 4.5 constructs on NVIDIA architectures and is extensible to other ELF-enabled devices. In this poster we describe two features of libomptarget: the mapping of data to devices and compilation of code sections for different architectures into a single binary. The aforementioned changes have been integrated locally with the Clang/LLVM repositories resulting in a fully functional OpenMP 4.5 compliant prototype. We demonstrate the robustness of our extensions and show preliminary performance results on the LULESH proxy application.

Polly as an analysis pass in LLVM - Utpal BoraIn this talk, we will introduce a new interface to use polyhedral dependence analysis of Polly in LLVM transformation passes such as Loop Vectorizer. As part of GSoC 2016, we implemented an interface to Polly, and provided new APIs that can be used as an Analysis pass within LLVM's transformation passes. We will describe our implementation and demonstrate some loop transformations using the new interface (PolyhedralInfo). Details on GSoC- http://utpalbora.com/gsoc/2016.html

Reducing the Computational Complexity of RegionInfo - Nandini SinghalThe LLVM RegionInfo pass provides a convenient abstraction to reason about independent single-entry-single-exit regions of the control flow graph. RegionInfo has proven useful in the context of Polly and the AMD GPU backend, but the quadratic complexity of RegionInfo construction due to the use of DominanceFrontier makes the use of control flow regions costly and consequently prevents the use of this convenient abstraction. In this work, we present a new approach for RegionInfo construction that replaces the use of DominanceFrontier with a clever combination of LoopInfo, DominanceInfo, and PostDominanceInfo. As these three analysis are (or will soon be) preserved by LLVM and consequently come at zero cost while the quadratic cost of DominanceFrontier construction is avoided, the overall cost of using RegionInfo has been largely reduced, which makes it practical in a lot more cases. Several other problems in the context of RegionInfo still need to be addressed. These include how to remove the RegionPass framework which makes little sense in the new pass manager, how to connect Regions and Loops better, and how we can move from a built-upfront-everything analysis to an on-demand analysis, which step-by-step builds only the needed regions. We hope to discuss some of these topics with the relevant people of the LLVM community as part of the poster session.

Binary Decompilation to LLVM IR - Sandeep DasguptaThis work is about developing a binary to LLVM IR translator to generate higher quality IR than that generated by the existing tools. Such an IR includes variable information, type information and individual stack frames per procedure, which in turn facilitates many sophisticated analysis and optimizations. We are using an open source tool McSema for the purpose and our goal is to extend the tool to 1) extract variable and type information, 2) improve the quality of recovered IR by mitigating some of its limitations and 3) re-construct stack for each procedure. The current status is we have extended the McSema recovered IR to re-construct the stack for each procedure which in turn will help in doing variable recovery and its promotion.

Dynamic Autovectorization - Joshua CranmerWe present our ongoing work on augmenting LLVM with a dynamic autovectorizer. This tool uses dynamic information to circumvent the shortfalls of imprecise static analysis when performing loop vectorization, as well as leveraging dynamic transformations of code and memory to make autovectorization and other optimization passes more effective. The key transformations we illustrate in this poster are the extraction of hot paths in innermost loops (with a current speedup of 5% on SPEC against vanilla LLVM) and the conversion of memory from array-of-structs to a struct-of-array representation.

RV: A Unified Region Vectorizer for LLVM - Simon MollThe Region Vectorizer (RV) is a general-purpose vectorization framework for LLVM. RV provides a unified interface to vectorize code regions, such as inner and outer loops, up to whole functions. Being a vectorization framework, RV is not another vectorization pass but rather enables users to vectorize IR directly from their own code. Currently, vectorization in LLVM is performed by stand-alone optimization passes. Users who want to vectorize IR have to roll their own vectorization code or hope for the existing vectorization passes to operate as the user intends them to. Polly, for example, features a simple built-in vectorizer but also communicates with LLVMs loop vectorizer through metadata. All these vectorizers pursue the same goal that of vectorizing some code region. However, their quality varies wildly and their code bases are redundant. In contrast, with RV users vectorize IR directly from their own code and through a simple unified API. The current prototype is a complete re-design of the earlier Whole-Function Vectorizer by Ralf Karrenberg. Unlike the Whole-Function Vectorizer or any vectorizer in LLVM, RV operates on regions, which are a more general concept. In terms of RV, a valid region is any connected subgraph of the CFG including loop nests. Regions make RV applicable for inner and outer loop vectorization. At the same time, RV attains the capability of its predecessor to vectorize functions into SIMD signatures. However, Whole-Function Vectorization is now only one of many possible use cases for RV. The current prototype of RV implements all stages of a full vectorization pipeline. However, users can compose these stages as they see fit, inserting and extracting IR and analysis information at any point.

Robustness Enhancement of Baggy Bounds Accurate Checking in SAFECode - Zhengyang LiuBaggy Bounds Accurate Checking(BBAC) is a compile-time transform and runtime hardening solution that detects out-of-bounds pointer arithmetic errors. The original version of BBAC implemented on SAFECode is not robust and efficient enough for real world use. Our work has improved the robustness and performance of the SAFECode's BBAC implementation, by fixing the bugs in the compile-time transform passes and runtime checking functions, as well as inlining several runtime checking functions. The latest implementation of BBAC achieves reasonable robustness and performance on various real world applications.

Extending Clang C++ Modules System Stability - Bianca-Cristina CristescuThe current state of the Module System, although fairly stable, has a few bugs for C++ support. Currently, the method for ensuring no regressions is a buildbot for libc++, which builds llvm in modules self-hosted mode. Its main purpose is to find bugs in clang’s implementation and ensure no regression for the ongoing development. We propose a flow for finding bugs, submitting them alongside with a minimal reproducer to the Clang community, and subsequently proposing a fix for the issue which can be reviewed by the community. The poster will emphasise, besides the common complexity of minimising an issue, a comparison between the labour required with and without the methodology proposed.

4:45pm

Loop Optimization is important for high-performance computing but evenmore for fast image processing, machine learning, and acceleratorprogramming. Over the last year the Polly loop optimization frameworkhas significantly evolved, with new support for data-layouttransformations, optimization of dense linear algebra kernels, and fullyautomatic accelerator mapping support. Many of these transformationshave been contributed by developers all over the world, including threesummer of code students. This BoF serves as a place for core developersto gather, to discuss the current status of Polly, and to shape the 2016development agenda of Polly. Hot topics are likely new automatic GPGPUcode generation facilities, recent improvements on correctness andcompile time, the new outer loop vectorization, and the recent additionof @polly support in Julia. The Polly code base also relies heavily onscalar evolution, value range analysis, and can serve as a basisfor performance, memory footprint, and data transfer modeling. We invitePolly developers and all other interested developers.

4:45pm

This talk will present a proof of concept of an approach which improves compile and link times by replacing the conventional use of object files with an incrementally updated repository without requiring change to existing build infrastructure. It aims to present the idea at a high-level using a live demo of some trivial tools and initiate discussion of a real implementation within the LLVM framework.

4:45pm

There are many embedded systems on which we rely heavily in our day to day lives, and for these it is crucial to ensure that these systems are as robust as possible. To this end, it is important to have strong guarantees about the integrity of running code. Achieving this naturally requires close integration between hardware features and compiler toolchain support for these features.

To achieve this, an NXP architecture uses hardware signing to ensure integrity of a program's control flow from modification. Each instruction's interpretation depends on the preceding instruction in the execution flow (and hence the sequence of all preceding instructions). Basic blocks require a “correction value” to bring the system into a consistent state when arriving from different predecessors. Compiler support is needed for this such that compiled code can receive the benefits of this feature.

Over the past year we have implemented the infrastructure for this feature which can be enabled on a per-function level in LLVM, for functions written in both C and/or assembly. In this talk we will present this system, and show how it enforces control flow integrity.

We will explain how we have extended our target’s backend with a pass that produces metadata describing a system’s control flow. This allows branches and calls to be resolved with appropriate correction values. A particular challenge is dealing with function pointers and hence indirect transfers of control. We will also describe the implementation of user attributes to support such functionality in Clang.

The encoding of each instruction, and the correction values cannot be finally determined until the final programs is linked. Using the metadata generated by LLVM, we can recreate the control flow graph for the entire program. From this, each instruction can be signed, and the correction values for each basic block inserted into the binary.