Welcome prospective Google Summer of Code 2018 Students! This document is your
starting point to finding interesting and important projects for LLVM, Clang,
and other related sub-projects. This list of projects is not just developed for
Google Summer of Code, but open projects that really need developers to work on
and are very beneficial for the LLVM community.

We encourage you to look through this list and see which projects excite you
and match well with your skill set. We also invite proposals not on this
list. However, you must propose your idea to the LLVM community through our
developers' mailing list (llvm-dev@lists.llvm.org or specific subproject mailing
list). Feedback from the community is a requirement for your proposal to be
considered and hopefully accepted.

The LLVM project has participated in Google Summer of Code for several years
and has had some very successful projects. We hope that this year is no
different and look forward to hearing your proposals. For information on how to
submit a proposal, please visit the Google Summer of Code
main website.

Description of the project:
After instruction selection LLVM uses the MI (Machine Instruction)
representation for programs. We recently added support for reading and
writing this representation to disk
(http://llvm.org/docs/MIRLangRef.html). Usage of this format for writing
tests is growing and so is the desire to improve the format, tools and
workflow. Possible projects:

Create a single consistent format instead of the current mix of YAML + IR + MIR

Description of the project: Debugging optimized code can be frustrating. Variables may appear as "<value optimized out>" in the debugger, or may not appear at all. Line numbers in stack traces may disappear, or worse, become inaccurate. To improve the situation, we have to teach more LLVM optimization passes how to preserve debug info. The primary focus will be on mid-level IR passes which fail to pass verification by the Debugify utility. This utility can identify passes which drop debug info in a targeted way and can simplify test case generation.

Expected Results:This project has two goals. Initially, the student will gather metrics on debug info loss for individual llvm passes. This will let us measure subsequent improvements. The second goal is to incrementally fix as many debug info loss bugs as possible, with a focus on areas of the compiler which are the hottest.

Description of the project: The llvm project has a lot of tools that can be used to inspect binaries, just as any other toolchain project does. However, many people are accustomed to existing tools and so having a command line compatible shell and we'd like to make that easy for them. Bonus points for producing similar output so that automated tools can continue to work reliably.

Expected Results:This project has one goal - produce binary tools that are drop in compatible with GNU binutils. The student will be expected to focus on a single tool at a time so that we can count each one as "done" as much as possible.

Description of the project: Dominance relation is used widely in many compiler analyses and optimizations. LLVM provides an implementation of the (Semi-NCA) Depth Based Search algorithm to incrementally update Dominator and PostDominator Trees. It it possible to use it directly or through a lazy updater object -- DeferredDominance. The current API is fragmented and different analyzes, transforms, and utilities (e.g. Local.cpp, LoopUnroll.cpp) have to decide how to perform incremental updates.

The fix would be to design and implement a new class for abstracting away how tree updates are performed (eagerly or lazily) and which trees are actually being updated (none, only DomTree, only PostDomTree, both). With this, performing faster incremental updates will become possible by first updating DomTree, and then using the result to prune unnecessary updates to PostDomTree.

Expected Results:

Create a new class that will store Dominator Trees to update and allow to specify update policy.

Convert existing API to use the new updater object instead of working directly with DomTree/DeferredDominance.

Design and implement a new algorithm to prune unnecessary PostDomTree updates based on updated DomTree.

Description of the project: LLVM functions can be tagged with several attributes such as the function only reads memory, or the function cannot throw exceptions. These attributes are used by many optimizations when deciding if a particular transformation is valid or not. Functions attributes can be either given by the frontend or be inferred by LLVM.
The goal of this project is to improve current function attributes inference algorithms, and to infer attributes that are not inferred right now. This will be accomplished via intra- and/or inter-procedural analyses. See this email for a list of oportunities for improvement.

Description of the project: Clang has a newly implemented autocompletion feature which details can be found at LLVM blog. We would like to improve this by adding more flags to autocompletion, supporting more shells (currently it supports only bash) and exporting this feature to other projects such as llvm-opt. Accepted student will be working on Clang Driver, LLVM Options and shell scripts.

Expected Results: Autocompletion working on bash and zsh, support llvm-opt options.

Description of the project: Just as LLVM is a library to
build compilers, LLDB is a library to build debuggers. LLDB vends
a stable, public SB API. Due to historic reasons the LLDB command
line interface is currently implemented on top of LLDB's private
API and it duplicates a lot of functionality that is already
implemented in the public API. Rewriting LLDB's command line
interface on top of the public API would simplify the
implementation, eliminate duplicate code, and most importantly
reduce the testing surface.

This work will also provide an opportunity to clean up the SB API
of commands that have accrued too many overloads over time and
convert them to make use of option classes to both gather up all
the variants and also future-proof the APIs.

Description of the project: LLDB's data formatters allow it to pretty-print objects such as std::vector (from the C++ standard library), or String (from the Swift standard library). These data formatters are implemented in C++ and reside within the debugger, but the data structures are defined in other projects. This means that when the data structures change, lldb's data formatters may not be updated in sync. This also means that it's difficult for projects to define and test custom data formatters for special kinds of objects.

Expected results: The goal of this project would be to define a DSL which makes it possible to implement lldb data formatters for standard C++ containers. These formatters would be moved into libc++ and tested there.

Description of the project: lldb-mi implements a
machine-readable interface that is supported by many IDEs and text
editors. The current support is incomplete and does not implement
enough commands to work with most text editors. More importantly,
it isn't using the right abstraction layer: Instead of executing
textual commands via handleCommand() and scraping LLDB's
textual output, it should be using the methods and data structures
provided by the public SB API.

Description of the project: One of the tensions in the
testsuite is that spinning up a process and getting it to some
point is not a cheap operation, so you'd like to do a bunch of
tests when you get there. But the current testsuite bails at the
first failure, so you don't want to do many tests since the
failure of one fails all the others. On the other hand, there are
some individual test assertions where the failure of the assertion
should cause the whole test to fail. For example, if you
fail to stop at a breakpoint where you want to check some variable
values, then the whole test should fail. But if your test then
wants to check the value of five independent locals, it should be
able to do all five, and then report how many of the five variable
assertions failed. We could do this by adding Start
and End markers for a batch of tests, do all the tests in
the batch without failing the whole test, and then report the
error and fail the whole test if appropriate. There might also be
a nice way to do this in Python using scoped objects for the test
sections.

Description of the project: apt.llvm.org provides Debian and Ubuntu repositories for every maintained version of these distributions. LLVM, Clang, clang extra tools, compiler-rt, polly, LLDB and LLD packages are generated for the stable, stabilization and development branches.
These packages are also shipped as part of Debian and Ubuntu without any changes.
Debian and Ubuntu have separate packages for libc++ and OpenMP.
The goal of this project is to merge libc++ and OpenMP packages as part of the llvm-toolchain packages.
The difficulty of this project is to make different versions of these libraries co-installable while remaining usable for developers. This project will also aim to limit the impact on existing usage of these libraries.

If the project is completed early, the student will also work on the full bootstrap of the llvm-toolchain (ie building it with a newly built clang binary).

Expectation: The student must have demonstrated some experience with Debian/Ubuntu packaging. Debian maintains a list of good first bugs. Please mention any packaging related contribution in the GSoC application.

Description of the project: apt.llvm.org provides Debian and Ubuntu repositories for every maintained version of these distributions. LLVM, Clang, clang extra tools, compiler-rt, polly, LLDB and LLD packages are generated for the stable, stabilization and development branches.
Currently, the packages are build using a Jenkins instance and Jenkins-Debian-Glue.
The goal of this project is to deploy an instance of Open Build Service (OBS) and port the various scripts and packages to this platform.

In theory, apt.llvm.org should not have any change from the user perspective.

Expectation: The student must have demonstrated some experience with Debian/Ubuntu packaging. Debian maintains a list of good first bugs. Please mention any packaging related contribution in the GSoC application.

Description of the project:
The C++ std::string class provides a c_str() method that returns a raw pointer to a string's inner character buffer. When a std::string is destroyed, the character buffer is deallocated. A common bug is to access a dangling raw pointer to the buffer after string deallocation. These "use after free" bugs can cause crashes or other unexpected behavior.
This project will add a new checker to the static analyzer to find when a dangling inner string pointer is used. This will help find bugs not only with std::string and c_str() but also with LLVM's StringRef class and the new C++17 std::string_view.

Description of the project:
The static analyzer finds bugs by exploring many possible paths through a program. To reduce false positives, it uses a very fast but imprecise custom constraint manager to rule out infeasible paths that cannot actually be executed at run time.
This project will extend the analyzer to use the Z3 SMT solver to rule out additional infeasible paths by postprocessing bug reports. This will help the analyzer reduce false positives when the path involves complicated branches that the built-in constraint manager cannot reason about.

Description of the project:
Clang-doc is a new tool for generating documentation for C/C++ code with a modular and extensible approach. It aims to simplify the overhead of generating documentation, leveraging the clang AST to produce results from existing comments and code. The main part of the tool produces an intermediate representation of the docs, which is consumed by a generator targeting a specific output format. Current and in-progress generators emit documentation in YAML and Markdown formats, but we’d like to have another one for HTML format.

Google Summer of Code 2017 contributed a lot to the LLVM project. Below is a
list of some projects that were offerred during GSoC 2017. For the list of
accepted and completed projects, please take a look into Google Summer of Code
website.

Description of the project:
ThinLTO is a cool new technology to perform Link-Time Optimization (see
this talk for more info). It is fairly new
and there are multiple improvements about cross-module optimizations that
can be made there.

Description of the project:
Adding Debug Info (compiling with `clang -g`) shouldn't change the
generated code at all. Unfortunately we have bugs… These are usually not
too hard to fix and a good way to discover new part of the codebase! A
starting point could be the test-suite. We suggest building object files
both ways and disassembling the text sections, which will give cleaner
diffs than comparing .s files.

Description of the project:
After instruction selection LLVM uses the MI (Machine Instruction)
representation for programs. We recently added support for reading and
writing this representation to disk
(http://llvm.org/docs/MIRLangRef.html). Usage of this format for writing
tests is growing and so is the desire to improve the format, tools and
workflow. Improvements would be welcome:

Create a single consistent format instead of the current mix of
YAML + IR + MIR

Do not print unnecessary information (we often print default
values where the reader could deduce them)

The format of things like MachineInstr/MachineBasicBlock::dump()
should be the same or very close to the .mir format => change the dump
functions.

Allow the representation to deduce successors of a basic block in
common cases

Description of the project:
When instantiating a template, the template arguments are canonicalized
before being substituted into the template pattern. Clang does not preserve
type sugar when subsequently accessing members of the instantiation.

Clang should "re-sugar" the type when performing member access on a class
template specialization, based on the type sugar of the accessed
specialization. The type of vs.front() should be std::string, not
std::basic_string<char, [...]>.

Suggested design approach: add a new type node to represent template
argument sugar, and implicitly create an instance of this node whenever a
member of a class template specialization is accessed. When performing a
single-step desugar of this node, lazily create the desugared representation
by propagating the sugared template arguments onto inner type nodes (and in
particular, replacing Subst*Parm nodes with the corresponding sugar). When
printing the type for diagnostic purposes, use the annotated type sugar to
print the type as originally written.

For good results, template argument deduction will also need to be able to
deduce type sugar (and reconcile cases where the same type is deduced twice
with different sugar).

Expected results:
Diagnostics preserve type sugar even when accessing members of a template
specialization. T<unsigned long> and T<size_t> are still the
same type and the same template instantiation, but
T<unsigned long>::type single-step desugars to 'unsigned long' and
T<size_t>::type single-step desugars to 'size_t'.

Description of the project:
Bash and other shells support typing a partial command and then
automatically completing it for the user (or at least providing suggestions
how to complete) when pressing the tab key. This is usually only supported
for popular programs such as package managers (e.g. pressing tab after
typing "apt-get install late" queries the APT package database and lists all
packages that start with "late"). As of now clang's frontend isn't supported
by any common shell.

Suggested design approach: The main goal is to support a variety of
terminals. It would be preferable to keep each shell plugin minimal,
enabling easy addition of new plugins. The implementation ought to extend
the clang driver switches with a flag to request auto-completion of a
partial shell command.

Description of the project:
Every developer has to interact with diff tools daily. The algorithms are
usually based on detecting "longest common subsequences", which is agnostic
to the file type content. A tool that would understand the structure of the
code may provide a better diff experience by being robust against, for
example, clang-format changes.

This check should be easier to write in clang-tidy than in Clang Static
Analyzer, specially because that we don't care about inlining (as long as it
doesn't modify pointer). More details in the
Bugzilla feature
request

Description of the project:
Implement a path-sensitive checker that warns if virtual calls are made from
constructors and destructors, which is not valid in case of pure virtual
calls and could be a sign of user error in non-pure calls.
The current virtual calls checker, implemented in VirtualCallChecker.cpp,
needs to be re-implemented in a path-sensitive way. The lack of
path-sensitive reasoning may result in false positives in the
inter-procedural mode, which is disabled now for that reason.
The false positives could happen when a called function uses a member
variable flag to track whether initialization is complete and relies on the
flag to ensure that the virtual member function is not called during
initialization. Further, the path diagnostic should be used to highlight
both the virtual call and the path from the constructor. Last, we will need
to evaluate if the warning should be issued for both calls to pure virtual
functions (which is always an error) and non-pure virtual functions (which
is more of a code smell and may be a false positive).

Description of the project:
Enhance the clang static analyzer by adding models of C++11 and C11 atomic
operations, such as std::atomic_compare_exchange_*. Currently, these
operations are being treated opaquely, which results in loss of precision
when analyzing the code that uses these instructions. To address the
problem, one would need to programmatically construct AST that simulates
these APIs to the BodyFarm of the analyzer. BodyFarm is the API used for
modeling system APIs. Finally, the work would also include writing tests
for the various APIs and checking that the analyzer correctly models
atomics.

Description of the project:
Many of the projects in compiler-rt are only supported on Linux.
Here are some examples: CFI, DFSan, XSan, LSan, XRay. Porting any of them
to other platforms, for example, Mac OS, would be great!

Description of the project:
The goal for the project is trying to improve the layout/performances of the
generated executable. The primary object format considered for the project
is ELF but this can be extended to other object formats. The project will
touch both LLVM and lld.

Warm-up: lld already provides an option to (--symbol-ordering file)
which takes a symbol ordering file (presumably collected from a
profiler) and builds a layout. This aims to reduce startup times. It
would be nice to provide scripts to profile the applications/process
various profilers output to produce an order file/evaluate the
impact of the feature (as it has been tested only on a small class
of applications). There's already some work in the area but nothing
has been integrated in the LLVM build system for ELF. Ideally a
motivated student would do the benchmarking/analysis before the GSoC
starts to familiarize with the problem.

The meat: Use/extend profile informations generated by LLVM to help
the linker laying out functions. An obvious way (what gcc uses, [1])
is to pass values to the linker using special `.note` sections. The
linker then can reconstruct the call graph and apply an algorithm
like the one described in [2] (this is just a starting point, other
alternatives can be explored).

Possible extension: Xray can be used to provide data (it's unclear whether
this is feasible easily, see David's comment in [3]).

Description of the project:
Even though Polly's compile time is today not a lot higher than other non-trial
IR passes, the need to version code in many situations and the lack of static
knowledge about loop iteration counts, hotness of functions, and parameter
requires Polly to be significantly more conservative than it would need to
be. The goal of this project is to connect Polly with the LLVM profiling
infrastructure to exploit profiling information to decide: 1) when to
run Polly, 2) how aggressive to version the code, 3) which code version
to emit, and 4) which assumptions to take. As a result, Polly should can
in profile guided builds become more aggressive, while still having a lower
compile time and code size impact.

Over the last years Chandler Carruth and others introduced a new pass manager
to LLVM which uses a new caching based architecture to allow analysis results
to be computed on demand. Besides resolving many engineering problems, the new
pass manager has three interesting properties: 1) analysis results for multiple
objects (e.g., functions) can be made available at the same time, 2) it is
possible to access the analysis result from one function in another function or
the analysis results from a function pass in a call-graph pass. 3) new pass
managers can be instantiated easily.

The goal of this project is to port Polly to the new pass manager and use this
opportunity to improve the overall pass design of Polly. The first step will
be to make Polly future proof by providing the same functionality Polly already
has with the old pass manager, in the context of the new pass manager. Next,
facilities of the new pass manger can be exploited to remove Polly's dependence
on the RegionPass infrastructure, and replace it with a Polly specific
scop-pass manager, that executes scop-model only passes without the need to
piggy-pack on some IR level analysis. Finally, the student thinks about how
analysis results can be made available across functions.

If the project is completed early, the student might look into exploiting
the availability of analysis results from multiple functions to perform GPU
code generation across functions

This document is meant to be a sort of "big TODO list" for LLVM. Each
project in this document is something that would be useful for LLVM to have, and
would also be a great way to get familiar with the system. Some of these
projects are small and self-contained, which may be implemented in a couple of
days, others are larger. Several of these projects may lead to interesting
research projects in their own right. In any case, we welcome all
contributions.

If you are thinking about tackling one of these projects, please send a mail
to the LLVM
Developer's mailing list, so that we know the project is being worked on.
Additionally this is a good way to get more information about a specific project
or to suggest other projects to add to this page.

Currently, both Clang and LLVM have a separate target description infrastructure,
with some features duplicated, others "shared" (in the sense that Clang has to create
a full LLVM target description to query specific information).

This separation has grown in parallel, since in the beginning they were quite
different and served disparate purposes. But as the compiler evolved, more and
more features had to be shared between the two so that the compiler would behave
properly. An example is when targets have default features on speficic configurations
that don't have flags for. If the back-end has a different "default" behaviour
than the front-end and the latter has no way of enforcing behaviour, it simply
won't work.

Of course, an alternative would be to create flags for all little quirks, but
first, Clang is not the only front-end or tool that uses LLVM's middle/back ends,
and second, that's what "default behaviour" is there for, so we'd be missing the
point.

Several ideas have been floating around to fix the Clang driver WRT recognizing
architectures, features and so on (table-gen it, user-specific configuration files,
etc) but none of them touch the critical issue: sharing that information with the
back-end.

Recently, the idea to factor out the target description infrastructure from
both Clang and LLVM into its own library that both use, has been floating around.
This would make sure that all defaults, flags and behaviour are shared, but would
also reduce the complexity (and thus the cost of maintenance) a lot. That would
also allow all tools (lli, llc, lld, lldb, etc) to have the same behaviour
across the board.

The main challenges are:

To make sure the transition doesn't destroy the delicate balance on any
target, as some defaults are implicit and, some times, unknown.

To be able to migrate one target at a time, one tool at a time and still
keep the old infrastructure intact.

To make it easy for detecting target's features for both front-end and
back-end features, and to merge both into a coherent set of properties.

To provide a bridge to the new system for tools that haven't migrated,
especially the off-the-tree ones, that will need some time (one release,
at least) to migrate..

The LLVM bug tracker occasionally
has "code-cleanup" bugs filed in it.
Taking one of these and fixing it is a good way to get your feet wet in the
LLVM code and discover how some of its components work. Some of these include
some major IR redesign work, which is high-impact because it can simplify a lot
of things in the optimizer.

The llvm-test testsuite is
a large collection of programs we use for nightly testing of generated code
performance, compile times, correctness, etc. Having a large testsuite gives
us a lot of coverage of programs and enables us to spot and improve any
problem areas in the compiler.

One extremely useful task, which does not require in-depth knowledge of
compilers, would be to extend our testsuite to include new programs and benchmarks.
In particular, we are interested in cpu-intensive programs that have few
library dependencies, produce some output that can be used for correctness
testing, and that are redistributable in source form. Many different programs
are suitable, for example, see this list for some
potential candidates.

We are always looking for new testcases and benchmarks for use with LLVM. In
particular, it is useful to try compiling your favorite C source code with LLVM.
If it doesn't compile, try to figure out why or report it to the llvm-bugs list. If you
get the program to compile, it would be extremely useful to convert the build
system to be compatible with the LLVM Programs testsuite so that we can check it
into SVN and the automated tester can use it to track progress of the
compiler.

When testing a code, try running it with a variety of optimizations, and with
all the back-ends: CBE, llc, and lli.

Find benchmarks either using our test results or on your own,
where LLVM code generators do not produce optimal code or simply where another
compiler produces better code. Try to minimize the test case that demonstrates
the issue. Then, either submit a
bug with your testcase and the code that LLVM produces vs. the code that it
should produce, or even better, see if you can improve the code
generator and submit a patch. The basic idea is that it's generally quite easy
for us to fix performance problems if we know about them, but we generally don't
have the resources to go finding out why performance is bad.

The
LNT perf database has some nice features like detect moving average,
standard deviations, variations, etc. But the report page give too much emphasis
on the individual variation (where noise can be higher than signal), eg.
this case.

The first part of the project would be to create an analysis tool that would
track moving averages and report:

If the current result is higher/lower than the previous moving average by
more than (configurable) S standard deviations

If the current moving average is more than S standard deviations of the
Base run

If the last A moving averages are in constant increase/decrease of more
than P percent

The second part would be to create a web page which would show all related
benchmarks (possibly configurable, like a dashboard) and show the basic statistics
with red/yellow/green colour codes to show status and links to more detailed
analysis of each benchmark.

A possible third part would be to be able to automatically cross reference
different builds, so that if you group them by architecture/compiler/number
of CPUs, this automated tool would understand that the changes are more common
to one particular group.

The
LLVM Coverage Report has a nice interface to show what source lines are
covered by the tests, but it doesn't mentions which tests, which revision and
what architecture is covered.

A project to renovate LCOV would involve:

Making it run on a buildbot, so that we know what commits / architectures
are covered

Update the web page to show that information

Develop a system that would report every buildbot build into the web page
in a searchable database, like LNT

Another idea is to enable the test suite to run all built backends, not just
the host architecture, so that coverage report can be built in a fast machine
and have one report per commit without needing to update the buildbots.

Completely rewrite bugpoint. In addition to being a mess, bugpoint suffers
from a number of problems where it will "lose" a bug when reducing. It should
be rewritten from scratch to solve these and other problems.

Move more optimizations out of the -instcombine pass and into
InstructionSimplify. The optimizations that should be moved are those that
do not create new instructions, for example turning sub i32 %x, 0
into %x. Many passes use InstructionSimplify to clean up code as
they go, so making it smarter can result in improvements all over the place.

We have a strong base for development of
both pointer analysis based optimizations as well as pointer analyses
themselves. It seems natural to want to take advantage of this:

The globals mod/ref pass basically does really simple and cheap
bottom-up context sensitive alias analysis. It being simple and cheap
are really important, but there are simple things that we could do to
better capture the effects of functions that access pointer
arguments. This can be really important for C++ methods, which spend
lots of time accessing pointers off 'this'.

The alias analysis API supports the getModRefBehavior method, which
allows the implementation to give details analysis of the functions.
For example, we could implement full knowledge
of printf/scanf side effects, which would be useful. This feature is in
place but not being used for anything right now.

We need some way to reason about errno. Consider a loop like this:

for ()
x += sqrt(loopinvariant);

We'd like to transform this into:

t = sqrt(loopinvariant);
for ()
x += t;

This transformation is safe, because the value of errno isn't
otherwise changed in the loop and the exit value of errno from the
loop is the same. We currently can't do this, because sqrt clobbers
errno, so it isn't "readonly" or "readnone" and we don't have a good
way to model this.

The hard part of this project is figuring out how to describe errno
in the optimizer: each libc #defines errno to something different it
seems. Maybe the solution is to have a __builtin_errno_addr() or
something and change sys headers to use it.

We now have a unified infrastructure for writing profile-guided
transformations, which will work either at offline-compile-time or in the JIT,
but we don't have many transformations. We would welcome new profile-guided
transformations as well as improvements to the current profiling system.

Ideas for profile-guided transformations:

Superblock formation (with many optimizations)

Loop unrolling/peeling

Profile directed inlining

Code layout

...

Improvements to the existing support:

The current block and edge profiling code that gets inserted is very simple
and inefficient. Through the use of control-dependence information, many fewer
counters could be inserted into the code. Also, if the execution count of a
loop is known to be a compile-time or runtime constant, all of the counters in
the loop could be avoided.

You could implement one of the "static profiling" algorithms which analyze a
piece of code an make educated guesses about the relative execution frequencies
of various parts of the code.

You could add path profiling support, or adapt the existing LLVM path
profiling code to work with the generic profiling interfaces.

LLVM aggressively optimizes for performance, but does not yet optimize for code size.
With a new ARM backend, there is increasing interest in using LLVM for embedded systems
where code size is more of an issue.

Someone interested in working on implementing code compaction in LLVM might want to read
this article, describing using
link-time optimizations for code size optimization.

Generalize target-specific backend passes that could be target-independent,
by adding necessary target hooks and making sure all IR/MI features (such as
register masks and predicated instructions) are properly handled. Enable these
for other targets where doing so is demonstrably beneficial.
For example:

lib/Target/Hexagon/RDF*

lib/Target/AArch64/AArch64AddressTypePromotion.cpp

Merge the delay slot filling logic that is duplicated into (at least)
the Sparc and Mips backends into a single target independent pass.
Likewise, the branch shortening logic in several targets should be merged
together into one pass.

Implement 'stack slot coloring' to allocate two frame indexes to the same
stack offset if their live ranges don't overlap. This can reuse a bunch of
analysis machinery from LiveIntervals. Making the stack smaller is good
for cache use and very important on targets where loads have limited
displacement like ppc, thumb, mips, sparc, etc. This should be done as
a pass before prolog epilog insertion. This is now done for register
allocator temporaries, but not for allocas.

Implement 'shrink wrapping', which is the intelligent placement of callee
saved register save/restores. Right now PrologEpilogInsertion always saves
every (modified) callee save reg in the prolog and restores it in the
epilog. However, some paths through a function (e.g. an early exit) may
not use all regs. Sinking the save down the CFG avoids useless work on
these paths. Work has started on this, please inquire on llvm-dev.

Implement interprocedural register allocation. The CallGraphSCCPass can be
used to implement a bottom-up analysis that will determine the *actual*
registers clobbered by a function. Use the pass to fine tune register usage
in callers based on *actual* registers used by the callee.

Add support for 16-bit x86 assembly and real mode to the assembler and
disassembler, for use by BIOS code. This includes both 16-bit instruction
encodings as well as privileged instructions (lgdt, lldt, ltr, lmsw, clts,
invd, invlpg, wbinvd, hlt, rdmsr, wrmsr, rdpmc, rdtsc) and the control and
debug registers.

Port the Bigloo
Scheme compiler, from Manuel Serrano at INRIA Sophia-Antipolis, to
output LLVM bytecode. It seems that it can already output .NET
bytecode, JVM bytecode, and C, so LLVM would ostensibly be another good
candidate.

Write a new frontend for some other language (Java? OCaml? Forth?)

Random test vector generator: Use a C grammar to generate random C code,
e.g., quest;
run it through llvm-gcc, then run a random set of passes on it using opt.
Try to crash opt. When
opt crashes, use bugpoint to reduce the
test case and post it to a website or mailing list. Repeat ad infinitum.

Port Valgrind to use LLVM code generation
and optimization passes instead of its own.

Write LLVM IR level debugger (extend Interpreter?)

Write an LLVM Superoptimizer. It would be interesting to take ideas from
this superoptimizer for x86:
paper #1 and paper #2 and adapt them to run on LLVM code.

It would seem that operating on LLVM code would save a lot of time
because its semantics are much simpler than x86. The cost of operating
on LLVM is that target-specific tricks would be missed.

The outcome would be a new LLVM pass that subsumes at least the
instruction combiner, and probably a few other passes as well. Benefits
would include not missing cases missed by the current combiner and also
more easily adapting to changes in the LLVM IR.

All previous superoptimizers have worked on linear sequences of code.
It would seem much better to operate on small subgraphs of the program
dependency graph.

In addition to projects that enhance the existing LLVM infrastructure, there
are projects that improve software that uses, but is not included with, the
LLVM compiler infrastructure. These projects include open-source software
projects and research projects that use LLVM. Like projects that enhance the
core LLVM infrastructure, these projects are often challenging and rewarding.

At least one project (and probably more) needs to use analysis information
(such as call graph analysis) from within a MachineFunctionPass. However,
most analysis passes operate at the LLVM IR level. In some cases, a value
(e.g., a function pointer) cannot be mapped from the MachineInstr level back
to the LLVM IR level reliably, making the use of existing LLVM analysis
passes from within a MachineFunctionPass impossible (or at least brittle).

This project is to encode analysis information from the LLVM IR level into
the MachineInstr IR when it is generated so that it is available to a
MachineFunctionPass. The exemplar is call graph analysis (useful for
control-flow integrity instrumentation, analysis of code reuse defenses, and
gadget compilers); however, other LLVM analyses may be useful.

Implement an on-demand function relocator in the LLVM JIT. This can help
improve code locality using runtime profiling information. The idea is to use
a relocation table for every function. The relocation entries need to be
updated upon every function relocation (take a look at
this article).
A (per-function) basic block reordering would be a useful extension.

Slimmer is a prototype tool, built using LLVM, that uses dynamic analysis to
find potential performance bugs in programs. Development on Slimmer started
during Google Summer of Code in 2015 and resulted in an initial prototype,
but evaluation of the prototype and improvements to make it portable and
robust are still needed. This project would have a student pick up and
finish the Slimmer work. The source code of Slimmer and
its current documentation can be found at its
Github web page.