fcdAn optimizing decompilerhttps://zneak.github.io/fcd/
Wed, 29 Mar 2017 07:13:05 +0000Wed, 29 Mar 2017 07:13:05 +0000Jekyll v3.4.3Helping Johnny to Analyze Malware: Part 1<p>When I presented at CSAW, Assistant Professor <a href="https://twitter.com/moyix">Brendan Dolan-Gavitt</a> told me that the authors of the <a href="https://www.internetsociety.org/sites/default/files/11_4_2.pdf"><em>No More Gotos</em> paper</a> released a second paper that describes enhancements that they implemented in their Dream decompiler (now called Dream++). The paper, <a href="https://net.cs.uni-bonn.de/fileadmin/ag/martini/Staff/yakdan/dream_oakland2016.pdf">Helping Johnny to Analyze Malware</a>, explains how some improvements were implemented, and pits the Hex-Rays decompiler against Dream++ in a comparative study.</p>
<p>It took me a while to get to it (basically, until I finally decided to push off type reconstruction again–sorry 😞), and a handful of ideas aren’t directly applicable to fcd. For instance, the code query and transformation tool that they describe seems to be a large amount of work for relatively little benefit considering that fcd already allows users to load in LLVM passes to execute. Of course, their system allows some simplifications that fcd is not superbly good at currently, but fcd still has more fundamental issues than “man, I wish that this <code class="highlighter-rouge">result = a &gt; b ? a : b</code> statement turned into <code class="highlighter-rouge">result = max(a, b)</code>”.</p>
<p>Two things stood out as very interesting, however. First, they introduce more loop transforms that help make loops better; second, they discuss transforms related to variable congruence. In addition, the part about making loops better inspired me to fix some long-standing problems with loop restructuration in fcd. Together, these improvements help make fcd’s output actually look great in many cases.</p>
<p>Since I tend to write long posts, this will only cover variable congruence. I’ll do loops–again–some other time.</p>
<h2 id="the-running-example">The Running Example</h2>
<p>With <a href="https://twitter.com/jeffreycrowell/status/835986452496297984">Jeff Crowell’s permission</a>, I pulled his Boston Key Party 2017’s <code class="highlighter-rouge">hiddensc</code> challenge into fcd’s test repository. Thanks, Jeff! It contains a short but interestingly demonstrative function that probably looked like this:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">unsigned</span> <span class="kt">long</span> <span class="nf">rand64</span><span class="p">()</span>
<span class="p">{</span>
<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">result</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">64</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">result</span> <span class="o">&lt;&lt;=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">result</span> <span class="o">|=</span> <span class="n">rand</span><span class="p">()</span> <span class="o">&amp;</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">result</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<h1 id="the-initial-state">The Initial State</h1>
<p>Before I got into any of this, fcd often did a poor job with loops and variables. Here’s the output as of March 18<sup>th</sup>:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">uint64_t</span> <span class="nf">rand64</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="n">arg0</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint64_t</span> <span class="n">phi3</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">phi4</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">phi_in1</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">phi_in2</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">do</span>
<span class="p">{</span>
<span class="n">phi3</span> <span class="o">=</span> <span class="n">phi_in1</span><span class="p">;</span>
<span class="n">phi4</span> <span class="o">=</span> <span class="n">phi_in2</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">phi4</span> <span class="o">&lt;</span> <span class="mi">64</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint32_t</span> <span class="n">anon5</span> <span class="o">=</span> <span class="n">rand</span><span class="p">();</span>
<span class="n">phi_in1</span> <span class="o">=</span> <span class="p">(</span><span class="n">__zext</span> <span class="kt">uint64_t</span><span class="p">)(</span><span class="n">anon5</span> <span class="o">&amp;</span> <span class="mi">1</span><span class="p">)</span> <span class="o">|</span> <span class="n">phi3</span> <span class="o">&lt;&lt;</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">phi_in2</span> <span class="o">=</span> <span class="n">phi4</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">while</span> <span class="p">(</span><span class="n">phi4</span> <span class="o">&lt;</span> <span class="mi">64</span><span class="p">);</span>
<span class="k">return</span> <span class="n">phi3</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>Don’t like it much? Can’t blame you. The output is at least twice as long, it indents deeper, and it went from using two variables to using 5.</p>
<p>There are some obvious improvements that can be made. First, we can get the number of variables down a bit. We can see that <code class="highlighter-rouge">phi3</code> takes its value from <code class="highlighter-rouge">phi_in1</code>, and looking at the code, it’s clear that we don’t really need two variables for this. How do you formalize it, though? Thankfully, Khaled Yakdan <em>et al.</em> have a solution for us.</p>
<h2 id="congruence-analysis-of-variable">Congruence analysis of variable</h2>
<p>The <em>Johnny</em> paper, as I’ll call it from now on, describes a technique called <em>congruence analysis</em> to identify variables that really want to represent the same value. The technique is perfectly applicable to variables like the synthesized Φ variables that fcd creates, and it produces satisfying results. In a few lines, the rules go as follow:</p>
<ul>
<li>You can only merge variables that have the exact same type (for instance, <em>not</em> <code class="highlighter-rouge">int</code> and <code class="highlighter-rouge">short</code>).</li>
<li>You can only merge variables that are assigned to one another.</li>
<li>You can only merge variables whose definitions do not <em>interfere</em>.</li>
</ul>
<p>The first two points are self-explanatory, but the third one could benefit from additional explanation. In this context, two variable definitions <em>interfere</em> when both variables are read sequentially with different values. For instance, in this very simple example:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">a</span> <span class="o">=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">b</span> <span class="o">=</span> <span class="mi">5</span><span class="p">;</span>
<span class="n">foo</span><span class="p">(</span><span class="n">a</span><span class="p">);</span>
<span class="n">foo</span><span class="p">(</span><span class="n">b</span><span class="p">);</span>
<span class="n">a</span> <span class="o">=</span> <span class="n">b</span><span class="p">;</span></code></pre></figure>
<p>Even though <code class="highlighter-rouge">b = a</code> in the end, <code class="highlighter-rouge">a</code> and <code class="highlighter-rouge">b</code> are read with different values at <code class="highlighter-rouge">foo(a)</code> and <code class="highlighter-rouge">foo(b)</code>, so the variable aren’t congruent and we can’t simplify this example to use just one variable.</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">a</span> <span class="o">=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">b</span> <span class="o">=</span> <span class="mi">5</span><span class="p">;</span>
<span class="n">foo</span><span class="p">(</span><span class="n">b</span><span class="p">);</span>
<span class="n">a</span> <span class="o">=</span> <span class="n">b</span><span class="p">;</span></code></pre></figure>
<p>Assuming that <code class="highlighter-rouge">foo(b)</code> doesn’t modify either <code class="highlighter-rouge">a</code> or <code class="highlighter-rouge">b</code> through some freak global pointer or any other design faux pas (to be clear, this is just an example and fcd wouldn’t dare to make assumptions about aliasing), then now, it’s safe to have a single variable for every manipulation. You would merely remove the assignments to <code class="highlighter-rouge">b</code>, and use <code class="highlighter-rouge">a</code> everywhere it’s used:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">a</span> <span class="o">=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">a</span> <span class="o">=</span> <span class="mi">5</span><span class="p">;</span>
<span class="n">foo</span><span class="p">(</span><span class="n">a</span><span class="p">);</span></code></pre></figure>
<p>To make these inferences, the <em>Johnny</em> paper explains that you can use the <em>live range</em> of the variables in question. The live ranges of a variable are the disjoint ranges of instructions starting from definitions to the last use of the variable before its next definition.</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">a</span> <span class="o">=</span> <span class="mi">4</span><span class="p">;</span> <span class="c1">// definition one of a. a is live
</span><span class="n">foo</span><span class="p">(</span><span class="n">b</span><span class="p">);</span> <span class="c1">// a is live
</span><span class="n">foo</span><span class="p">(</span><span class="n">a</span><span class="p">);</span> <span class="c1">// last use of a's definition one. a is live
</span><span class="n">foo</span><span class="p">(</span><span class="n">c</span><span class="p">);</span> <span class="c1">// a is *dead*
</span><span class="n">a</span> <span class="o">=</span> <span class="mi">5</span><span class="p">;</span> <span class="o">//</span> <span class="n">new</span> <span class="n">definition</span> <span class="n">of</span> <span class="n">a</span><span class="p">.</span> <span class="n">a</span> <span class="n">is</span> <span class="n">live</span> <span class="p">(</span><span class="n">again</span><span class="p">)</span></code></pre></figure>
<p><em>Johnny</em> uses this property to determine if two variables are interference-free. In fact, the property of being interference-free is defined to be whether, with variables <code class="highlighter-rouge">a</code> and <code class="highlighter-rouge">b</code> again, <code class="highlighter-rouge">a</code> is assigned a value (other than <code class="highlighter-rouge">b</code>) in an instruction that is part of <code class="highlighter-rouge">b</code>’s live range, or vice-versa. If <code class="highlighter-rouge">a</code> is free of interference with <code class="highlighter-rouge">b</code>, and <code class="highlighter-rouge">b</code> is free of interference is <code class="highlighter-rouge">a</code>, then the two variables are said to be congruent, and should be merged.</p>
<p>This is currently implemented in <a href="https://github.com/zneak/fcd/blob/d41571c19f1fa610f348d8c60646215c7ccebc8a/fcd/ast/pass_congruence.cpp"><code class="highlighter-rouge">pass_congruence.cpp</code></a>. It doesn’t solve the unnecessarily complex loop structure of the running example, but it does make things better:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">uint64_t</span> <span class="nf">rand64</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="n">arg0</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint64_t</span> <span class="n">phi1</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">phi2</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">do</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">phi2</span> <span class="o">&lt;</span> <span class="mi">64</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint32_t</span> <span class="n">anon5</span> <span class="o">=</span> <span class="n">rand</span><span class="p">();</span>
<span class="n">phi1</span> <span class="o">=</span> <span class="p">(</span><span class="n">__zext</span> <span class="kt">uint64_t</span><span class="p">)(</span><span class="n">anon5</span> <span class="o">&amp;</span> <span class="mi">1</span><span class="p">)</span> <span class="o">|</span> <span class="n">phi1</span> <span class="o">&lt;&lt;</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">phi2</span> <span class="o">=</span> <span class="n">phi2</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">while</span> <span class="p">(</span><span class="n">phi2</span> <span class="o">&lt;</span> <span class="mi">64</span><span class="p">);</span>
<span class="k">return</span> <span class="n">phi1</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p><code class="highlighter-rouge">phi_in3</code> and <code class="highlighter-rouge">phi_in4</code> are gone, which is great. <code class="highlighter-rouge">anon5</code> is still around because fcd hates to (re)move statements that have observable side-effects, though this has also been alleviated: expressions with side-effects can be moved up until to the next expression with side-effects, and in this case this allows us to remove the <code class="highlighter-rouge">anon5</code> variable and put <code class="highlighter-rouge">rand()</code> directly in the <code class="highlighter-rouge">phi1</code> assignment.</p>
<h2 id="the-future-more-loop-simplifications">The Future: more loop simplifications</h2>
<p>The latest and greatest version of fcd also performs some loop simplifications. Currently, the output for the function has been reduced to a single, fairly nice loop:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">uint64_t</span> <span class="nf">rand64</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="n">arg0</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint64_t</span> <span class="n">phi1</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">phi2</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span><span class="n">phi2</span> <span class="o">&lt;</span> <span class="mi">64</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">phi1</span> <span class="o">=</span> <span class="p">(</span><span class="n">__zext</span> <span class="kt">uint64_t</span><span class="p">)(</span><span class="n">rand</span><span class="p">()</span> <span class="o">&amp;</span> <span class="mi">1</span><span class="p">)</span> <span class="o">|</span> <span class="n">phi1</span> <span class="o">&lt;&lt;</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">phi2</span> <span class="o">=</span> <span class="n">phi2</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">phi1</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>This will be the topic of another post in due time, when some more of the suggestions in <em>Johnny</em> have been implemented.</p>
Wed, 29 Mar 2017 00:00:00 +0000https://zneak.github.io/fcd/2017/03/29/johnny-congruence.html
https://zneak.github.io/fcd/2017/03/29/johnny-congruence.htmlHow do compilers optimize divisions?<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
<p>A fun part of writing a decompiler is trying to figure out how a compiler got from point A to point B. Compilers are known to use every last trick in the book to make things just marginally faster. Sometimes, compilers are just a bit clever: for instance, Clang can codegen a switch statement with lots of disjoint cases as a binary search over the cases to get better average- and worst-case number of comparisons. Anybody reading the code can see how it made sense to do that. Sometimes, however, compilers get <em>really really</em> clever and figuring out what happened is not a straightforward process.</p>
<p>One example of a non-obvious optimization is what compilers do for divisions by a constant. Divisions are still hard, and compilers hate to emit a <code class="highlighter-rouge">div</code> or <code class="highlighter-rouge">idiv</code> if they know one side of the equation. The resulting code, however, is puzzling.</p>
<iframe class="godbolt" src="https://godbolt.org/e#compiler:clang391,filters:'colouriseAsm,labels,directives,commentOnly,intel',options:'-O3',k:30,source:'unsigned+udiv19%28unsigned+arg%29+%7B%0A%09return+arg+%2F+19%3B%0A%7D'"></iframe>
<p>Up to very recently, fcd wouldn’t see the division in there and would produce output similar to the following:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">uint32_t</span> <span class="nf">udiv19</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">arg0</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint64_t</span> <span class="n">anon1</span> <span class="o">=</span> <span class="p">(</span><span class="n">__zext</span> <span class="kt">uint64_t</span><span class="p">)</span><span class="n">arg0</span> <span class="o">*</span> <span class="mi">2938661835</span> <span class="o">&gt;&gt;</span> <span class="mi">32</span><span class="p">;</span>
<span class="k">return</span> <span class="p">(</span><span class="kt">uint32_t</span><span class="p">)(</span><span class="n">anon1</span> <span class="o">+</span> <span class="p">((</span><span class="n">__zext</span> <span class="kt">uint64_t</span><span class="p">)(</span><span class="n">arg0</span> <span class="o">-</span> <span class="p">(</span><span class="kt">uint32_t</span><span class="p">)</span><span class="n">anon1</span><span class="p">)</span> <span class="o">&gt;&gt;</span> <span class="mi">1</span><span class="p">)</span> <span class="o">&gt;&gt;</span> <span class="mi">4</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mh">0x0fffffff</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<h2 id="what-happened">What happened?!</h2>
<p>This doesn’t quite look like a division by 19. Of course, that’s a problem for fcd, because we want to help people make sense of what they’re looking at.</p>
<p>Generally speaking, you can’t divide without dividing. What happened here is that Clang (and most other compilers up at Matt Godbolt’s <a href="https://godbolt.org/">Compiler Explorer</a>) simply resort to using the one and only type of division that computers are really good at: divisions by a power of two. There’s no <code class="highlighter-rouge">div</code> instruction in sight, but we do have right shifts. In fact, we can rewrite this code as a questionably better-looking mathematical expression:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
a \cdot \frac{1}{19} &\approx
\frac{a \cdot \frac{2938661835}{2^{32}} +
\frac{a - a \cdot \frac{2938661835}{2^{32}}}{2^1}}{2^4} \\\
a \cdot \frac{1}{19} &\approx
\left(
a \cdot 2938661835 \cdot 2^{-32} +
\left( a - a \cdot 2938661835 \cdot 2^{-32} \right) \cdot 2^{-1}
\right)
\cdot 2^{-4} \\\
a \cdot \frac{1}{19} &\approx
a \cdot \frac{7233629131}{137438953472} \\\
\end{align*} %]]></script>
<p>(The <code class="highlighter-rouge">&amp; 0x0fffffff</code> that fcd shows and that we ignored here is a leftover from the earlier phases of decompilation, where every 32-bit integer is represented as a 64-bit integer masked with <code class="highlighter-rouge">0xffffffff</code>. Most of these masks go away, but in this specific case, the mask was combined with the <code class="highlighter-rouge">&gt;&gt; 4</code> and the result isn’t obvious enough to let fcd get rid of it outright.)</p>
<p>The result of 137438953472 / 7233629131 is 18.999999997649866. In other words, the compiler merely found a big factor (2938661835) with which you could relatively easily compose divisions by powers of two until you’d <em>almost</em> get a division by 19.</p>
<p>How close is close enough? Given that we divide an unsigned 32-bit number, this has to be accurate for integers up to 4294967295. 18.999999997649866 is not quite 19, and this is integer division, so the denominator needs to be <code class="highlighter-rouge">ceil</code>ed to be accurate. We get an error margin of <code class="highlighter-rouge">ceil(denom) - denom</code>, and we want to know when enough error has accumulated that we’ll get a difference of 1. If that point happens further than 4294967295 / 19, then the approximation is valid for every integer in our division domain. We need to check for this:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
\frac{1}{19 - 18.999999997649866} &\ge \frac{4294967295}{19} \\\
425507595.8885449 &\ge 226050910.2631579
\end{align*} %]]></script>
<p>It holds, meaning that the approximation is accurate for any unsigned 32-bit integer that we can throw at it. Yay! This verification is important, because we wouldn’t want to transform something that is not really a division into a division.</p>
<h2 id="generalizing">Generalizing</h2>
<p>Clang generates 5 or 6 different patterns to divide by (or get the remainder of division with) constant integers. Interestingly enough, there is little to no variety in how signed division works, but there are several distinguishable patterns with unsigned division.</p>
<p>For the pattern that was just discussed, as a lazy person, I just threw the formula in <a href="http://www.wolframalpha.com">Wolfram|Alpha</a> and asked it to isolate the denominator in it:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
\frac{a}{D} &\approx \frac{\frac{aC}{2^{X}} + \frac{a - \frac{aC}{2^{X}}}{2^Y}}{2^Z}
\end{align*} %]]></script>
<p>Where <script type="math/tex">a</script> is the numerator (which is variable), <script type="math/tex">D</script> is the denominator (which we want to find), <script type="math/tex">C</script> is the large multiplier constant, and <script type="math/tex">X</script>, <script type="math/tex">Y</script> and <script type="math/tex">Z</script> are the different exponents of two that are used for right shifts. It came back with this:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
D &\approx \frac{2^{X+Y+Z}}{C \cdot \left(2^Y-1\right)+2^Z}
\end{align*} %]]></script>
<p>This formula is easy enough to plug into fcd, and the verification code is equally easy to use.</p>
<p>I didn’t find a lot of documentation about this optimization. <a href="http://reverseengineering.stackexchange.com/questions/1397/how-can-i-reverse-optimized-integer-division-modulo-by-constant-operations">RE.SE</a> has two answers to this question; the most upvoted one uses an example to show how you could come up with these numbers and covers one case the hard way. <a href="https://blogs.msdn.microsoft.com/devdev/2005/12/12/integer-division-by-constants/">Compiler people</a> have better information on how to come up with these numbers, but this is somewhat beyond what I’m interested in for fcd’s purposes.</p>
<p>As of this time, fcd gets signed division and remainder right, unsigned division right, and <em>some</em> of unsigned remainder. It turns out that the unsigned remainder operation does a few weird things that I’m not sure how to interpret, and unfortunately, my lazy options are more limited as Wolfram|Alpha has trouble understanding modulos (or, more probably, I have trouble expressing what I want to do in its language). Still, I’m happy to report that <code class="highlighter-rouge">udiv19</code> now decompiles as <code class="highlighter-rouge">arg / 19</code>.</p>
Sun, 19 Feb 2017 00:00:00 +0000https://zneak.github.io/fcd/2017/02/19/divisions.html
https://zneak.github.io/fcd/2017/02/19/divisions.htmlLLVM 4.0, Travis CI, type reconstruction–oh my!<p>Well, it’s been <a href="/2016/12/07/type-inference.html">two months since I said</a> that I’d have something to show for type recovery after one month. I’m afraid that I’m still not quite there, but I do have some things to discuss. And fortunately, things haven’t been at a complete standstill either: LLVM is releasing a stable 4.0 in a few weeks, and fcd is getting continuous integration.</p>
<h1 id="llvm-40">LLVM 4.0</h1>
<p>As expected around this time of the year, LLVM has a new stable release coming up. I like to pick up the release candidates because fcd does unusual things with LLVM IR (centered around the fact that it goes from machine code to source instead of the other way around), and I want to make sure that I won’t be locked
on the old version for a few months because there’s a bug in the new version that only impacts fcd and that prevents me from picking it up. (Testing 3.9 <a href="http://lists.llvm.org/pipermail/llvm-dev/2016-August/104018.html">did find such a regression</a>.)</p>
<p>With 4.0, LLVM is inching ever closer to removing the element type from pointer types, and have a single <code class="highlighter-rouge">*</code> pointer type. Between 3.9 and 4.0, <code class="highlighter-rouge">PointerType</code> stopped being a <code class="highlighter-rouge">SequentialType</code>. This is probably most impactful when you generate <code class="highlighter-rouge">getelementptr</code> instructions by hand, and thankfully, fcd doesn’t need to do much code generation by hand.</p>
<p>While there wasn’t too much trouble with the new release around this, I expect that it will become a problem in the future, especially for type recovery. External function calls provide very valuable type information, but fcd largely relies on function signatures having typed pointers to benefit from them. While <code class="highlighter-rouge">PointerType</code> no longer inherits from <code class="highlighter-rouge">SequentialType</code>, it gained a <code class="highlighter-rouge">getElementType()</code> method that does the same thing the one that <code class="highlighter-rouge">SequentialType</code> gave it. I expect this method to disappear in a not-so-distant future. This means that fcd will need to figure something out to carry the information over from C headers to LLVM function declarations.</p>
<p>On its end, Clang brought microscopic improvements to memory management in the tiny API surface that fcd uses. The single noticed improvement is that <code class="highlighter-rouge">ClangInvocation</code> now uses <code class="highlighter-rouge">unique_ptr</code>/<code class="highlighter-rouge">shared_ptr</code> instead of an <code class="highlighter-rouge">IntrusiveRefCntPtr</code> in front-facing APIs. In all fairness, fcd’s use of Clang is superficial enough that I’m probably just missing on all the goodness.</p>
<p>As for improvements, what we’re seeing is about on par with what I expected out of an already very mature compiler framework: not much of a change for the purposes of decompiling. The output is usually exactly the same size; frequently a tiny bit smaller; sometimes a tiny bit bigger. There could be some cool new things waiting to be used; I’ll have to look into that.</p>
<h1 id="continuous-integration-with-travis">Continuous Integration with Travis</h1>
<p>I’ve recently flipped the switch to build fcd with Travis. I finally bit the bait when prompted about it on <a href="https://github.com/zneak/fcd/pull/17">Trass3r’s pull request on the CMake files</a>.</p>
<p>I had never used Travis before, so this was quite the new experience. Overall, I guess that it’s hard to beat free, but I can’t say that I’m super impressed with the build environment in the context of native programs. My feeling is that Travis was built with Web applications in mind.</p>
<p>Travis gives us either Ubuntu 12.04 (Precise) or 14.04 (Trusty), which are respectively 5 years old and 3 years old distributions. Capstone has packages for neither. The installation of LLVM 3.9 (soon-to-be 4) is non-obvious as well. The standard libraries are outdated and newer versions need to be grabbed from added repositories. The macOS build works, but the environment frequently take over 45 minutes to prop up.</p>
<p>And of course, testing scripts is made more complex by the fact that the only real way to test continuous integration is to commit changes. In my short time using Travis, I haven’t found a reliable way to reproduce their environment locally.</p>
<p>But enough negativity. Beyond answering the simple “does this build” question, through its shell script capabilities, Travis also lets us run automated tests and put results somewhere. I created the <a href="https://github.com/zneak/fcd-tests">fcd-tests</a> repository, where I store my usual test suite of little programs, and I have Travis clone it, decompile every one of them, save the output, and push it back to Github. This afford some relative peace of mind, and hopefully will help figure it out faster when something breaks. Yay!</p>
<p>Still, this isn’t a perfect solution. The main reason that I’ve held off on continuous integration for so long is fcd’s nebulous criteria for success. Of course, crashing is a failure, but what about mis-decompilations? There is currently nothing in place that will tell you if fcd’s output is blatantly wrong. At best, checking in results to a Git repository means that once an issue is discovered, it <em>could</em> be doable to go back in time and find the first revision that exhibits the problem.</p>
<p>Comparing two output revisions is possible, but fcd has non-deterministic output. As <a href="https://github.com/zneak/fcd/blob/c0c14509e458e0e60d19c44a08fe21fcafb790c2/FUTURE.md#stabilize-output-across-runs">outlined in some version of fcd’s wishlist file</a>, this is most likely caused by <code class="highlighter-rouge">unoredered_map</code>s and <code class="highlighter-rouge">unordered_set</code>s using ASLR’d pointers as keys. I find that the output is fairly stable within basic blocks and on control flow graphs that don’t have cycles, but functions with loops tend to look very different from one run to the other.</p>
<p>Another obvious consequence of this move is that it allows anyone to look at fcd’s output. This is equal parts great and scary. It possibly raises awareness about the project and lets people know what they’re getting into. It makes it simpler to identify low-hanging fruits that could be picked up by newer contributors. However, it exposes all the things that I’m self-conscious of and want to fix at some point but haven’t fixed yet and that gives me shipping anxiety.</p>
<p>Looking back, using IOCCC programs as test cases for a decompiler is a fun idea, but the reality is that they commonly use very brutal gotos and other <a href="https://en.wikipedia.org/wiki/Duff's_device">weird constructs</a>. That commonly results in garbage control flow graphs. The problem with these is that they’re not representative of the real world, and fcd doesn’t always do a great job on them, so it probably looks worse than it would look on programs written by humans (as opposed to programs written by <a href="http://www.ioccc.org/winners.html">monsters</a>).</p>
<p>One thing that I might/should start looking into is integrating old CTF binaries in this testing pipeline. CTF hosts, let me know if I can take your stuff!</p>
<h1 id="type-reconstruction">Type Reconstruction</h1>
<p>Progress on type reconstruction has been going at a baby steps pace, but as a wise man once said, baby steps are just as good as adult steps if the fact that there are no venture capitalists to pressure you means that you can make enough of them.</p>
<p>The 10,000-meters high view (that’s over 30,000ft!) is that fcd uses some constraint solving over the whole program to figure out values that are pointers, and it organizes them in hierarchical tree structures. For instance, it determines that some value A is a pointer, and that this value A + 16 is also a pointer, and makes this a logical descendant of A. A second step (that does not do constraint solving at the moment) is run over the output of this first step, and wrangles this tree-like structure in somewhat flat records, also across the whole program.</p>
<p>As somewhat of a novelty, the dominator tree will be used as a heuristic to distinguish subtypes, but as this post is getting quite long, I’m keeping that one for later. Sorry for everyone who hoped that I’d have a true update on type reconstruction! I don’t really.</p>
<p>But I swear, I’m working on it.</p>
Fri, 27 Jan 2017 00:00:00 +0000https://zneak.github.io/fcd/2017/01/27/llvm-travis.html
https://zneak.github.io/fcd/2017/01/27/llvm-travis.htmlKicking off the holidays with type recovery<style>
.small {
font-size: 9pt;
opacity: 0.6;
}
</style>
<p>Last time, I <a href="/2016/11/25/revisiting-regions.html">wrote</a> that fcd frequently hung or crashed because of problems in three broad categories:</p>
<ul>
<li>loop structurization;</li>
<li>complex reaching conditions;</li>
<li>stack frame recovery.</li>
</ul>
<p>Since then, the update to structurization has landed in the master branch, which solved every known occurrence of crashes in the first two categories (though, granted, my test set is still rather limited). It’s not mission accomplished yet, as there are still areas where structurization needs to improve. For instance, <a href="https://github.com/zneak/fcd/issues/33">nested loops are collapsed into one big fat loop</a>. I’m also aware that the No More Gotos authors have published a follow-up paper, <em><a href="https://net.cs.uni-bonn.de/fileadmin/ag/martini/Staff/yakdan/dream_oakland2016.pdf">Helping Johnny to Analyze Malware</a></em>, that I have only quickly glanced over.</p>
<p>Still, stability was a clear winner in this merge. As a consequence of this improvement, stack frame recovery became fcd’s number one cause of failure.</p>
<p>The recovery of the stack frame is a special case of the type recovery problem. Recovering the variables of a stack frame is a simpler version of the problem of recovering the fields of a structure. Another special case is the recovery of global variables. Fcd doesn’t try it at the moment, but a good type recovery algorithm could solve that.</p>
<p>Since an algorithm for stack frame recovery was implemented, the end goal has been that it should be replaced with a more general pass at some point in the future. We’re (hopefully) approaching this time, so I’d like to share some observations about how fcd does it.</p>
<p>I initially wanted to make one post for the whole feature, but there is an awful lot of things to cover around type recovery (and a lot of problems to solve), so let’s start with an overview of what’s to do.</p>
<p>As an aside: this problem is frequently called “type inference”, but I strongly prefer “type recovery”. I find that it describes better what we hope to achieve. With modern languages, “inference” has this calmly-glowing aura around it that the compiler/interpreter <em>just knows</em> what you’re talking about when you declare all of these variables without explicitly typing any. Recovery feels more like “use as much duct-tape and WD-40 as you need to make something useful out of this mess”.</p>
<p>There’s also an implementation detail of fcd that deserves recognition on this topic: because of the SSA form, structure-typed LLVM values rarely exist for a long time. The framework will almost always trivially break them up into individual variables before it gets to pseudo-C code generation. What really matters is the type of memory (and thus of pointers), because LLVM can’t systematically break up structures that live in memory, as much as we would benefit if it could. For this reason, this post focuses on the problems of recovering the <code class="highlighter-rouge">Foo</code> in <code class="highlighter-rouge">Foo*</code>.</p>
<h1 id="the-type-problem">The type problem</h1>
<p>Type recovery is easy in a world where things have one unambiguous type. For instance, a structure with four fields can easily be recognized by the analysis of a function that accepts a pointer to it and manipulates every field individually. Unfortunately, the story for most languages is much more complicated than that.</p>
<p>In most native languages, the program could choose to cast a pointer to a different and arbitrary type. For instance, a <code class="highlighter-rouge">uint64_t*</code> can be casted to a <code class="highlighter-rouge">double*</code>. This could cause undefined behavior in some high-level languages like C++, but machine code generally doesn’t have undefined behavior: it has been frozen and the same sequence of operations is expected to have the same result every time, disregarding randomness-as-a-feature modifiers like ASLR.</p>
<p>Most interesting native languages also have some concept of a union, where a variable has one type out of many predefined possibilities. In C, this is just a <code class="highlighter-rouge">union</code>-tagged record. Other languages, like Swift, have discriminated unions, which are basically unions made safe by storing information about which member is being represented.</p>
<p>Subclasses can also be considered a form of unions. In object-oriented languages, it is not excluded that a pointer to an object is, in fact, a pointer to a larger object. For instance, with a base class <code class="highlighter-rouge">Base</code> that has a subclass <code class="highlighter-rouge">Derived</code>, if a function accepts a pointer to a <code class="highlighter-rouge">Base</code>, you could pass in a pointer to a <code class="highlighter-rouge">Derived</code>. The function itself is binary-compatible with either type, as they respect the prefix object layout of <code class="highlighter-rouge">Base</code>.</p>
<p>Things get a little weird when you put multiple inheritance in the picture. When this happens, suddenly, you can’t just take <code class="highlighter-rouge">Derived*</code> bits and expect them to represent valid <code class="highlighter-rouge">Base*</code> bits. The language needs to do pointer adjustment. In this simple example:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">struct</span> <span class="n">Base1</span> <span class="p">{</span> <span class="kt">int</span> <span class="n">foo</span><span class="p">;</span> <span class="p">};</span>
<span class="k">struct</span> <span class="n">Base2</span> <span class="p">{</span> <span class="kt">int</span> <span class="n">bar</span><span class="p">;</span> <span class="p">};</span>
<span class="k">struct</span> <span class="n">Derived</span> <span class="o">:</span> <span class="k">public</span> <span class="n">Base1</span><span class="p">,</span> <span class="n">Base2</span> <span class="p">{</span> <span class="kt">int</span> <span class="n">frob</span><span class="p">;</span> <span class="p">};</span></code></pre></figure>
<p>C++ will allow you to pass a <code class="highlighter-rouge">Derived*</code> to a function that accepts a <code class="highlighter-rouge">Base1*</code> or a <code class="highlighter-rouge">Base2*</code>, but the conversion is not a noop at the machine level. The two base classes, obviously, can’t occupy the same memory location. This is one possible flat representation of the <code class="highlighter-rouge">Derived</code> structure:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">struct</span> <span class="n">Derived</span> <span class="p">{</span> <span class="n">Base1</span> <span class="n">base1</span><span class="p">;</span> <span class="n">Base2</span> <span class="n">base2</span><span class="p">;</span> <span class="kt">int</span> <span class="n">frob</span><span class="p">;</span> <span class="p">};</span></code></pre></figure>
<p>When you pass the <code class="highlighter-rouge">Derived*</code> to a function accepting a <code class="highlighter-rouge">Base2*</code>, the compiler will adjust the pointer and pass the equivalent of <code class="highlighter-rouge">&amp;derived-&gt;base2</code>.</p>
<p>The place where things get <em>really</em> weird, however, is when you have a function that casts the <code class="highlighter-rouge">Base2*</code> to a <code class="highlighter-rouge">Derived*</code>. To do this, the compiler’s only option is to adjust the pointer by a negative amount. This manipulation is generally perceived as scary, and my observation is that it is poorly understood by disassemblers and decompilers.</p>
<p>(This doesn’t account for virtual inheritance. To be honest, I’ve never even looked at how this one is implemented in any ABI.)</p>
<p>Even when you don’t have to adjust pointers, a mere pointer cast can easily throw a wrench in type recovery. Here is another simple C++ hierarchy:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">struct</span> <span class="n">Base</span> <span class="p">{</span> <span class="kt">int</span> <span class="n">type</span><span class="p">;</span> <span class="p">};</span>
<span class="k">struct</span> <span class="n">Derived1</span> <span class="o">:</span> <span class="k">public</span> <span class="n">Base</span> <span class="p">{</span> <span class="kt">double</span> <span class="n">bar</span><span class="p">;</span> <span class="p">};</span>
<span class="k">struct</span> <span class="n">Derived2</span> <span class="o">:</span> <span class="k">public</span> <span class="n">Base</span> <span class="p">{</span> <span class="kt">uint64_t</span> <span class="n">bar</span><span class="p">;</span> <span class="p">};</span></code></pre></figure>
<p>If you have a function that accepts a <code class="highlighter-rouge">Base</code> and casts it to a <code class="highlighter-rouge">Derived1</code> or <code class="highlighter-rouge">Derived2</code> depending on whether <code class="highlighter-rouge">Base::type</code> is 0 or 1, it would be an error for a type recovery engine to pretend that this function operates on a single type. While this is often considered poor object-oriented design (<em>something something</em> virtual methods), it is encouraged in languages that have discriminated unions, or in frameworks where developers are savvy of other programming paradigms.</p>
<p>Another problem is that as compilers and languages become smarter, native programs carry less and less hints of their original data types. For instance, when copying Swift structures, the compiler will happily load the whole thing in a vector register and copy it to another memory location, without any respect for field boundaries. This obviously doesn’t mean that the type was a vector type to begin with; it just happens to be the fastest way to get memory from one place to the other.</p>
<h1 id="the-pointer-problem">The pointer problem</h1>
<p>As the name implies, the high-level objective of type recovery is to recover types. However, you need to be able to identify pointers before you’re able to try to find out what pointers reference. In fcd’s stack frame recovery pass, this is easy: the only pointer that it recognizes is the stack frame pointer, passed as an argument to the function, and tagged with the <code class="highlighter-rouge">fcd.stackptr</code> metadata attribute. (After recovery, this parameter is removed from the function’s signature.) Sadly, the general case is much more complicated.</p>
<ul>
<li>Some pointers are passed in as arguments to a function. This is a generalization of the case that fcd already handles. However, while fcd knows that the stack parameter is <em>always</em> a pointer, it can’t make that assumption about every function parameter.</li>
<li>Some pointer parameters are just <em>part-time pointers</em>. For instance, in the x86_64 System V ABI, a structure like <code class="highlighter-rouge">struct { int type; union { long i; void* p; } value; };</code> passed by value will be spread over argument registers, with <code class="highlighter-rouge">type</code> going in <code class="highlighter-rouge">rdi</code> and <code class="highlighter-rouge">value</code> going in <code class="highlighter-rouge">rsi</code>. This puts <code class="highlighter-rouge">rsi</code> in some tricky superposed state where it is both a pointer and an integer until observed. Of course, this also applies to return values: returning the same structure by value would put <code class="highlighter-rouge">type</code> in <code class="highlighter-rouge">rax</code> and <code class="highlighter-rouge">value</code> in <code class="highlighter-rouge">rdx</code>.
<ul>
<li>This also speaks of another distinct problem: exact argument type recovery is difficult/impossible when structures are broken up.</li>
</ul>
</li>
<li>Global variables are usually referenced by address in a compiled program. This also covers the case of magic memory locations like memory-mapped device registers.</li>
<li>Some pointers are dynamically allocated and are obtained as the result of a function call, using <code class="highlighter-rouge">malloc</code> or another allocation routine.</li>
<li>Some pointers are obtained as an “out” parameter to a function (like <code class="highlighter-rouge">allocateStuff(size, &amp;stuff)</code>, where <code class="highlighter-rouge">stuff</code> is a pointer). Sometimes, the caller of <code class="highlighter-rouge">allocateStuff</code> won’t even try to reference anything inside <code class="highlighter-rouge">stuff</code> after it gets a value (a case that the stack frame recovery pass handles very poorly at the moment).</li>
<li>Some pointers are obtained by offsetting another pointer by a constant or variable amount of memory, as in <code class="highlighter-rouge">a[b]</code>: the pointer <code class="highlighter-rouge">a</code> offset by the integral value <code class="highlighter-rouge">b</code>, times <code class="highlighter-rouge">sizeof *a</code>.</li>
<li>Some pointers are obtained by dereferencing other pointers.</li>
</ul>
<p>Almost all of these cases have “what if”s and “but”s attached to them, making the challenge of identifying pointers at least as important as finding what they point to.</p>
<p>In the general case, it’s not possible to start from a root value and walk down its branches to find every pointers used by a program like the stack frame recovery pass does. The only good way to identify pointers appears to be to find memory instructions in a program and identify their memory operands. Memory instructions here refer to <code class="highlighter-rouge">load</code> and <code class="highlighter-rouge">store</code>, of course, but also to <code class="highlighter-rouge">call</code> instructions, which may accept and return pointers. Since we are doing this in part to figure out the type of function parameters, if the program uses recursive functions, the result of this analysis may end up depending on itself.</p>
<p>Another problem is that to get good results, you almost certainly need to unify the type of different pointers: that is, you assume that two pointers point to the same type of memory. This lets you apply your findings to multiple values at once. The obvious issue is that, for any of the reasons mentioned in this section, compounded with any reason mentioned in the previous section, the assumption could be incorrect. Trying to provide a sound solution to this problem looks like a losing battle, but there will need to be some cutoff or heuristic that is “good enough”.</p>
<h1 id="going-forward">Going forward</h1>
<p>I’m writing this because it helps me put my thoughts in order about what I’m going to have to do. I don’t have anything worth showing at the moment, but I swear, it’s getting there. <span class="small">Hopefully this doesn’t become one of these posts that have no follow-up four years later.</span></p>
<p>An issue with all of this is that type recovery is pretty involved and somewhat of an academic field, which puts me outside of my element as I do not pursue an academic career. In fact, I never even had a class on type theory or compiler principles. I feel that it puts me at a handicap when trying to figure out type inference papers.</p>
<p>I looked around and found two papers that felt like they were promising: the <a href="https://users.ece.cmu.edu/~dbrumley/pdf/Lee,%20Avgerinos,%20Brumley_2011_TIE%20Principled%20Reverse%20Engineering%20of%20Types%20in%20Binary%20Programs.pdf">TIE paper</a> (<a href="https://github.com/BinaryAnalysisPlatform/bap">open-source implementation in OCaml here</a>) and the <a href="http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=6079860">paywalled SmartDec paper</a> (<a href="https://github.com/smartdec/smartdec">open-source implementation that hasn’t been touched since the original commit here</a>).</p>
<p>A problem with the TIE paper is that I’ve never been taught the logical language and several concepts that they use in the paper to describe their inference technique; but from what I understand, it would do a poor job at recovering polymorphic types (where polymorphism is used in its object-oriented sense). The part that I think that I understand is the one about recovering the type of values instead of the type of pointed-to memory, which is less relevant in the LLVM world.</p>
<p>A problem with the SmartDec paper is that they parse RTTI, and failing that, they look at constructors and destructors. RTTI is complex to parse, easy to mess with, ABI-specific, and it only exists in C++ programs when it has not been turned off. Constructors and destructors are only useful if they exist (which is not a given, even in C++: they could be trivial or inlined), and if they can be identified. The approach is too specific for what fcd wants to be.</p>
<p>Confronted with a solution that I don’t understand very well and a solution that I don’t believe in, the best way forward might just be to make something up. But then again, given the nature of the task and my background, it might not. This is going to be exciting.</p>
<p>I still think that I have a decent idea to cover most cases of polymorphism using the dominator tree of a function to draw a line between types that it’s okay to keep un-unified. I still need to do first things first, though, and git gud at pointer discovery. I’ll publish an update with nice figures and a new Git branch when things get more concrete.</p>
Wed, 07 Dec 2016 00:00:00 +0000https://zneak.github.io/fcd/2016/12/07/type-inference.html
https://zneak.github.io/fcd/2016/12/07/type-inference.htmlRevisiting Structurization<style type="text/css">
svg .cfg-node{
fill: #ffffff;
stroke: #85888D;
stroke-width: 5;
stroke-linecap: butt;
stroke-linejoin: miter;
stroke-miterlimit: 4;
}
svg .stroke-black{
fill: none;
stroke: #000000;
stroke-width: 5;
stroke-linecap: butt;
stroke-linejoin: miter;
stroke-miterlimit: 4;
}
svg .translucid {
opacity: 0.25;
}
svg text {
font-size: 42.78px;
font-family: Helvetica, Arial, sans-serif;
font-weight: bold;
fill: #000000;
}
.wide svg {
width: 100%;
}
</style>
<p>From my personal set of test programs (which are mostly <a href="http://www.ioccc.org">IOCCC</a> entries), I can tell three main problems because of which fcd will fail to decompile some program:</p>
<ul>
<li>loop structurization breaks;</li>
<li>complex reaching conditions grind fcd to a halt;</li>
<li>stack frame recovery crashes.</li>
</ul>
<p>As interest into fcd drips in, it’s becoming harder to justify that it doesn’t work all that often. Now that fcd has caught some attention with fun gimmicks that no one else has, it might be time to work on reliability.</p>
<p>The third problem is a type inference problem. Type inference has proved to be a tough nut to crack, so I decided to focus on the other two for now. They both live in the IR-to-AST layer, making it a prime target for enhancements.</p>
<h1 id="the-loop-problem-again">The Loop Problem, again</h1>
<p>The pattern-independent control flow structuring technique, on which fcd is based, needs loops to have precisely one entry and one exit. Ensuring this property, it turns out, is complex. Back when I had the motivation to <a href="/2016/02/24/seseloop.html">do fancy SVG figures for these posts</a>, I made it look easy enough:</p>
<blockquote>
<p>Once you have your entries, your loop members and your exits, you must ensure that there is a single entry node and a single exit node. If there are more than that, the pass creates a “funnel node” (my term) that collects every entering (or exiting, since the same algorithm is used for both cases) edge, creates a Φ node with a different value for every incoming edge, and directs execution to different blocks depending on its value.</p>
</blockquote>
<p>One major problem with this approach at the IR level is that it gravely mangles the dominator tree. Suppose that you have nodes A and B inside a loop, which respectively go out to nodes C and D outside of it (a case of multiple exits). Also assuming that C and D have no other predecessors, it is obvious that A dominates C, and B dominates D: the only way to get to node C is by passing through node A, and the only way to get to node D is by passing through node B.</p>
<figure class="wide">
<svg class="" x="0" y="0" width="540" height="150" viewBox="0 0 540 260" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<defs>
<clipPath id="c0_1"><path d="M128.5,61.5l0,-59l119,0l0,59Z" /></clipPath>
<clipPath id="c1_1"><path d="M3.5,226.5l0,-178l178,0l0,178Z" /></clipPath>
<clipPath id="c2_1"><path d="M128.5,252.5l0,-36l110,0l0,36Z" /></clipPath>
</defs>
<path d="M299.4,39.1c16.3,16.3,16.3,42.7,0,59c-16.2,16.3,-42.7,16.3,-59,0c-16.2,-16.3,-16.2,-42.7,0,-59c16.3,-16.3,42.8,-16.3,59,0Z" class="cfg-node" />
<text x="255" y="83" dx="0">A</text>
<path d="M299.4,177c16.3,16.3,16.3,42.7,0,59c-16.2,16.3,-42.7,16.3,-59,0c-16.2,-16.3,-16.2,-42.7,0,-59c16.3,-16.3,42.8,-16.3,59,0Z" class="cfg-node" />
<text x="255" y="222" dx="0">B</text>
<path d="M433.8,39.1c16.2,16.3,16.2,42.7,0,59c-16.3,16.3,-42.8,16.3,-59,0c-16.3,-16.3,-16.3,-42.7,0,-59c16.2,-16.3,42.7,-16.3,59,0Z" class="cfg-node" />
<text x="389" y="83" dx="0">C</text>
<path d="M433.8,177c16.2,16.3,16.2,42.7,0,59c-16.3,16.3,-42.8,16.3,-59,0c-16.3,-16.3,-16.3,-42.7,0,-59c16.2,-16.3,42.7,-16.3,59,0Z" class="cfg-node" />
<text x="389" y="222" dx="0">D</text>
<path d="M314,68.6c8.8,0,17.6,0,26.3,0l2.4,0" class="stroke-black" />
<path d="M340.3,78.5l19.9,-9.9l-19.9,-10Z" class="g4_1" />
<path d="M448.3,68.4c21.5,0,42.9,-0.1,64.3,-0.1l2.3,0" class="stroke-black" />
<path d="M512.6,78.2l19.8,-10L512.6,58.4Z" class="g4_1" />
<path d="M448.3,206.5c23.5,0,47,0,70.4,0l2.4,0" class="stroke-black" />
<path d="M518.7,216.4l19.9,-9.9l-19.9,-9.9Z" class="g4_1" />
<path d="M342.7,206.5l-2.4,0c-8.7,0,-17.5,0,-26.3,0" class="stroke-black" />
<path d="M340.3,216.4l19.9,-9.9l-19.9,-9.9Z" class="g4_1" />
<path d="M173.8,106c12.2,-4.7,24.4,-9.5,36.6,-14.2l2.2,-0.9" class="stroke-black" />
<path d="M214,101L228.9,84.6L206.8,82.5Z" class="g4_1" />
<path d="M173.8,169.1c12.2,4.7,24.4,9.5,36.6,14.2l2.2,0.9" class="stroke-black" />
<path d="M206.8,192.5l22.1,-2L214,174Z" class="g4_1" />
<g clip-path="url(#c0_1)">
<path d="M243.2,33.6C206.5,-4.9,171.8,-1.9,139.2,42.5l-1.3,2" class="stroke-black translucid" />
</g>
<path d="M130.9,37.1l-0.8,19.2L147.5,48m-8.3,-5.5l-9.1,13.8" class="translucid" />
<g clip-path="url(#c1_1)">
<path d="M152.6,77.8c33,33,33,86.5,0,119.5c-33,33,-86.5,33,-119.5,0c-33,-33,-33,-86.5,0,-119.5c33,-33,86.5,-33,119.5,0Z" class="cfg-node translucid" />
</g>
<text x="33" y="150" dx="0,0,0,0,0,0" class="translucid">(loop)</text>
<g clip-path="url(#c2_1)">
<path d="M235.5,234c-33.9,20.7,-64,19,-90.2,-5.1l-1.6,-1.8" class="stroke-black translucid" />
</g>
<path d="M152.6,222.2l-18.5,-5.5l3.9,18.9m7.3,-6.7L134.1,216.7" class="translucid" />
</svg>
</figure>
<p>Unfortunately, when you stick a funnel node in this control flow graph, you have to direct both A and B to exit to it, and both C and D to succeed it: the domination relationships are broken. This means that without further adjustments, node C cannot reference a value created in block A, because LLVM does not realize that this value is, in fact, guaranteed to exist if we got to node C.</p>
<figure class="wide">
<svg class="" x="0" y="0" width="673" height="150" viewBox="0 0 673 260" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<defs>
<clipPath id="c0_1"><path d="M126.5,59.5l0,-59l120,0l0,59Z" /></clipPath>
<clipPath id="c1_1"><path d="M1.5,224.5l0,-178l179,0l0,178Z" /></clipPath>
<clipPath id="c2_1"><path d="M126.5,250.5l0,-36l110,0l0,36Z" /></clipPath>
</defs>
<path d="M297.9,37.2c16.3,16.3,16.3,42.7,0,59c-16.3,16.3,-42.7,16.3,-59,0c-16.3,-16.3,-16.3,-42.7,0,-59c16.3,-16.3,42.7,-16.3,59,0Z" class="cfg-node" />
<text x="253" y="82" dx="0">A</text>
<path d="M297.9,175.2c16.3,16.3,16.3,42.7,0,59c-16.3,16.3,-42.7,16.3,-59,0c-16.3,-16.3,-16.3,-42.7,0,-59c16.3,-16.3,42.7,-16.3,59,0Z" class="cfg-node" />
<text x="253" y="219" dx="0">B</text>
<path d="M566.5,37.2c16.3,16.3,16.3,42.7,0,59c-16.3,16.3,-42.7,16.3,-59,0c-16.3,-16.3,-16.3,-42.7,0,-59c16.3,-16.3,42.7,-16.3,59,0Z" class="cfg-node" />
<text x="522" y="82" dx="0">C</text>
<path d="M566.5,175.2c16.3,16.3,16.3,42.7,0,59c-16.3,16.3,-42.7,16.3,-59,0c-16.3,-16.3,-16.3,-42.7,0,-59c16.3,-16.3,42.7,-16.3,59,0Z" class="cfg-node" />
<text x="522" y="221" dx="0">D</text>
<path d="M307.6,86.9c12.8,6.5,25.5,13,38.2,19.6l2.1,1.1" class="stroke-black" />
<path d="M341.3,115.3l22.2,0.3L350.4,97.7Z" class="g4_1" />
<path d="M581.1,66.6c21.4,0,42.8,-0.1,64.2,-0.2l2.4,0" class="stroke-black" />
<path d="M645.4,76.4l19.8,-10L645.3,56.5Z" class="g4_1" />
<path d="M581.1,204.7c23.5,0,46.9,0,70.4,0l2.3,0" class="stroke-black" />
<path d="M651.5,214.6l19.8,-9.9l-19.8,-10Z" class="g4_1" />
<path d="M347.9,163.8l-2.1,1.1c-12.7,6.5,-25.4,13.1,-38.2,19.6" class="stroke-black" />
<path d="M350.4,173.7l13.1,-17.9l-22.2,0.3Z" class="g4_1" />
<path d="M172.3,104.2c12.1,-4.8,24.3,-9.5,36.5,-14.3L211,89.1" class="stroke-black" />
<path d="M212.4,99.2L227.3,82.7l-22.1,-2Z" class="g4_1" />
<path d="M172.3,167.2c12.1,4.8,24.3,9.5,36.5,14.3l2.2,0.8" class="stroke-black" />
<path d="M205.2,190.7l22.1,-2L212.4,172.2Z" class="g4_1" />
<g clip-path="url(#c0_1)">
<path d="M241.6,31.7C204.9,-6.7,170.3,-3.8,137.7,40.7l-1.3,2" class="stroke-black translucid" />
</g>
<path d="M129.4,35.2l-0.9,19.3l17.4,-8.3m-8.2,-5.5l-9.2,13.8" class="stroke-black translucid" />
<g clip-path="url(#c1_1)">
<path d="M151.1,76c33,32.9,33,86.4,0,119.4c-33,33,-86.5,33,-119.5,0C-1.3,162.4,-1.3,108.9,31.6,76c33,-33,86.5,-33,119.5,0Z" class="cfg-node translucid" />
</g>
<text x="31" y="149" dx="0,0,0,0,0,0" class="translucid">(loop)</text>
<g clip-path="url(#c2_1)">
<path d="M233.9,232.1c-33.8,20.7,-63.9,19,-90.2,-5l-1.6,-1.8" class="stroke-black translucid" />
</g>
<path d="M151,220.3l-18.5,-5.4l3.9,18.9m7.3,-6.7L132.5,214.9" class="stroke-black translucid" />
<path d="M432.2,106.2c16.3,16.3,16.3,42.7,0,59c-16.3,16.3,-42.7,16.3,-59,0c-16.3,-16.3,-16.3,-42.7,0,-59c16.3,-16.3,42.7,-16.3,59,0Z" class="cfg-node" />
<text x="388" y="151" dx="0">Φ</text>
<path d="M441.9,155.8c12.8,6.6,25.5,13.1,38.2,19.7l2.1,1" class="stroke-black" />
<path d="M475.6,184.3l22.2,0.2L484.7,166.6Z" class="g4_1" />
<path d="M441.9,115.6c12.8,-6.6,25.5,-13.1,38.2,-19.7l2.1,-1" class="stroke-black" />
<path d="M484.7,104.8L497.8,86.9l-22.2,0.2Z" class="g4_1" />
</svg>
<figcaption>Without looking at what's going on in the Φ block, you cannot know that A only goes to C and B only goes to D.</figcaption>
</figure>
<p>(And these were the two figures for today. Thank you for watching.)</p>
<p>In the best-case scenario, this works but you now need a ton of new Φ instructions. As every Φ node causes <em>two</em> variables to be emitted, the proliferation of Φ instructions should be something that we want to avoid. And, of course, in the worst-case scenario, it doesn’t work. Sadly, it didn’t work in a lot of cases; dominating-no-more values would be missed, or back-edge detection would ironically spin into an endless loop.</p>
<h2 id="the-loop-solution">The Loop Solution</h2>
<p>I determined that the simplest solution to these issues would be to ditch LLVM IR entirely at the point where we have to structurize loops. In itself, this is not a huge change: loop structurization was the second-to-last pass to run before creating the AST, with the last pass being a cleanup pass for loop structurization.</p>
<p>What happens now is that fcd creates a new “AST graph” based on the IR basic block graph. The AST graph initially has one node per IR basic block, and contains an AST representation of that basic block. Then, before structurizing it, we ensure that every loop has a single entry and a single exit. This is done by performing a depth-first search on each strongly-connected component of the control flow graph, starting at an arbitrary entry edge. The depth-first search detects back-edges and collects them. The final step is just to take each edge and direct them to a funnel block.</p>
<p>Since this graph deals with AST constructs, which are only loosely safe compared to LLVM IR, there is no need to create any new LLVM Φ node. Funnel blocks do not match any IR block, and a single AST variable is introduced to represent what would have been a Φ node in the IR.</p>
<p>The case where loops have no exit is also important to consider. Loops without exits are a problem because the post-dominator tree algorithm starts its work by looking at a function’s exits and walk up to the entry; if a loop never exits, then the algorithm will never reach it. Previously, this problem was solved using the shotgun approach of adding fake roots to the post-dominator tree in any place that looked like it could be necessary. Now that fcd has a flexible graph that can be modified without impacting the LLVM representation, fcd adds a fake edge going to a fake exit to the loop header. This edge’s reaching condition is <code class="highlighter-rouge">false</code> and as such never appears in decompiled output. This largely harmless change is all that the post-dominator tree builing algorithm needs to be happy again.</p>
<h1 id="the-region-problem-again">The Region Problem, again</h1>
<p>I took the opportunity to revisit my quick choice of doing region detection myself, and try to use the LLVM infrastructure for it. I <a href="/2016/02/17/structuring.html">wrote</a> before:</p>
<blockquote>
<p>Because I didn’t know what I was doing, I eagerly discounted LLVM’s region detection algorithm and ended up writing my own. I now view this as a mistake, and I would eventually like to rework that part of fcd.</p>
</blockquote>
<p>Although not algorithmically or stylistically great, fcd’s region detection code did get the job done. My hope was that I could make the code both algorithmically better and more readable by using LLVM’s region tools this time around.</p>
<p>Unfortunately, even though my reasons to roll out my own region detection code at the time were flawed, it turns out that LLVM’s region code is poorly-suited to this task.</p>
<p>LLVM’s graph tools are meant to work with any kind of graph that you can throw at it. To achieve this, they are templated to the bone; the graph algorithms will work provided that you implement the simple <code class="highlighter-rouge">GraphTraits</code> interface that they use.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">template</span><span class="o">&lt;&gt;</span>
<span class="k">struct</span> <span class="n">llvm</span><span class="o">::</span><span class="n">GraphTraits</span><span class="o">&lt;</span><span class="n">MyGraphType</span><span class="o">*&gt;</span>
<span class="p">{</span>
<span class="k">typedef</span> <span class="n">MyGraphNode</span> <span class="n">NodeType</span><span class="p">;</span>
<span class="k">typedef</span> <span class="n">NodeType</span><span class="o">*</span> <span class="n">NodeRef</span><span class="p">;</span>
<span class="k">typedef</span> <span class="n">MyGraphNodeIterator</span> <span class="n">ChildIteratorType</span><span class="p">;</span>
<span class="k">typedef</span> <span class="n">MyGraphNodeIterator</span> <span class="n">nodes_iterator</span><span class="p">;</span>
<span class="k">static</span> <span class="n">NodeRef</span> <span class="n">getEntryNode</span><span class="p">(</span><span class="n">MyGraphType</span><span class="o">*</span> <span class="n">node</span><span class="p">);</span>
<span class="k">static</span> <span class="n">nodes_iterator</span> <span class="n">nodes_begin</span><span class="p">(</span><span class="n">MyGraphType</span><span class="o">*</span> <span class="n">f</span><span class="p">);</span>
<span class="k">static</span> <span class="n">nodes_iterator</span> <span class="n">nodes_end</span><span class="p">(</span><span class="n">MyGraphType</span><span class="o">*</span> <span class="n">f</span><span class="p">);</span>
<span class="k">static</span> <span class="n">ChildIteratorType</span> <span class="n">child_begin</span><span class="p">(</span><span class="n">NodeRef</span> <span class="n">node</span><span class="p">);</span>
<span class="k">static</span> <span class="n">ChildIteratorType</span> <span class="n">child_end</span><span class="p">(</span><span class="n">NodeRef</span> <span class="n">node</span><span class="p">);</span>
<span class="p">};</span></code></pre></figure>
<p>With just that, you can get node traversal in just about any order that you like for you graph, fast dominator tree calculation, and a lot of other interesting things.</p>
<p>What you <em>don’t</em> get: regions.</p>
<p>LLVM’s <code class="highlighter-rouge">RegionInfoBase</code> base class, which performs all the heavy lifting of finding regions, has a private constructor, a private destructor, and private fields for the analyses that it needs. Its two concrete subclasses are friended into the class definition, and they manipulate these private fields themselves, locking out everyone else for reasons that I can’t quite discern.</p>
<p>Because of the private constructor and destructor, there is no standard-compliant way to inherit from <code class="highlighter-rouge">RegionInfoBase</code> without modifying its definition to either make these members <code class="highlighter-rouge">protected</code> upstream, or sinfully violate the one-definition rule in a way or another. I received <a href="http://lists.llvm.org/pipermail/llvm-dev/2016-November/107372.html">no response</a> when I asked if it was meant to be subclassed on the llvm-dev mailing list.</p>
<p>I went with it anyway, at least to give it a shot. To work around these limitations, I violated ODR in the nastiest way.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="c1">// I know that this is nasty and violates ODR, but I don't know what else
// to do. RegionInfoBase has a private constructor and destructor, which
// makes it impossible to create a subclass that is not friended in. This
// macro is ugly enough that we will most likely know right away if it
// expands in unexpected locations.
</span><span class="k">class</span> <span class="nc">PreAstRegionInfo</span><span class="p">;</span>
<span class="cp">#define MachineRegionInfo MachineRegionInfo; \
friend class ::PreAstRegionInfo
#include &lt;llvm/Analysis/RegionInfo.h&gt;
#undef MachineRegionInfo</span></code></pre></figure>
<p>This macro “friends us in” to <code class="highlighter-rouge">RegionInfo</code>. It “works” because <code class="highlighter-rouge">MachineRegionInfo</code> is used exactly once, when it is friended to <code class="highlighter-rouge">RegionInfo</code>.</p>
<p>Advancing, even using the default implementation of region graph traits turned out to be problematic. Heterogeneous iteration of region members (both regions and basic blocks) using <code class="highlighter-rouge">RegionInfo::element_begin</code> relies on the region’s graph traits, which, for whatever reason, systematically crashed on use. As they are heavily templated and rely on macros, finding out the reason turned out to be more effort than I was interested in expending.</p>
<p>I looked for examples of this in the LLVM codebase. As it turns out, its <em>only</em> <code class="highlighter-rouge">RegionPass</code> is the <code class="highlighter-rouge">StructurizeCFG</code> pass. The <code class="highlighter-rouge">RegionInfoPass</code>, which is an analysis rather than a pass model, is used by a single pass to print regions. When things don’t work as expected, it’s hard to find examples of the right thing to do.</p>
<h2 id="the-good-old-ways">The Good Old Ways</h2>
<p>As a team of one and just a few hours a week to put on the project, I am not particularly interested in breaking new grounds around API usage. I finally decided to go back and own the region finding code instead of relying on LLVM to do it. It still mostly uses the same logic as LLVM’s region detection code, with a handful of tweaks. Instead of producing regions, it queues a list of visited blocks, and folds blocks belonging to regions, when it identifies them, into a single block until you have just one block left that represents the whole function. I do think that the code is better and faster now, so there’s that.</p>
<p>Hopefully, I won’t feel the need to re-revisit this for a while. At the time of writing, this development is happening in the <code class="highlighter-rouge">structurize-v2</code> branch of fcd, which <a href="https://github.com/zneak/fcd/pull/31">hasn’t been merged to <code class="highlighter-rouge">master</code> yet</a>. There are still a number of small things that need some love; the upgrade introduced a number of regressions in condition simplification. Progress is being made, however, and the merge will probably happen shortly.</p>
Fri, 25 Nov 2016 00:00:00 +0000https://zneak.github.io/fcd/2016/11/25/revisiting-regions.html
https://zneak.github.io/fcd/2016/11/25/revisiting-regions.htmlfcd at CSAW'16<p>As a three-times Computer Security Awareness Week CTF finalist, I was very happy when NYU Poly’s Brendan Dolan-Gavitt invited me to give a talk about fcd at this edition’s new SOS workshop, where authors of open-source software would come and talk about their projects. This motivated me to finish some features that had been in the works for a while, which I’d like to describe here. Additionally, the talk gave me an opportunity to present fcd to a broader audience. Doing so, I realized that I’ve often written about what fcd does, but not about what I want it to do.</p>
<h1 id="the-goals-of-fcd">The goals of fcd</h1>
<p>Of course, the broad goal of a decompiler is to produce analyzable source code from a binary program. Most decompilers specialize for a given architecture or compiler. This is essentially the current state of fcd: it handles x86_64 ELF programs, and not much else. However, this is not what I want this project to be about.</p>
<p>I like to use cake mix as an analogy for the compilation process. If compiling is like pouring cake mix and other ingredients into a bowl to finally bake it into delicious cake, then decompiling would be about taking the cake and trying to get cake mix back. Compilation is a very transformative process in which a significant amount of information is destroyed. In fact, since the fastest code is code that does not run, and the smallest code is code that does not exist, we usually evaluate compilers by how much code they are able to destroy. As a result, a final executable is usually fast, efficient, and devoid of information that would be useful to recover its original structure.</p>
<p>The problems of filling these gaps are what fcd wants to be great at. Decompiling is a process that can only exist on top of a disassembling process, and after a little more than a year working on fcd, it is my firm conclusion that disassembly-related problems are generally easier than decompilation problems. I’m shoving a lot of necessities under that rug:</p>
<ul>
<li>Parsing executables;</li>
<li>Lifting machine code to IR (fcd’s approach, which I like to refer to as <a href="/2016/02/16/lifting-x86-code.html">codegen on a budget</a>, is about as cheap as codegen comes);</li>
<li>Parsing symbols.</li>
</ul>
<p>The problems that remain once you’ve taken these away are those that I want fcd to be great at solving. They include:</p>
<ul>
<li>Recovering function parameters;</li>
<li>Recovering types;</li>
<li>Producing good, C-like output.</li>
</ul>
<p>As a result, I feel that features like executable parsers are less and less relevant to include in the core C++ code of fcd. I will probably find myself writing more Python extension points and leverage them to provide new inputs to fcd.</p>
<h1 id="diy-executable-parsers">DIY executable parsers</h1>
<p>Fcd has supported Python optimization passes for almost a year now. What hasn’t been as widely publicized is that fcd can now also accept Python scripts to parse executables. These scripts need to implement a very simple interface:</p>
<ul>
<li>an <code class="highlighter-rouge">init(data)</code> function, where <code class="highlighter-rouge">data</code> is a byte string containing the executable’s data. The function is called before any other member of the module is used;</li>
<li>an <code class="highlighter-rouge">executableType</code> variable that contains an arbitrary string identifying the type of the executable;</li>
<li>an <code class="highlighter-rouge">entryPoints</code> global variable, typed as a list of <code class="highlighter-rouge">(virtualAddress, name)</code> tuples;</li>
<li>a <code class="highlighter-rouge">getStubTarget(jumpTarget)</code> method that accepts the memory location that an import stub function jumps to, and returns a <code class="highlighter-rouge">(library name?, import name)</code> tuple (where the library name can be None if it is unknown, which is the case in executable formats that don’t support two-level namespacing, like ELF);</li>
<li>a <code class="highlighter-rouge">mapAddress(virtualAddress)</code> function that accepts a virtual address and returns the offset in <code class="highlighter-rouge">init</code>’s <code class="highlighter-rouge">data</code> parameter that contains the information at this address.</li>
</ul>
<p>This interface has been used to implement Portable Executable parsing, using <a href="https://github.com/erocarrera/pefile">Ero Carrera’s very good <code class="highlighter-rouge">pefile</code> Python module</a>, in about 60 lines.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">pefile</span>
<span class="kn">import</span> <span class="nn">bisect</span>
<span class="n">stubs</span> <span class="o">=</span> <span class="p">{}</span>
<span class="n">sectionStart</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">sectionInfo</span> <span class="o">=</span> <span class="p">{}</span>
<span class="n">executableType</span> <span class="o">=</span> <span class="s">"Portable Executable"</span>
<span class="n">entryPoints</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">def</span> <span class="nf">init</span><span class="p">(</span><span class="n">data</span><span class="p">):</span>
<span class="k">global</span> <span class="n">stubs</span>
<span class="k">global</span> <span class="n">sectionStart</span>
<span class="k">global</span> <span class="n">sectionInfo</span>
<span class="k">global</span> <span class="n">executableType</span>
<span class="k">global</span> <span class="n">entryPoints</span>
<span class="n">pe</span> <span class="o">=</span> <span class="n">pefile</span><span class="o">.</span><span class="n">PE</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="n">data</span><span class="p">)</span>
<span class="n">machineType</span> <span class="o">=</span> <span class="n">pefile</span><span class="o">.</span><span class="n">MACHINE_TYPE</span><span class="p">[</span><span class="n">pe</span><span class="o">.</span><span class="n">FILE_HEADER</span><span class="o">.</span><span class="n">Machine</span><span class="p">]</span>
<span class="n">executableType</span> <span class="o">=</span> <span class="s">"Portable Executable </span><span class="si">%</span><span class="s">s"</span> <span class="o">%</span> <span class="n">machineType</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="s">"IMAGE_FILE_MACHINE_"</span><span class="p">):]</span>
<span class="n">imageBase</span> <span class="o">=</span> <span class="n">pe</span><span class="o">.</span><span class="n">OPTIONAL_HEADER</span><span class="o">.</span><span class="n">ImageBase</span>
<span class="k">for</span> <span class="n">section</span> <span class="ow">in</span> <span class="n">pe</span><span class="o">.</span><span class="n">sections</span><span class="p">:</span>
<span class="n">virtualAddress</span> <span class="o">=</span> <span class="n">imageBase</span> <span class="o">+</span> <span class="n">section</span><span class="o">.</span><span class="n">VirtualAddress</span>
<span class="n">bisect</span><span class="o">.</span><span class="n">insort</span><span class="p">(</span><span class="n">sectionStart</span><span class="p">,</span> <span class="n">virtualAddress</span><span class="p">)</span>
<span class="n">sectionInfo</span><span class="p">[</span><span class="n">virtualAddress</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">section</span><span class="o">.</span><span class="n">PointerToRawData</span><span class="p">,</span> <span class="n">section</span><span class="o">.</span><span class="n">SizeOfRawData</span><span class="p">)</span>
<span class="k">for</span> <span class="n">entry</span> <span class="ow">in</span> <span class="n">pe</span><span class="o">.</span><span class="n">DIRECTORY_ENTRY_IMPORT</span><span class="p">:</span>
<span class="k">for</span> <span class="n">imp</span> <span class="ow">in</span> <span class="n">entry</span><span class="o">.</span><span class="n">imports</span><span class="p">:</span>
<span class="k">if</span> <span class="n">imp</span><span class="o">.</span><span class="n">name</span><span class="p">:</span>
<span class="n">stubs</span><span class="p">[</span><span class="n">imp</span><span class="o">.</span><span class="n">address</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">entry</span><span class="o">.</span><span class="n">dll</span><span class="p">,</span> <span class="n">imp</span><span class="o">.</span><span class="n">name</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="c"># make up some name based on the ordinal</span>
<span class="n">stubs</span><span class="p">[</span><span class="n">imp</span><span class="o">.</span><span class="n">address</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">entry</span><span class="o">.</span><span class="n">dll</span><span class="p">,</span> <span class="s">"</span><span class="si">%</span><span class="s">s:</span><span class="si">%</span><span class="s">i"</span> <span class="o">%</span> <span class="p">(</span><span class="n">entry</span><span class="o">.</span><span class="n">dll</span><span class="p">,</span> <span class="n">imp</span><span class="o">.</span><span class="n">ordinal</span><span class="p">))</span>
<span class="n">entry</span> <span class="o">=</span> <span class="p">(</span><span class="n">imageBase</span> <span class="o">+</span> <span class="n">pe</span><span class="o">.</span><span class="n">OPTIONAL_HEADER</span><span class="o">.</span><span class="n">AddressOfEntryPoint</span><span class="p">,</span> <span class="s">"pe.start"</span><span class="p">)</span>
<span class="n">entryPoints</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">entry</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">pe</span><span class="p">,</span> <span class="s">"DIRECTORY_ENTRY_EXPORT"</span><span class="p">):</span>
<span class="k">for</span> <span class="n">export</span> <span class="ow">in</span> <span class="n">pe</span><span class="o">.</span><span class="n">DIRECTORY_ENTRY_EXPORT</span><span class="o">.</span><span class="n">symbols</span><span class="p">:</span>
<span class="n">exportTuple</span> <span class="o">=</span> <span class="p">(</span><span class="n">imageBase</span> <span class="o">+</span> <span class="n">export</span><span class="o">.</span><span class="n">address</span><span class="p">,</span> <span class="n">export</span><span class="o">.</span><span class="n">name</span><span class="p">)</span>
<span class="n">entryPoints</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">exportTuple</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">getStubTarget</span><span class="p">(</span><span class="n">target</span><span class="p">):</span>
<span class="k">if</span> <span class="n">target</span> <span class="ow">in</span> <span class="n">stubs</span><span class="p">:</span>
<span class="k">return</span> <span class="n">stubs</span><span class="p">[</span><span class="n">target</span><span class="p">]</span>
<span class="k">return</span> <span class="bp">None</span>
<span class="k">def</span> <span class="nf">mapAddress</span><span class="p">(</span><span class="n">address</span><span class="p">):</span>
<span class="n">sectionIndex</span> <span class="o">=</span> <span class="n">bisect</span><span class="o">.</span><span class="n">bisect_right</span><span class="p">(</span><span class="n">sectionStart</span><span class="p">,</span> <span class="n">address</span><span class="p">)</span>
<span class="k">if</span> <span class="n">sectionIndex</span><span class="p">:</span>
<span class="n">sectionMaybeStart</span> <span class="o">=</span> <span class="n">sectionStart</span><span class="p">[</span><span class="n">sectionIndex</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="n">thisSectionInfo</span> <span class="o">=</span> <span class="n">sectionInfo</span><span class="p">[</span><span class="n">sectionMaybeStart</span><span class="p">]</span>
<span class="n">pointerOffset</span> <span class="o">=</span> <span class="n">address</span> <span class="o">-</span> <span class="n">sectionMaybeStart</span>
<span class="k">if</span> <span class="n">pointerOffset</span> <span class="o">&lt;=</span> <span class="n">thisSectionInfo</span><span class="p">[</span><span class="mi">1</span><span class="p">]:</span>
<span class="k">return</span> <span class="n">thisSectionInfo</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">+</span> <span class="n">pointerOffset</span>
<span class="k">return</span> <span class="bp">None</span></code></pre></figure>
<p>This script was not announced or widely distributed because fcd doesn’t support Windows executables very well. The main reason is that MSVC++ will very frequently use a custom calling convention for functions that are not externally linked, which is rather poorly supported at the moment. However, a similar Mach-O parser could be implemented and used on x86_64 executables, since Clang consistently uses the x86_64 System V ABI calling convention everywhere on macOS.</p>
<h1 id="using-headers-as-poor-mans-symbols">Using headers as poor man’s symbols</h1>
<p>Over the course of the week, fcd finally gained the ability to reference in-executable functions from headers. This means that if you know the signature of a function contained in an executable, you can use a header file to better inform fcd. The declaration for that function has to be annotated with the special <code class="highlighter-rouge">FCD_ADDRESS</code> attribute macro. For instance, in the <a href="/2016/09/04/parsing-headers.html">original announcement post</a>, you could add this to the header:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>int main(int argc, const char** argv) FCD_ADDRESS(0x040045e);
</code></pre>
</div>
<p>Assuming that the address of the <code class="highlighter-rouge">main</code> function is indeed 0x040045e, fcd will “recover” the prototype as <code class="highlighter-rouge">uint32_t main(uint32_t argc, uint8_t** argv)</code>. As you can see, some information is lost in translation: the signedness of the integer and the constness of the <code class="highlighter-rouge">argv</code> parameter. This is because signedness and constness are concepts that do not exist in LLVM’s type system, and no effort has been made yet to carry that information over.</p>
<p>Under the hood, <code class="highlighter-rouge">FCD_ADDRESS</code> has the following definition:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>#define FCD_ADDRESS(x) __attribute__((annotate("fcd.virtualaddress:" #x)))
</code></pre>
</div>
<p>It uses <code class="highlighter-rouge">strtoull</code> with base 0 to parse <code class="highlighter-rouge">#x</code>, meaning that you should be able to use just about any base that you like (though I expect that base 16 would be the most popular). Of course, this also means that the address has to be an integer literal, and not, say, a C++ constant expression.</p>
<p>Eli Friedman was kind enough to <a href="http://lists.llvm.org/pipermail/cfe-dev/2016-October/051371.html">answer my question on the cfe-dev mailing list</a> about which attribute could be used to carry that kind of information. It turns out that <code class="highlighter-rouge">annotate</code> has no meaning, and can be used to convey just about any information that you can serialize to a string. It can be specified multiple times, so I expect to use it again in the future to specify more information.</p>
<p>(Incidentally, in the strange case where the same function exists in two places in an executable, you can use multiple <code class="highlighter-rouge">FCD_ADDRESS</code> attributes to tell fcd that this prototype applies to multiple addresses.)</p>
<p>Fcd will recognize any function with the <code class="highlighter-rouge">FCD_ADDRESS</code> attribute as an entry point. This means that you can now use that to specify additional entry points instead of the <code class="highlighter-rouge">-e</code> command-line switch (unless you’re doing partial disassembly, in which case you still need it).</p>
<p>This change is powered by a new <code class="highlighter-rouge">EntryPointProvider</code> interface. It is currently implemented by two classes (<code class="highlighter-rouge">Executable</code> and <code class="highlighter-rouge">HeaderDeclarations</code>), but it would be a fair candidate for Pythonization. I will probably get to it when I come up with an acceptable design for an interface that provides full-on symbol information, since there are already enough ways as is to specify new entry points, and new ways to do that will almost certainly have actual symbol info. Headers already do, but I’m thinking about PDB/DWARF symbol parsers, for instance.</p>
<h1 id="thank-you-csaw">Thank you CSAW!</h1>
<p>It was a blast this year again. I thought that last year was going to be my last time, but I’m very happy to have been proved wrong! It was a great experience to talk about fcd and gather feedback.</p>
<p>The presentation was recorded and the link might eventually find its way to this page.</p>
Sat, 12 Nov 2016 00:00:00 +0000https://zneak.github.io/fcd/2016/11/12/csaw16.html
https://zneak.github.io/fcd/2016/11/12/csaw16.htmlParsing headers for fun and profit<p>One of the many challenges of decompiling programs is figuring out the parameters of functions that don’t have a visible body. There are two main situations in which this can happen:</p>
<ol>
<li>the function is called indirectly;</li>
<li>the function immediately jumps indirectly to somewhere else.</li>
</ol>
<p>Of note, dynamic linker stubs create a lot of functions that immediately jump indirectly to somewhere else. Now, the great thing about dynamic linker stubs is that they often reference the name of an external function to call, and it’s quite possible that headers for the library are available.</p>
<h1 id="dynamic-linkage-primer">Dynamic linkage primer</h1>
<p>Since dynamic linkers are complicated beasts, I have no intention of diving very deep into their inner workings. However, for the sake of this post, it’s useful to put out some basics about linkage (specifically targeting on x86_64 Linux ELF executables using <code class="highlighter-rouge">ld.so</code>). The basic idea is that when you pull in a function from a shared object in your program, the compiler creates a stub function. For instance, <code class="highlighter-rouge">objdump</code> might show this for a stub to <code class="highlighter-rouge">puts</code>:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>0000000000400480 &lt;puts@plt&gt;:
400480: jmp QWORD PTR [rip+0x2008a2]
400486: push 0x5
40048b: jmp 400470
</code></pre>
</div>
<p>This function is what your program actually calls when it tries to call <code class="highlighter-rouge">puts</code>. Here, <code class="highlighter-rouge">@plt</code> stands for <em>Procedure Linkage Table</em>, and the address <code class="highlighter-rouge">QWORD PTR [rip+0x2008a2]</code> points right into it. On a call to <code class="highlighter-rouge">puts</code>, execution is transferred to the code pointed to by this entry in the PLT.</p>
<p>To save time on startup, the PLT is populated lazily. Initially, the entry at <code class="highlighter-rouge">QWORD PTR [rip+0x2008a2]</code> actually points right back to <code class="highlighter-rouge">0x400486</code>, which is the instruction just after the initial jump. This means that the first time that you call <code class="highlighter-rouge">puts</code>, the rest of the stub is executed. We can interpret the two instructions as some pseudo-function call to the routine at <code class="highlighter-rouge">0x400470</code> passing 0x5 as a parameter, where <code class="highlighter-rouge">0x400470</code> goes back into <code class="highlighter-rouge">ld.so</code>. From there, the dynamic linker reads the metadata associated to the fifth entry, finding out that the import we’re interested with is named <code class="highlighter-rouge">puts</code>. Then, it iterates dynamic libraries in order, until it finds one that exports that name. <code class="highlighter-rouge">Ld.so</code> then writes back the address of that import in the PLT and transfers execution to it. That way, the next time you call <code class="highlighter-rouge">puts</code>, you won’t need to look it up again.</p>
<p>Stub symbols generally don’t have their name stored in symbol tables. Tools that show names like <code class="highlighter-rouge">puts@plt</code> “make them up” by parsing metadata just like <code class="highlighter-rouge">ld.so</code> would have.</p>
<p>Of course, this means that fcd can do it too, and it does.</p>
<p>Unfortunately, in most cases, the function name doesn’t say a lot. For instance, <code class="highlighter-rouge">exit</code> doesn’t say that the function accepts an integer and does not return. And since the function’s body is in a different library, it’s probably not possible (or at least, not practical) to get the implementation of that function.</p>
<h1 id="just-put-a-compiler-in-your-decompiler">Just put a compiler in your decompiler</h1>
<p>Up until recently, fcd had a hard-coded list of approximately 50 glibc functions with their number of parameters, whether they returned a value, and if they were variadic. The parameter and return information did not include actual types. I added entries to that list as I tested programs that used new functions. This approach doesn’t scale very well.</p>
<p>Fortunately, there is an authoritative source of function signatures on almost every system out there: header files. We could solve this problem rather elegantly, and allow extensibility, by parsing .h files and using the information to determine function parameters.</p>
<p>As it turns out, fcd already links against LLVM and requires Clang to <a href="/2016/02/16/lifting-x86-code.html">lift machine code to LLVM IR</a>. Being that it’s already <em>this close</em> to link against Clang, it’s not a huge step to take.</p>
<p>The great thing about Clang, of course, is that it’s an <em>actual</em> compiler. This solution is not some half-working, in-house C parser that explodes at the slightest hint of a macro: it’s an industrial-grade and proven compiler that actually knows what it’s doing. It’s also extremely convenient that Clang is the only compiler (to my knowledge) that will happily parse Linux, Darwin (iOS, macOS) and Windows headers. Even though fcd only supports ELF executables at the moment, it’s good that Clang will not be a limitation in the foreseeable future.</p>
<p>Initially, I tried to integrate Clang to fcd by using <code class="highlighter-rouge">libclang</code>. While it worked for the task of parsing headers, it had a number of downsides:</p>
<ul>
<li><code class="highlighter-rouge">Libclang</code> statically links LLVM too, so fcd’s address space had two copies of LLVM living side-by-side. This is not a huge problem since <code class="highlighter-rouge">libclang</code> doesn’t leak any LLVM object out, so there’s no chance of mixup between the two, but it still feels clumsy.</li>
<li><code class="highlighter-rouge">Libclang</code> only exposes a small subset of the possible attributes that a function can receive. While most attributes have a larger impact on compilation, some of them provide very useful insight for decompilation as well. For instance, a call to a <code class="highlighter-rouge">noreturn</code> function (one of the attributes that <code class="highlighter-rouge">libclang</code> doesn’t expose) terminates a basic block just like a return instruction, but if a decompiler doesn’t know that, it will think that the function continues beyond the call, which will (at best) make a huge mess.</li>
<li>It’s good that <code class="highlighter-rouge">libclang</code> doesn’t leak any of its internal LLVM details when you have two instances of LLVM side by side, but it also means that it can’t be used to take a <code class="highlighter-rouge">clang::FunctionDecl</code> (which is what we essentially get out of <code class="highlighter-rouge">libclang</code>) and extract a <code class="highlighter-rouge">llvm::FunctionType</code> out of it (which is what fcd needs).</li>
</ul>
<p>Because of that, fcd links against the Clang static libraries. They solve these problems at the cost of an inscrutable memory ownership model. (Well, they <em>almost</em> solve these problems; some massaging still required to get a <code class="highlighter-rouge">FunctionType</code> out of a <code class="highlighter-rouge">FunctionDecl</code>.) If this is of any interest, the current implementation lives in <a href="https://github.com/zneak/fcd/blob/089dba9f01443f9ebac5e8ac2b93a518d5408a08/fcd/header_decls.cpp">fcd/header_decls.cpp</a>.</p>
<h1 id="passing-headers-to-fcd">Passing headers to fcd</h1>
<p>To support this new feature, fcd gains two new command-line options:</p>
<ul>
<li><code class="highlighter-rouge">-I</code> to add an include directory in the search path (can be specified multiple times);</li>
<li><code class="highlighter-rouge">--header</code> to <code class="highlighter-rouge">#include</code> a specific header.</li>
</ul>
<p>While nothing prevents you from writing your own header file and including it, right now, fcd will only use the information for function stubs. Allowing users to somehow pass their knowledge of the program that they’re decompiling to fcd is <strong>definitely</strong> on the radar, though.</p>
<p>Here’s a small program that will write “Hello World!” to a file whose path is passed as the first parameter:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#include &lt;stdio.h&gt;
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">**</span> <span class="n">argv</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">FILE</span><span class="o">*</span> <span class="n">f</span> <span class="o">=</span> <span class="n">fopen</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="s">"w"</span><span class="p">);</span>
<span class="n">fputs</span><span class="p">(</span><span class="s">"Hello World!</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">f</span><span class="p">);</span>
<span class="n">fclose</span><span class="p">(</span><span class="n">f</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>Without header information, fcd does a rather poor job at figuring out what’s going on:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="c1">// $ fcd hello
</span><span class="kt">void</span> <span class="nf">main</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="n">rip</span><span class="p">,</span> <span class="kt">uint64_t</span> <span class="n">rsi</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">fopen</span><span class="p">(</span><span class="mi">4195775</span><span class="p">);</span>
<span class="n">fwrite</span><span class="p">(</span><span class="mi">4195801</span><span class="p">);</span>
<span class="n">fclose</span><span class="p">(</span><span class="mi">4195809</span><span class="p">);</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>However, with header information, we get something that actually makes sense:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="c1">// $ fcd --header stdio.h hello
</span><span class="kt">void</span> <span class="nf">main</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="n">rip</span><span class="p">,</span> <span class="kt">uint64_t</span> <span class="n">rsi</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">struct</span> <span class="n">_IO_FILE</span><span class="o">*</span> <span class="n">anon1</span> <span class="o">=</span> <span class="n">fopen</span><span class="p">(</span><span class="o">*</span><span class="p">(</span><span class="kt">uint8_t</span><span class="o">**</span><span class="p">)(</span><span class="n">rsi</span> <span class="o">+</span> <span class="mi">8</span><span class="p">),</span> <span class="p">(</span><span class="kt">uint8_t</span><span class="o">*</span><span class="p">)</span><span class="mh">0x400674</span><span class="p">);</span>
<span class="n">fwrite</span><span class="p">((</span><span class="kt">uint8_t</span><span class="o">*</span><span class="p">)</span><span class="mh">0x400676</span><span class="p">,</span> <span class="mi">13</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">anon1</span><span class="p">);</span>
<span class="n">fclose</span><span class="p">(</span><span class="n">anon1</span><span class="p">);</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>Interestingly, we can see that the compiler promoted the call to <code class="highlighter-rouge">fputs</code> into a call to <code class="highlighter-rouge">fwrite</code>.</p>
<p>Of course, this example highlights that fcd doesn’t do a great job with string literals (and that it doesn’t know about <code class="highlighter-rouge">main</code>’s signature). However, getting better type information is an obvious first step in in determining what should be displayed as a string literal.</p>
<h2 id="passing-foreign-headers-to-fcd">Passing foreign headers to fcd</h2>
<p>Anyone who’s been watching fcd is probably aware that my main development environment is macOS. However, while macOS standard library headers are generally source-compatible with Linux standard library headers, they are certainly not equivalent. Unfortunately, you can’t easily “just include” your own machine’s headers if you’re decompiling a program that targets a different platform. For instance, on macOS, <code class="highlighter-rouge">fopen</code> boils down to <code class="highlighter-rouge">_fopen</code> (it does not on Linux), and on the other side of the fence, <code class="highlighter-rouge">putc</code> becomes <code class="highlighter-rouge">_IO_putc</code> (it does not on macOS).</p>
<p>Fortunately, getting Linux headers on macOS (or any other platform) is quite easy. For instance, since I know that the program is an x86_64 Linux program using glibc, it’s possible to just <a href="https://packages.debian.org/sid/libc6-dev">head to a package repository</a> and download the package for <code class="highlighter-rouge">libc-devel</code>. After decompressing the .deb and then the data.tar.gz archive, fcd can be pointed to the right header location with the <code class="highlighter-rouge">-I</code> parameter. For instance, I would use this invocation of fcd on macOS:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>$ fcd \
-I /tmp/libc6-dev_2.24-2_amd64/data/usr/include \
-I /tmp/libc6-dev_2.24-2_amd64/data/usr/include/x86_64-linux-gnu \
--header stdio.h \
hello
</code></pre>
</div>
<p>This command has the same result as running fcd on a machine where these headers are actually installed. (Fcd configures Clang to target the platform of the executable that is being decompiled.)</p>
<p>In a future where fcd supports Mach-O programs, Apple provides downloads to its <a href="https://opensource.apple.com/release/os-x-10115/">Libc source</a> (this links points to the macOS 10.11.5 release), which could probably fulfill a similar role on non-Apple platforms.</p>
<p>The Windows situation is a bit more complicated, as the <a href="https://developer.microsoft.com/en-us/windows/downloads/windows-10-sdk">Windows SDK installer</a> is a .exe program that is more involved than a self-extracting archive. Having not looked at the SDK license, it is possible that this use violates the EULA, too. That bridge will be crossed in time.</p>
<h1 id="looking-forward">Looking forward</h1>
<p>This new feature unblocks a lot of type information that can be used for inference, and this will probably be my next focus. It’ll also be interesting to add C++ support here, as the feature introduced a slight regression: the hard-coded list contained mangled C++ names, which are no longer available. C++ has its own set of challenges, notably virtual dispatch and templates.</p>
<p>Additionally, using headers to specify information about functions found in the program itself is a great point of interest. One option that could be explored is adding a new attribute, in fcd only, that specifies the virtual address of a function. However, it is unclear how feasible adding a new attribute is for out-of-tree Clang consumers.</p>
<p>Finally, at the moment, fcd has its own code to take a function prototype and figure out which locations will be used to pass parameter. It is known that LLVM can do that translation too, and it can certainly do it better. However, the specifics are nebulous. This will hopefully be investigated at some point in the future.</p>
Sun, 04 Sep 2016 00:00:00 +0000https://zneak.github.io/fcd/2016/09/04/parsing-headers.html
https://zneak.github.io/fcd/2016/09/04/parsing-headers.htmlUsing bugpoint to find the cause of crashes<p>Fcd needs several custom LLVM transformation passes to generate decent output. Generating pseudocode happens in four more or less distinct steps:</p>
<ol>
<li>Lifting machine code to LLVM IR;</li>
<li>“Pre-optimizing” the module using mostly generic passes (GVN, DSE, instruction combining, CFG simplification);</li>
<li>Running the “customizable optimization pipeline”, which has most of the fcd-specific passes, and in which the user may insert passes using the <code class="highlighter-rouge">--opt</code> option;</li>
<li>Actually producing pseudocode (which is itself a multi-step operation as well).</li>
</ol>
<p>Fcd implements several passes that are meant to be run in the customizable optimization pipeline. Notably, these passes serve the purposes of:</p>
<ul>
<li>recovering a function’s actual arguments by analyzing its IR with the active calling convention in mind;</li>
<li>simplifying conditions as expressed by the combination of x86 CPU flags;</li>
<li>identifying local variables;</li>
<li>eliminating dead loads using the MemorySSA analysis (though I assume that this will eventually be done by LLVM);</li>
<li>simplifying no-op pointer casts;</li>
<li>replacing arithmetic right shifts with sign extension operations.</li>
</ul>
<p>All of these address a specific problem that is often found in the output of fcd’s x86_64 code lifter. While several of these are fairly simple and straightforward, some passes required a significant debugging effort. The SESE loop pass was, up to now, the most difficult pass to get right.</p>
<p>This is because currently, debugging an optimization pass with fcd is a tedious process when problems can’t be made immediately obvious. If your pass can’t discover itself the problems that it might cause, then the next step is using the <code class="highlighter-rouge">--print-after-all</code> command switch inherited from LLVM and analyzing each pass’s input and output to figure out which transformation made things go wrong. This is usually easier with passes further down the pass pipeline, as very often, most of the code has been chewed away. However, when you need to write a pass that should be run early, you’re left with a lot of noise to go through.</p>
<p>Currently, fcd’s x86 front-end extends <em>every</em> integer to 64-bit and only uses 64-bit math. However, things get ugly and messy when the original program used 32-bit maths: the output is riddled with <code class="highlighter-rouge">&amp; 2147483647</code>, and some passes fail to identify patterns when inputs are obscured by AND masks. For this reason, I’m working on an “int narrowing” pass, that notably uses LLVM’s <code class="highlighter-rouge">DemandedBits</code> analysis to figure out when values don’t actually need to be 64-bits.</p>
<h2 id="enter-bugpoint">Enter bugpoint</h2>
<p><code class="highlighter-rouge">Bugpoint</code> is a tool that takes IR code that triggers a bug in a pass, and automaticaly strips it down until everything in the sample is necessary to trigger it. It was written in a way that ensures that it can work without knowing what’s going on with the passes, so it can be used by trained compiler engineers as much as lowly hobbyists like me.</p>
<p>I snubbed <code class="highlighter-rouge">bugpoint</code> for the longest time, but this pass that I’m writing seems to be an excellent use case, so I got my hands dirty and set out to make it work.</p>
<p>Joshua Cranmer has a great introduction to <a href="http://quetzalcoatal.blogspot.ca/2012/03/how-to-use-bugpoint.html">using <code class="highlighter-rouge">bugpoint</code></a> with a standalone (i.e. not <code class="highlighter-rouge">opt</code>) tool, and this is exactly what we need for fcd.</p>
<h3 id="generating-ir-to-work-with">Generating IR to work with</h3>
<p>The first step is to generate IR code ready to be tested. Before today, fcd could only generate an IR dump right after the lifting process and before the “preoptimization pass” (where stable LLVM passes simplify the lifted code as much as they can), which meant that might still be a lot of work to repeat every time <code class="highlighter-rouge">bugpoint</code> tries something.</p>
<p>To solve this problem, the <code class="highlighter-rouge">--module-in</code> (<code class="highlighter-rouge">-m</code>) and <code class="highlighter-rouge">--module-out</code> (<code class="highlighter-rouge">-n</code>) options can now be specified multiple times. The gist is that the number of times you specify <code class="highlighter-rouge">--module-out</code> determines at which decompilation step fcd will dump its module, and the number of times you specify <code class="highlighter-rouge">--module-in</code> determines at which step it would resume with that module. In general, a module obtained with <code class="highlighter-rouge">-m</code> specified M times should be loaded with <code class="highlighter-rouge">-n</code> specified M times as well.</p>
<p>For one occurrence of <code class="highlighter-rouge">-n</code>, the module will be dumped right after lifting. For two, it will be dumped after pre-optimizations. For three, it will be dumped after the customizable optimization pipeline. It can’t be specified four times because the next step is the AST production.</p>
<p>For one occurrence of <code class="highlighter-rouge">-m</code>, the optimizations will resume just before pre-optimizations. For two, it they will resume just before the customizable pipeline. For three, it will resume just before AST generation. It can’t be specified four times either.</p>
<h3 id="specifying-which-passes-to-run">Specifying which passes to run</h3>
<p>Another problem with fcd (before now) is that the pass pipeline was fairly inflexible. It had a single customization point about a third of the way in. You could add passes but you couldn’t remove any, so this would also cause <code class="highlighter-rouge">bugpoint</code> to waste a lot of time going through passes that are unlikely to be broken.</p>
<p>Fcd now solves this problem by introducing the <code class="highlighter-rouge">--opt-pipeline</code> option. This option can be used in three ways:</p>
<ul>
<li>when not specified (or when explicitly set to <code class="highlighter-rouge">default</code>), fcd behaves the same as it did before, and still allows specifying more passes with <code class="highlighter-rouge">-opt</code> (<code class="highlighter-rouge">-O</code>);</li>
<li>when set to a string, the string is whitespace-separated into pass names and fcd will run these passes only;</li>
<li>when set to the empty string, fcd will start your <code class="highlighter-rouge">$EDITOR</code> with a pre-populated list of default passes and allow you to customize it any way you need it.</li>
</ul>
<p>The second behavior is used with <code class="highlighter-rouge">bugpoint</code>.</p>
<h3 id="using-a-debug-calling-convention">Using a debug calling convention</h3>
<p>Since bugpoint doesn’t know anything about fcd’s calling convention system, it may end up trying to delete instructions that are essential to argument identification. When you want to try against a single pass that doesn’t depend on argument identification, you can get spurious crashes and end up with a module containing a single function whose only instruction is the <code class="highlighter-rouge">unreachable</code> terminator.</p>
<p>To work around this issue, fcd now implements the <code class="highlighter-rouge">anyarch/noargs</code> calling convention. It doesn’t match any executable but it can be passed as a command-line parameter. This calling convention lets passes assume that functions have no arguments and no returns by performing no analysis, so <code class="highlighter-rouge">bugpoint</code> can’t break it.</p>
<h3 id="running-bugpoint">Running <code class="highlighter-rouge">bugpoint</code></h3>
<p>These new enhancements make it possible to use <code class="highlighter-rouge">bugpoint</code> to figure out problems with fcd’s custom passes. First, you need to generate a module for <code class="highlighter-rouge">bugpoint</code> to play with:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>$ fcd -n -n offending-program &gt; offending-program.ll
</code></pre>
</div>
<p>After this, you need to pass the <strong>path of a program</strong> that <code class="highlighter-rouge">bugpoint</code> will launch, passing the IR file as a parameter. Since this needs to be a path, we can’t just put an fcd invocation with parameters. We have to write a script instead, but it’s fairly straightforward:</p>
<div class="highlighter-rouge"><pre class="highlight"><code><span class="c">#!/bin/bash</span>
fcd -m -m -n -n -n --opt-pipeline<span class="o">=</span><span class="s2">"intnarrowing verify"</span> --cc<span class="o">=</span>x86_64/sysv <span class="nv">$@</span>
</code></pre>
</div>
<p>As a side note, this command makes me wish that LLVM’s CommandLine API supported <code class="highlighter-rouge">getopt</code>-style short options. <code class="highlighter-rouge">-mmnnn</code> looks more natural.</p>
<p>Here, we use <code class="highlighter-rouge">-m -m</code> to specify that our module should be passed directly to the custom optimization pipeline; we use <code class="highlighter-rouge">-n -n -n</code> to stop before AST generation takes place (since we only care to see if the custom optimization pipeline worked).</p>
<p><code class="highlighter-rouge">--opt-pipeline</code> only has the <code class="highlighter-rouge">intnarrowing</code> pass and the verifier pass. The verifier pass, by default, will terminate the process with a non-zero status if verification fails, which is exactly what bugpoint is looking for. (It would also catch crashes, but the pass doesn’t crash.)</p>
<p>Then, <code class="highlighter-rouge">bugpoint</code> can be started with this command:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>$ bugpoint --compile-custom --compile-command=./fcd.sh offending-program.ll
</code></pre>
</div>
<p>In my case, <code class="highlighter-rouge">bugpoint</code> reduced <code class="highlighter-rouge">offending-program.ll</code> into a tiny function:</p>
<figure class="highlight"><pre><code class="language-llvm" data-lang="llvm"><span class="k">define</span> <span class="kt">void</span> <span class="vg">@main</span><span class="p">()</span> <span class="p">{</span>
<span class="nl">entry:</span>
<span class="nv">%0</span> <span class="p">=</span> <span class="k">zext</span> <span class="kt">i32</span> <span class="k">undef</span> <span class="k">to</span> <span class="kt">i64</span>
<span class="k">store</span> <span class="kt">i64</span> <span class="nv">%0</span><span class="p">,</span> <span class="kt">i64</span><span class="p">*</span> <span class="k">undef</span><span class="p">,</span> <span class="k">align</span> <span class="m">8</span>
<span class="nv">%1</span> <span class="p">=</span> <span class="k">sub</span> <span class="k">nsw</span> <span class="kt">i64</span> <span class="nv">%0</span><span class="p">,</span> <span class="k">undef</span>
<span class="nv">%2</span> <span class="p">=</span> <span class="k">and</span> <span class="kt">i64</span> <span class="nv">%1</span><span class="p">,</span> <span class="m">2147483648</span>
<span class="k">br</span> <span class="kt">i1</span> <span class="k">undef</span><span class="p">,</span> <span class="kt">label</span> <span class="nv">%"400643"</span><span class="p">,</span> <span class="kt">label</span> <span class="nv">%"4006f5"</span>
<span class="nl">"4006f5":</span> <span class="c1">; preds = %entry</span>
<span class="k">ret</span> <span class="kt">void</span>
<span class="nl">"400643":</span> <span class="c1">; preds = %entry</span>
<span class="k">unreachable</span>
<span class="p">}</span></code></pre></figure>
<p>And indeed, when fed this input, the <code class="highlighter-rouge">intnarrowing</code> pass produces incorrect output. With a sample this small, it’s much easier to see what’s going wrong.</p>
Wed, 16 Mar 2016 00:00:00 +0000https://zneak.github.io/fcd/2016/03/16/bugpoint.html
https://zneak.github.io/fcd/2016/03/16/bugpoint.htmlUsing a use list in the AST<p>The final part in decompilation is generating pseudocode output. In general, decompilers will follow the inverse path of compilers to the end by generating an abstract syntax tree from the intermediate representation and turning that into text.</p>
<p>Fcd is no exception. It has an <code class="highlighter-rouge">ast</code> folder in which all of its abstract syntax tree-related code resides. That code is approximately 5600 lines, which is more than 25% of the decompiler’s total source lines of code. Comparatively, the x86 emulator is only about 1700 lines of code.</p>
<p>It shouldn’t be very surprising that this much effort goes into this final decompilation step. After all, the output is what people see and what they value your decompiler for, so it better be pretty.</p>
<p>However, fcd got off to a bad start as far as the AST goes. As many other sad pieces of code, it started with good intentions, but it didn’t take very long for problems to show up.</p>
<p>It’s known that taking an SSA form and turning it back into something else is cumbersome, regardless of whether you’re going back to pseudocode or moving forward to machine code. At least, source code has fewer limitations than machine code: most notably, fcd can have as many variables as it needs, whereas x86_64 has a grand total of 16 visible, general-purpose registers, and they aren’t even all that general-purpose. However, with readability objectives in mind, we still have a lot to be concerned about.</p>
<p>I remember that Van Emmerick says somewhere in his thesis that you want to propagate values “just enough” through your SSA representation. If you collapse too many things into variables, your code ends up looking like a mess of definitions, but if you don’t do it enough, expressions are repeated and overall readability is negatively impacted.</p>
<p>LLVM doesn’t even allow the luxury of propagation values “just enough”. With a true SSA representation, every value is collapsed into its own variable. This means that the pseudocode back-end needs to work against LLVM and make things pretty again.</p>
<h2 id="the-old-abstract-syntax-tree">The old abstract syntax tree</h2>
<p>The initial design sounded simple enough to actually work. Every LLVM instruction would have its value represented as a declaration in the abstract syntax tree, and then a pass infrastructure parallel to LLVM’s would be responsible for cleaning up the code and make it readable.</p>
<p>AST elements were separated in two class hierarchies: expressions and statements. Expressions represent values, while statements represent control flow.</p>
<p>The pass pipeline would try to do the following things:</p>
<ul>
<li>combine similar if-else statements;</li>
<li>remove spurious scopes;</li>
<li>propagate variables that are used in only one other place;</li>
<li>simplify expressions (for instance, <code class="highlighter-rouge">!(a == b)</code> should be <code class="highlighter-rouge">a != b</code>);</li>
<li>remove assignments to the special <code class="highlighter-rouge">__undefined</code> value;</li>
<li>combine similar if-else branches again;</li>
<li>print output.</li>
</ul>
<p>Most of these passes need to scan the whole AST to work properly, and some of them never really lived up to their expectations. For one thing, generating AST variables was a messy story. Every value is a variable, but some are more variable than others, and some have special constraints. For instance, a Φ node has its own variable, but it can be assigned to, whereas normal values have a single assignment. Some values were pointer-shared, some were hidden behind an identifier.</p>
<p>Things got the messiest for expression propagation. The basic idea is that it would find variables that are used in just one place and replace them with their initializing expression (and leave a <code class="highlighter-rouge">var = __undefined</code> to be removed at some later point). However, memory operations (like calls, reading from memory and writing to memory) are not allowed to propagate like normal values because their ordering might be important. To make matters worse, there was no way to tell, just looking at a variable, where it was used in the AST, or even what its initial definition was (when it even had one). You had to walk it all by hand, and the visitor classes were brilliantly inept at helping you write less code.</p>
<p>At some point, fcd had a use analysis pass that would try to tell all of that. It was full of problems too and was trashed at some point last Fall.</p>
<p>There were things that were just very hard too. For instance, every variable declaration would sit at the top of the function and then be assigned later on. This would sometimes cause a lot of bloat, and <a href="https://github.com/zneak/fcd/issues/4">changing it would have been non-trivial</a>.</p>
<p>What happened in the end is that I would frequently change things, and see that some input improved, but that then some values went missing or some condition disappeared. I couldn’t make any substantial progress anymore, so it was time for a fundamental change.</p>
<h2 id="the-new-abstract-syntax-tree">The new abstract syntax tree</h2>
<p>For the new AST design, I decided to bring LLVM’s tried and true approaches to fcd.</p>
<p>One of LLVM’s best features is its <code class="highlighter-rouge">Use</code> and <code class="highlighter-rouge">User</code> classes, and the <code class="highlighter-rouge">replaceAllUsesWith</code> operation that they make possible. <code class="highlighter-rouge">replaceAllUsesWith</code> takes a value and replaces every use of it with another one, in linear time over the number of uses. Since this works pretty well, fcd now has <code class="highlighter-rouge">ExpressionUse</code> and <code class="highlighter-rouge">ExpressionUser</code> classes, which support a very similar set of operations, including <code class="highlighter-rouge">replaceAllUsesWith</code>.</p>
<p>This new design makes it a joke to eliminate assignments to the special <code class="highlighter-rouge">__undefined</code> value, for instance. Just ask for it (<code class="highlighter-rouge">AstContext::expressionForUndef()</code>), walk over its uses (<code class="highlighter-rouge">for (ExpressionUse&amp; use : undef.uses())</code>), check if the user is an assignment statement (<code class="highlighter-rouge">use.getUser()</code>), and if <code class="highlighter-rouge">__undefined</code> is on the right-hand side, remove that statement. Although, it doesn’t even matter that much, because this new design produces significantly less <code class="highlighter-rouge">__undefined</code> values.</p>
<p>Instead of immediately creating a variable for each value, each LLVM instruction is given a matching AST expression. Its pointer is shared everywhere it needs to appear, and the use list lets fcd keep track of what is where. Declaration statements have been eliminated: instead, the code printing class is responsible to insert declaration statements where appropriate.</p>
<p>The printer is also responsible for propagating expressions. An expression that is used a single time will be shown inline with its larger expression. Expressions that are used more than one time are eligible to be promoted to local variables; the printer will create the declaration where it should be relevant to have it, and the initial assignment is made in-line when it is possible.</p>
<p>In short, I would say to switching to a use list for the AST made my life much better. That said, as shiny as this new design is, it still has a number of pitfalls.</p>
<p>For one thing, while having the printer do all this work makes everything else much simpler, well, it makes the printer that much more complex. I’ve seen a few issues already and I’m certain that I haven’t seen the last.</p>
<p>Another problem is that now that variables aren’t always accompanied with a declaration, the lack of type information in the AST suddenly became glaringly obvious. Variables are declared with a <code class="highlighter-rouge">some_t</code> type, which means absolutely nothing. This needs to be improved shortly.</p>
<p>Also, the use list implementation is based off the LLVM use list implementation, and it’s <em>scary</em>. An use array is allocated <em>in front</em> of users (users refer to it by indexing backwards–<em>shudder</em>), and if I was caught using this many pointer casts 400 years ago I’d probably be burning at the stake right now.</p>
<p>It should also be noted that while LLVM instructions are rooted in blocks, expressions are just kind of floating around with no clear owner. Fcd’s AST classes are allocated through fast memory pools that don’t even allow individual deallocation (everything is freed when the pool goes out of scope), so this isn’t really a problem for me, but it might be a problem for you if you’re also making a decompiler or something similar.</p>
<p>And finally, while expressions do look much better now, there’s still a significant amount of pain with statements, which have barely changed with the new AST. Matching against statement patterns is still as difficult.</p>
<p>If you’re writing a decompiler and want a convenient way to manipulate your AST, I do recommend that your classes have use lists. However, let me know if you find a more convenient way to manipulate statements.</p>
Tue, 01 Mar 2016 06:08:59 +0000https://zneak.github.io/fcd/2016/03/01/ast.html
https://zneak.github.io/fcd/2016/03/01/ast.htmlUpgrading to LLVM 3.8<p>As I wrote before, one goal for fcd is to stay up-to-date with LLVM. This brings all sorts of niceties, as each version brings its own set of improvements in terms of performance and optimizations.</p>
<p>Upgrading means that fcd has better chances of staying relevant. The LLVM maintainers are certainly not afraid of breaking any and every kind of compatibility that users may be looking for. The bitcode format changes semi-regularly; the assembly syntax changes on a similar schedule; C++ APIs change all the time. The more stable (and much much less powerful) C API has symbols that face aggressive deprecation: old functions are declared obsolete in a release and deleted in the next. Another example is that LLVM 3.7 introduced an accidental API change in <code class="highlighter-rouge">LLVMBuildLandingPad</code> that was reverted in the next dot release, breaking API compatibility on an unusual schedule (and <a href="https://github.com/zneak/fcd/issues/6">biting</a> <a href="https://github.com/zneak/fcd/issues/12">fcd</a> in the process).</p>
<p>This all goes to say that a project that stays behind is likely to become isolated and unusable within a relatively short time frame. Even if the project can read or emit LLVM bitcode or assembly, it’s improbable that the next version of LLVM will even be able to use it. An unfortunate example of this is the <a href="https://github.com/trailofbits/mcsema">MC-Semantics framework</a>. With just under 300 commits over almost 2 years, you can tell that non-negligible effort went into it. Sadly, I predict that it will be forgotten soon if it can’t be upgraded—and the cost of upgrading rises with each passing release.</p>
<p>LLVM 3.8 marks the second time that fcd was upgraded. The first time was practically painless given the amount of code that had already gone into the project; this time was a little harder.</p>
<h2 id="what-changed-in-llvm-38">What changed in LLVM 3.8</h2>
<p>As we still wait for official release notes, a comprehensive list of what changed in LLVM 3.8 is hard to come by. The breaking changes that were observed while compiling fcd range from “easy to fix” to “dammit what am I missing?”</p>
<p>A short list would be:</p>
<ul>
<li><code class="highlighter-rouge">ilist</code> iterators and <code class="highlighter-rouge">ilist</code> element pointers are no longer interchangeable. For instance, the result of <code class="highlighter-rouge">function-&gt;arg_begin()</code> is no longer assignable to an <code class="highlighter-rouge">Argument*</code> variable. A static cast can convert an iterator to a pointer (as it defines a conversion operator) and an iterator to the object can be obtained with the <code class="highlighter-rouge">getIterator()</code> method. That one was easy.</li>
<li>The <code class="highlighter-rouge">CloningDirector</code> API was removed. Within LLVM, its only purpose was to help cloning landing pads, and LLVM 3.8 does away with them. This took some refactoring.</li>
<li>The alias analysis infrastructure underwent massive changes. That one was hard.</li>
</ul>
<p>These probably deserve some explanations.</p>
<h3 id="removing-the-cloningdirector">Removing the CloningDirector</h3>
<p>As I explained in a <a href="/2016/02/16/lifting-x86-code.html">previous blog post</a>, fcd used <code class="highlighter-rouge">CloneAndPruneIntoFromInst</code> to inline instruction implementation templates into a new function. Up to LLVM 3.7, the function had a <code class="highlighter-rouge">CloningDirector</code> parameter that could be used to perform special actions when certain instructions were encountered.</p>
<p>Fcd used the cloning director to resolve its custom intrinsics. When the director encountered a call to a function like <code class="highlighter-rouge">x86_read_mem</code>, instead of emitting back another call, it generated a <code class="highlighter-rouge">load</code> instruction. However, this would no longer be possible with LLVM 3.8.</p>
<p>A downside of this implementation was that it heavily coupled the cloning director with the intrinsic logic. While fcd already had isolated code generation classes (a design which could allow other architectures in the future), it had just one cloning director, and it was responsible for transforming x86-specific intrinsics.</p>
<p>Right now, intrinsics represent a useful concept on any processor that fcd could want to support (return, jump, call, read/write memory are all fairly universal), but it might not be the case forever: the large size of IR function templates currently causes major performance issues and one way to solve it would be to create intrinsics that do more work than just a very small and precise operation. These would most certainly be processor-specific.</p>
<p>Since that just wouldn’t build anymore, I took the opportunity to factor out architecture-specific code into the code generator classes.</p>
<p>Now, the code generator checks its list of intrinsics each time code for an instruction has been generated and replaces every use with the corresponding code. Alongside with solving build problems, this clears a major obstacle towards supporting multiple processor architectures.</p>
<h3 id="upgrading-the-alias-analysis">Upgrading the alias analysis</h3>
<p>Alias analysis was the most serious problem that I had upgrading fcd. The problem isn’t so much that the new design is complex than that it is currently wholly undocumented. Up to now, if you wanted to build an alias analysis pass, you could go to the <a href="https://web.archive.org/web/20151118064719/http://llvm.org/docs/AliasAnalysis.html">LLVM documentation page about it</a>, take half an hour to read it, and you’d be ready to go. While the new system is similar enough, there’s nothing similar to help you use it.</p>
<p>Before LLVM 3.8, alias analyses belonged to an <em>analysis group</em>. As far as I know, analysis groups were an abstraction invented to allow composing alias analysis passes, and it never was used for anything else. LLVM 3.8 got rid of it in favor of a new design that works best with the new pass infrastructure, but unfortunately, the new pass infrastructure isn’t ready for prime time yet. Several passes haven’t been ported over yet, so we need to stick to <code class="highlighter-rouge">legacy::PassManager</code>, and second class alias analysis support that it provides.</p>
<p>Now, instead of having a pass that also inherits from <code class="highlighter-rouge">AliasAnalysis</code>, you create an <code class="highlighter-rouge">AAResult</code> class that inherits from the curiously-recurring template class <code class="highlighter-rouge">AAResultBase</code>.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">class</span> <span class="nc">MyAAResult</span> <span class="o">:</span> <span class="k">public</span> <span class="n">llvm</span><span class="o">::</span><span class="n">AAResultBase</span><span class="o">&lt;</span><span class="n">MyAAResult</span><span class="o">&gt;</span>
<span class="p">{</span>
<span class="cm">/* snip */</span>
<span class="p">};</span></code></pre></figure>
<p>This class should shadow the methods that need to be overridden. They’re mostly the same as in the previous alias analysis infrastructure.</p>
<p>The trickiest part is to make the alias analysis results available to other passes, since <code class="highlighter-rouge">MyAAResult</code> is not itself a pass. Existing alias analyses wrap <code class="highlighter-rouge">AAResult</code> objects in legacy passes and provide a <code class="highlighter-rouge">getResult</code> method. Of course, just providing that method won’t get you anywhere.</p>
<p>LLVM 3.8 provides a “legacy” <code class="highlighter-rouge">AAResultsWrapperPass</code> that, much like the <code class="highlighter-rouge">DominatorTreeWrapperPass</code> and its friends, can be added as a required analysis and then queried for the wrapped object. To combine alias analysis results, the pass contains a <em>hard-coded list of every known alias analysis pass</em>, tries to see if they were included in the pass manager, and puts them together when it finds them. The obvious problem is that if you’re an out-of-tree user and you wrote your own alias analysis pass, you can’t just put it there without asking people to recompile LLVM.</p>
<p>As a bridge solution, the pass also tries to see if there is an <code class="highlighter-rouge">ExternalAAWrapperPass</code> in the pipeline. If so, it uses a callback on it that is supposed to combine external alias analysis results with the LLVM AA results.</p>
<p>While this is enough if you have just one alias analysis, or multiple alias analyses that don’t depend on one another, this fell short for fcd. The project contains two alias analysis passes: one to tell that program memory can’t alias with machine registers, and one to figure out the “mod/ref” behavior of functions with respect to the machine register structure, with the intent of identifying parameter registers. This second pass needs accurate alias analysis results within the functions that it analyses to run the <code class="highlighter-rouge">MemorySSA</code> utility. That means that the second pass depends on the first one.</p>
<p>This is a problem because there can only be one <code class="highlighter-rouge">ExternalAA</code> pass in the pipeline, and it can only expose alias analyses that were inserted before it in the pipeline. The presented choice was to either have accurate alias analysis while identifying parameter registers without being able to propagate that information, or fail to recover parameter registers but be able to propagate that information.</p>
<p>The solution to this situation, of course, had to be a hack. The parameter identification pass builds its own copy of the program memory alias analysis and merges it itself with the <code class="highlighter-rouge">AAResult</code> that it gets from the <code class="highlighter-rouge">AAResultsWrapperPass</code>.</p>
<p>Hopefully, the new pass pipeline will hit a stable release sooner than later and we can go back to a proper implementation. Until then, this actually works pretty well.</p>
<h3 id="fixing-bindingscpp">Fixing bindings.cpp</h3>
<p>Finally, the last improvement made while upgrading to LLVM 3.8 was to remove the implementation of the <code class="highlighter-rouge">bindings.cpp</code> file from the repository. Of course, the file was as broken as it was under LLVM 3.7.1, so I took the opportunity to solve the problem.</p>
<p>This file was auto-generated by parsing a manually-edited and pre-processed version of the <code class="highlighter-rouge">&lt;llvm-c/Core.h&gt;</code> header, and it was included in the repository. Unfortunately, as we found out with LLVM 3.7.1, the C API changes once in a while.</p>
<p>Now, the build system is responsible for it and it was removed from the repository. This will ensure that users always get a fresh version that matches their LLVM install instead of a stale one.</p>
<h2 id="the-future">The future</h2>
<p>Overall, upgrading LLVM isn’t a very pleasant task, especially given that documentation isn’t always available with pre-releases.</p>
<p>There are still a lot of breaking changes to come. Two major planned changes are the erasure of pointer types (LLVM will have a single <code class="highlighter-rouge">*</code> pointer type, and <code class="highlighter-rouge">load</code>/<code class="highlighter-rouge">store</code>/<code class="highlighter-rouge">getelementptr</code> instructions will specify the pointee type), and the deployment of the new pass manager infrastructure. I’m trying my best to not depend on features that will disappear, but as outlined by the disappearance of the <code class="highlighter-rouge">CloningDirector</code>, this information is not always easy to find. Other changes are just unavoidable.</p>
<p>I’m considering developing fcd against the SVN version of LLVM, so that incompatibilities don’t accumulate until it’s time to upgrade to the next stable release. Fcd releases could be tagged at the same time as a new stable release of LLVM comes out, which has several advantages.</p>
<p>At the same time, I don’t know if I want to build LLVM and Clang every day. Given the aging hardware that I use, building them from scratch takes more than an hour. It wouldn’t be as bad with incremental builds, but it’s still a factor to consider. For now, this remains an open question.</p>
Fri, 26 Feb 2016 05:45:31 +0000https://zneak.github.io/fcd/2016/02/26/llvm38.html
https://zneak.github.io/fcd/2016/02/26/llvm38.html