Maelstromhttp://alangrow.com/blog/Alanalangrow+maelstrom@gmail.comhttp://alangrow.com/blog/shell-quirk-assign-from-heredocShell Quirk: Assignment From a Heredoc2017-06-10T20:30:00Alanalangrow+maelstrom@gmail.com<p>I have a <strike>fetish for</strike> fascination with POSIX shell corner cases. It all started a decade ago with a segfault: a certain <code>while read</code> loop ran fine on every Unix except AIX. We were stumped, and I was hooked.</p>
<p>Here's a new find. What will the following POSIX shell program print?</p><div class="highlight"><pre><span class="c">#!/bin/sh</span>
<span class="nv">paths</span><span class="o">=</span><span class="sb">`</span>tr <span class="s1">&#39;\n&#39;</span> <span class="s1">&#39;:&#39;</span> | sed -e <span class="s1">&#39;s/:$//&#39;</span><span class="sb">`</span><span class="s">&lt;&lt;EOPATHS</span>
<span class="s">/foo</span>
<span class="s">/bar</span>
<span class="s">/baz</span>
<span class="s">EOPATHS</span>
<span class="nb">echo</span> <span class="s2">&quot;$paths&quot;</span>
</pre></div>
<p>If you said <code>/foo:/bar:/baz</code>, you're right...that is, if you're on Linux and <code>/bin/sh</code> is provided by <a href="https://en.wikipedia.org/wiki/Almquist_shell#dash:_Ubuntu.2C_Debian_and_POSIX_compliance_of_Linux_distributions">dash</a>.</p>
<p>If you're on MacOS <a href="#1">[1]</a> or FreeBSD instead, this same script will wait for input and print nothing. This is probably the behavior on all BSD derivatives, and it's likely the correct behavior too, since the BSDs are usually right about these things.</p>
<p>Correct or not, the <code>dash</code> behavior is a bit more useful. It also points to a fundamental difference in the way <a href="http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_07_04">here-documents</a> work: <code>dash</code> interprets the heredoc <em>before</em> anything else on the line. When the assignment is interpreted next, stdin already has the contents of the heredoc. I'm not even sure what the other POSIX shells do. Is the heredoc interpreted after the assignment? Where does it even go?</p>
<p>Fortunately there's an easy portable alternative: wrap the whole thing in backquotes.</p><div class="highlight"><pre><span class="c">#!/bin/sh</span>
<span class="nv">paths</span><span class="o">=</span><span class="sb">`</span>tr <span class="s1">&#39;\n&#39;</span> <span class="s1">&#39;:&#39;</span> | sed -e <span class="s1">&#39;s/:$//&#39;</span><span class="s">&lt;&lt;EOPATHS</span>
<span class="s">/foo</span>
<span class="s">/bar</span>
<span class="s">/baz</span>
<span class="s">EOPATHS</span><span class="sb">`</span>
<span class="nb">echo</span> <span class="s2">&quot;$paths&quot;</span>
</pre></div>
<p><a name="1">[1]</a> Note that on recent MacOS versions, <code>/bin/sh</code> is actually <code>bash</code> in POSIX mode. Don't believe me? Run <code>/bin/sh --help</code> and <code>/bin/sh -c 'echo $POSIXLY_CORRECT'</code>.</p>
<style>
.highlight .s { color: #dd7700; }
</style>http://alangrow.com/blog/blog-refresh-now-with-lessBlog Refresh: Now With Less2017-05-01T02:13:42Alanalangrow+maelstrom@gmail.com<p>To readers who enjoyed the 3-column layout, the Edgar Allen Poe quote, and the engraving of the fragile rowboat disappearing into the mighty maelstrom: I'm sorry. It's all gone. To me, minimalism is less an aesthetic than it is the search for time invariants, and well...here we are some years later.</p>
<p>It's actually a bit more practical than all that. After porting this blog from <a href="https://jekyllrb.com/">jekyll</a> to <a href="https://github.com/acg/tinysite">tinysite</a>, I discovered that the very problem I set out to solve -- fast incremental site rebuilds -- was still a problem. No comment on why this seems to be a common failure mode for shiny two-point-oh-y things.</p>
<p>The culprit? That index of posts in the right column. The simple act of fixing a typo on a single page would cause <code>posts.json</code> to rebuild, and then every post would be rebuilt in a cascade, since the right column of every post depended on <code>posts.json</code>. Other static site generators probably learned to avoid this years ago. I finally came around to it this weekend.</p>
<p>In the interim, editing posts has been pretty unpleasant. Doubly so because I had no one to blame but myself. Now incremental site rebuilds are quick and can be accelerated with <code>make -j</code> as before.</p>
<p>With that out of the way, I decided to take advantage of the "let's optimize the shit out of everything" mental state I was in and see what could be done to speed up the publishing side of things. I really like the Heroku / Github Pages approach of "just git push and we'll do the rest," and have spent the last few years building systems to make everything at <a href="https://endcrawl.com">Endcrawl</a> work like that. Maybe those years would have been better spent learning docker or kube. Maybe the people who regard <a href="https://news.ycombinator.com/item?id=5927843">deploying-via-git as an antipattern</a> are right. But I can't shake the idea that we're overengineering the hell out of this problem right now. As <a href="https://news.ycombinator.com/item?id=14216655">one HN commenter</a> put it:</p>
<blockquote>
<p>In the long term I predict that base OS everywhere will improve support for deployment, workload scheduling, resource allocation, endpoint discovery, and dependency management. These will match and eventually surpass the additional capabilities that containers offer, and <strong>then we can all go back to putting files on a server and restarting a process</strong>, which is all that 99% of us actually need.</p>
</blockquote>
<p>There's a bit more to the story than the part I emphasized, but that's one for another day. Suffice to say there's tooling now that fully realizes the <a href="./dream-deploys-atomic-zero-downtime-deployments/">"dream deploys"</a> idea, this site uses it, and who knows, maybe it'll get opensourced one day.</p>
<p>I also took a stab at the horribly clunky <code>{% highlight lang %}</code> template syntax this blog used for code highlighting. When I started there was no good standard for this kind of thing, but now it seems <a href="https://help.github.com/articles/creating-and-highlighting-code-blocks/">fenced code blocks</a> have won. Good for them, they're awesome. Switching <code>tinysite</code> to fenced code turned out to be trivial <a href="https://github.com/acg/tinysite/commit/d6ea6fe0bf58ef6a28776a7f4f0b622f8c47c747">(diff)</a>, mainly because the original approach was a small regex hack rather than a more evolved approach. That Yagni guy they're always invoking knows what's up!</p>
<p>Oh yeah. The Disqus comments section is gone. <a href="http://donw.io/post/github-comments/">Good riddance</a>. It's been broken for years, ever since I migrated from <code>acg.github.io</code> to this domain. I probably made a mistake somewhere in the Disqus migration tool but never could figure it out. If you feel a burning desire to rebutt or high-five something, <a href="https://twitter.com/alangrow">hit me up on twitter</a> and I may link to it. Better yet, <a href="https://github.com/acg/alangrow.com/issues/new">open a github issue against this blog</a>.</p>http://alangrow.com/blog/dream-deploys-atomic-zero-downtime-deploymentsDream Deploys: Atomic, Zero-Downtime Deployments2015-06-05T21:11:00Alanalangrow+maelstrom@gmail.com<p>Are you afraid to deploy? Do deployments always mean either downtime, leaving your site in an inconsistent state for a while, or both? It doesn't have to be this way!</p>
<p>Let's conquer our fear. Let's deploy whenever we damn well feel like it.</p>
<div class="image">
<img src="../images/blog/donnie-darko-not-afraid-anymore.jpg" width="100%"/>
</div>
<h3>You Don't Need Much</h3>
<p>This is a tiny demo to convince you that Dream Deploys are not only possible, they're easy.</p>
<p>To live the dream, you don't need much:</p>
<ul>
<li>You don't need a fancy load balancer.</li>
<li>You don't need magic "clustering" infrastructure.</li>
<li>You don't need a specific language or framework.</li>
<li>You don't need a queue system.</li>
<li>You don't need a message bus or fancy IPC.</li>
<li><em>You don't even need multiple instances of your server running.</em></li>
</ul>
<p>All you need is a couple old-school Unix tricks.</p>
<h2>A Quick Demo</h2>
<p>Don't take my word for it. Grab the code <a href="https://github.com/acg/dream-deploys">here</a> with:</p><div class="highlight"><pre>git clone git@github.com:acg/dream-deploys.git
<span class="nb">cd </span>dream-deploys
</pre></div>
<p>In a terminal, run this and visit the link:</p><div class="highlight"><pre>./serve
</pre></div>
<p>In a second terminal, deploy whenever you want:</p><div class="highlight"><pre>./deploy
</pre></div>
<p>Refresh the page to see it change.</p>
<p>Edit code, static files, or both under <code>./root.unused</code>. Then leave <code>./root.unused</code> and run <code>./deploy</code> to see your changes appear atomically and with zero downtime.</p>
<h2>Questions &amp; Answers</h2>
<h3>What do you mean by a "zero downtime" deployment?</h3>
<p>At no point is the site unavailable. Requests will continue to be served before, during, and after the deployment. In other words, this is about <strong>availability</strong>.</p>
<h3>What do you mean by an "atomic" deployment?</h3>
<p>For a given connection, either you will talk to the new code working against the new files, or you will talk to the old code working against the old files. You will never see a mix of old and new. In other words, this is about <strong>consistency</strong>.</p>
<h3>How does the zero downtime part work?</h3>
<p>This brings us to Unix trick #1. If you keep the same listen socket open throughout the deployment, clients won't get <code>ECONNREFUSED</code> under normal circumstances. The kernel places them in a listen backlog until our server gets around to calling <code>accept(2)</code>.</p>
<p>This means, however, that our server process can't be the thing to call <code>listen(2)</code> if we want to stop and start it, or we'll incur visible downtime. Something else – some long running process – must call <code>listen(2)</code> and keep the listen socket open across deployments.</p>
<p>The trick in a nutshell, then, is this:</p>
<ul>
<li>
<p>A <a href="https://github.com/acg/dream-deploys/blob/master/tcplisten">tiny, dedicated program</a> calls <code>listen(2)</code> and then passes the listen socket to child processes as descriptor 0 (stdin). This process replaces itself by executing a subordinate program.</p>
</li>
<li>
<p>The subordinate program is <a href="https://github.com/acg/dream-deploys/blob/master/loop-forever">just a loop</a> that repeatedly executes our server program. Because this loop program never exits, the listen socket on descriptor 0 stays open.</p>
</li>
<li>
<p>Our server program, instead of calling <code>bind(2)</code> and <code>listen(2)</code> like everyone <strong>loves to do</strong>, humbly calls <code>accept(2)</code> on stdin in a loop and handles one client connection at a time.</p>
</li>
<li>
<p>When it's time to restart the server process, we tell the server to exit after handling the current connection, if any. That way deployment doesn't disrupt any pending requests. We tell the server process to gracefully exit by sending it a <code>SIGHUP</code> signal.</p>
</li>
</ul>
<p><strong>Note</strong>: a shocking, saddening number of web frameworks force you to call <code>listen(2)</code> in your Big Ball Of App Code That Needs To Be Restarted. The <a href="https://github.com/strongloop-forks/connect/blob/7edb875a9f305e38f4d960fa46ac674038241892/lib/proto.js#L231">connect</a> HTTP server framework used by <a href="https://github.com/strongloop/express">express</a>, the most popular web app framework for <a href="https://nodejs.org/">Node.js</a>, is one of them.</p>
<p>"I'll just use the new <a href="https://lwn.net/Articles/542629/"><code>SO_REUSEPORT</code> socket option in Linux</a>!" you say.</p>
<p>Fine, but take care that at least one server process is always running at any given time. This means some handoff coordination between the old and new server processes. Alternately, you could run an unrelated process on the port that just listens.</p>
<p>At any rate, an <code>accept(2)</code>-based server is simpler. It also has some nice added benefits unrelated to deployments:</p>
<ul>
<li>
<p>An <code>accept(2)</code>-based server is network-agnostic. For instance, you can run it behind a Unix domain socket without modifying a single line of code.</p>
</li>
<li>
<p>An <code>accept(2)</code>-based server is a more secure factoring of concerns. If your server listens directly on a privileged port (80 or 443), you'll need root privileges or a fancy capabilities setup. After binding, a listen server should also drop root privileges (horrifyingly, some don't). The <code>accept(2)</code> factoring means a tiny, well-audited program can bind to the privileged port, drop privileges to a minimally empowered user account, and run a known program. This is a huge security win.</p>
</li>
</ul>
<h3>How does the atomic part work?</h3>
<p>A connection will either be served by the old server process or the new server process. The question is whether the old process might possibly see new files, or the new process might see old files. If we update files in-place then one of these inconsistencies can happen. This forces us to keep two complete copies of the files, an old copy and a new copy.</p>
<p>While we're updating the new files, no server process should use them. If the old server process is restarted during this phase, intentionally or accidentally, it should continue to work off the old files. When the new copy is finally ready, we want to "throw the switch": deactivate the old files and simultaneously activate the new files for future server processes. The trick is to make throwing the switch an atomic operation.</p>
<div class="image">
<img src="../images/blog/mad-scientist-with-switch.jpg" width="100%"/>
</div>
<p>There are a number of <a href="http://rcrowley.org/2010/01/06/things-unix-can-do-atomically.html">things Unix can do atomically</a>. Among them: use <code>rename(2)</code> to replace a symlink with another symlink. If the "switch" is a simply a symlink pointing at one directory or the other, then deployments are atomic. This is Unix trick #2.</p>
<h3>What about serving inconsistent assets? Browsers open multiple connections.</h3>
<p>This is a problem, but there's also a straightforward solution.</p>
<p>Let's clarify the problem first: during a deployment, a client may request a page from the old server, then open more connections that request assets from the new server. (Remember, consistency is only guaranteed within the same connection.) So you can get old page content mixed with new css, js, images, etc.</p>
<p>The solution in prevailing practice is to build a new tagged set of static assets for every deployment, then have the page refer to all assets via this tag. You can do this by modifying the <a href="https://github.com/acg/dream-deploys/blob/master/deploy"><code>./deploy</code> script</a> to do this, like so:</p>
<ul>
<li>Update the new files.</li>
<li>Generate a unique tag <code>$TAG</code>. Epoch timestamps are usually good enough.</li>
<li>Record <code>$TAG</code> in a file inside the new file directory.</li>
<li>Copy all the static assets into a new directory <code>assets.$TAG</code> outside of both file copies.</li>
<li>Continue with the deployment.</li>
</ul>
<p>When the server starts up, it should read <code>$TAG</code> from the file, and make sure all asset URLs it generates contain <code>$TAG</code>.</p>
<p>That's pretty much it. Eventually you'll want to delete them, but if you keep the old <code>assets.$TAG</code> directories around for a while, even sessions that haven't reloaded the page will continue to get consistent results across deployments.</p>
<p>The long term solution to this problem is <a href="https://http2.github.io/faq/#why-is-http2-multiplexed">HTTP/2 multiplexing</a>, which makes multiple browser connections unnecessary.</p>
<h3>What about serving inconsistent ajax requests?</h3>
<p>Let's clarify this problem: during a deployment, a client may request a page from the old server, then open more connections that make ajax requests of the new server using old client code.</p>
<p>There's a less technical solution to this one: simply make your API backwards compatible. This is a good idea regardless.</p>
<h3>What about concurrency? Your example only serves one connection at a time.</h3>
<p>You can run as many <code>accept(2)</code>-calling server processes as you want on the same listen socket. The kernel will efficiently multiplex connections to them.</p>
<p>In production, I use a small program I wrote called <code>forkpool</code> that keeps N concurrent child processes running. It doesn't do anything beyond this, which means it doesn't have any bugs at this point and never needs restarting. Remember, children are a precious resource, but without a parent to keep that listen socket open they're <em>orphans</em>.</p>
<h3>What about deployment collisions?</h3>
<p>Yes, you really should prevent concurrent deployments via a lock. That's not demonstrated here, but it's extremely easy and reliable to do with <a href="http://cr.yp.to/daemontools/setlock.html">the setlock(8) program from daemontools</a>.</p>
<h3>What about deploying database schema changes?</h3>
<p>This topic has been covered <a href="https://blog.rainforestqa.com/2014-06-27-zero-downtime-database-migrations/">well elsewhere</a>.</p>http://alangrow.com/blog/turn-vim-into-excel-tips-for-tabular-data-editingTurn Vim Into Excel: Tips for Editing Tabular Data2013-03-29T00:00:00Alanalangrow+maelstrom@gmail.com<div class="center image">
<a href="../images/blog/vim-as-spreadsheet.png"><img src="../images/blog/vim-as-spreadsheet-thumbnail.png" /></a><br/>
<small>Vim editing <a href="http://www.census.gov/econ/cbp/download/">US census data on 2010 county business patterns</a></small>
</div>
<p>I tried to edit data in spreadsheet programs, I really did.</p>
<p>But it's a fact: Vim ruins you for life. Power corrupts.</p>
<p>Of course, Vim can edit tabular data too, although there are a few things that will make it more pleasant. For this discussion I'm assuming you're editing files in tab-separated value format (TSV).</p>
<p><em>"But what about CSV files?"</em> <a href="http://en.wikipedia.org/wiki/Comma-separated_values#Lack_of_a_standard">Just</a>. <a href="http://www.catb.org/esr/writings/taoup/html/ch05s02.html">Don't</a>.</p>
<p><strong>Do</strong>: convert your CSV to TSV and back for editing.</p>
<h2>A Note on the TSV Format</h2>
<p>To really do TSV right, you should escape newline and tab characters in data. Here are two scripts, <a href="https://gist.github.com/acg/5312217">csv2tsv</a> and <a href="https://gist.github.com/acg/5312238">tsv2csv</a>, that will handle escaping during CSV &lt;-&gt; TSV conversions.</p>
<p>Converting CSV to TSV, with C-style escaping:</p>
<pre><code>csv2tsv -e &lt; file.csv &gt; file.tsv
</code></pre>
<p>Converting TSV back to CSV, with C-style un-escaping:</p>
<pre><code>tsv2csv -e &lt; file.tsv &gt; file.csv
</code></pre>
<h2>Setting up Tabular Editing in Vim</h2>
<p>Open the file:</p>
<pre><code>:e file.tsv
</code></pre>
<p>Excel numbers the rows, why can't we?</p>
<pre><code>:set number
</code></pre>
<p>Adjust your tab settings so you're editing with hard tabs:</p>
<pre><code>:setlocal noexpandtab
</code></pre>
<p>Now, widen the columns enough so they're aligned:</p>
<pre><code>:setlocal shiftwidth=20
:setlocal softtabstop=20
:setlocal tabstop=20
</code></pre>
<p>Fiddle with that number 20 as needed. As far as I can tell, Vim doesn't support variable tab stops. It would be real nifty if I was wrong about this. It would be even niftier if column width detection / tabstop setting could be automated.</p>
<h2>Tall Spreadsheets: Always-Visible Column Names Above</h2>
<p>Typically, the first line of the tsv file is a header containing the column names. We want those column names to always be visible, no matter how far down in the file we scroll. The way we'll do this is by splitting the current window in two. The top window will only be 1 line high and will show the headers. The bottom window will be for data editing.</p>
<pre><code>:sp
:0
1 CTRL-W _
CTRL-W j
</code></pre>
<p>At this point you should have two windows, one above the other showing the first row of column headers. If you don't have very many columns, then you're done.</p>
<h2>Wide Spreadsheets: Horizontal Scrolling</h2>
<p>If you do have lots of columns, or very wide columns, you're probably noticing how confusing it looks when lines wrap. Your columns don't line up so well anymore. So turn off wrapping for both windows:</p>
<pre><code>:set nowrap
CTRL-W k
:set nowrap
CTRL-W j
</code></pre>
<p>One problem remains: when you scroll right to edit columns in the data pane, the header pane doesn't scroll to the right with it. Once again, your columns aren't aligned.</p>
<p>Fortunately Vim has a solution: you can "bind" horizontal scrolling of the two windows. This forces them to scroll left and right in tandem.</p>
<pre><code>:set scrollopt=hor
:set scrollbind
CTRL-W k
:set scrollbind
CTRL-W j
</code></pre>
<h2>But What About Formulas and Calculations?!</h2>
<p>It's true, Excel does far more than just edit tabular data. Vim is just ("just") an editor.</p>
<p>However, if you're using Vim, chances are you're a competent programmer. Chances are you can write programs to manipulate tabular data. So how about this arrangement:</p>
<ol>
<li>A tsv that contains formulas, calculations, and other potentially interpreted data.</li>
<li>A program that will process that tsv and "render" a tsv with calculated data.</li>
<li>The ability to quickly switch between these tsvs.</li>
</ol>
<p>I haven't put this to the test, just throwing out ideas.</p>http://alangrow.com/blog/printf-length-delimited-stringHow to printf a length-delimited string2012-11-15T00:00:00Alanalangrow+maelstrom@gmail.com<p>Sometimes you're dealing with a string that isn't null-delimited but rather length-delimited, and you wind up doing somersaults just to print it out:</p><div class="highlight"><pre><span class="kt">void</span> <span class="nf">logit</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">string</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">length</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="mi">255</span><span class="p">];</span>
<span class="n">strncpy</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="n">string</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">buf</span><span class="p">));</span>
<span class="n">buf</span><span class="p">[</span><span class="k">sizeof</span><span class="p">(</span><span class="n">buf</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="sc">&#39;\0&#39;</span><span class="p">;</span>
<span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&quot;debug: %s</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span> <span class="n">buf</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>The extra copying isn't necessary, and you don't have to live with the potential length-truncation either. Did you know <code>printf(3)</code> can format length-delimited strings directly? Buried in the man page is this little gem:</p>
<pre><code>The precision
An optional precision, in the form of a period ('.') followed by an optional decimal digit string. Instead of a decimal digit string one may write "*" or "*m$" (for some decimal integer m) to specify that the precision is given in the next argument, or in the m-th argument, respectively, which must be of type int. This gives ... the maximum number of characters to be printed from a string for s and S conversions.
</code></pre>
<p>With that in mind, we can just write:</p><div class="highlight"><pre><span class="kt">void</span> <span class="nf">logit</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">string</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">length</span><span class="p">)</span> <span class="p">{</span>
<span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&quot;debug: %.*s</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span> <span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="n">length</span><span class="p">,</span> <span class="n">string</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
http://alangrow.com/blog/really-actiontecReally, Actiontec?2012-10-29T00:00:00Alanalangrow+maelstrom@gmail.com<p>From a Verizon-branded Actiontec DSL router. Look for <code>adminPassword</code> in the javascript below...</p>
<pre><code>$ printf "GET / HTTP/1.1\r\n\r\n" | nc 192.168.1.1 80
HTTP/1.1 200 Ok
Server: micro_httpd
Cache-Control: no-cache
Date: Mon, 29 Oct 2012 17:50:28 GMT
Content-Type: text/html
Connection: close
&lt;!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"&gt;
&lt;html&gt;
&lt;head&gt;
&lt;meta http-equiv="Content-Type" content="text/html; charset=utf-8" /&gt;
&lt;title&gt;Actiontec&lt;/title&gt;
&lt;script language="JavaScript" src="js/nav.js"&gt;&lt;/script&gt;
&lt;script language="Javascript"&gt;
var adminPassword = "abc123";
function do_load(){
if(adminPassword == "abc123")
window.top.location.href='login.html';
else
window.top.location.href='index_real.html';
}
&lt;/script&gt;
&lt;/head&gt;
&lt;body onload="do_load()"&gt;
&lt;form name="myform"&gt;
&lt;/form&gt;
&lt;/body&gt;
&lt;/html&gt;
</code></pre>http://alangrow.com/blog/recovering-a-dying-ipod-diskRecovering a Dying iPod Disk2012-04-03T00:00:00Alanalangrow+maelstrom@gmail.com<p>An <a href="http://en.wikipedia.org/wiki/IPod_Classic#Sixth_generation">80GB iPod Classic</a> filled with 4 years of music started to die on us. The symptom: the menu screen suddenly showed "No Music," but disk usage was still nearly 100%. I figured this meant the internal 1.8" hard disk had started to go south and had taken some critical sectors with it.</p>
<p>That turned out to be the case. But here's how we recovered nearly 10,000 files from the iPod anyway...</p>
<h3>The Winning Ticket</h3>
<p>Before things got any worse, I decided to grab an image of the entire disk:</p><div class="highlight"><pre>sudo dd <span class="k">if</span><span class="o">=</span>/dev/sdc <span class="nv">bs</span><span class="o">=</span>1M <span class="nv">conv</span><span class="o">=</span>noerror,sync | pv &gt; ipod.img
</pre></div>
<p>The "conv=noerror" directive tells dd to keep on going if there are disk read errors instead of erroring out. (There were about a dozen. Sectors had probably been going bad for some time, and finally a critical one bit the dust.)</p>
<p>The "conv=sync" directive tells dd to write out an appropriately sized block of zeroes whenever there's an error reading a block. This is necessary, or file offsets will be wrong from the point of the error onward.</p>
<p>The pv command just shows some nice info about how much data is flowing through and how long it's taken. It's not essential here.</p>
<p>As described <a href="#deadends_and_other_things_we_tried">below</a>, I tried to fsck.vfat the first partition of the disk image, but this reported that an unusually high number of free cluster chains would be reclaimed. This indicated that FAT32 metadata had been damaged and that walking the complete filesystem directory structure wouldn't be possible anymore.</p>
<p>The new approach was to say, to hell with directory structure, let's just linearly scan the disk image for files and extract them. This needles-in-the-haystack approach isn't for everybody: you will lose filenames, permissions, directory locality etc. But most mp3s have self-identifying id3 tag metadata so we didn't care too much.</p>
<p>There are a couple programs that can find file needles in a disk image haystack. The one that worked was <a href="http://www.cgsecurity.org/wiki/PhotoRec">PhotoRec</a>, which can actually find much more than just photo files. For an opensource unix program it has a rather strange set of options and user interface. Anyway, I ran it with:</p><div class="highlight"><pre>photorec /log /debug /d rescue ipod.img
</pre></div>
<p>All in all photorec recovered over 8,000 mp3s and some other files to boot.</p>
<pre><code>Pass 1 - Reading sector 135045680/155907592, 9944 files found
Elapsed time 1h14m22s - Estimated time for achievement 0h11m29
mp3: 8339 recovered
mov: 1264 recovered
txt: 129 recovered
apple: 96 recovered
tx?: 63 recovered
jpg: 21 recovered
aif: 13 recovered
riff: 12 recovered
mpg: 3 recovered
gpg: 1 recovered
others: 3 recovered
</code></pre>
<p>Afterwards, the files were scattered randomly in flat directories named rescue.1, rescue.2, rescue.3 etc:</p><div class="highlight"><pre>ls rescue.1 | grep mp3 | head
</pre></div>
<pre><code>f0234384.mp3
f0241008.mp3
f0247536.mp3
f0254352.mp3
f0257680.mp3
f0263664.mp3
f0271120.mp3
f0277872.mp3
f0284784.mp3
f0292176.mp3
</code></pre>
<p>If desired, they can be renamed into Artist + Album + Track + Title directories via a program like <a href="http://search.cpan.org/~acg/supertag-0.2.1/supertag">supertag</a> (disclaimer: I'm the author). But I'm not sure iTunes even cares about filenames.</p>
<p>Addendum: as time has gone on, we've noticed that a fair percentage of the songs were truncated by photorec, something like 1 in 5. One of these rainy weekends I'm going to see if I can patch photorec's mp3 recognition.</p>
<h3>Dead-Ends and Other Things We Tried</h3>
<p>The filesystem was W95 FAT32 but couldn't be mounted due to the bad sectors. Doing an fsck on the block device was also not possible because of read errors. The errors manifested themselves like this in dmesg:</p>
<pre><code>[64658.941382] sd 6:0:0:0: [sdc] Unhandled sense code
[64658.941395] sd 6:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[64658.941407] sd 6:0:0:0: [sdc] Sense Key : Medium Error [current]
[64658.941422] Info fld=0x0
[64658.941428] sd 6:0:0:0: [sdc] Add. Sense: Unrecovered read error
[64658.941442] sd 6:0:0:0: [sdc] CDB: Read(10): 28 00 00 00 00 40 00 00 01 00
[64658.941470] end_request: I/O error, dev sdc, sector 512
[64658.941484] Buffer I/O error on device sdc, logical block 64
</code></pre>
<p>After capturing the disk image, it was possible to run fsck.vfat directly on the partition file; it doesn't actually require a block device, which is cool.</p>
<p>To run fsck on the disk image file, we needed to extract the lone FAT32 partition into a file by itself. The trick here was figuring out where the partition started. Doing an fdisk on the actual block device for the iPod (/dev/sdc) to figure out the disk geometry helped. Using that geometry, this command let us figure out the sector offset of the first partition:</p><div class="highlight"><pre>fdisk -u -C 14991 -b 4096 -l ipod.img
</pre></div>
<pre><code>Device Boot Start End Blocks Id System
ipod.img1 63 19488469 77953628 b W95 FAT32
</code></pre>
<p>A trick to extract the partition image:</p><div class="highlight"><pre><span class="o">{</span> dd <span class="nv">bs</span><span class="o">=</span>4096 <span class="nv">skip</span><span class="o">=</span>63 <span class="nv">count</span><span class="o">=</span>0 ; pv ; <span class="o">}</span> &lt; ipod.img &gt; ipod.img.part1
</pre></div>
<p>This took a while. Disks are slow.</p>
<p>Then I ran fsck.vfat on the partition image:</p><div class="highlight"><pre>fsck.vfat -v -n ipod.img.part1
</pre></div>
<pre><code>...
Checking for unused clusters.
Reclaimed 3561014 unused clusters (58343653376 bytes).
...
</code></pre>
<p>As you can see, it thought most of the disk consisted of free clusters -- this is bad. If I had tried to repair the disk via fsck, only a small fraction of the files would have been recovered.</p>
<p>You can see which file paths were traversed with the -l switch:</p><div class="highlight"><pre>fsck.vfat -v -n -l ipod.img.part1
</pre></div>
<p>In our case this helped me verify that only a small number of files were actually going to be recovered by the fsck.</p>
<p>Once I gave up on fsck and embarked on needle-in-haystack file extraction, I tried <a href="http://www.itu.dk/~jobr/magicrescue/">magicrescue</a>. It found mp3s but kept saying "invalid mp3 file" and extracted almost none of them. It was also really slow -- it shells out to perl scripts and mpg123 to test mp3 validity. Yuck.</p>http://alangrow.com/blog/how-many-consonant-pairsHow Many Consonant Pairs Do We Actually Use?2012-02-26T00:00:00Alanalangrow+maelstrom@gmail.com<p>Of all possible pairs of consonants you could start a word with, how many are actually valid in the English language?</p>
<p>The question came up at a party during a disappointing Ouija board session where the spirits conjured gibberish like "QHPEV." Someone wondered aloud how difficult it was to pick a valid pairs of consonants at random. Instinctively, we felt that most of them were invalid.</p>
<p>This is a nice little problem for the unix text processing toolset. I used the <a href="http://www.isc.ro/lists/twl06.zip">2006 Scrabble Tournament Word List</a> because /usr/share/dict/words contains many proper names and non-words. To get the count:</p><div class="highlight"><pre>tr <span class="s1">&#39;[A-Z]&#39;</span> <span class="s1">&#39;[a-z]&#39;</span> &lt; TWL06.txt |
sed -nEe <span class="s1">&#39;s/^([a-z]{2}).*$/\1/p&#39;</span> |
grep -v <span class="s1">&#39;[aeiouy]&#39;</span> |
sort -u |
wc -l
82
</pre></div>
<p>There are 20 consonants in the language after removing "aeiouy", so that makes 400 possible pairs of consonants.</p>
<p>So only 20.5% of all consonant pairs are valid beginnings for an English word.</p>
<p>To see the 82 valid pairs:</p><div class="highlight"><pre>tr <span class="s1">&#39;[A-Z]&#39;</span> <span class="s1">&#39;[a-z]&#39;</span> &lt; TWL06.txt |
sed -nEe <span class="s1">&#39;s/^([a-z]{2}).*$/\1/p&#39;</span> |
grep -v <span class="s1">&#39;[aeiouy]&#39;</span> |
sort -u |
tr <span class="s1">&#39;\n&#39;</span> <span class="s1">&#39; &#39;</span>
</pre></div>
<pre><code>bd bh bl br bw ch cl cn cr ct cw cz
dh dj dr dw fj fl fr gh gj gl gn gr
gw hm hr hw jn kb kh kl kn kr kv kw
ll lw mb mh mm mn mr ng nt pf ph pl
pn pr ps pt qw rh sc sf sg sh sj sk
sl sm sn sp sq sr st sv sw tc th tm
tr ts tw tz vr wh wr zl zw zz
</code></pre>
<p>To see an example word for each valid pair (remember, this is the Scrabble dictionary, so there's some pretty weird stuff in there):</p><div class="highlight"><pre>tr <span class="s1">&#39;[A-Z]&#39;</span> <span class="s1">&#39;[a-z]&#39;</span> &lt; TWL06.txt |
tr -d <span class="s1">&#39;\r&#39;</span> |
sed -nEe <span class="s1">&#39;s/^([a-z]{2})(.*)$/\1\2 \1/p&#39;</span> |
grep <span class="s1">&#39; [^aeiouy][^aeiouy]&#39;</span> |
sort |
uniq -f1 |
awk <span class="s1">&#39;{ print $2, $1 }&#39;</span>
</pre></div>
<pre><code>bd bdellium
bh bhakta
bl blabbed
br brabble
bw bwana
ch chabazite
cl clabber
cn cnida
cr craal
ct ctenidia
cw cwm
cz czar
dh dhak
dj djebel
dr drabbed
dw dwarf
fj fjeld
fl flabbergasted
fr frabjous
gh gharial
gj gjetost
gl glabellae
gn gnar
gr graal
gw gweduc
hm hm
hr hryvna
hw hwan
jn jnana
kb kbar
kh khaddar
kl klatches
kn knacked
kr kraaled
kv kvases
kw kwacha
ll llama
lw lwei
mb mbaqanga
mh mho
mm mm
mn mnemonically
mr mridangam
ng ngultrum
nt nth
pf pfennige
ph phaeton
pl placabilities
pn pneuma
pr praam
ps psalmbook
pt ptarmigan
qw qwerty
rh rhabdocoele
sc scabbarded
sf sferics
sg sgraffiti
sh shabbatot
sj sjamboked
sk skag
sl slabbed
sm smacked
sn snacked
sp spaceband
sq squabbier
sr sraddha
st stabbed
sv svarajes
sw swabbed
tc tchotchkes
th thacked
tm tmeses
tr trabeated
ts tsaddikim
tw twaddled
tz tzaddikim
vr vroomed
wh whacked
wr wracked
zl zlote
zw zwiebacks
zz zzz
</code></pre>
<p>Aside: finding good and freely available (ie opensource or creative commons) word lists is surprisingly annoying.</p>http://alangrow.com/blog/mutt-tip-attach-multiple-filesMutt Tip: Attach Multiple Files2011-11-25T00:00:00Alanalangrow+maelstrom@gmail.com<p>You can attach multiple files in <a href="http://www.mutt.org/">mutt</a>'s file browser, if they're in the same directory: just use 't' to tag them, then ';'-Enter. You can also view files from the file browser before attaching them, just hit Space. Ten years of mutt and I'm still discovering this stuff...</p>http://alangrow.com/blog/patching-is-a-normal-activityPatching is a Normal Activity2011-11-23T00:00:00Alanalangrow+maelstrom@gmail.com<p>This morning, while adding some songs to my iPod with <a href="http://www.gnu.org/s/gnupod/">gnupod</a>, I hit a <a href="http://savannah.gnu.org/bugs/index.php?34886#discussion">bug</a>.</p>
<p>The error message was descriptive enough that it took all of 5 minutes to patch and be on my way. Gnupod is not great code, but it's Perl, and unlike Rhythmbox -- which has a problem where it syncs partial audio files, and is a big fat gui program written in a compiled language -- I actually have a good chance of making it do what I want.</p>
<p>The two things I found interesting about this:</p>
<ol>
<li>I prefer the crappier but hackable code.</li>
<li>Patching something seems like a perfectly normal activity in the context of "I want to listen to some music."</li>
</ol>http://alangrow.com/blog/python-split-is-inconsistentInconsistent split Behavior in Python2011-11-05T00:00:00Alanalangrow+maelstrom@gmail.com<p>Here's a futile but cathartic <a href="http://bugs.python.org/issue13346">bug report</a> I filed against Python recently.</p>
<p>In Python, string.split and re.split both take an optional argument that limits the number of splits that are done. This is unlike Perl's split builtin, which limits the number of <em>pieces</em>. But it makes sense I guess, and consistency between the two languages is not something I'd necessarily expect.</p>
<p>However, consistency <em>within</em> a language...a reasonable expectation, no?</p>
<p>The inconsistency lies in how the string.split and re.split handle the edge cases of "do an unlimited number of splits" and "don't do any splits." The two agree that "unlimited splits" is the default. They don't agree on how to interpret the value of an explicit maxsplit parameter.</p>
<table class="matrix">
<thead>
<td class="col-header row-header"></td>
<td class="col-header">maxsplit=0</td>
<td class="col-header">maxsplit=-1</td>
</thead>
<tr>
<td class="row-header">string.split</td>
<td>no splits</td>
<td>unlimited splits</td>
</tr>
<tr>
<td class="row-header">re.split</td>
<td>unlimited splits</td>
<td>no splits</td>
</tr>
</table>
<p>I think string.split is doing the sensible thing here.</p>
<p>Of course, the "bug" has zero chance of being fixed at this point. I pretty much just filed it to create a search result for others similarly bitten, annoyed, or both.</p>http://alangrow.com/blog/postgresql-tip-bulk-copying-data-between-tablesPostgreSQL Tip: Bulk Copying Data Between Tables2011-06-17T00:00:00Alanalangrow+maelstrom@gmail.com<p>Suppose you have two different PostgreSQL databases, db1 and db2. You want to populate db2.table2 with data from db1.table1. How?</p>
<p>Try this:</p><div class="highlight"><pre>psql -c <span class="s1">&#39;COPY table1 TO STDOUT&#39;</span> db1 | <span class="se">\</span>
psql -c <span class="s1">&#39;COPY table2 FROM STDIN&#39;</span> db2
</pre></div>
<p>Is there a more efficient way to do this if the two databases are hosted by the same server instance? Probably.</p>
<p>Then again, if the databases are on different servers, this works:</p><div class="highlight"><pre>psql -c <span class="s1">&#39;COPY table1 TO STDOUT&#39;</span> db1 | gzip -c | <span class="se">\</span>
ssh host2 <span class="s2">&quot;gunzip -c | psql -c &#39;COPY table2 FROM STDIN&#39; db2&quot;</span>
</pre></div>
<p>Bonus: with <a href="http://www.ivarch.com/programs/pv.shtml">pv(1)</a>, you can see how quickly the data is flowing:</p><div class="highlight"><pre>psql -c <span class="s1">&#39;COPY table1 TO STDOUT&#39;</span> db1 | pv | <span class="se">\</span>
psql -c <span class="s1">&#39;COPY table2 FROM STDIN&#39;</span> db2
</pre></div>
http://alangrow.com/blog/measuring-the-measurersMeasuring the Measurers2011-06-10T00:00:00Alanalangrow+maelstrom@gmail.com<p>"Projects A and B are your top priority now. Oh, and Project C can't be impacted."</p>
<p>Sound familiar?</p>
<p>It's a common complaint of the project-managed: everything can't be top priority. Something has to give. Resources allocated to Project A must be deallocated from elsewhere, either Project C, or some other project. Declaring everything "top priority" is not helpful.</p>
<p>If project management accomplishes one thing, it should help each of us answer the question, "What should I work on next?"</p>
<p>A friend of mine relates a story about a meeting between tech and client services. The tech team came prepared with a list of development tasks in loose priority order. As the meeting progressed, the client services team found more and more reasons to disagree with the priorities.</p>
<p>Eventually, in frustration, the tech lead said, "Here's the list. You order it."</p>
<p>The client services lead was taken aback and refused: "It all has to be done. As soon as possible."</p>
<p>Not helpful.</p>
<p>While I do think there are better ways of scheduling work than imposing a single ordering -- which breaks down when multiple workers are able to proceed in parallel -- I also think the ability to see and maintain consistent priorities is an important thing to look for in a project manager. Or any manager, really.</p>
<p>Which is why I propose the following fun experiment. Present a manager with two randomly sampled work items from their team, side by side, and ask which is higher priority. Repeat until you've got a decent number of comparisons. Remember xkcd's <a href="http://thefunniest.info/">project to find the funniest image in the world</a>? Yeah. It's kinda like that.</p>
<p>Now that we've turned a human being into a comparison operator ;) we can ask how good that operator is. Does it define an ordering? For any reasonable sample size, probably not.</p>
<p>Forget about <a href="http://en.wikipedia.org/wiki/Sorting_algorithm#Stability">stable sort</a>. Viewed as a <a href="http://en.wikipedia.org/wiki/Directed_graph">directed graph</a>, there will probably be cycles, like A &gt; B &gt; C &gt; A. In general, you can induce an acyclic digraph from a cyclic digraph by identifying the <a href="http://en.wikipedia.org/wiki/Strongly_connected_component">strongly connected components</a>. So one metric would be to compare the size of the induced acyclic graph to the original graph (<code>1/|V|</code> is the worst, <code>|V|/|V|=1.0</code> is the best). Another metric would be the height of the induced acyclic graph over the number of nodes (work items). A perfect comparison operator would produce a line of nodes in a well-defined order, and would score 1.0.</p>
<p>Another thing to measure would be the consistency of the ordering over time. Yes, priorities change, but resource re-allocation also has a cost.</p>
<p>Measuring the measurers seems like a good thing for a number of reasons. Among them, that it exposes the often subtle problems of <em>conflicting directives</em> and the even subtler problems of <em>competing directives</em>. Too often, only the people carrying out the directives are aware of them.</p>http://alangrow.com/blog/put-everything-in-vi-modePut *Everything* in vi Mode2011-05-17T00:00:00Alanalangrow+maelstrom@gmail.com<p>It's the little things in life. Especially when they add up. Consider, for instance, the calculus of a productivity tweak you should have made half a decade ago.</p>
<p>If you're a vi user like me, try adding these two lines to your <code>~/.inputrc</code> file:</p>
<pre><code>set keymap vi
set editing-mode vi
</code></pre>
<p>Now, every program that uses the readline library for tty input ( <code>perl -d</code>, the <code>python</code> REPL, <code>psql</code>, <code>gdb</code>, anything you run under <code>rlwrap</code>, etc.) has vi key bindings instead of the default emacs bindings.</p>
<p>In short, this means things like:</p>
<ul>
<li><code>0</code> and <code>$</code> for beginning and end of line</li>
<li><code>k</code> and <code>j</code> for navigating history forwards and backwards</li>
<li><code>b</code> and <code>e</code> for skipping words</li>
<li><code>u</code> for undo</li>
</ul>
<p>The full list is in the <a href="http://www.freebsd.org/cgi/man.cgi?query=readline&amp;apropos=0&amp;sektion=0&amp;manpath=FreeBSD+8.2-RELEASE&amp;format=html#DEFAULT_KEY_BINDINGS">readline man page</a>.</p>
<p>I've been using this for years with bash, where one can do <code>set -o vi</code>. Are full vi bindings a recent feature of readline? Or do I really have no excuse for this one?</p>http://alangrow.com/blog/how-i-lost-100-and-blamed-calHow I Lost $100 and Blamed It On cal(1)2011-03-22T00:00:00Alanalangrow+maelstrom@gmail.com<p>True story. Back in September 2008, I decided that this year, I would <strong>not</strong> wait until the last minute to book my Thanksgiving flight home.</p>
<p>What's the rule for Thanksgiving again? Oh right, fourth Thursday in November. So I busted out <a href="http://www.freebsd.org/cgi/man.cgi?query=cal&amp;apropos=0&amp;sektion=0&amp;manpath=FreeBSD+8.2-RELEASE&amp;format=html">cal(1)</a>:</p>
<pre><code>$ cal
September 2008
Su Mo Tu We Th Fr Sa
1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30
</code></pre>
<p>Whoops, it only shows the current month. So I passed it the year:</p>
<pre><code>$ cal 08
8
January February March
Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa
1 2 3 4 5 6 7 1 2 3 4 1 2 3
8 9 10 11 12 13 14 5 6 7 8 9 10 11 4 5 6 7 8 9 10
15 16 17 18 19 20 21 12 13 14 15 16 17 18 11 12 13 14 15 16 17
22 23 24 25 26 27 28 19 20 21 22 23 24 25 18 19 20 21 22 23 24
29 30 31 26 27 28 29 25 26 27 28 29 30 31
April May June
Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa
1 2 3 4 5 6 7 1 2 3 4 5 1 2
8 9 10 11 12 13 14 6 7 8 9 10 11 12 3 4 5 6 7 8 9
15 16 17 18 19 20 21 13 14 15 16 17 18 19 10 11 12 13 14 15 16
22 23 24 25 26 27 28 20 21 22 23 24 25 26 17 18 19 20 21 22 23
29 30 27 28 29 30 31 24 25 26 27 28 29 30
July August September
Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa
1 2 3 4 5 6 7 1 2 3 4 1
8 9 10 11 12 13 14 5 6 7 8 9 10 11 2 3 4 5 6 7 8
15 16 17 18 19 20 21 12 13 14 15 16 17 18 9 10 11 12 13 14 15
22 23 24 25 26 27 28 19 20 21 22 23 24 25 16 17 18 19 20 21 22
29 30 31 26 27 28 29 30 31 23 24 25 26 27 28 29
30
October November December
Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa
1 2 3 4 5 6 1 2 3 1
7 8 9 10 11 12 13 4 5 6 7 8 9 10 2 3 4 5 6 7 8
14 15 16 17 18 19 20 11 12 13 14 15 16 17 9 10 11 12 13 14 15
21 22 23 24 25 26 27 18 19 20 21 22 23 24 16 17 18 19 20 21 22
28 29 30 31 25 26 27 28 29 30 23 24 25 26 27 28 29
30 31
</code></pre>
<p>I booked my flight for Tuesday, November 20th, and forgot about it.</p>
<p>The day approached. I called home just to make sure someone could pick me up from the airport. That's when I discovered that Thanksgiving was actually the following week. <strong>I had booked my flight based on the calendar for the year 8 A.D.</strong></p>
<p>What I should have done was this:</p>
<pre><code>$ cal 2008
2008
January February March
Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa
1 2 3 4 5 1 2 1
6 7 8 9 10 11 12 3 4 5 6 7 8 9 2 3 4 5 6 7 8
13 14 15 16 17 18 19 10 11 12 13 14 15 16 9 10 11 12 13 14 15
20 21 22 23 24 25 26 17 18 19 20 21 22 23 16 17 18 19 20 21 22
27 28 29 30 31 24 25 26 27 28 29 23 24 25 26 27 28 29
30 31
April May June
Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa
1 2 3 4 5 1 2 3 1 2 3 4 5 6 7
6 7 8 9 10 11 12 4 5 6 7 8 9 10 8 9 10 11 12 13 14
13 14 15 16 17 18 19 11 12 13 14 15 16 17 15 16 17 18 19 20 21
20 21 22 23 24 25 26 18 19 20 21 22 23 24 22 23 24 25 26 27 28
27 28 29 30 25 26 27 28 29 30 31 29 30
July August September
Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa
1 2 3 4 5 1 2 1 2 3 4 5 6
6 7 8 9 10 11 12 3 4 5 6 7 8 9 7 8 9 10 11 12 13
13 14 15 16 17 18 19 10 11 12 13 14 15 16 14 15 16 17 18 19 20
20 21 22 23 24 25 26 17 18 19 20 21 22 23 21 22 23 24 25 26 27
27 28 29 30 31 24 25 26 27 28 29 30 28 29 30
31
October November December
Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa
1 2 3 4 1 1 2 3 4 5 6
5 6 7 8 9 10 11 2 3 4 5 6 7 8 7 8 9 10 11 12 13
12 13 14 15 16 17 18 9 10 11 12 13 14 15 14 15 16 17 18 19 20
19 20 21 22 23 24 25 16 17 18 19 20 21 22 21 22 23 24 25 26 27
26 27 28 29 30 31 23 24 25 26 27 28 29 28 29 30 31
30
</code></pre>
<p>When all was said and done -- with the change fee and the fare difference -- the mistake cost me $100. But it "inspired" me to actually learn a thing or two about cal(1).</p>
<p><strong>TL;DR</strong>: RTFM, or you will pay.</p>
<pre><code>CAL(1)
...
A single parameter specifies the year (1 - 5875706) to be displayed; note the year must be fully specified: “cal 89” will not display a calendar
for 1989.
</code></pre>http://alangrow.com/blog/better-inline-syntax-highlightingCoding for the Web: A Proposal for Better Inline Syntax Highlighting2011-03-14T00:00:00Alanalangrow+maelstrom@gmail.com<p><em>Update: I've since switched from this syntax to <a href="https://help.github.com/articles/creating-and-highlighting-code-blocks/">fenced code blocks</a>.</em></p>
<p><a href="http://daringfireball.net/projects/markdown/syntax">Markdown</a> is great for semi-structured text. <a href="http://pygments.org/">Pygments</a> is great for syntax highlighting. This blog uses both: <a href="https://github.com/mojombo/jekyll">jekyll</a>+<a href="http://www.liquidmarkup.org/">liquid</a> passes code snippets surrounded by {<code>% highlight languageX %</code>} and {<code>% endhighlight %</code>} to pygments. The rest gets processed with markdown.</p>
<p>So is there anything to complain about here? As usual, the answer is yes.</p>
<ul>
<li>The {<code>% highlight languageX %</code>} syntax isn't supported by github's default markdown renderer. So if I use it in the README.md file for a project, it will appear literally around the un-highlighted output. This may well confuse the hell out of someone trying to copy and paste some code or shell commands. If the github guys don't want to pay the penalty for parsing and syntax highlighting in markdown everywhere, I completely understand. But then let's try find a relatively inert way of specifying the language of a code snippet.</li>
<li>The {<code>% highlight languageX %</code>} syntax is also jekyll+liquid-specific. I don't see support for this elsewhere. I do see people rolling their own syntax <a href="http://zerokspot.com/weblog/2008/06/18/syntax-highlighting-in-markdown-with-pygments/">like this one</a>, which was later incorporated into the python markdown package, and looks for code snippets surrounded by <code>[sourcecode:languageX]</code> <code>[/sourcecode]</code>. It's similarly deficient in that code snippets must be surrounded by special beginning and ending tokens that will be confusing if emitted literally.</li>
<li>The {<code>% highlight languageX %</code>} syntax doesn't actually play nice with markdown code blocks: you can't indent the code snippet with 4 spaces and wrap it with {<code>% highlight languageX %</code>} {<code>% endhighlight %</code>}. You must use no indentation for the snippet. This means a markdown processor that doesn't understand the syntax won't even know to emit an html code element; you'll get plain, wrapped text, probably not in a monospace font. Not good. These unindented code snippets also look like shit in <a href="http://www.vim.org/scripts/script.php?script_id=2882">markdown.vim</a>.</li>
</ul>
<p>To summarize, a better solution:</p>
<ul>
<li>Should be "inert", ie not confusing or ugly if output literally as part of the snippet.</li>
<li>Should gracefully degrade when a markdown processor doesn't implement special syntax highlighting.</li>
<li>Doesn't need both beginning and ending tags. Just scope the syntax highlighting to the current code block.</li>
</ul>
<p>On to the specific proposal, which is really nothing fancy or new:</p>
<ul>
<li>Put a shebang line at the beginning of the code block.</li>
</ul>
<p>In a sense, this is a <strong>solved problem</strong>.</p>
<p>My <a href="http://www.vim.org">editor</a> already does syntax highlighting based on the shebang line, and chances are, so does yours. In many cases it also makes the code snippet more complete: if you're going to copy and paste it into a new script, you're going to add the shebang line anyway. But you could also choose to suppress the shebang line when rendering.</p>
<p>Another solution might be to use a <a href="http://everything2.com/title/modeline">modeline</a>. In either case, you're embedding information in a language-specific comment, and doing so in a way that already has precedent.</p>
<p>Here's an example code snippet from the documentation for <a href="https://github.com/acg/python-percentcoding">python-percentcoding</a>:</p><div class="highlight"><pre><span class="c">#!/usr/bin/env python</span>
<span class="kn">from</span> <span class="nn">percentcoding</span> <span class="kn">import</span> <span class="n">quote</span><span class="p">,</span> <span class="n">unquote</span>
<span class="nb">str</span> <span class="o">=</span> <span class="s">&quot;This is a test!&quot;</span>
<span class="n">escaped</span> <span class="o">=</span> <span class="n">quote</span><span class="p">(</span><span class="nb">str</span><span class="p">)</span>
<span class="k">print</span> <span class="n">escaped</span>
<span class="k">assert</span><span class="p">(</span><span class="nb">str</span> <span class="o">==</span> <span class="n">unquote</span><span class="p">(</span><span class="n">escaped</span><span class="p">))</span>
</pre></div>
<p>I've already implemented the proposal in Python <a href="https://github.com/acg/python-percentcoding/blob/master/hilite_markdown.py">here</a>, in order to generate html documentation for pypi. It would need to be ported over to Ruby for use in jekyll+liquid.</p>http://alangrow.com/blog/two-new-python-c-extensionsTwo New Python C Extensions2011-03-08T00:00:00Alanalangrow+maelstrom@gmail.com<p>Today I'm releasing two new Python C extensions. They've been useful in fast text record processing, but could be used for plenty of other things. YMMV.</p>
<ul>
<li><strong><a href="https://github.com/acg/python-percentcoding">percentcoding</a></strong> -- is a Python C extension for <a href="http://en.wikipedia.org/wiki/Percent-encoding">percent encoding</a> and decoding. URL encoding is a specific instance of percent encoding, with a set of reserved characters defined by <a href="http://tools.ietf.org/html/rfc3986#section-2.1">RFC 3986</a> . The <code>percentcoding</code> library can be used as a 10x faster drop-in replacement for the <a href="http://docs.python.org/library/urllib.html?highlight=urllib#urllib.quote">urllib.quote</a> and <a href="http://docs.python.org/library/urllib.html?highlight=urllib#urllib.unquote">urllib.unquote</a> included with Python. I use it for escaping whitespace and non-printable characters in Unix text record formats.</li>
<li><strong><a href="https://github.com/acg/python-flattery">flattery</a></strong> -- is a Python C extension for converting hierarchical data to and from flat key/value pairs. This comes up in web form processing when you've got many different input elements in a single form -- perhaps even tabular data that can be edited -- and you want to map them onto a nested data structure. I use it together with <a href="https://github.com/acg/python-percentcoding">percentcoding</a> to process hierarchical record data stored in Unix text formats. Which makes them interchangeable with records in json or protocol buffer format, except that they're <code>sort(1)</code>, <code>cut(1)</code>, <code>join(1)</code> etc. friendly.</li>
</ul>
<p>I've had pure Python implementations of these kicking around for a while. They were slow, but it didn't matter until recently. See also <a href="http://news.ycombinator.com/item?id=2290357">a day in the life of a back-end developer</a>:</p>
<blockquote>
<p>1. Find bottleneck. <br/>
2. Remove bottleneck. <br/>
3. Repeat. <br/>
4. Every once in a while, make a bold move to throw something out that can no longer work that way and replace it with something more scalable. But while this is important, it comes up less often than you might think.</p>
</blockquote>http://alangrow.com/blog/teasing-out-a-new-repositoryTeasing Out a New Git Repository2011-03-02T00:00:00Alanalangrow+maelstrom@gmail.com<p><em>The Ideal Git Law states that the documentation surrounding git(1) will expand to fill all available volume.</em></p>
<p>I'm building a suite of record processing tools. Up to now, the development has taken place inside the <a href="https://github.com/acg/lwpb">lwpb</a> git repository. But it doesn't really belong there, since other record formats besides protobuf are supported: the classic unix tab-separated text format, and soon json.</p>
<p>So how does one extract <em>part</em> of a git repository into a new repository, preserving history where possible?</p>
<p>All of the files I want to extract from the main repository live under the same subdirectory, which should become the top-level directory of the new repository. So a good place to start is this <a href="http://stackoverflow.com/questions/359424/detach-subdirectory-into-separate-git-repository">stack overflow thread</a> which explains <code>git filter-branch --subdirectory-filter subdir</code>. It goes something like this:</p><div class="highlight"><pre>mkdir newrepo
<span class="nb">cd </span>newrepo
git clone --no-hardlinks /oldrepo ./
git filter-branch --subdirectory-filter subdir HEAD
git reset --hard
git gc --aggressive
git prune
</pre></div>
<p>As a comment on the stackoverflow thread mentions, it's also a good idea to remove the old repo as a remote of the new repo, so you don't accidentally push changes back to it:</p><div class="highlight"><pre>git remote rm origin
</pre></div>
<p>So far so good. But I only want <em>some</em> of the files under this subdirectory in the new repo. The rest shouldn't be there. Can I rewrite the commit history again, this time file-wise?</p>
<p>Yes. For this I used <code>git filter-branch --tree-filter command</code>. This works by checking out each commit, running <code>$SHELL -c "$command"</code>, looking at what changes were made to the checkout, and then formulating a new commit. If the command removes a file in the checkout, it will be removed from the commit. If a command creates a file, it will be added to the commit.</p>
<p>In my case, I only want to remove certain files, so the filter command is a shell script that looks like this:</p><div class="highlight"><pre><span class="c">#!/bin/sh</span>
find . -type f -not -path <span class="s2">&quot;*/.git/*&quot;</span> | <span class="se">\</span>
sed -e <span class="s1">&#39;s#^./##&#39;</span> | <span class="se">\</span>
grep -v -E <span class="s1">&#39;^(pb.*\.py|flat\.py|percent.*)$&#39;</span> | <span class="se">\</span>
xargs rm -v
</pre></div>
<p>The <code>rm -v</code> lets me see all the deletions this script makes for each commit. I saved this as my-git-filter and ran</p><div class="highlight"><pre>git filter-branch -f --prune-empty --tree-filter my-git-filter HEAD
</pre></div>
<p>The <code>-f</code> option forces the operation even if there's already a backup of the original repo from a previous <code>git filter-branch</code> run.</p>
<p>Follow this up with the same cleanup procedure from the <code>--subdirectory-filter</code> example:</p><div class="highlight"><pre>git reset --hard
git gc --aggressive
git prune
</pre></div>
http://alangrow.com/blog/saving-flash-videosSaving Flash Videos with Linux2011-02-28T00:00:00Alanalangrow+maelstrom@gmail.com<p><em>Update: this was written before <a href="https://rg3.github.io/youtube-dl/">youtube-dl</a>. You should use that instead. The tricks below probably don't work anymore, but I'm leaving this post up for historical curiosity.</em></p>
<p>Sometimes, when I'm watching a flash video in my browser, I'd like to download the video file itself and watch it later, offline.</p>
<p>With older versions of the linux flash plugin this was easy: the flash video file was downloaded to a temporary path like <code>/tmp/FlashXX1sjAm9</code>. You could just copy the file to somewhere outside of <code>/tmp</code>.</p>
<p>The most recent linux flash plugin makes this a bit harder, but it's still no match for a wily unix user. The new problem is that flash deletes the <code>/tmp/FlashXYZblah</code> video file. But the key is that the flash process still has the deleted file open for reading and writing.</p>
<p>The following instructions work for both Firefox and Chrome. (But they certainly won't work forever; I'm sure future versions of the flash plugin will find a way to make this even more convoluted.)</p>
<p>First, load the page with the video, start playing it, and wait for the video to finish buffering.</p>
<p>Next, track down the flash plugin process.</p>
<pre><code>$ ps ax | grep flash
28988 ? Sl 0:03 /usr/lib/firefox-3.6.13/plugin-container /usr/lib/flashplugin-installer/libflashplayer.so 28970 plugin
</code></pre>
<p>Now, we're going to use the <code>/proc</code> filesystem.</p>
<pre><code>$ cd /proc/28988/fd/
$ ls -l
total 0K
lr-x------ 1 user user 64 2011-02-28 13:05 0 -&gt; /dev/null
lrwx------ 1 user user 64 2011-02-28 13:05 1 -&gt; /mnt/common/home/user/.xsession-errors
lrwx------ 1 user user 64 2011-02-28 13:05 10 -&gt; pipe:[7860847]
lrwx------ 1 user user 64 2011-02-28 13:05 11 -&gt; pipe:[7860848]
lrwx------ 1 user user 64 2011-02-28 13:05 12 -&gt; pipe:[7860848]
lrwx------ 1 user user 64 2011-02-28 13:05 13 -&gt; socket:[7860851]
lrwx------ 1 user user 64 2011-02-28 13:05 14 -&gt; /mnt/common/home/user/.mozilla/firefox/abc123.default/cert8.db
l-wx------ 1 user user 64 2011-02-28 13:05 15 -&gt; /mnt/common/home/user/.mozilla/firefox/abc123.default/key3.db
lrwx------ 1 user user 64 2011-02-28 13:05 16 -&gt; /tmp/FlashXX1sjAm9 (deleted)
lrwx------ 1 user user 64 2011-02-28 13:05 17 -&gt; pipe:[7860983]
lrwx------ 1 user user 64 2011-02-28 13:05 18 -&gt; pipe:[7860983]
lr-x------ 1 user user 64 2011-02-28 13:05 19 -&gt; pipe:[7860984]
lrwx------ 1 user user 64 2011-02-28 13:05 2 -&gt; /mnt/common/home/user/.xsession-errors
l-wx------ 1 user user 64 2011-02-28 13:05 20 -&gt; pipe:[7860984]
lr-x------ 1 user user 64 2011-02-28 13:05 21 -&gt; socket:[7860988]
lrwx------ 1 user user 64 2011-02-28 13:05 3 -&gt; socket:[7860769]
lr-x------ 1 user user 64 2011-02-28 13:05 4 -&gt; anon_inode:[eventpoll]
l-wx------ 1 user user 64 2011-02-28 13:05 5 -&gt; socket:[7860844]
lr-x------ 1 user user 64 2011-02-28 13:05 6 -&gt; socket:[7860845]
l-wx------ 1 user user 64 2011-02-28 13:05 7 -&gt; pipe:[7860846]
lr-x------ 1 user user 64 2011-02-28 13:05 8 -&gt; pipe:[7860846]
l-wx------ 1 user user 64 2011-02-28 13:05 9 -&gt; pipe:[7860847]
</code></pre>
<p>Okay, there's a bunch of junk we don't care about. But see file descriptor 16? That's a symlink to the deleted flash video. Save it:</p>
<pre><code>$ cp 16 ~/movie.flv
</code></pre>
<p>Test that you can play the video:</p>
<pre><code>$ mplayer ~/movie.flv
</code></pre>http://alangrow.com/blog/profiling-every-command-in-a-makefileProfiling every command in a Makefile2011-02-25T00:00:00Alanalangrow+maelstrom@gmail.com<p>Here's the scenario. I've got a batch data processing pipeline implemented as a Makefile. (Hey! It's only a prototype! Trust me, I'm a make hater just like you!) There's already a lot of data, so an end-to-end full run can take about a day, with some of the individual stages taking hours.</p>
<p>Now I'm thinking, wouldn't it be nice to know how long each rule took? Even better, wouldn't it be nice to get a report of how much cpu it consumed, how much memory it used, how much I/O it performed, etc.? Armed with this information, I could start optimizing poorly performing stages.</p>
<p>So, let's suppose we cook up some wrapper program that runs a subordinate program, collects <a href="http://www.freebsd.org/cgi/man.cgi?query=getrusage&amp;apropos=0&amp;sektion=2&amp;format=html">rusage</a> when it exits, and prints out the interesting info. Fortunately, such a wrapper program basically already exists.</p>
<p>I'd rather not go rewrite every rule in the Makefile, prefixing it with this wrapper program. That wouldn't even work if the rule was a pipeline: since <code>make(1)</code> executes rules by wrapping them with <code>$(SHELL) -c</code>, only the first command in the pipeline would actually run under the wrapper.</p>
<p>The solution is to <a href="http://www.gnu.org/software/make/manual/make.html#Choosing-the-Shell">set the shell</a> in your Makefile to:</p><div class="highlight"><pre><span class="nv">SHELL</span> <span class="o">=</span> rusage sh
</pre></div>
<p>Where <code>rusage</code> is a wrapper shell script that looks like this:</p><div class="highlight"><pre><span class="c">#!/bin/sh</span>
<span class="nb">exec time</span> -f <span class="s1">&#39;rc=%x elapsed=%e user=%U system=%S maxrss=%M avgrss=%t ins=%I outs=%O minflt=%R majflt=%F swaps=%W avgmem=%K avgdata=%D argv=&quot;%C&quot;&#39;</span> <span class="s2">&quot;$@&quot;</span>
</pre></div>
<p>Note that this uses <code>/usr/bin/time</code>, <strong>not to be confused</strong> with the bash builtin <code>time</code>, which is what you're using probably 90% of the time at the command line.</p>
<p>Note also, this unfortunately only works with GNU <code>time(1)</code>. The BSD (and probably Darwin, haven't actually checked) versions of <code>time(1)</code> don't support the <code>-f</code> argument to specify a format string. But on BSD derivatives, you should be able to at least get a human readable dump of the rusage structure by using <code>/usr/bin/time -l</code>. Which looks equivalent to the <code>/usr/bin/time -v</code> output from GNU time. (It's just not as convenient if you plan to analyze the logs later.)</p>http://alangrow.com/blog/mapping-python-over-records-with-lwpbMapping Python Code Over Records With lwpb2011-02-08T00:00:00Alanalangrow+maelstrom@gmail.com<p><em>In which we reimplement <code>wc(1)</code> as a <a href="#wc-pbio-example">python one-liner</a>, discover a <a href="#python-exec">neat feature</a> in the Python interpreter, <a href="#top10-pbio-example">rip through</a> a bunch of records in a document database, and generally start to wonder if we're converging on the <code>awk(1)</code> of Protocol Buffers.</em></p>
<p>In my work on <a href="https://github.com/acg/lwpb">lwpb</a>, a library which includes a <a href="https://github.com/acg/lwpb#performance">fast</a> Python encoder and decoder for <a href="http://code.google.com/p/protobuf/">Google Protocol Buffers</a>, one of the first things I needed was a comfortable way to convert between a protobuf stream and a plain old text stream. You know, the usual Unix tab- and newline-delimited records thing. Once you've got this conversion, your old friends <code>grep(1)</code>, <code>cut(1)</code>, <code>sort(1)</code>, et al. can help you again.</p>
<p>The conversion tool I came up with is called <a href="https://github.com/acg/lwpb/blob/python/python/pbio.py">pbio</a>. It converts in both directions, and can also do some other things like extract a range of records. So far, pretty pedestrian right?</p>
<p>But with a mere <a href="https://github.com/acg/lwpb/commit/a64f2f9eeb497cc83e66f4471ddd7ccdebb05c13">8 line patch</a>, <em>pbio</em> has suddenly become immensely more useful: you can now map Python code supplied at the command line over your records, producing new calculated fields. This is a big step towards <a href="http://en.wikipedia.org/wiki/MapReduce">MapReduce</a>-style programming, but without the overhead of having to write a separate program each time which defines a distinct <em>map</em> function. As a programmer, I'm always looking for ways to write less code and still get the job done.</p>
<p>In the <em>pbio</em> case, there is <strong>zero</strong> overhead code required to calculate and output a new field, a surprising and mostly accidental consequence of <a href="https://github.com/acg/lwpb">lwpb's</a> decision to encode and decode using dictionaries, together with a serendipitous feature of Python's <a href="http://docs.python.org/reference/simple_stmts.html#grammar-token-exec_stmt">exec()</a> built-in. More on that in a second.</p>
<p>To frame all of this with an example, suppose you have the following simple <em>schema.proto</em> file for a document database.</p>
<pre><code>package org;
message Document {
required uint64 docid = 1;
required string url = 2;
required string content = 3;
};
</code></pre>
<p>Perhaps this database is populated by a web crawler. You'd like to know the length, in bytes, of each document.</p>
<p>Sure, you could dump the entire content of each document with <em>pbio</em> and pipe that to a script that calculates lengths, but that's a bit wasteful. And you're also going to have to grok the percent-escaped sequences that <em>pbio</em> uses.</p>
<p>Here's a better way:</p><div class="highlight"><pre>pbio.py -F <span class="s1">&#39;url,length&#39;</span> -e <span class="s1">&#39;length=len(content)&#39;</span> -p schema.pb -m org.Document &lt; docs.pb
</pre></div>
<p>Let's break down what's going on here:</p>
<ul>
<li><em>pbio</em> is inputting protobuf records, and outputting text records (the default mode)</li>
<li><code>-F 'url,length'</code> tells <em>pbio</em> to output text records with these two fields</li>
<li><code>-p schema.pb</code> specifies a compiled version of <em>schema.proto</em> you've created with <code>protoc</code></li>
<li><code>-m org.Document</code> says records in <em>docs.pb</em> conform to the <em>org.Document</em> message type</li>
<li><code>-e 'length=len(content)'</code> calculates a new field in the output record named <em>length</em></li>
</ul>
<p>The <code>-F</code>, <code>-p</code>, and <code>-m</code> options are just about input and output, while the <code>-e</code> option is more interesting. How does <em>pbio</em> know to populate a new field in the output record named <em>length</em> with the calculated value? Does it parse the code, looking for assignments that match up with fields specified by <code>-F</code>? And how has a field of the input record, <code>content</code>, become available as a local variable?</p>
<p><span id="python-exec"></span>
This is where Python's <a href="http://docs.python.org/reference/simple_stmts.html#grammar-token-exec_stmt">exec()</a> comes in:</p>
<blockquote>
<p><code>exec_stmt ::= "exec" or_expr ["in" expression ["," expression]]</code></p>
<p>In all cases, if the optional parts are omitted, the code is executed in the current scope. If only the first expression after <em>in</em> is specified, it should be a dictionary, which will be used for both the global and the local variables... As a side effect, an implementation may insert additional keys into the dictionaries given besides those corresponding to variable names set by the executed code.</p>
</blockquote>
<p>The input record, which has been decoded by <em>lwpb</em> into a dictionary, becomes the scope in which the user-supplied code executes. Each input field becomes a local variable in the new scope -- in our example, this means the user-supplied code automatically sees local variables <code>docid</code>, <code>url</code>, and <code>content</code>, one for each field in the input record. After execution, any new local variables created by assignments become new fields in the output record.</p>
<p><span id="wc-pbio-example"></span>
More complicated code is possible. For instance, here's <code>wc(1)</code>:</p><div class="highlight"><pre>pbio.py -F <span class="s1">&#39;lines,words,chars,url&#39;</span> -p schema.pb -m org.Document &lt; docs.pb -e <span class="s1">&#39;</span>
<span class="s1">chars=len(content)</span>
<span class="s1">words=len(content.split())</span>
<span class="s1">lines=len(content.split(&quot;\n&quot;))&#39;</span>
</pre></div>
<p><span id="top10-pbio-example"></span>
To wrap up the original example, let's find the top 10 longest documents:</p><div class="highlight"><pre>pbio.py -F <span class="s1">&#39;url,length&#39;</span> -e <span class="s1">&#39;length=len(content)&#39;</span> -p schema.pb -m org.Document &lt; docs.pb | sort -k2 -nr | head -10
</pre></div>
http://alangrow.com/blog/bouncing-hopping-tunneling-with-tcpforwardBouncing, Hopping and Tunneling with tcpforward2011-02-07T00:00:00Alanalangrow+maelstrom@gmail.com<p>This weekend I dusted off a little network utility of mine called <a href="https://github.com/acg/tcpforward">tcpforward</a>. It proved its worth once again, so instead of throwing it back into the rusty toolbox like I always do, here's why you might want to throw it into your very own rusty toolbox. ;)</p>
<ul class="toc">
<li><a href="#bouncing">Scenario: Remote Assistance, AKA "Bouncing Your Signal Off The Moon"</a></li>
<li><a href="#hopping">Scenario: Hopping Over the Middleman</a></li>
<li><a href="#tunneling">Scenario: Tunneling Through Corporate Firewalls </a></li>
<li><a href="#how-it-works">How it Works</a></li>
</ul>
<p><span id="bouncing"></span> </p>
<h3>Scenario: Remote Assistance, AKA "Bouncing Your Signal Off The Moon"</h3>
<p>Suppose you need to SSH to a friend's machine, but you're both behind NATs.</p>
<p>If your friend is savvy enough to compile it, and you've got time for that, you could use <a href="http://samy.pl/pwnat/">pwnat</a>. You could also have your friend configure port forwarding on his router -- again, only if your friend is savvy enough, and doesn't mind punching a hole in his firewall. Yet another option: give your friend an SSH account on a public machine, and go look up the SSH arguments for reverse port forwarding for the bazillionth time.</p>
<p>The lowest-hassle option I can think of is to use tcpforward. Suppose you and your friend can both reach a 3rd machine, a public server you own called <em>moon</em>.</p>
<p>Run the following on <em>moon</em>:</p><div class="highlight"><pre>tcpforward -v -N 1 -l moon:9922 -l moon:9921
</pre></div>
<p>Arrange for your friend to run the following on his local machine:</p><div class="highlight"><pre>./tcpforward -v -N 1 -c moon:9922 -c localhost:22
</pre></div>
<p>Now, on your machine, run:</p><div class="highlight"><pre>ssh -p 9921 moon
</pre></div>
<p>And voila, your SSH connection is forwarded past your friend's NAT, to his machine. The <code>-N 1</code> option makes this a one-shot connection. The <code>-v</code> option gives him something to watch while you go to work -- some realtime transfer statistics.</p>
<p>(This example assumes port 9921 and 9922 are open on <em>moon</em>, and that your friend is running sshd).</p>
<p><span id="hopping"></span> </p>
<h3>Scenario: Hopping Over the Middleman</h3>
<p>Ever wanted to copy files to a machine you could only reach from an intermediate machine? For no particular reason, let's call these machines <em>production</em> and <em>gateway</em>. I bet you usually end up scp'ing or rsync'ing files to <em>gateway</em>, ssh'ing to <em>gateway</em>, then running scp or rsync again, then cleaning up the files, etc.</p>
<p>"There must be a better way!" I hear you scream.</p>
<p>Yes. First, ssh to <em>gateway</em> and run:</p><div class="highlight"><pre>tcpforward -v -k -l 0.0.0.0:9922 -c production:22
</pre></div>
<p>In another tty on your local machine, you can now run:</p><div class="highlight"><pre>scp -o <span class="nv">Port</span><span class="o">=</span>9922 somefile gateway:somefile
</pre></div>
<p>Or, rsync:</p><div class="highlight"><pre>rsync -e <span class="s2">&quot;ssh -p 9922&quot;</span> -avzp somedir/ gateway:somedir/
</pre></div>
<p>Remember to kill the tcpforward session on <em>gateway</em>, or your sysadmin may get angry, annoyed, frightened, or all of the above.</p>
<p>(Once again, assumes port 9922 is open on <em>gateway</em>.)</p>
<p><span id="tunneling"></span> </p>
<h3>Scenario: Tunneling Through Corporate Firewalls</h3>
<p>Let's continue with the slightly subversive examples. Suppose you're behind a corporate firewall that doesn't allow SSH connections out, only web traffic. You've got a public server out there called <em>freedom</em>, and you want to log in once in a while.</p>
<p>You could run <code>hts</code> from <a href="http://www.nocrew.org/software/httptunnel.html">httptunnel</a> on <em>freedom</em>. That's a fair bit of C code to expose to the world though. ;)</p>
<p>Alternately, let's say you're not running anything on <em>freedom:443</em>. Most corporate firewalls will allow https out, and most of them don't do deep packet inspection to verify that the initial handshake actually conforms to the TLS protocol.</p>
<p>Before going off to work, run the following on <em>freedom</em>:</p><div class="highlight"><pre>tcpforward -v -k -l 0.0.0.0:443 -c localhost:22
</pre></div>
<p>From work:</p><div class="highlight"><pre>ssh -p 443 freedom <span class="c"># scream FREEEEEEDOOOOMMM!!! as you&#39;re doing this</span>
</pre></div>
<p><span id="how-it-works"></span> </p>
<h3>How it Works</h3>
<p>The time has come to pull back the curtain, revealing the wizened figure of a <a href="https://github.com/acg/tcpforward/blob/master/tcpforward">160 line Perl script</a>.</p>
<p>How does it work?</p>
<p>Well, you always run <code>tcpforward</code> with two arguments that specify a pair of TCP sockets to set up, then copy bytes between. Each socket argument is either a listen / accept socket -- if you specify the <code>-l</code> flag -- or a connect socket, if you specify the <code>-c</code> flag. Once both sockets of a pair are accepted or connected, a little async I/O copy loop runs until both sockets close for reading. If you pass the <code>-k</code> flag, the I/O copy loop runs in a forked process and another socket pair is immediately ready for setup.</p>
<p>There's more documentation in the <a href="https://github.com/acg/tcpforward/blob/master/README.md">POD</a>.</p>
<p>Happy connection hacking!</p>http://alangrow.com/blog/python-default-refsA Python Gotcha: References as Default Parameters2011-02-05T00:00:00Alanalangrow+maelstrom@gmail.com<p>Suppose you're writing a Python function like <a href="https://github.com/acg/lwpb/blob/python/python/flat.py">this one</a> that unpacks data into a dictionary; optionally, an existing dictionary instead of an empty one.</p>
<p><em>Surprise</em>!</p><div class="highlight"><pre><span class="err">$</span> <span class="n">python</span>
<span class="n">Python</span> <span class="mf">2.6</span><span class="o">.</span><span class="mi">4</span>
<span class="p">[</span><span class="n">GCC</span> <span class="mf">4.4</span><span class="o">.</span><span class="mi">1</span><span class="p">]</span> <span class="n">on</span> <span class="n">linux2</span>
<span class="o">&gt;&gt;&gt;</span> <span class="k">def</span> <span class="nf">hashcopy</span><span class="p">(</span><span class="n">src</span><span class="p">,</span> <span class="n">dst</span><span class="o">=</span><span class="p">{}):</span>
<span class="o">...</span> <span class="k">for</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span> <span class="ow">in</span> <span class="n">src</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
<span class="o">...</span> <span class="n">dst</span><span class="p">[</span><span class="n">k</span><span class="p">]</span> <span class="o">=</span> <span class="n">v</span>
<span class="o">...</span> <span class="k">return</span> <span class="n">dst</span>
<span class="o">...</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">hashcopy</span><span class="p">({</span><span class="mi">1</span><span class="p">:</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">:</span><span class="mi">4</span><span class="p">})</span>
<span class="p">{</span><span class="mi">1</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">:</span> <span class="mi">4</span><span class="p">}</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">hashcopy</span><span class="p">({</span><span class="mi">5</span><span class="p">:</span><span class="mi">6</span><span class="p">,</span><span class="mi">7</span><span class="p">:</span><span class="mi">8</span><span class="p">})</span>
<span class="p">{</span><span class="mi">1</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">:</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">:</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">7</span><span class="p">:</span> <span class="mi">8</span><span class="p">}</span>
</pre></div>
<p>I haven't looked deeply into this, but it seems like default parameters must be bound to object instances at compile time.</p>
<p>In Perl 5 you typically only set default parameters at runtime, so the empty hashref you get is always the freshest in the land:</p><div class="highlight"><pre><span class="k">sub </span><span class="nf">hashcopy</span>
<span class="p">{</span>
<span class="k">my</span> <span class="nv">$src</span> <span class="o">=</span> <span class="nb">shift</span><span class="p">;</span>
<span class="k">my</span> <span class="nv">$dst</span> <span class="o">=</span> <span class="nb">shift</span> <span class="o">||</span> <span class="p">{};</span>
<span class="nv">%$dst</span> <span class="o">=</span> <span class="p">(</span><span class="nv">%$dst</span><span class="p">,</span> <span class="nv">%$src</span><span class="p">);</span>
<span class="k">return</span> <span class="nv">$dst</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>All other things equal, this is undoubtedly slower, but considerably less wtf-subtle.</p>http://alangrow.com/blog/thinkpad-key-replacementThinkpad T43 Key Removal, Assembly2007-02-18T00:00:00Alanalangrow+maelstrom@gmail.com<p>Within a few days of the <a href="../05/lcd-smashed-so-ratpoison.html">destruction of my T40</a>, I got a T43 from a guy on craigslist. The left control key promptly broke so I swapped it for the right one. There's relatively little info out there about how to assemble and disassemble keys, so here's some info on the process. Before we begin, get out your jeweler's eyepiece...</p>
<p>You can pry off the key face gently as described <a href="http://dqd.com/~mayoff/notes/thinkpad/key/">here</a>, just push away from you and up with a flat object. The face snaps into a cage mechanism consisting of three parts: a top plate and two wickets which anchor it to from the north and south respectively. Each wicket has a bar that wraps over the top plate, and two legs with pegs that secure it to the keyboard bevel. Viewed from the east or west sides, the wickets cross over each other, making an X. There is enough play in the cage's anchoring that you can squish the whole thing down flat. The only thing that impedes you is a little rubber spring glued to the keyboard bevel. This spring is primarily responsible for that distinctive Thinkpad key feel.</p>
<p>By squishing the cage flat, you can hook or unhook the wickets. To reassemble and replace a key, I found it easiest to build the cage first. Start by crossing the wickets--they are fitted to each other. While pressing the X sides of the cage in, you can slip in the face plate. Don't put on the key face yet. Attach the cage to the keyboard bevel by putting it in place and hooking in the south wicket's legs first. Getting the north wicket in is a bit of a stretch. Flatten the cage by pressing down on it until the north legs slip in. Now you can attach the key face by setting it on top of the cage and applying gentle downward force. You should hear it snap.</p>http://alangrow.com/blog/lcd-smashed-so-ratpoisonLCD Smashed, So...Ratpoison2007-02-05T00:00:00Alanalangrow+maelstrom@gmail.com<p>I just had a really terrible and wonderful thing happen to me. I dropped my thinkpad T40, shattering the LCD panel at pixel y=514 and below. That's the terrible part. The wonderful part is that I am now running <a href="http://www.nongnu.org/ratpoison/">ratpoison</a> with the following fdump to compensate...the upper 2/3 of my screen is usable enough to find a replacement laptop / screen on.</p><div class="highlight"><pre><span class="p">(</span><span class="nf">frame</span> <span class="nv">:number</span> <span class="mi">2</span> <span class="nv">:x</span> <span class="mi">0</span> <span class="nv">:y</span> <span class="mi">0</span> <span class="nv">:width</span> <span class="mi">1024</span> <span class="nv">:height</span> <span class="mi">514</span> <span class="nv">:screenw</span> <span class="mi">1024</span> <span class="nv">:screenh</span> <span class="mi">768</span> <span class="nv">:window</span> <span class="mi">12582974</span> <span class="nv">:last-access</span> <span class="mi">126</span> <span class="nv">:dedicated</span> <span class="mi">0</span><span class="p">)</span><span class="o">,</span><span class="p">(</span><span class="nf">frame</span> <span class="nv">:number</span> <span class="mi">0</span> <span class="nv">:x</span> <span class="mi">0</span> <span class="nv">:y</span> <span class="mi">514</span> <span class="nv">:width</span> <span class="mi">1024</span> <span class="nv">:height</span> <span class="mi">254</span> <span class="nv">:screenw</span> <span class="mi">1024</span> <span class="nv">:screenh</span> <span class="mi">768</span> <span class="nv">:window</span> <span class="mi">16777278</span> <span class="nv">:last-access</span> <span class="mi">0</span> <span class="nv">:dedicated</span> <span class="mi">0</span><span class="p">)</span>
</pre></div>
http://alangrow.com/blog/tai64-for-all-timeTAI64 For All Time2006-09-14T00:00:00Alanalangrow+maelstrom@gmail.com<p>From Bernstein's <a href="http://cr.yp.to/libtai/tai64.html#tai64">tai64 page</a>:</p>
<blockquote>
<p>"Integers 2^63 and larger are reserved for future extensions. Under many cosmological theories, the integers under 2^63 are adequate to cover the entire expected lifetime of the universe; in this case no extensions will be necessary."</p>
</blockquote>
<p>Phew!</p>
<p>Dealing with <a href="http://cr.yp.to/daemontools/multilog.html">multilog</a>'s TAI64 timestamps is always a bit annoying, but I suppose old djb may very well be laughing his head off in <a href="http://www.unixtimestamp.com/index.php">2038</a>. Still, the idea of writing software "for all time" has enough allure to the developer mind that it feels like a trap.</p>http://alangrow.com/blog/ssh-authorize-keySSH Pubkey Setup In One Command2005-02-14T00:00:00Alanalangrow+maelstrom@gmail.com<p>Transfer your ssh public key to a remote host, for passwordless logins, in one command:</p><div class="highlight"><pre>ssh &lt; <span class="s2">&quot;$key&quot;</span> <span class="s2">&quot;$@&quot;</span> <span class="s1">&#39;</span>
<span class="s1"> cat &gt; $HOME/authorized_keys &amp;&amp; </span>
<span class="s1"> mkdir -p .ssh &amp;&amp;</span>
<span class="s1"> cat $HOME/authorized_keys &gt;&gt; $HOME/.ssh/authorized_keys &amp;&amp;</span>
<span class="s1"> rm -f $HOME/authorized_keys &amp;&amp;</span>
<span class="s1"> chmod 0700 .ssh &amp;&amp;</span>
<span class="s1"> chmod 0600 $HOME/.ssh/authorized_keys&#39;</span>
</pre></div>
<p>Note that newer versions of ssh now have <code>ssh-copy-id(1)</code>.</p>http://alangrow.com/blog/colorful-prompt-generatorColorful Bash Prompt Generator2004-12-30T00:00:00Alanalangrow+maelstrom@gmail.com<p>(A very old post, but I've used this prompt ever since.)</p>
<p>Setting your bash prompt is one of those geek machismo things that usually culminates in something like</p><div class="highlight"><pre><span class="nb">export </span><span class="nv">PS1</span><span class="o">=</span><span class="s1">&#39;\[\e]0;\w\a\]\n\[\e[32m\]\u@\h \[\e[33m\]\w\n\[\e[0m\]$ &#39;</span>
</pre></div>
<p>the idea being that lots of escape sequences = eliteness. (After a while you only see blondes, brunettes, and redheads.) Though, I'd guess most people just copy someone else's bash prompt and foist it off as their own, rather than learn ansi / xterm / bash escape sequences. Like <a href="http://blogs.thegotonerd.com/maelstrom/archives/000453.html">me initially</a>. :)</p>
<p>However, you can easily make your prompt setup readable by breaking it down. I've started doing this in my <a href="http://thegotonerd.com/scripts/agrow/conf/prompt.html">prompt</a> file.</p><div class="highlight"><pre><span class="c"># ansi color escape sequences</span>
<span class="nv">prompt_black</span><span class="o">=</span><span class="s1">&#39;\[\e[30m\]&#39;</span>
<span class="nv">prompt_red</span><span class="o">=</span><span class="s1">&#39;\[\e[31m\]&#39;</span>
<span class="nv">prompt_green</span><span class="o">=</span><span class="s1">&#39;\[\e[32m\]&#39;</span>
<span class="nv">prompt_yellow</span><span class="o">=</span><span class="s1">&#39;\[\e[33m\]&#39;</span>
<span class="nv">prompt_blue</span><span class="o">=</span><span class="s1">&#39;\[\e[34m\]&#39;</span>
<span class="nv">prompt_magenta</span><span class="o">=</span><span class="s1">&#39;\[\e[35m\]&#39;</span>
<span class="nv">prompt_cyan</span><span class="o">=</span><span class="s1">&#39;\[\e[36m\]&#39;</span>
<span class="nv">prompt_white</span><span class="o">=</span><span class="s1">&#39;\[\e[37m\]&#39;</span>
<span class="nv">prompt_default_color</span><span class="o">=</span><span class="s1">&#39;\[\e[0m\]&#39;</span>
</pre></div>
<p>My motivation initially was to avoid beeping console prompts. The xterm escape sequence to set the window title contains a bell character, which was of course interpreted by xterm and friends, but not when I'd sit down at system consoles (where usually <code>TERM=cons25</code>). I needed to set <code>$PS1</code> according to <code>$TERM</code>.</p>
<p>In the course of things, I discovered the <code>\t</code> bash escape sequence, which gives you the current time in <code>hh:mm:ss</code> form. Nice. By incorporating this into the prompt you can now tell by inspection how long you've been sitting with your jaw open trying to remember what you were about to do. Or, how severe one's random spastic <code>ls</code>-ing has gotten.</p>
<div class="image">
<a href="../images/blog/bash-prompt-with-time.png">
<img src="../images/blog/bash-prompt-with-time-small.png" />
</a>
</div>
<p>For emergencies, there's also the no-color prompt.</p><div class="highlight"><pre><span class="nv">prompt_nocolor</span><span class="o">=</span><span class="s1">&#39;\n\u@\h \w\n$ &#39;</span>
</pre></div>
<p>For nostalgia (or out of masochism) there's the old dos prompt.</p><div class="highlight"><pre><span class="nv">prompt_dos</span><span class="o">=</span><span class="s1">&#39;\n\w&gt;&#39;</span>
</pre></div>