Victoria Drake, software developer on victoria.devhttps://victoria.dev/
Recent content in Victoria Drake, software developer on victoria.devHugo -- gohugo.ioen-ushello@victoria.dev (Victoria Drake)hello@victoria.dev (Victoria Drake)Thu, 17 Jan 2019 13:47:16 -0500How to choose and care for a secure open source projecthttps://victoria.dev/blog/how-to-choose-and-care-for-a-secure-open-source-project/
Mon, 25 May 2020 05:53:09 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/how-to-choose-and-care-for-a-secure-open-source-project/A few tricks for assessing the security of an open source project.
]]>
<p>There is a rather progressive sect of the software development world that believes that most people would be a lot happier and get a lot more work done if they just stopped building things that someone else has already built and is offering up for free use. They&rsquo;re called the open source community. They want you to take their stuff.</p>
<p><img src="wheels.png" alt="A comic I drew about using other people&rsquo;s stuff, with the wheel as an example."></p>
<p>Besides existing without you having to lift a finger, open source tools and software have some distinct advantages. Especially in the case of well-established projects, it&rsquo;s highly likely that someone else has already worked out all the most annoying bugs for you. Thanks to the ease with which users can view and modify source code, it&rsquo;s also more likely that a program has been tinkered with, improved, and secured over time. When many developers contribute, they bring their own unique expertise and experiences. This can result in a product far more robust and capable than one a single developer can produce.</p>
<p>Of course, being as varied as the people who build them, not all open source projects are created equal, nor maintained to be equally secure. There are many factors that affect a project&rsquo;s suitability for your use case. Here are a few general considerations that make a good starting point when choosing an open source project.</p>
<h2 id="how-to-choose-an-open-source-project">How to choose an open source project</h2>
<p>As its most basic requirements, a good software project is reliable, easy to understand, and has up-to-date components and security. There are several indicators that can help you make an educated guess about whether an open source project satisfies these criteria.</p>
<h3 id="whos-using-it">Who&rsquo;s using it</h3>
<p>Taken in context, the number of people already using an open source project may be indicative of how good it is. If a project has a hundred users, for instance, it stands to reason that someone has tried to use it at least a hundred times before you found it. Thus by the ancient customs of &ldquo;I don&rsquo;t know what&rsquo;s in that cave, you go first,&rdquo; it&rsquo;s more likely to be fine.</p>
<p>You can draw conclusions about a project&rsquo;s user base by looking at available statistics. Depending on your platform, these may include the number of downloads, reviews, issues or tickets, comments, contributions, forks, or &ldquo;stars,&rdquo; whatever those are.</p>
<p>Evaluate social statistics on platforms like GitHub with a grain of salt. They can help you determine how popular a project may be, but only in the same way that restaurant review apps can help you figure out if you should eat at Foo&rsquo;s Grill &amp; Bar. Depending on where Foo&rsquo;s Grill &amp; Bar is, when it opened, and how likely people are to be near it when the invariable steak craving should call, having twenty-six reviews may be a good sign or a terrible one. While you would not expect a project that addresses a very obscure use case or technology to have hundreds of users, having a few active users is, in such a case, just as confidence-inspiring.</p>
<p>External validation can also be useful. For example, packages that are included in a Linux operating system distribution (distro) must conform to stringent standards and undergo vetting. Choosing software that is included in a distro&rsquo;s default repositories can mean it&rsquo;s more likely to be secure.</p>
<p>Perhaps one of the best indications to look for is whether a project&rsquo;s development team is using their own project. Look for issues, discussions, or blog posts that show that the project&rsquo;s creators and maintainers are using what they&rsquo;ve built themselves. Commonly referred to as <a href="https://en.wikipedia.org/wiki/Eating_your_own_dog_food">&ldquo;eating your own dog food,&quot;</a> or &ldquo;dogfooding,&rdquo; it&rsquo;s an indicator that the project is most likely to be well-maintained by its developers.</p>
<h3 id="whos-building-it">Who&rsquo;s building it</h3>
<p>The main enemy of good open source software is usually a lack of interest. The parties involved in an open source project can make the difference between a flash-in-the-pan library and a respected long-term utility. Multiple committed maintainers, even making contributions in their spare time, have a much higher success rate of sustaining a project and generating interest.</p>
<p>Projects with healthy interest are usually supported by, and in turn cultivate, a community of contributors and users. New contributors may be actively welcomed, clear guides are available explaining how to help, and project maintainers are available and approachable when people have inevitable questions. Some communities even have chat rooms or forums where people can interact outside of contributions. Active communities help sustain project interest, relevance, and its ensuing quality.</p>
<p>In a less organic fashion, a project can also be sustained through organizations that sponsor it. Governments and companies with financial interest are open source patrons too, and a project that enjoys public sector use or financial backing has added incentive to remain relevant and useful.</p>
<h3 id="how-alive-is-it">How alive is it</h3>
<p>The recency and frequency of an open source project&rsquo;s activity is perhaps the best indicator of how much attention is likely paid to its security. Look at releases, commit history, changelogs, or documentation revisions to determine if a project is active. As projects vary in size and scope, here are some general things to look for.</p>
<p>Maintaining security is an ongoing endeavor that requires regular monitoring and updates, especially for projects with third-party components. These may be libraries or any part of the project that relies on something outside itself, such as a payment gateway integration. An inactive project is more likely to have outdated code or use outdated versions of components. For a more concrete determination, you can research a project&rsquo;s third-party components and compare their most recent patches or updates with the project&rsquo;s last updates.</p>
<p>Projects without third-party components may have no outside updates to apply. In these cases, you can use recent activity and release notes to determine how committed a project&rsquo;s maintainers may be. Generally, active projects should show updates within the last months, with a notable release within the last year. This can be a good indication of whether the project is using an up-to-date version of its language or framework.</p>
<p>You can also judge how active a project may be by looking at the project maintainers themselves. Active maintainers quickly respond to feedback or new issues, even if it&rsquo;s just to say, &ldquo;We&rsquo;re on it.&rdquo; If the project has a community, its maintainers are a part of it. They may have a dedicated website or write regular blogs. They may offer ways to contact them directly and privately, especially to raise security concerns.</p>
<h3 id="can-you-understand-it">Can you understand it</h3>
<p>Having documentation is a baseline requirement for a project that&rsquo;s intended for anyone but its creator to use. Good open source projects have documentation that is easy to follow, honest, and thorough.</p>
<p>Having <a href="https://victoria.dev/blog/word-bugs-in-software-documentation-and-how-to-fix-them/">well-written documentation</a> is one way a project can stand out and demonstrate the thoughtfulness and dedication of its maintainers. A &ldquo;Getting Started&rdquo; section may detail all the requirements and initial set up for running the project. An accurate list of topics in the documentation enables users to quickly find the information they need. A clear license statement leaves no doubt as to how the project can be used, and for what purposes. These are characteristic aspects of documentation that serves its users.</p>
<p>A project that is following sound coding practices likely has code that is as readable as its documentation. Code that is easy to read lends itself to being understood. Generally, it has clearly defined and appropriately-named functions and variables, a logical flow, and apparent purpose. Readable code is easier to fix, secure, and build upon.</p>
<h3 id="how-compatible-is-it">How compatible is it</h3>
<p>A few factors will determine how compatible a project is with your goals. These are objective qualities, and can be determined by looking at a project&rsquo;s repository files. They include:</p>
<ul>
<li>Code language</li>
<li>Specific technologies or frameworks</li>
<li>License compatibility</li>
</ul>
<p>Compatibility doesn&rsquo;t necessarily mean a direct match. Different code languages can interact with each other, as can various technologies and frameworks. You should carefully read a project&rsquo;s license to understand if it permits usage for your goal, or if it is compatible with a license you would like to use.</p>
<p>Ultimately, a project that satisfies all these criteria may still not quite suit your use case. Part of the beauty of open source software, however, is that you may still benefit from it by making alterations that better suit your usage. If those alterations make the project better for everyone, you can pay it back and pay it forward by contributing your work to the project.</p>
<h2 id="proper-care-and-feeding-of-an-open-source-project">Proper care and feeding of an open source project</h2>
<p>Once you adopt an open source project, a little attention is required to make sure it continues to be a boon to your goals. While its maintainers will look after the upstream project files, you alone are responsible for your own copy. Like all software, your open source project must be well-maintained in order to remain as secure and useful as possible.</p>
<p>Have a system that provides you with notifications when updates for your software are made available. Update software promptly, treating each patch as if it were vital to security; it may well be. Keep in mind that open source project creators and maintainers are, in most cases, acting only out of the goodness of their own hearts. If you&rsquo;ve got a particularly awesome one, its developers may make updates and security patches available on a regular basis. It&rsquo;s up to you to keep tabs on updates and promptly apply them.</p>
<p>As with most things in software, keeping your open source additions modular can come in handy. You might use <a href="https://git-scm.com/book/en/v2/Git-Tools-Submodules">git submodules</a>, branches, or environments to isolate your additions. This can make it easier to apply updates or pinpoint the source of any bugs that arise.</p>
<p>So although an open source project may cost no money, <em>caveat emptor,</em> which means, &ldquo;Jimmy, if we get you a puppy, it&rsquo;s your responsibility to take care of it.&rdquo;</p>
If you want to build a treehouse, start at the bottomhttps://victoria.dev/blog/if-you-want-to-build-a-treehouse-start-at-the-bottom/
Mon, 11 May 2020 05:46:47 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/if-you-want-to-build-a-treehouse-start-at-the-bottom/How threat modeling and pushing left help create a stable foundation for secure software.
]]>
<p>If you&rsquo;ve ever watched a kid draw a treehouse, you have some idea of how applications are built when security isn&rsquo;t made a priority. It&rsquo;s far more fun to draw the tire swing, front porch, and swimming pool than to worry about how a ten-thousand-gallon bucket of water stays suspended in midair. With too much attention spent on fun and flashy features, foundations suffer.</p>
<p><img src="for-the-turrets.png" alt="A comic I drew about building castles with poor foundations. It&rsquo;s not that funny."></p>
<p>Of course, spending undue hours building a back end like Fort Knox may not be necessary for your application, either. Being an advocate for security doesn&rsquo;t mean always wearing your tinfoil hat (although you do look dashing in it) but does mean building in an appropriate amount of security.</p>
<p>How much security is appropriate? The answer, frustratingly, is, &ldquo;it depends.&rdquo; The right amount of security for your application depends on who&rsquo;s using it, what it does, and most importantly, what undesirable things it could be made to do. It takes some analysis to make decisions about the kinds of risks your application faces and how you&rsquo;ll prepare to handle them. Okay, now&rsquo;s a good time to don your tinfoil hat. Let&rsquo;s imagine the worst.</p>
<h2 id="threat-modeling-whats-the-worst-that-could-happen">Threat modeling: what&rsquo;s the worst that could happen</h2>
<p>A <em>threat model</em> is a stuffy term for the result of trying to imagine the worst things that could happen to an application. Using your imagination to assess risks (fittingly called <em>risk assessment</em>) is a conveniently non-destructive method for finding ways an application can be attacked. You won&rsquo;t need any tools; just an understanding of how the application might work, and a little imagination. You&rsquo;ll want to record your results with pen and paper. For the younger folks, that means the notes app on your phone.</p>
<p>A few different methodologies for application risk assessment can be found in the software world, including the in-depth <a href="https://csrc.nist.gov/publications/detail/sp/800-30/rev-1/final">NIST Special Publication 800-30</a>. Each method&rsquo;s framework has specific steps and output, and will go into various levels of detail when it comes to defining threats. If following a framework, first choose the one you&rsquo;re most likely to complete. You can always add more depth and detail from there.</p>
<p>Even informal risk assessments are beneficial. Typically taking the form of a set of questions, they may be oriented around possible threats, the impact to assets, or ways a vulnerability could be exploited. Here are some examples of questions addressing each orientation:</p>
<ul>
<li>What kind of adversary would want to break my app? What would they be after?</li>
<li>If the control of <em>x</em> fell into the wrong hands, what could an attacker do with it?</li>
<li>Where could a <em>x</em> vulnerability occur in my app?</li>
</ul>
<p>A basic threat model explains the technical, business, and human considerations for each risk. It will typically detail:</p>
<ul>
<li>The vulnerabilities or components that can cause the risk</li>
<li>The impact that a successful execution of the risk would have on the application</li>
<li>The consequences for the application&rsquo;s users or organization</li>
</ul>
<p>The result of a risk assessment exercise is your threat model; in other words, a list of things you would very much like not to occur. It is usually sorted in a hierarchy of risks, from the worst to the mildest. The worst risks have the most negative impact, and are most important to protect against. The mildest risks are the most acceptable - while still an undesirable outcome, they have the least negative impact on the application and users.</p>
<p>You can use this resulting hierarchy as a guide to determine how much of your cybersecurity efforts to apply to each risk area. An appropriate amount of security for your application will eliminate (where possible) or mitigate the worst risks.</p>
<h2 id="pushing-left">Pushing left</h2>
<p>Although it sounds like a dance move meme, <em>pushing left</em> refers instead to building in as much of your planned security as possible in the early stages of software development.</p>
<p>Building software is a lot like building a treehouse, just without the pleasant fresh air. You start with the basic supporting components, such as attaching a platform to a tree. Then comes the framing, walls, and roof, and finally, your rustic-modern Instagram-worthy wall hangings and deer bust.</p>
<p>The further along in the build process you are, the harder and more costly it becomes to make changes to a component that you&rsquo;ve already installed. If you discover a problem with the walls only after the roof is put in place, you may need to change or remove the roof in order to fix it. Similar parallels can be drawn for software components, only without similar ease in detangling the attached parts.</p>
<p>In the case of a treehouse, it&rsquo;s rather impossible to start with decorations or even a roof, since you can&rsquo;t really suspend them in midair. In the case of software development, it is, unfortunately, possible to build many top-layer components and abstractions without a sufficient supporting architecture. A push-left approach views each additional layer as adding cost and complication. Pushing left means attempting to mitigate security risks as much as possible at each development stage before proceeding to the next.</p>
<h2 id="building-bottom-to-top">Building bottom-to-top</h2>
<p>By considering your threat model in the early stages of developing your application, you reduce the chances of necessitating a costly remodel later on. You can make choices about architecture, components, and code that support the main security goals of your particular application.</p>
<p>While it&rsquo;s not possible to foresee all the functionality your application may one day need to support, it is possible to prepare a solid foundation that allows additional functionality to be added more securely. Building in appropriate security from the bottom to the top will help make mitigating security risks much easier in the future.</p>
Hugo vs Jekyll: an epic battle of static site generator themeshttps://victoria.dev/blog/hugo-vs-jekyll-an-epic-battle-of-static-site-generator-themes/
Mon, 27 Apr 2020 06:34:41 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/hugo-vs-jekyll-an-epic-battle-of-static-site-generator-themes/A comparison of nuances of creating themes for the top two static site generators.
]]>
<p>I recently took on the task of creating a documentation site theme for two projects. Both projects needed the same basic features, but one uses Jekyll while the other uses Hugo.</p>
<p>In typical developer rationality, there was clearly only one option. I decided to create the same theme in both frameworks, and to give you, dear reader, a side-by-side comparison.</p>
<p>This post isn&rsquo;t a comprehensive theme-building guide, but intended to familiarize you with the process of building a theme in either generator. Here&rsquo;s what we&rsquo;ll cover:</p>
<ul>
<li><a href="#how-theme-files-are-organized">How theme files are organized</a></li>
<li><a href="#where-to-put-content">Where to put content</a></li>
<li><a href="#how-templating-works">How templating works</a></li>
<li><a href="#creating-a-top-level-menu-with-the-pages-object">Creating a top-level menu with the <code>pages</code> object</a></li>
<li><a href="#creating-a-menu-with-nested-links-from-a-data-list">Creating a menu with nested links from a data list</a></li>
<li><a href="#putting-the-template-together">Putting the template together</a></li>
<li><a href="#create-a-stylesheet">Create a stylesheet</a>
<ul>
<li><a href="#sass-and-css-in-jekyll">Sass and CSS in Jekyll</a></li>
<li><a href="#sass-and-hugo-pipes-in-hugo">Sass and Hugo Pipes in Hugo</a></li>
</ul>
</li>
<li><a href="#configure-and-deploy-to-github-pages">Configure and deploy to GitHub Pages</a>
<ul>
<li><a href="#configure-jekyll">Configure Jekyll</a></li>
<li><a href="#configure-hugo">Configure Hugo</a></li>
<li><a href="#deploy-to-github-pages">Deploy to GitHub Pages</a></li>
</ul>
</li>
<li><a href="#showtime">Showtime!</a></li>
<li><a href="#wait-who-won">Wait who won?</a></li>
</ul>
<p>Here&rsquo;s a crappy wireframe of the theme I&rsquo;m going to create.</p>
<p><img src="wireframe.jpg" alt="A sketch of the finished page"></p>
<p>If you&rsquo;re planning to build-along, it may be helpful to serve the theme locally as you build it; both generators offer this functionality. For Jekyll, run <code>jekyll serve</code>, and for Hugo, <code>hugo serve</code>.</p>
<p>There are two main elements: the main content area, and the all-important sidebar menu. To create them, you&rsquo;ll need template files that tell the site generator how to generate the HTML page. To organize theme template files in a sensible way, you first need to know what directory structure the site generator expects.</p>
<h2 id="how-theme-files-are-organized">How theme files are organized</h2>
<p>Jekyll supports gem-based themes, which users can install like any other Ruby gems. This method hides theme files in the gem, so for the purposes of this comparison, we aren&rsquo;t using gem-based themes.</p>
<p>When you run <code>jekyll new-theme &lt;name&gt;</code>, Jekyll will scaffold a new theme for you. Here&rsquo;s what those files look like:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">.
├── assets
├── Gemfile
├── _includes
├── _layouts
│ ├── default.html
│ ├── page.html
│ └── post.html
├── LICENSE.txt
├── README.md
├── _sass
└── &lt;name&gt;.gemspec
</code></pre></div><p>The directory names are appropriately descriptive. The <code>_includes</code> directory is for small bits of code that you reuse in different places, in much the same way you&rsquo;d put butter on everything. (Just me?) The <code>_layouts</code> directory contains templates for different types of pages on your site. The <code>_sass</code> folder is for <a href="https://sass-lang.com/documentation/syntax">Sass</a> files used to build your site&rsquo;s stylesheet.</p>
<p>You can scaffold a new Hugo theme by running <code>hugo new theme &lt;name&gt;</code>. It has these files:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">.
├── archetypes
│ └── default.md
├── layouts
│ ├── 404.html
│ ├── _default
│ │ ├── baseof.html
│ │ ├── list.html
│ │ └── single.html
│ ├── index.html
│ └── partials
│ ├── footer.html
│ ├── header.html
│ └── head.html
├── LICENSE
├── static
│ ├── css
│ └── js
└── theme.toml
</code></pre></div><p>You can see some similarities. Hugo&rsquo;s page template files are tucked into <code>layouts/</code>. Note that the <code>_default</code> page type has files for a <code>list.html</code> and a <code>single.html</code>. Unlike Jekyll, Hugo uses these specific file names to distinguish between <a href="https://gohugo.io/templates/lists/">list pages</a> (like a page with links to all your blog posts on it) and <a href="https://gohugo.io/templates/single-page-templates/">single pages</a> (like one of your blog posts). The <code>layouts/partials/</code> directory contains the buttery reusable bits, and stylesheet files have a spot picked out in <code>static/css/</code>.</p>
<p>These directory structures aren&rsquo;t set in stone, as both site generators allow some measure of customization. For example, Jekyll lets you define <a href="https://jekyllrb.com/docs/collections/">collections</a>, and Hugo makes use of <a href="https://gohugo.io/content-management/page-bundles/">page bundles</a>. These features let you organize your content multiple ways, but for now, lets look at where to put some simple pages.</p>
<h2 id="where-to-put-content">Where to put content</h2>
<p>To create a site menu that looks like this:</p>
<div class="highlight"><pre class="chroma"><code class="language-md" data-lang="md">Introduction
Getting Started
Configuration
Deploying
Advanced Usage
All Configuration Settings
Customizing
Help and Support
</code></pre></div><p>You&rsquo;ll need two sections (&ldquo;Introduction&rdquo; and &ldquo;Advanced Usage&rdquo;) containing their respective subsections.</p>
<p>Jekyll isn&rsquo;t strict with its content location. It expects pages in the root of your site, and will build whatever&rsquo;s there. Here&rsquo;s how you might organize these pages in your Jekyll site root:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">.
├── 404.html
├── assets
├── Gemfile
├── _includes
├── index.markdown
├── intro
│ ├── config.md
│ ├── deploy.md
│ ├── index.md
│ └── quickstart.md
├── _layouts
│ ├── default.html
│ ├── page.html
│ └── post.html
├── LICENSE.txt
├── README.md
├── _sass
├── &lt;name&gt;.gemspec
└── usage
├── customizing.md
├── index.md
├── settings.md
└── support.md
</code></pre></div><p>You can change the location of the site source in your <a href="https://jekyllrb.com/docs/configuration/default/">Jekyll configuration</a>.</p>
<p>In Hugo, all rendered content is expected in the <code>content/</code> folder. This prevents Hugo from trying to render pages you don&rsquo;t want, such as <code>404.html</code>, as site content. Here&rsquo;s how you might organize your <code>content/</code> directory in Hugo:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">.
├── _index.md
├── intro
│ ├── config.md
│ ├── deploy.md
│ ├── _index.md
│ └── quickstart.md
└── usage
├── customizing.md
├── _index.md
├── settings.md
└── support.md
</code></pre></div><p>To Hugo, <code>_index.md</code> and <code>index.md</code> mean different things. It can be helpful to know what kind of <a href="https://gohugo.io/content-management/page-bundles/">Page Bundle</a> you want for each section: Leaf, which has no children, or Branch.</p>
<p>Now that you have some idea of where to put things, let&rsquo;s look at how to build a page template.</p>
<h2 id="how-templating-works">How templating works</h2>
<p>Jekyll page templates are built with the <a href="https://jekyllrb.com/docs/liquid/">Liquid templating language</a>. It uses braces to output variable content to a page, such as the page&rsquo;s title: <code>{{ page.title }}</code>.</p>
<p>Hugo&rsquo;s templates also use braces, but they&rsquo;re built with <a href="https://gohugo.io/templates/introduction/">Go Templates</a>. The syntax is similar, but different: <code>{{ .Title }}</code>.</p>
<p>Both Liquid and Go Templates can handle logic. Liquid uses <em>tags</em> syntax to denote logic operations:</p>
<pre><code class="language-liquid" data-lang="liquid">{% if user %}
Hello {{ user.name }}!
{% endif %}
</code></pre><p>And Go Templates places its functions and arguments in its braces syntax:</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="p">{{</span> <span class="k">if</span> <span class="p">.</span><span class="nx">User</span> <span class="p">}}</span>
<span class="nx">Hello</span> <span class="p">{{</span> <span class="p">.</span><span class="nx">User</span> <span class="p">}}!</span>
<span class="p">{{</span> <span class="nx">end</span> <span class="p">}}</span>
</code></pre></div><p>Templating languages allow you to build one skeleton HTML page, then tell the site generator to put variable content in areas you define. Let&rsquo;s compare two possible <code>default</code> page templates for Jekyll and Hugo.</p>
<p>Jekyll&rsquo;s scaffold <code>default</code> theme is bare, so we&rsquo;ll look at their starter theme <a href="https://github.com/jekyll/minima">Minima</a>. Here&rsquo;s <code>_layouts/default.html</code> in Jekyll (Liquid):</p>
<div class="highlight"><pre class="chroma"><code class="language-html" data-lang="html"><span class="cp">&lt;!DOCTYPE html&gt;</span>
<span class="p">&lt;</span><span class="nt">html</span> <span class="na">lang</span><span class="o">=</span><span class="s">&#34;{{ page.lang | default: site.lang | default: &#34;</span><span class="na">en</span><span class="err">&#34;</span> <span class="err">}}&#34;</span><span class="p">&gt;</span>
{%- include head.html -%}
<span class="p">&lt;</span><span class="nt">body</span><span class="p">&gt;</span>
{%- include header.html -%}
<span class="p">&lt;</span><span class="nt">main</span> <span class="na">class</span><span class="o">=</span><span class="s">&#34;page-content&#34;</span> <span class="na">aria-label</span><span class="o">=</span><span class="s">&#34;Content&#34;</span><span class="p">&gt;</span>
<span class="p">&lt;</span><span class="nt">div</span> <span class="na">class</span><span class="o">=</span><span class="s">&#34;wrapper&#34;</span><span class="p">&gt;</span>
{{ content }}
<span class="p">&lt;/</span><span class="nt">div</span><span class="p">&gt;</span>
<span class="p">&lt;/</span><span class="nt">main</span><span class="p">&gt;</span>
{%- include footer.html -%}
<span class="p">&lt;/</span><span class="nt">body</span><span class="p">&gt;</span>
<span class="p">&lt;/</span><span class="nt">html</span><span class="p">&gt;</span>
</code></pre></div><p>Here&rsquo;s Hugo&rsquo;s scaffold theme <code>layouts/_default/baseof.html</code> (Go Templates):</p>
<div class="highlight"><pre class="chroma"><code class="language-html" data-lang="html"><span class="cp">&lt;!DOCTYPE html&gt;</span>
<span class="p">&lt;</span><span class="nt">html</span><span class="p">&gt;</span>
{{- partial &#34;head.html&#34; . -}}
<span class="p">&lt;</span><span class="nt">body</span><span class="p">&gt;</span>
{{- partial &#34;header.html&#34; . -}}
<span class="p">&lt;</span><span class="nt">div</span> <span class="na">id</span><span class="o">=</span><span class="s">&#34;content&#34;</span><span class="p">&gt;</span>
{{- block &#34;main&#34; . }}{{- end }}
<span class="p">&lt;/</span><span class="nt">div</span><span class="p">&gt;</span>
{{- partial &#34;footer.html&#34; . -}}
<span class="p">&lt;/</span><span class="nt">body</span><span class="p">&gt;</span>
<span class="p">&lt;/</span><span class="nt">html</span><span class="p">&gt;</span>
</code></pre></div><p>Different syntax, same idea. Both templates pull in reusable bits for <code>head.html</code>, <code>header.html</code>, and <code>footer.html</code>. These show up on a lot of pages, so it makes sense not to have to repeat yourself. Both templates also have a spot for the main content, though the Jekyll template uses a variable (<code>{{ content }}</code>) while Hugo uses a block (<code>{{- block &quot;main&quot; . }}{{- end }}</code>). <a href="https://gohugo.io/templates/base/#readout">Blocks</a> are just another way Hugo lets you define reusable bits.</p>
<p>Now that you know how templating works, you can build the sidebar menu for the theme.</p>
<h2 id="creating-a-top-level-menu-with-the-pages-object">Creating a top-level menu with the <code>pages</code> object</h2>
<p>You can programmatically create a top-level menu from your pages. It will look like this:</p>
<div class="highlight"><pre class="chroma"><code class="language-md" data-lang="md">Introduction
Advanced Usage
</code></pre></div><p>Let&rsquo;s start with Jekyll. You can display links to site pages in your Liquid template by iterating through the <code>site.pages</code> object that Jekyll provides and building a list:</p>
<div class="highlight"><pre class="chroma"><code class="language-html" data-lang="html"><span class="p">&lt;</span><span class="nt">ul</span><span class="p">&gt;</span>
{% for page in site.pages %}
<span class="p">&lt;</span><span class="nt">li</span><span class="p">&gt;&lt;</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">&#34;{{ page.url | absolute_url }}&#34;</span><span class="p">&gt;</span>{{ page.title }}<span class="p">&lt;/</span><span class="nt">a</span><span class="p">&gt;&lt;/</span><span class="nt">li</span><span class="p">&gt;</span>
{% endfor %}
<span class="p">&lt;/</span><span class="nt">ul</span><span class="p">&gt;</span>
</code></pre></div><p>This returns all of the site&rsquo;s pages, including all the ones that you might not want, like <code>404.html</code>. You can filter for the pages you actually want with a couple more tags, such as conditionally including pages if they have a <code>section: true</code> parameter set:</p>
<div class="highlight"><pre class="chroma"><code class="language-html" data-lang="html"><span class="p">&lt;</span><span class="nt">ul</span><span class="p">&gt;</span>
{% for page in site.pages %}
{%- if page.section -%}
<span class="p">&lt;</span><span class="nt">li</span><span class="p">&gt;&lt;</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">&#34;{{ page.url | absolute_url }}&#34;</span><span class="p">&gt;</span>{{ page.title }}<span class="p">&lt;/</span><span class="nt">a</span><span class="p">&gt;&lt;/</span><span class="nt">li</span><span class="p">&gt;</span>
{%- endif -%}
{% endfor %}
<span class="p">&lt;/</span><span class="nt">ul</span><span class="p">&gt;</span>
</code></pre></div><p>You can achieve the same effect with slightly less code in Hugo. Loop through Hugo&rsquo;s <code>.Pages</code> object using Go Template&rsquo;s <a href="https://golang.org/pkg/text/template/#hdr-Actions"><code>range</code> action</a>:</p>
<div class="highlight"><pre class="chroma"><code class="language-html" data-lang="html"><span class="p">&lt;</span><span class="nt">ul</span><span class="p">&gt;</span>
{{ range .Pages }}
<span class="p">&lt;</span><span class="nt">li</span><span class="p">&gt;</span>
<span class="p">&lt;</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">&#34;{{.Permalink}}&#34;</span><span class="p">&gt;</span>{{.Title}}<span class="p">&lt;/</span><span class="nt">a</span><span class="p">&gt;</span>
<span class="p">&lt;/</span><span class="nt">li</span><span class="p">&gt;</span>
{{ end }}
<span class="p">&lt;/</span><span class="nt">ul</span><span class="p">&gt;</span>
</code></pre></div><p>This template uses the <code>.Pages</code> object to return all the top-level pages in <code>content/</code> of your Hugo site. Since Hugo uses a specific folder for the site content you want rendered, there&rsquo;s no additional filtering necessary to build a simple menu of site pages.</p>
<h2 id="creating-a-menu-with-nested-links-from-a-data-list">Creating a menu with nested links from a data list</h2>
<p>Both site generators can use a separately defined data list of links to render a menu in your template. This is more suitable for creating nested links, like this:</p>
<div class="highlight"><pre class="chroma"><code class="language-md" data-lang="md">Introduction
Getting Started
Configuration
Deploying
Advanced Usage
All Configuration Settings
Customizing
Help and Support
</code></pre></div><p>Jekyll supports <a href="https://jekyllrb.com/docs/datafiles/">data files</a> in a few formats, including YAML. Here&rsquo;s the definition for the menu above in <code>_data/menu.yml</code>:</p>
<div class="highlight"><pre class="chroma"><code class="language-yml" data-lang="yml"><span class="k">section</span><span class="p">:</span><span class="w">
</span><span class="w"> </span>- <span class="k">page</span><span class="p">:</span><span class="w"> </span>Introduction<span class="w">
</span><span class="w"> </span><span class="k">url</span><span class="p">:</span><span class="w"> </span>/intro<span class="w">
</span><span class="w"> </span><span class="k">subsection</span><span class="p">:</span><span class="w">
</span><span class="w"> </span>- <span class="k">page</span><span class="p">:</span><span class="w"> </span>Getting<span class="w"> </span>Started<span class="w">
</span><span class="w"> </span><span class="k">url</span><span class="p">:</span><span class="w"> </span>/intro/quickstart<span class="w">
</span><span class="w"> </span>- <span class="k">page</span><span class="p">:</span><span class="w"> </span>Configuration<span class="w">
</span><span class="w"> </span><span class="k">url</span><span class="p">:</span><span class="w"> </span>/intro/config<span class="w">
</span><span class="w"> </span>- <span class="k">page</span><span class="p">:</span><span class="w"> </span>Deploying<span class="w">
</span><span class="w"> </span><span class="k">url</span><span class="p">:</span><span class="w"> </span>/intro/deploy<span class="w">
</span><span class="w"> </span>- <span class="k">page</span><span class="p">:</span><span class="w"> </span>Advanced<span class="w"> </span>Usage<span class="w">
</span><span class="w"> </span><span class="k">url</span><span class="p">:</span><span class="w"> </span>/usage<span class="w">
</span><span class="w"> </span><span class="k">subsection</span><span class="p">:</span><span class="w">
</span><span class="w"> </span>- <span class="k">page</span><span class="p">:</span><span class="w"> </span>Customizing<span class="w">
</span><span class="w"> </span><span class="k">url</span><span class="p">:</span><span class="w"> </span>/usage/customizing<span class="w">
</span><span class="w"> </span>- <span class="k">page</span><span class="p">:</span><span class="w"> </span>All<span class="w"> </span>Configuration<span class="w"> </span>Settings<span class="w">
</span><span class="w"> </span><span class="k">url</span><span class="p">:</span><span class="w"> </span>/usage/settings<span class="w">
</span><span class="w"> </span>- <span class="k">page</span><span class="p">:</span><span class="w"> </span>Help<span class="w"> </span>and<span class="w"> </span>Support<span class="w">
</span><span class="w"> </span><span class="k">url</span><span class="p">:</span><span class="w"> </span>/usage/support<span class="w">
</span></code></pre></div><p>Here&rsquo;s how to render the data in the sidebar template:</p>
<div class="highlight"><pre class="chroma"><code class="language-html" data-lang="html">{% for a in site.data.menu.section %}
<span class="p">&lt;</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">&#34;{{ a.url }}&#34;</span><span class="p">&gt;</span>{{ a.page }}<span class="p">&lt;/</span><span class="nt">a</span><span class="p">&gt;</span>
<span class="p">&lt;</span><span class="nt">ul</span><span class="p">&gt;</span>
{% for b in a.subsection %}
<span class="p">&lt;</span><span class="nt">li</span><span class="p">&gt;&lt;</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">&#34;{{ b.url }}&#34;</span><span class="p">&gt;</span>{{ b.page }}<span class="p">&lt;/</span><span class="nt">a</span><span class="p">&gt;&lt;/</span><span class="nt">li</span><span class="p">&gt;</span>
{% endfor %}
<span class="p">&lt;/</span><span class="nt">ul</span><span class="p">&gt;</span>
{% endfor %}
</code></pre></div><p>This method allows you to build a custom menu, two nesting levels deep. The nesting levels are limited by the <code>for</code> loops in the template. For a recursive version that handles further levels of nesting, see <a href="https://jekyllrb.com/tutorials/navigation/#scenario-9-nested-tree-navigation-with-recursion">Nested tree navigation with recursion</a>.</p>
<p>Hugo does something similar with its <a href="https://gohugo.io/templates/menu-templates/#section-menu-for-lazy-bloggers">menu templates</a>. You can define menu links in your <a href="https://gohugo.io/getting-started/configuration/">Hugo site config</a>, and even add useful properties that Hugo understands, like weighting. Here&rsquo;s a definition of the menu above in <code>config.yaml</code>:</p>
<div class="highlight"><pre class="chroma"><code class="language-yml" data-lang="yml"><span class="k">sectionPagesMenu</span><span class="p">:</span><span class="w"> </span>main<span class="w">
</span><span class="w">
</span><span class="w"></span><span class="k">menu</span><span class="p">:</span><span class="w">
</span><span class="w"> </span><span class="k">main</span><span class="p">:</span><span class="w">
</span><span class="w"> </span>- <span class="k">identifier</span><span class="p">:</span><span class="w"> </span>intro<span class="w">
</span><span class="w"> </span><span class="k">name</span><span class="p">:</span><span class="w"> </span>Introduction<span class="w">
</span><span class="w"> </span><span class="k">url</span><span class="p">:</span><span class="w"> </span>/intro/<span class="w">
</span><span class="w"> </span><span class="k">weight</span><span class="p">:</span><span class="w"> </span><span class="m">1</span><span class="w">
</span><span class="w"> </span>- <span class="k">name</span><span class="p">:</span><span class="w"> </span>Getting<span class="w"> </span>Started<span class="w">
</span><span class="w"> </span><span class="k">parent</span><span class="p">:</span><span class="w"> </span>intro<span class="w">
</span><span class="w"> </span><span class="k">url</span><span class="p">:</span><span class="w"> </span>/intro/quickstart/<span class="w">
</span><span class="w"> </span><span class="k">weight</span><span class="p">:</span><span class="w"> </span><span class="m">1</span><span class="w">
</span><span class="w"> </span>- <span class="k">name</span><span class="p">:</span><span class="w"> </span>Configuration<span class="w">
</span><span class="w"> </span><span class="k">parent</span><span class="p">:</span><span class="w"> </span>intro<span class="w">
</span><span class="w"> </span><span class="k">url</span><span class="p">:</span><span class="w"> </span>/intro/config/<span class="w">
</span><span class="w"> </span><span class="k">weight</span><span class="p">:</span><span class="w"> </span><span class="m">2</span><span class="w">
</span><span class="w"> </span>- <span class="k">name</span><span class="p">:</span><span class="w"> </span>Deploying<span class="w">
</span><span class="w"> </span><span class="k">parent</span><span class="p">:</span><span class="w"> </span>intro<span class="w">
</span><span class="w"> </span><span class="k">url</span><span class="p">:</span><span class="w"> </span>/intro/deploy/<span class="w">
</span><span class="w"> </span><span class="k">weight</span><span class="p">:</span><span class="w"> </span><span class="m">3</span><span class="w">
</span><span class="w"> </span>- <span class="k">identifier</span><span class="p">:</span><span class="w"> </span>usage<span class="w">
</span><span class="w"> </span><span class="k">name</span><span class="p">:</span><span class="w"> </span>Advanced<span class="w"> </span>Usage<span class="w">
</span><span class="w"> </span><span class="k">url</span><span class="p">:</span><span class="w"> </span>/usage/<span class="w">
</span><span class="w"> </span>- <span class="k">name</span><span class="p">:</span><span class="w"> </span>Customizing<span class="w">
</span><span class="w"> </span><span class="k">parent</span><span class="p">:</span><span class="w"> </span>usage<span class="w">
</span><span class="w"> </span><span class="k">url</span><span class="p">:</span><span class="w"> </span>/usage/customizing/<span class="w">
</span><span class="w"> </span><span class="k">weight</span><span class="p">:</span><span class="w"> </span><span class="m">2</span><span class="w">
</span><span class="w"> </span>- <span class="k">name</span><span class="p">:</span><span class="w"> </span>All<span class="w"> </span>Configuration<span class="w"> </span>Settings<span class="w">
</span><span class="w"> </span><span class="k">parent</span><span class="p">:</span><span class="w"> </span>usage<span class="w">
</span><span class="w"> </span><span class="k">url</span><span class="p">:</span><span class="w"> </span>/usage/settings/<span class="w">
</span><span class="w"> </span><span class="k">weight</span><span class="p">:</span><span class="w"> </span><span class="m">1</span><span class="w">
</span><span class="w"> </span>- <span class="k">name</span><span class="p">:</span><span class="w"> </span>Help<span class="w"> </span>and<span class="w"> </span>Support<span class="w">
</span><span class="w"> </span><span class="k">parent</span><span class="p">:</span><span class="w"> </span>usage<span class="w">
</span><span class="w"> </span><span class="k">url</span><span class="p">:</span><span class="w"> </span>/usage/support/<span class="w">
</span><span class="w"> </span><span class="k">weight</span><span class="p">:</span><span class="w"> </span><span class="m">3</span><span class="w">
</span></code></pre></div><p>Hugo uses the <code>identifier</code>, which must match the section name, along with the <code>parent</code> variable to handle nesting. Here&rsquo;s how to render the menu in the sidebar template:</p>
<div class="highlight"><pre class="chroma"><code class="language-html" data-lang="html"><span class="p">&lt;</span><span class="nt">ul</span><span class="p">&gt;</span>
{{ range .Site.Menus.main }}
{{ if .HasChildren }}
<span class="p">&lt;</span><span class="nt">li</span><span class="p">&gt;</span>
<span class="p">&lt;</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">&#34;{{ .URL }}&#34;</span><span class="p">&gt;</span>{{ .Name }}<span class="p">&lt;/</span><span class="nt">a</span><span class="p">&gt;</span>
<span class="p">&lt;/</span><span class="nt">li</span><span class="p">&gt;</span>
<span class="p">&lt;</span><span class="nt">ul</span> <span class="na">class</span><span class="o">=</span><span class="s">&#34;sub-menu&#34;</span><span class="p">&gt;</span>
{{ range .Children }}
<span class="p">&lt;</span><span class="nt">li</span><span class="p">&gt;</span>
<span class="p">&lt;</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">&#34;{{ .URL }}&#34;</span><span class="p">&gt;</span>{{ .Name }}<span class="p">&lt;/</span><span class="nt">a</span><span class="p">&gt;</span>
<span class="p">&lt;/</span><span class="nt">li</span><span class="p">&gt;</span>
{{ end }}
<span class="p">&lt;/</span><span class="nt">ul</span><span class="p">&gt;</span>
{{ else }}
<span class="p">&lt;</span><span class="nt">li</span><span class="p">&gt;</span>
<span class="p">&lt;</span><span class="nt">a</span> <span class="na">href</span><span class="o">=</span><span class="s">&#34;{{ .URL }}&#34;</span><span class="p">&gt;</span>{{ .Name }}<span class="p">&lt;/</span><span class="nt">a</span><span class="p">&gt;</span>
<span class="p">&lt;/</span><span class="nt">li</span><span class="p">&gt;</span>
{{ end }}
{{ end }}
<span class="p">&lt;/</span><span class="nt">ul</span><span class="p">&gt;</span>
</code></pre></div><p>The <code>range</code> function iterates over the menu data, and Hugo&rsquo;s <code>.Children</code> variable handles nested pages for you.</p>
<h2 id="putting-the-template-together">Putting the template together</h2>
<p>With your menu in your reusable sidebar bit (<code>_includes/sidebar.html</code> for Jekyll and <code>partials/sidebar.html</code> for Hugo), you can add it to the <code>default.html</code> template.</p>
<p>In Jekyll:</p>
<div class="highlight"><pre class="chroma"><code class="language-html" data-lang="html"><span class="cp">&lt;!DOCTYPE html&gt;</span>
<span class="p">&lt;</span><span class="nt">html</span> <span class="na">lang</span><span class="o">=</span><span class="s">&#34;{{ page.lang | default: site.lang | default: &#34;</span><span class="na">en</span><span class="err">&#34;</span> <span class="err">}}&#34;</span><span class="p">&gt;</span>
{%- include head.html -%}
<span class="p">&lt;</span><span class="nt">body</span><span class="p">&gt;</span>
{%- include sidebar.html -%}
{%- include header.html -%}
<span class="p">&lt;</span><span class="nt">div</span> <span class="na">id</span><span class="o">=</span><span class="s">&#34;content&#34;</span> <span class="na">class</span><span class="o">=</span><span class="s">&#34;page-content&#34;</span> <span class="na">aria-label</span><span class="o">=</span><span class="s">&#34;Content&#34;</span><span class="p">&gt;</span>
{{ content }}
<span class="p">&lt;/</span><span class="nt">div</span><span class="p">&gt;</span>
{%- include footer.html -%}
<span class="p">&lt;/</span><span class="nt">body</span><span class="p">&gt;</span>
<span class="p">&lt;/</span><span class="nt">html</span><span class="p">&gt;</span>
</code></pre></div><p>In Hugo:</p>
<div class="highlight"><pre class="chroma"><code class="language-html" data-lang="html"><span class="cp">&lt;!DOCTYPE html&gt;</span>
<span class="p">&lt;</span><span class="nt">html</span><span class="p">&gt;</span>
{{- partial &#34;head.html&#34; . -}}
<span class="p">&lt;</span><span class="nt">body</span><span class="p">&gt;</span>
{{- partial &#34;sidebar.html&#34; . -}}
{{- partial &#34;header.html&#34; . -}}
<span class="p">&lt;</span><span class="nt">div</span> <span class="na">id</span><span class="o">=</span><span class="s">&#34;content&#34;</span> <span class="na">class</span><span class="o">=</span><span class="s">&#34;page-content&#34;</span> <span class="na">aria-label</span><span class="o">=</span><span class="s">&#34;Content&#34;</span><span class="p">&gt;</span>
{{- block &#34;main&#34; . }}{{- end }}
<span class="p">&lt;/</span><span class="nt">div</span><span class="p">&gt;</span>
{{- partial &#34;footer.html&#34; . -}}
<span class="p">&lt;/</span><span class="nt">body</span><span class="p">&gt;</span>
<span class="p">&lt;/</span><span class="nt">html</span><span class="p">&gt;</span>
</code></pre></div><p>When the site is generated, each page will contain all the code from your <code>sidebar.html</code>.</p>
<h2 id="create-a-stylesheet">Create a stylesheet</h2>
<p>Both site generators accept Sass for creating CSS stylesheets. Jekyll <a href="https://jekyllrb.com/docs/assets/">has Sass processing built in</a>, and Hugo uses <a href="https://gohugo.io/hugo-pipes/scss-sass/">Hugo Pipes</a>. Both options have some quirks.</p>
<h3 id="sass-and-css-in-jekyll">Sass and CSS in Jekyll</h3>
<p>To process a Sass file in Jekyll, create your style definitions in the <code>_sass</code> directory. For example, in a file at <code>_sass/style-definitions.scss</code>:</p>
<div class="highlight"><pre class="chroma"><code class="language-scss" data-lang="scss"><span class="nv">$background-color</span><span class="o">:</span> <span class="mh">#eef</span> <span class="nv">!default</span><span class="p">;</span>
<span class="nv">$text-color</span><span class="o">:</span> <span class="mh">#111</span> <span class="nv">!default</span><span class="p">;</span>
<span class="nt">body</span> <span class="p">{</span>
<span class="nt">background-color</span><span class="nd">:</span> <span class="err">$</span><span class="nt">background-color</span><span class="p">;</span>
<span class="nt">color</span><span class="nd">:</span> <span class="err">$</span><span class="nt">text-color</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div><p>Jekyll won&rsquo;t generate this file directly, as it only processes files with front matter. To create the end-result filepath for your site&rsquo;s stylesheet, use a placeholder with empty front matter where you want the <code>.css</code> file to appear. For example, <code>assets/css/style.scss</code>. In this file, simply import your styles:</p>
<div class="highlight"><pre class="chroma"><code class="language-scss" data-lang="scss"><span class="nt">---</span>
<span class="nt">---</span>
<span class="o">@</span><span class="nt">import</span> <span class="s2">&#34;style-definitions&#34;</span><span class="p">;</span>
</code></pre></div><p>This rather hackish configuration has an upside: you can use Liquid template tags and variables in your placeholder file. This is a nice way to allow users to set variables from the site <code>_config.yml</code>, for example.</p>
<p>The resulting CSS stylesheet in your generated site has the path <code>/assets/css/style.css</code>. You can link to it in your site&rsquo;s <code>head.html</code> using:</p>
<div class="highlight"><pre class="chroma"><code class="language-html" data-lang="html"><span class="p">&lt;</span><span class="nt">link</span> <span class="na">rel</span><span class="o">=</span><span class="s">&#34;stylesheet&#34;</span> <span class="na">href</span><span class="o">=</span><span class="s">&#34;{{ &#34;</span><span class="err">/</span><span class="na">assets</span><span class="err">/</span><span class="na">css</span><span class="err">/</span><span class="na">style</span><span class="err">.</span><span class="na">css</span><span class="err">&#34;</span> <span class="err">|</span> <span class="na">relative_url</span> <span class="err">}}&#34;</span> <span class="na">media</span><span class="o">=</span><span class="s">&#34;screen&#34;</span><span class="p">&gt;</span>
</code></pre></div><h3 id="sass-and-hugo-pipes-in-hugo">Sass and Hugo Pipes in Hugo</h3>
<p>Hugo uses <a href="https://gohugo.io/hugo-pipes/scss-sass/">Hugo Pipes</a> to process Sass to CSS. You can achieve this by using Hugo&rsquo;s asset processing function, <code>resources.ToCSS</code>, which expects a source in the <code>assets/</code> directory. It takes the SCSS file as an argument. With your style definitions in a Sass file at <code>assets/sass/style.scss</code>, here&rsquo;s how to get, process, and link your Sass in your theme&rsquo;s <code>head.html</code>:</p>
<div class="highlight"><pre class="chroma"><code class="language-html" data-lang="html">{{ $style := resources.Get &#34;/sass/style.scss&#34; | resources.ToCSS }}
<span class="p">&lt;</span><span class="nt">link</span> <span class="na">rel</span><span class="o">=</span><span class="s">&#34;stylesheet&#34;</span> <span class="na">href</span><span class="o">=</span><span class="s">&#34;{{ $style.RelPermalink }}&#34;</span> <span class="na">media</span><span class="o">=</span><span class="s">&#34;screen&#34;</span><span class="p">&gt;</span>
</code></pre></div><p>Hugo asset processing <a href="https://gohugo.io/troubleshooting/faq/#i-get-tocss--this-feature-is-not-available-in-your-current-hugo-version">requires extended Hugo</a>, which you may not have by default. You can get extended Hugo from the <a href="https://github.com/gohugoio/hugo/releases">releases page</a>.</p>
<h2 id="configure-and-deploy-to-github-pages">Configure and deploy to GitHub Pages</h2>
<p>Before your site generator can build your site, it needs a configuration file to set some necessary parameters. Configuration files live in the site root directory. Among other settings, you can declare the name of the theme to use when building the site.</p>
<h3 id="configure-jekyll">Configure Jekyll</h3>
<p>Here&rsquo;s a minimal <code>_config.yml</code> for Jekyll:</p>
<div class="highlight"><pre class="chroma"><code class="language-yml" data-lang="yml"><span class="k">title</span><span class="p">:</span><span class="w"> </span>Your<span class="w"> </span>awesome<span class="w"> </span>title<span class="w">
</span><span class="w"></span><span class="k">description</span><span class="p">:</span><span class="w"> </span>&gt;- <span class="c"># this means to ignore newlines until &#34;baseurl:&#34;</span><span class="w">
</span><span class="w"> </span>Write<span class="w"> </span>an<span class="w"> </span>awesome<span class="w"> </span>description<span class="w"> </span>for<span class="w"> </span>your<span class="w"> </span>new<span class="w"> </span>site<span class="w"> </span>here.<span class="w"> </span>You<span class="w"> </span>can<span class="w"> </span>edit<span class="w"> </span>this<span class="w">
</span><span class="w"> </span>line<span class="w"> </span>in<span class="w"> </span>_config.yml.<span class="w"> </span>It<span class="w"> </span>will<span class="w"> </span>appear<span class="w"> </span>in<span class="w"> </span>your<span class="w"> </span>document<span class="w"> </span>head<span class="w"> </span>meta<span class="w"> </span>(for<span class="w">
</span><span class="w"> </span>Google<span class="w"> </span>search<span class="w"> </span>results)<span class="w"> </span>and<span class="w"> </span>in<span class="w"> </span>your<span class="w"> </span>feed.xml<span class="w"> </span>site<span class="w"> </span>description.<span class="w">
</span><span class="w"></span><span class="k">baseurl</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;&#34;</span><span class="w"> </span><span class="c"># the subpath of your site, e.g. /blog</span><span class="w">
</span><span class="w"></span><span class="k">url</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;&#34;</span><span class="w"> </span><span class="c"># the base hostname &amp; protocol for your site, e.g. http://example.com</span><span class="w">
</span><span class="w"></span><span class="k">theme</span><span class="p">:</span><span class="w"> </span><span class="c"># for gem-based themes</span><span class="w">
</span><span class="w"></span><span class="k">remote_theme</span><span class="p">:</span><span class="w"> </span><span class="c"># for themes hosted on GitHub, when used with GitHub Pages</span><span class="w">
</span></code></pre></div><p>With <code>remote_theme</code>, any <a href="https://help.github.com/en/github/working-with-github-pages/adding-a-theme-to-your-github-pages-site-using-jekyll#adding-a-jekyll-theme-in-your-sites-_configyml-file">Jekyll theme hosted on GitHub can be used</a> with sites hosted on GitHub Pages.</p>
<p>Jekyll has a <a href="https://jekyllrb.com/docs/configuration/default/">default configuration</a>, so any parameters added to your configuration file will override the defaults. Here are <a href="https://jekyllrb.com/docs/configuration/options/">additional configuration settings</a>.</p>
<h3 id="configure-hugo">Configure Hugo</h3>
<p>Here&rsquo;s a minimal example of Hugo&rsquo;s <code>config.yml</code>:</p>
<div class="highlight"><pre class="chroma"><code class="language-yml" data-lang="yml"><span class="k">baseURL</span><span class="p">:</span><span class="w"> </span>https<span class="p">:</span>//example.com/<span class="w"> </span><span class="c"># The full domain your site will live at</span><span class="w">
</span><span class="w"></span><span class="k">languageCode</span><span class="p">:</span><span class="w"> </span>en-us<span class="w">
</span><span class="w"></span><span class="k">title</span><span class="p">:</span><span class="w"> </span>Hugo<span class="w"> </span>Docs<span class="w"> </span>Site<span class="w">
</span><span class="w"></span><span class="k">theme</span><span class="p">:</span><span class="w"> </span><span class="c"># theme name</span><span class="w">
</span></code></pre></div><p>Hugo makes no assumptions, so if a necessary parameter is missing, you&rsquo;ll see a warning when building or serving your site. Here are <a href="https://gohugo.io/getting-started/configuration/#all-configuration-settings">all configuration settings for Hugo</a>.</p>
<h3 id="deploy-to-github-pages">Deploy to GitHub Pages</h3>
<p>Both generators build your site with a command.</p>
<p>For Jekyll, use <code>jekyll build</code>. See <a href="https://jekyllrb.com/docs/configuration/options/#build-command-options">further build options here</a>.</p>
<p>For Hugo, use <code>hugo</code>. You can run <code>hugo help</code> or see <a href="https://gohugo.io/getting-started/usage/#test-installation">further build options here</a>.</p>
<p>You&rsquo;ll have to choose the source for your GitHub Pages site; once done, your site will update each time you push a new build. Of course, you can also automate your GitHub Pages build using GitHub Actions. Here&rsquo;s one for <a href="https://github.com/victoriadrake/hugo-latest-cd">building and deploying with Hugo</a>, and one for <a href="https://github.com/victoriadrake/jekyll-cd">building and deploying Jekyll</a>.</p>
<h2 id="showtime">Showtime!</h2>
<p>All the substantial differences between these two generators are under the hood; all the same, let&rsquo;s take a look at the finished themes, in two color variations.</p>
<p>Here&rsquo;s Hugo:</p>
<p><img src="ogd_hugo.png" alt="OpenGitDocs theme for Hugo"></p>
<p>Here&rsquo;s Jekyll:</p>
<p><img src="ogd_jekyll.png" alt="OpenGitDocs theme for Jekyll"></p>
<p>Spiffy!</p>
<h2 id="wait-who-won">Wait who won?</h2>
<p>🤷</p>
<p>Both Hugo and Jekyll have their quirks and conveniences.</p>
<p>From this developer&rsquo;s perspective, Jekyll is a workable choice for simple sites without complicated organizational needs. If you&rsquo;re looking to render some one-page posts in an <a href="https://jekyllrb.com/docs/themes/">available theme</a> and host with GitHub Pages, Jekyll will get you up and running fairly quickly.</p>
<p>Personally, I use Hugo. I like the organizational capabilities of its Page Bundles, and it&rsquo;s backed by a dedicated and conscientious team that really seems to strive to facilitate convenience for their users. This is evident in Hugo&rsquo;s many functions, and handy tricks like <a href="https://gohugo.io/content-management/image-processing/">Image Processing</a> and <a href="https://gohugo.io/content-management/shortcodes/">Shortcodes</a>. They seem to release new fixes and versions about as often as I make a new cup of coffee - which, depending on your use case, may be fantastic, or annoying.</p>
<p>If you still can&rsquo;t decide, don&rsquo;t worry. The <a href="https://github.com/opengitdocs">OpenGitDocs documentation theme</a> I created is available for both Hugo and Jekyll. Start with one, switch later if you want. That&rsquo;s the benefit of having options.</p>
Outsourcing security with 1Password, Authy, and Privacy.comhttps://victoria.dev/blog/outsourcing-security-with-1password-authy-and-privacy.com/
Mon, 16 Mar 2020 08:12:32 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/outsourcing-security-with-1password-authy-and-privacy.com/Take some work off your plate while beefing up security with three changes you can make today.
]]>
<p>Unstable times are insecure times, and we&rsquo;ve already got enough going on to deal with. When humans are busy and under stress, we tend to get lax in less-obviously-pressing areas, like the security of our online accounts. These areas only become an obvious problem when it&rsquo;s too late for prevention. Thankfully, most of the work necessary to keep up our cybersecurity measures can be outsourced.</p>
<p>Implementing proper cybersecurity measures can be fiddly, and I especially dislike fiddling with things that I could avoid fiddling with. These fiddly things include resetting forgotten passwords, transferring multifactor authentication (MFA) codes when I change devices, and dealing with the fallout of compromised payment details in the event one of my accounts is still breached.</p>
<p>Here are three changes I&rsquo;ve made that significantly reduce the chances of needing to fiddle with any of these things again. You can too.</p>
<h2 id="1password">1Password</h2>
<p>I&rsquo;ve historically avoided password managers because of an irrational knee-jerk reaction to putting all my eggs in one basket. You know what&rsquo;s great for irrational reactions? Education. To figure out if putting all my passwords into a password manager is more secure than not using one, I set out to see what some smart people wrote about it.</p>
<p>First, we need to know a thing or two about passwords. Troy Hunt figured out almost a decade ago that <a href="https://www.troyhunt.com/only-secure-password-is-one-you-cant/">trying to remember strong passwords doesn&rsquo;t work</a>. In more recent times, Alex Weinert expanded on this in <a href="https://techcommunity.microsoft.com/t5/azure-active-directory-identity/your-pa-word-doesn-t-matter/ba-p/731984">Your Pa$$word doesn&rsquo;t matter</a>. TL;DR: our brains aren&rsquo;t better at passwords than computers, and please use MFA.</p>
<p>So passwords don&rsquo;t matter, but complicated passwords are still better than memorable and guessable ones. Since I&rsquo;ve next to no hope of remembering a dozen variations of <code>p/q2-q4!</code> (I&rsquo;m not a <a href="https://inbox.vuxu.org/tuhs/CAG=a+rj8VcXjS-ftaj8P2_duLFSUpmNgB4-dYwnTsY_8g5WdEA@mail.gmail.com/">chess player</a>), this is a task I can outsource to <a href="https://1password.com/">1Password</a>. I&rsquo;ll still need to remember one, long, complicated master password - 1Password uses this to encrypt my data, so I really can&rsquo;t lose it - but I can handle just one.</p>
<p>Using 1Password specifically has another, decidedly obvious, advantage. I chose 1Password because of their <a href="https://support.1password.com/watchtower/">Watchtower</a> feature. <a href="https://www.troyhunt.com/have-i-been-pwned-is-now-partnering-with-1password/">Thanks to Troy Hunt&rsquo;s Have I Been Pwned</a>, Watchtower will alert you if any of your passwords show up in a breach so you can change them. Passwords still don&rsquo;t completely work, but this is probably the best band-aid there is.</p>
<p>One last bonus is that using a password manager is a heck of a lot more convenient. Complicated passwords need not take two tries to type. When it comes to sites that I only rarely use, and don&rsquo;t consider important, I&rsquo;m typically far more likely to end up (re)setting those passwords to something memorable, and thus something easily hacked. Even - perhaps especially - unimportant sites can open doors to your more important ones. Using 1Password and generated passwords, those sites are now also first-class citizens in the land of strong passwords, instead of being half-abandoned and half-open attack vectors.</p>
<p>So, yes, all my eggs are in one basket. A well-protected, complex, and monitored basket, as opposed to being scattered about in several of those paper cartons from the grocery store that don&rsquo;t really close and certainly can&rsquo;t survive a <em>rather gentle bump</em> as you come in the doorway, Victoria, how many times do I need to remind you to be careful.</p>
<h2 id="authy">Authy</h2>
<p>Okay - so it&rsquo;s more like one-and-a-half baskets. 🤷🏻</p>
<p><a href="https://authy.com/">Authy</a>, from the folks over at <a href="https://www.twilio.com">Twilio</a>, provides a 2FA solution that&rsquo;s more secure than SMS (I find this to be an interesting intersection, coming from Twilio, and I applaud.) <a href="https://authy.com/blog/authy-vs-google-authenticator/">Unlike Google Authenticator</a>, you can choose to back up your 2FA codes in case you lose or change your phone. (1Password offers 2FA functionality as well - but, you know, redundancies.)</p>
<p>With Authy, your back up is encrypted with your password, similarly to how 1Password works. This makes it the second password you can&rsquo;t forget, if you don&rsquo;t want to lose access to your codes. If you reset your account, they all go away. I can deal with remembering two passwords; I&rsquo;ll take that trade.</p>
<p>I&rsquo;ve tried other methods of MFA, including hardware keys, which can make accessing accounts on your phone more complicated than I care to put up with. I find the combination of 1Password and Authy to be the most practical combination of convenience and security that yet exists in my knowledge.</p>
<h2 id="privacycom">Privacy.com</h2>
<p>Finally, there&rsquo;s one last line of defense you can put in place in the unfortunate event that one of your accounts is still compromised. All the strong passwords and MFA in the world won&rsquo;t help if you open the doors yourself, and scams and phishing are a thing.</p>
<p>Since it&rsquo;s rather impractical to use a different real credit card every place you shop, virtual cards are just a great idea. There&rsquo;s no good reason to spend an afternoon (or more) resetting your payment information on every account just to thwart a misbehaving merchant or patch up a data breach from that online shop for cute salt shakers you made a purchase at last year (just me?).</p>
<p>By setting up a separate virtual card for each merchant, in the event that one of those merchants is compromised, I can simply pause or delete that card. None of my other accounts or actual bank details are caught up in the process. Cards can have time-oriented limits or be one-off burner numbers, making them ideal for setting up subscriptions.</p>
<p>This is the sort of basic functionality that I hope, one day, becomes more prevalent from banks and credit cards. In the meantime, I&rsquo;ll keep using <a href="https://privacy.com/join/Q6V3V">Privacy.com</a>. That&rsquo;s my referral link; if you&rsquo;d like to thank me by using it, we&rsquo;ll both get five bucks as a bonus.</p>
<h2 id="outsource-better-security">Outsource better security</h2>
<p>All together, implementing these changes will probably take up an afternoon, depending on how many accounts you have. It&rsquo;s worth it for the time you&rsquo;d otherwise spend resetting passwords, setting up new devices, or (knock on wood) recovering from compromised banking details. Best of all, you&rsquo;ll have continual protection just running in the background - an effortless boost to your <a href="https://victoria.dev/blog/personal-cybersecurity-posture-for-when-youre-just-this-guy-you-know/">personal cybersecurity posture</a>.</p>
<p>We have the technology. Free up some brain cycles to focus on other things - or simply remove some unnecessary stress from your life by outsourcing the fiddly bits.</p>
<h2 id="encore-notes-on-browser-based-password-managers">Encore: notes on browser-based password managers</h2>
<p><em>Thanks to <a href="https://dev.to/savagepixie/comment/ml7o">SavagePixie on dev.to</a> for raising the question.</em></p>
<p>Browser password managers <a href="http://blog.elliottkember.com/chromes-insane-password-security-strategy">used to be a lot less secure than they are now</a> (2013), but even recently, I&rsquo;ve seen some claim <a href="https://hackernoon.com/why-you-should-never-save-passwords-on-chrome-or-firefox-96b770cfd0d0">they&rsquo;re not very well thought-out</a>. I haven&rsquo;t tested them myself, so I can&rsquo;t confirm or deny.</p>
<p>I&rsquo;ll note, though, that not all browsers continue to pay the same level of attention. Firefox has <a href="https://www.mozilla.org/en-US/firefox/lockwise/">Lockwise</a>, which is sort of a browser-based password manager, but one much more similar to 1Password than any other I&rsquo;ve seen. It has standalone apps, and uses your Firefox account to encrypt your synced data using your password with (they say) 256-bit encryption. Mozilla also has a partnership with Have I Been Pwned, so you&rsquo;ll get alerts if it detects previously breached credentials. As a bonus, it&rsquo;s <a href="https://github.com/mozilla-lockwise">open source</a>.</p>
<p>Sounds perfect; so why am I not using Lockwise? I&rsquo;m a Mozilla fan in general; open source, even more so. Unfortunately, in the world of password managers, Lockwise is relatively new. Even apps built by excellent people need some time to <a href="https://security.stackexchange.com/a/220985">work out</a> <a href="https://medium.com/@JoeKreydt/how-secure-is-firefox-lockwise-password-manager-51d44dcf4dbc">the kinks</a>. I&rsquo;ll probably check back in a couple years and re-evaluate.</p>
SQLite in production with WAL 🔥https://victoria.dev/blog/sqlite-in-production-with-wal/
Thu, 05 Mar 2020 10:14:43 -0500hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/sqlite-in-production-with-wal/An underappreciated candidate for light and fast database transactions.
]]>
<p><a href="https://sqlite.org/index.html">SQLite</a> (&ldquo;see-quell-lite&rdquo;) is a lightweight Sequel, or Structured Query Language (<a href="https://en.wikipedia.org/wiki/SQL">SQL</a>), database engine. Instead of using the client-server database management system model, SQLite is self-contained in a single file. It is library, database, and data, all in one package.</p>
<p>For certain applications, SQLite is a solid choice for a production database. It&rsquo;s lightweight, ultra-portable, and has no external dependencies. Remember when MacBook Air first came out? It&rsquo;s nothing like that.</p>
<p>SQLite is best suited for production use in applications that:</p>
<ul>
<li>Desire fast and simple set up.</li>
<li>Require high reliability in a small package.</li>
<li>Have, and want to retain, a small footprint.</li>
<li>Are read-heavy but not write-heavy.</li>
<li>Don&rsquo;t need multiple user accounts or features like multiversion concurrency snapshots.</li>
</ul>
<p>If your application can benefit from SQLite&rsquo;s serverless convenience, you may like to know about the different modes available for managing database changes.</p>
<h2 id="with-and-without-wal">With and without WAL</h2>
<p>POSIX <a href="https://linux.die.net/man/2/fsync">system call <code>fsync()</code></a> commits buffered data (data saved in the operating system cache) referred to by a specified file descriptor to permanent storage or disk. This is relevant to understanding the difference between SQLite&rsquo;s two modes, as <code>fsync()</code> will block until the device reports the transfer is complete.</p>
<p>For efficiency, SQLite uses <a href="https://sqlite.org/atomiccommit.html">atomic commits</a> to batch database changes into a single transaction. This enables the apparent writing of many transactions to a database file simultaneously. Atomic commits are performed using one of two modes: a rollback journal, or a write-ahead log (WAL).</p>
<h3 id="rollback-journal">Rollback journal</h3>
<p>A <a href="https://www.sqlite.org/lockingv3.html#rollback">rollback journal</a> is essentially a back-up file created by SQLite before write changes occur on a database file. It has the advantage of providing high reliability by helping SQLite restore the database to its original state in case a write operation is compromised during the disk-writing process.</p>
<p>Assuming a cold cache, SQLite first needs to read the relevant pages from a database file before it can write to it. Information is read out into the operating system cache, then transferred into user space. SQLite obtains a reserved lock on the database file, preventing other processes from writing to the database. At this point, other processes may still read from the database.</p>
<p>SQLite creates a separate file, the rollback journal, with the original content of the pages that will be changed. Initially existing in the cache, the rollback journal is written to persistent disk storage with <code>fsync()</code> to enable SQLite to restore the database should its next operations be compromised.</p>
<p>SQLite then obtains an exclusive lock preventing other processes from reading or writing, and writes the page changes to the database file in cache. Since writing to disk is slower than interaction with the cache, writing to disk doesn&rsquo;t occur immediately. The rollback journal continues to exist until changes are safely written to disk, with a second <code>fsync()</code>. From a user-space process point of view, the change to the disk (the COMMIT, or end of the transaction) happens instantaneously once the rollback journal is deleted - hence, atomic commits. However, the two <code>fsync()</code> operations required to complete the COMMIT make this option, from a transactional standpoint, slower than SQLite&rsquo;s lesser known WAL mode.</p>
<h3 id="write-ahead-logging-wal">Write-ahead logging (WAL)</h3>
<p>While the rollback journal method uses a separate file to preserve the original database state, the <a href="https://www.sqlite.org/wal.html">WAL method</a> uses a separate WAL file to instead record the changes. Instead of a COMMIT depending on writing changes to disk, a COMMIT in WAL mode occurs when a record of one or more commits is appended to the WAL. This has the advantage of not requiring blocking read or write operations to the database file in order to make a COMMIT, so more transactions can happen concurrently.</p>
<p>WAL mode introduces the concept of the checkpoint, which is when the WAL file is synced to persistent storage before all its transactions are transferred to the database file. You can optionally specify when this occurs, but SQLite provides reasonable defaults. The checkpoint is the WAL version of the atomic commit.</p>
<p>In WAL mode, write transactions are performed faster than in the traditional rollback journal mode. Each transaction involves writing the changes only once to the WAL file instead of twice - to the rollback journal, and then to disk - before the COMMIT signals that the transaction is over.</p>
<h2 id="the-simplicity-of-sqlite">The simplicity of SQLite</h2>
<p>For medium-sized read-heavy applications, SQLite may be a great choice. Using SQLite in WAL mode may make it an even better one. Benchmarks on the smallest EC2 instance, with no provisioned <a href="https://en.wikipedia.org/wiki/IOPS">IOPS</a>, put this little trooper at 400 write transactions per second, and thousands of reads. That&rsquo;s some perfectly adequate capability, in a perfectly compact package.</p>
Multithreaded Python: slithering through an I/O bottleneckhttps://victoria.dev/blog/multithreaded-python-slithering-through-an-i/o-bottleneck/
Fri, 28 Feb 2020 09:31:02 -0500hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/multithreaded-python-slithering-through-an-i/o-bottleneck/How taking advantage of parallelism in Python can make your software orders of magnitude faster.
]]>
<p>I recently developed a project that I called <a href="https://github.com/victoriadrake/hydra-link-checker">Hydra</a>: a multithreaded link checker written in Python. Unlike many Python site crawlers I found while researching, Hydra uses only standard libraries, with no external dependencies like BeautifulSoup. It&rsquo;s intended to be run as part of a CI/CD process, so part of its success depended on being fast.</p>
<p>Multiple threads in Python is a bit of a bitey subject (not sorry) in that the Python interpreter doesn&rsquo;t actually let multiple threads execute at the same time. Python&rsquo;s <a href="https://wiki.python.org/moin/GlobalInterpreterLock">Global Interpreter Lock</a>, or GIL, prevents multiple threads from executing Python bytecodes at once. Each thread that wants to execute must first wait for the GIL to be released by the currently executing thread. The GIL is pretty much the microphone in a low-budget conference panel, except where no one gets to shout.</p>
<p>This has the advantage of preventing <a href="https://en.wikipedia.org/wiki/Race_condition">race conditions</a>. It does, however, lack the performance advantages afforded by running multiple tasks in parallel. (If you&rsquo;d like a refresher on concurrency, parallelism, and multithreading, see <a href="https://victoria.dev/blog/concurrency-parallelism-and-the-many-threads-of-santa-claus/">Concurrency, parallelism, and the many threads of Santa Claus</a>.) While I prefer Go for its convenient first-class primitives that support concurrency (see <a href="https://tour.golang.org/concurrency/1">Goroutines</a>), this project&rsquo;s recipients were more comfortable with Python. I took it as an opportunity to test and explore!</p>
<p>Simultaneously performing multiple tasks in Python isn&rsquo;t impossible; it just takes a little extra work. For Hydra, the main advantage is in overcoming the input/output (I/O) bottleneck.</p>
<p>In order to get web pages to check, Hydra needs to go out to the Internet and fetch them. When compared to tasks that are performed by the CPU alone, going out over the network is comparatively slower. How slow?</p>
<p>Here are approximate timings for tasks performed on a typical PC:</p>
<table>
<thead>
<tr>
<th></th>
<th>Task</th>
<th>Time</th>
</tr>
</thead>
<tbody>
<tr>
<td>CPU</td>
<td>execute typical instruction</td>
<td>1/1,000,000,000 sec = 1 nanosec</td>
</tr>
<tr>
<td>CPU</td>
<td>fetch from L1 cache memory</td>
<td>0.5 nanosec</td>
</tr>
<tr>
<td>CPU</td>
<td>branch misprediction</td>
<td>5 nanosec</td>
</tr>
<tr>
<td>CPU</td>
<td>fetch from L2 cache memory</td>
<td>7 nanosec</td>
</tr>
<tr>
<td>RAM</td>
<td>Mutex lock/unlock</td>
<td>25 nanosec</td>
</tr>
<tr>
<td>RAM</td>
<td>fetch from main memory</td>
<td>100 nanosec</td>
</tr>
<tr>
<td>Network</td>
<td>send 2K bytes over 1Gbps network</td>
<td>20,000 nanosec</td>
</tr>
<tr>
<td>RAM</td>
<td>read 1MB sequentially from memory</td>
<td>250,000 nanosec</td>
</tr>
<tr>
<td>Disk</td>
<td>fetch from new disk location (seek)</td>
<td>8,000,000 nanosec (8ms)</td>
</tr>
<tr>
<td>Disk</td>
<td>read 1MB sequentially from disk</td>
<td>20,000,000 nanosec (20ms)</td>
</tr>
<tr>
<td>Network</td>
<td>send packet US to Europe and back</td>
<td>150,000,000 nanosec (150ms)</td>
</tr>
</tbody>
</table>
<p>Peter Norvig first published these numbers some years ago in <a href="http://norvig.com/21-days.html#answers">Teach Yourself Programming in Ten Years</a>. Since computers and their components change year over year, the exact numbers shown above aren&rsquo;t the point. What these numbers help to illustrate is the difference, in orders of magnitude, between operations.</p>
<p>Compare the difference between fetching from main memory and sending a simple packet over the Internet. While both these operations occur in less than the blink of an eye (literally) from a human perspective, you can see that sending a simple packet over the Internet is over a million times slower than fetching from RAM. It&rsquo;s a difference that, in a single-thread program, can quickly accumulate to form troublesome bottlenecks.</p>
<p>In Hydra, the task of parsing response data and assembling results into a report is relatively fast, since it all happens on the CPU. The slowest portion of the program&rsquo;s execution, by over six orders of magnitude, is network latency. Not only does Hydra need to fetch packets, but whole web pages! One way of improving Hydra&rsquo;s performance is to find a way for the page fetching tasks to execute without blocking the main thread.</p>
<p>Python has a couple options for doing tasks in parallel: multiple processes, or multiple threads. These methods allow you to circumvent the GIL and speed up execution in a couple different ways.</p>
<h2 id="multiple-processes">Multiple processes</h2>
<p>To execute parallel tasks using multiple processes, you can use Python&rsquo;s <a href="https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ProcessPoolExecutor"><code>ProcessPoolExecutor</code></a>. A concrete subclass of <a href="https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor"><code>Executor</code></a> from the <a href="https://docs.python.org/3/library/concurrent.futures.html"><code>concurrent.futures</code> module</a>, <code>ProcessPoolExecutor</code> uses a pool of processes spawned with the <a href="https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing"><code>multiprocessing</code> module</a> to avoid the GIL.</p>
<p>This option uses worker subprocesses that maximally default to the number of processors on the machine. The <code>multiprocessing</code> module allows you to maximally parallelize function execution across processes, which can really speed up compute-bound (or <a href="https://en.wikipedia.org/wiki/CPU-bound">CPU-bound</a>) tasks.</p>
<p>Since the main bottleneck for Hydra is I/O and not the processing to be done by the CPU, I&rsquo;m better served by using multiple threads.</p>
<h2 id="multiple-threads">Multiple threads</h2>
<p>Fittingly named, Python&rsquo;s <a href="https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor"><code>ThreadPoolExecutor</code></a> uses a pool of threads to execute asynchronous tasks. Also a subclass of <a href="https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor"><code>Executor</code></a>, it uses a defined number of maximum worker threads (at least five by default, according to the formula <code>min(32, os.cpu_count() + 4)</code>) and reuses idle threads before starting new ones, making it pretty efficient.</p>
<p>Here is a snippet of Hydra with comments showing how Hydra uses <code>ThreadPoolExecutor</code> to achieve parallel multithreaded bliss:</p>
<div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="c1"># Create the Checker class</span>
<span class="k">class</span> <span class="nc">Checker</span><span class="p">:</span>
<span class="c1"># Queue of links to be checked</span>
<span class="n">TO_PROCESS</span> <span class="o">=</span> <span class="n">Queue</span><span class="p">()</span>
<span class="c1"># Maximum workers to run</span>
<span class="n">THREADS</span> <span class="o">=</span> <span class="mi">100</span>
<span class="c1"># Maximum seconds to wait for HTTP response</span>
<span class="n">TIMEOUT</span> <span class="o">=</span> <span class="mi">60</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">url</span><span class="p">):</span>
<span class="o">...</span>
<span class="c1"># Create the thread pool</span>
<span class="bp">self</span><span class="o">.</span><span class="n">pool</span> <span class="o">=</span> <span class="n">futures</span><span class="o">.</span><span class="n">ThreadPoolExecutor</span><span class="p">(</span><span class="n">max_workers</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">THREADS</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">run</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="c1"># Run until the TO_PROCESS queue is empty</span>
<span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">target_url</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">TO_PROCESS</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">block</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">timeout</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
<span class="c1"># If we haven&#39;t already checked this link</span>
<span class="k">if</span> <span class="n">target_url</span><span class="p">[</span><span class="s2">&#34;url&#34;</span><span class="p">]</span> <span class="ow">not</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">visited</span><span class="p">:</span>
<span class="c1"># Mark it as visited</span>
<span class="bp">self</span><span class="o">.</span><span class="n">visited</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">target_url</span><span class="p">[</span><span class="s2">&#34;url&#34;</span><span class="p">])</span>
<span class="c1"># Submit the link to the pool</span>
<span class="n">job</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">pool</span><span class="o">.</span><span class="n">submit</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">load_url</span><span class="p">,</span> <span class="n">target_url</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">TIMEOUT</span><span class="p">)</span>
<span class="n">job</span><span class="o">.</span><span class="n">add_done_callback</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">handle_future</span><span class="p">)</span>
<span class="k">except</span> <span class="n">Empty</span><span class="p">:</span>
<span class="k">return</span>
<span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="n">e</span><span class="p">)</span>
</code></pre></div><p>You can view the full code in <a href="https://github.com/victoriadrake/hydra-link-checker">Hydra&rsquo;s GitHub repository</a>.</p>
<h2 id="single-thread-to-multithread">Single thread to multithread</h2>
<p>If you&rsquo;d like to see the full effect, I compared the run times for checking my website between a prototype single-thread program, and the <del>multiheaded</del>multithreaded Hydra.</p>
<div class="highlight"><pre class="chroma"><code class="language-text" data-lang="text">time python3 slow-link-check.py https://victoria.dev
real 17m34.084s
user 11m40.761s
sys 0m5.436s
time python3 hydra.py https://victoria.dev
real 0m15.729s
user 0m11.071s
sys 0m2.526s
</code></pre></div><p>The single-thread program, which blocks on I/O, ran in about seventeen minutes. When I first ran the multithreaded version, it finished in 1m13.358s - after some profiling and tuning, it took a little under sixteen seconds. Again, the exact times don&rsquo;t mean all that much; they&rsquo;ll vary depending on factors such as the size of the site being crawled, your network speed, and your program&rsquo;s balance between the overhead of thread management and the benefits of parallelism.</p>
<p>The more important thing, and the result I&rsquo;ll take any day, is a program that runs some orders of magnitude faster.</p>
Breaking bottlenecks 🍾https://victoria.dev/blog/breaking-bottlenecks/
Tue, 25 Feb 2020 12:50:29 -0500hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/breaking-bottlenecks/A talk on the benefits of non-blocking functions for programs, developers, and organizations.
]]>
<p><em>I recently gave a lecture on the benefits of building non-blocking processes. This is a write-up of the full talk, minus any &ldquo;ums&rdquo; that may have occurred. You can <a href="https://victoria.dev/slides/bottlenecks">view the slides here</a>.</em></p>
<p>I&rsquo;ve been helping out a group called the Open Web Application Security Project (OWASP). They&rsquo;re a non-profit foundation that produces some of the foremost application testing guides and cybersecurity resources. OWASP&rsquo;s publications, checklists, and reference materials are a help to security professionals, penetration testers, and developers all over the world. Most of the individual teams that create these materials are run almost entirely by volunteers.</p>
<p>OWASP is a great group doing important work. I&rsquo;ve seen this firsthand as part of the core team that produces the Web Security Testing Guide. However, while OWASP inspires in its large volunteer base, it lacks in the area of central organization.</p>
<p>This lack of organization was most recently apparent in the group&rsquo;s website, <a href="https://owasp.org">OWASP.org</a>. A big organization with an even bigger website to match, OWASP.org enjoys hundreds of thousands of visitors. Unfortunately, many of its pages - individually managed by disparate projects - are infrequently updated. Some are abandoned. The website as a whole lacks a centralized quality assurance process, and as a result, OWASP.org is peppered with broken links.</p>
<h2 id="the-trouble-with-broken-links">The trouble with broken links</h2>
<p>Customers don&rsquo;t like broken links; attackers really do. That&rsquo;s because broken links are a security vulnerability. Broken links can signal opportunities for attacks like <a href="https://edoverflow.com/2017/broken-link-hijacking/">broken link hijacking</a> and <a href="https://www.hackerone.com/blog/Guide-Subdomain-Takeovers">subdomain takeovers</a>. At their least effective, these attacks can be embarrassing; at their worst, severely damaging to businesses and organizations. One OWASP group, the Application Security Verification Standard (ASVS) project, writes about <a href="https://github.com/OWASP/ASVS/blob/d9e0ac99828ef3c1e9233bd8a1f691f2a6958aa3/4.0/en/0x18-V10-Malicious.md#v103-deployed-application-integrity-controls">integrity controls</a> that can help to mitigate the likelihood of these attacks. This knowledge, unfortunately, has not yet propagated throughout the rest of OWASP yet.</p>
<p>This is the story of how I created a fast and efficient tool to help OWASP solve this problem.</p>
<h2 id="the-job">The job</h2>
<p>I took on the task of creating a program that could run as part of a CI/CD process to detect and report broken links. The program needed to:</p>
<ul>
<li>Find and enumerate all the broken links on OWASP.org in a report.</li>
<li>Keep track of the parent pages the broken links were on so they could be fixed.</li>
<li>Run efficiently as part of a CI/CD pipeline.</li>
</ul>
<p>Essentially; I need to build a web crawler.</p>
<p>My original journey through this process was also in Python, as that was a comfortable language choice for everyone in the OWASP group. Personally, I prefer to use Go for higher performance as it offers more convenient concurrency primitives. Between the task and this talk, I wrote three programs: a prototype single-thread Python program, a multithreaded Python program, and a Go program using goroutines. We&rsquo;ll see a comparison of how each worked out near the end of the talk - first, let&rsquo;s explore how to build a web crawler.</p>
<h2 id="prototyping-a-web-crawler">Prototyping a web crawler</h2>
<p>Here&rsquo;s what our web crawler will need to do:</p>
<ol>
<li>Get the HTML data of the first page of the website (for example, <code>https://victoria.dev</code>)</li>
<li>Check all of the links on the page
<ol>
<li>Keep track of the links we&rsquo;ve already visited so we don&rsquo;t end up checking them twice</li>
<li>Record any broken links we find</li>
</ol>
</li>
<li>Fetch more HTML data from any valid links on the page, as long as they&rsquo;re in the same domain (<code>https://victoria.dev</code> and not <code>https://github.com</code>, for instance)</li>
<li>Repeat step #2 until all of the links on the site have been checked</li>
</ol>
<p>Here&rsquo;s what the execution flow will look like:</p>
<figure class="screenshot">
<img src="execution_flow.png"
alt="A flow chart showing program execution"/>
</figure>
<p>As you can see, the nodes &ldquo;GET page&rdquo; -&gt; &ldquo;HTML&rdquo; -&gt; &ldquo;Parse links&rdquo; -&gt; &ldquo;Valid link&rdquo; -&gt; &ldquo;Check visited&rdquo; all form a loop. These are what enable our web crawler to continue crawling until all the links on the site have been accounted for in the &ldquo;Check visited&rdquo; node. When the crawler encounters links it&rsquo;s already checked, it will &ldquo;Stop.&rdquo; This loop will become more important in a moment.</p>
<p>For now, the question on everyone&rsquo;s mind (I hope): how do we make it fast?</p>
<h2 id="how-fast-can-you-do-the-thing">How fast can you do the thing?</h2>
<p>Here are some approximate timings for tasks performed on a typical PC:</p>
<table>
<thead>
<tr>
<th></th>
<th>Task</th>
<th>Time</th>
</tr>
</thead>
<tbody>
<tr>
<td>CPU</td>
<td>execute typical instruction</td>
<td>1/1,000,000,000 sec = 1 nanosec</td>
</tr>
<tr>
<td>CPU</td>
<td>fetch from L1 cache memory</td>
<td>0.5 nanosec</td>
</tr>
<tr>
<td>CPU</td>
<td>branch misprediction</td>
<td>5 nanosec</td>
</tr>
<tr>
<td>CPU</td>
<td>fetch from L2 cache memory</td>
<td>7 nanosec</td>
</tr>
<tr>
<td>RAM</td>
<td>Mutex lock/unlock</td>
<td>25 nanosec</td>
</tr>
<tr>
<td>RAM</td>
<td>fetch from main memory</td>
<td>100 nanosec</td>
</tr>
<tr>
<td>RAM</td>
<td>read 1MB sequentially from memory</td>
<td>250,000 nanosec</td>
</tr>
<tr>
<td>Disk</td>
<td>fetch from new disk location (seek)</td>
<td>8,000,000 nanosec (8ms)</td>
</tr>
<tr>
<td>Disk</td>
<td>read 1MB sequentially from disk</td>
<td>20,000,000 nanosec (20ms)</td>
</tr>
<tr>
<td>Network</td>
<td>send packet US to Europe and back</td>
<td>150,000,000 nanosec (150ms)</td>
</tr>
</tbody>
</table>
<p>Peter Norvig first published these numbers some years ago in <a href="http://norvig.com/21-days.html#answers">Teach Yourself Programming in Ten Years</a>. They typically crop up now and then in articles titled along the lines of, &ldquo;Latency numbers every developer should know.&rdquo;</p>
<p>Since computers and their components change year over year, the exact numbers shown above aren&rsquo;t the point. What these numbers help to illustrate is the difference, in orders of magnitude, between operations.</p>
<p>Compare the difference between fetching from main memory and sending a simple packet over the Internet. While both these operations occur in less than the blink of an eye (literally) from a human perspective, you can see that sending a simple packet over the Internet is over a million times slower than fetching from RAM. It&rsquo;s a difference that, in a single-thread program, can quickly accumulate to form troublesome bottlenecks.</p>
<h2 id="bottleneck-network-latency">Bottleneck: network latency</h2>
<p>The numbers above mean that the difference in time it takes to send something over the Internet compared to fetching data from main memory is over six orders of magnitude. Remember the loop in our execution chart? The &ldquo;GET page&rdquo; node, in which our crawler fetches page data over the network, is going to be <em>a million times slower</em> than the next slowest thing in the loop!</p>
<p>We don&rsquo;t need to run our prototype to see what that means in practical terms; we can estimate it. Let&rsquo;s take OWASP.org, which has upwards of 12,000 links, as an example:</p>
<div class="highlight"><pre class="chroma"><code class="language-text" data-lang="text"> 150 milliseconds
x 12,000 links
---------
1,800,000 milliseconds (30 minutes)
</code></pre></div><p>A whole half hour, just for the network tasks. It may even be much slower than that, since web pages are frequently much larger than a packet. This means that in our single-thread prototype web crawler, our biggest bottleneck is network latency. Why is this problematic?</p>
<h3 id="feedback-loops">Feedback loops</h3>
<p>I previously wrote about <a href="https://victoria.dev/blog/how-to-set-up-a-short-feedback-loop-as-a-solo-coder/">feedback loops</a>. In essence, in order to improve at doing anything, you first need to be able to get feedback from your last attempt. That way, you have the necessary information to make adjustments and get closer to your goal on your next iteration.</p>
<p>As a software developer, bottlenecks can contribute to long and inefficient feedback loops. If I&rsquo;m waiting on a process that&rsquo;s part of a CI/CD pipeline, in our bottlenecked web crawler example, I&rsquo;d be sitting around for a minimum of a half hour before learning whether or not changes in my last push were successful, or whether they broke <code>master</code> (hopefully <code>staging</code>).</p>
<p>Multiply a slow and inefficient feedback loop by many runs per day, over many days, and you&rsquo;ve got a slow and inefficient developer. Multiply that by many developers in an organization bottlenecked on the same process, and you&rsquo;ve got a slow and inefficient company.</p>
<h3 id="the-cost-of-bottlenecks">The cost of bottlenecks</h3>
<p>To add insult to injury, not only are you waiting on a bottlenecked process to run; you&rsquo;re also paying to wait. Take the serverless example - AWS Lambda, for instance. Here&rsquo;s a chart showing the cost of functions by compute time and CPU usage.</p>
<figure>
<img src="lambda-chart.png"
alt="Chart showing Total Lambda compute cost by function execution"/> <figcaption>
<p>Source: <a href="https://serverless.com/blog/understanding-and-controlling-aws-lambda-costs/">Understanding and Controlling AWS Lambda Costs</a></p>
</figcaption>
</figure>
<p>Again, the numbers change over the years, but the main concepts remain the same: the bigger the function and the longer its compute time, the bigger the cost. For applications taking advantage of serverless, these costs can add up dramatically.</p>
<p>Bottlenecks are a recipe for failure, for both productivity and the bottom line.</p>
<p>The good news is that bottlenecks are mostly unnecessary. If we know how to identify them, we can strategize our way out of them. To understand how, let&rsquo;s get some tacos.</p>
<h2 id="tacos-and-threading">Tacos and threading</h2>
<p>Everyone, meet Bob. He&rsquo;s a gopher who works at the taco stand down the street as the cashier. Say &ldquo;Hi,&rdquo; Bob.</p>
<div class="highlight"><pre class="chroma"><code class="language-text" data-lang="text">🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮
🌮 🌳
🌮
🌮 ╔══════════════╗
🌮 Hi I&#39;m Bob 🌳
🌮 ╚══════════════╝ \
🌮 🐹 🌮
🌮
🌮
🌮 🌳
🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮
</code></pre></div><p>Bob works very hard at being a cashier, but he&rsquo;s still just one gopher. The customers who frequent Bob&rsquo;s taco stand can eat tacos really quickly; but in order to get the tacos to eat them, they&rsquo;ve got to order them through Bob. Here&rsquo;s what our bottlenecked, single-thread taco stand currently looks like:</p>
<div class="highlight"><pre class="chroma"><code class="language-text" data-lang="text">🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮
🌮 🌳
🌮
🌮
🌮 🌳
🌮 🐹 🧑💵🧑💵🧑💵🧑💵🧑💵🧑💵🧑💵🧑💵🧑💵
🌮
🌮
🌮
🌮 🌳
🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮
</code></pre></div><p>As you can see, all the customers are queued up, right out the door. Poor Bob handles one customer&rsquo;s transaction at a time, starting and finishing with that customer completely before moving on to the next. Bob can only do so much, so our taco stand is rather inefficient at the moment. How can we make Bob faster?</p>
<p>We can try splitting the queue:</p>
<div class="highlight"><pre class="chroma"><code class="language-text" data-lang="text">🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮
🌮 🌳
🌮
🌮 🧑💵🧑💵🧑💵🧑💵
🌮 🌳
🌮 🐹
🌮
🌮 🧑💵🧑💵🧑💵🧑💵🧑💵
🌮
🌮 🌳
🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮
</code></pre></div><p>Now Bob can do some multitasking. For example, he can start a transaction with a customer in one queue; then, while that customer counts their bills, Bob can pop over to the second queue and get started there. This arrangement, known as a <a href="https://en.wikipedia.org/wiki/Concurrency_(computer_science)">concurrency model</a>, helps Bob go a little bit faster by jumping back and forth between lines. However, it&rsquo;s still just one Bob, which limits our improvement possibilities. If we were to make four queues, they&rsquo;d all be shorter; but Bob would be very thinly stretched between them. Can we do better?</p>
<p>We could get two Bobs:</p>
<div class="highlight"><pre class="chroma"><code class="language-text" data-lang="text">🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮
🌮 🌳
🌮
🌮 🌳
🌮 🐹 🧑💵🧑💵🧑💵🧑💵
🌮 🌳
🌮 🐹 🧑💵🧑💵🧑💵🧑💵🧑💵
🌮 🌳
🌮
🌮 🌳
🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮
</code></pre></div><p>With twice the Bobs, each can handle a queue of his own. This is our most efficient solution for our taco stand so far, since two Bobs can handle much more than one Bob can, even if each customer is still attended to one at a time.</p>
<p>We can do even better than that:</p>
<div class="highlight"><pre class="chroma"><code class="language-text" data-lang="text">🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮
🌮 🌳
🌮 🐹 🧑💵🧑💵
🌮 🌳
🌮 🐹 🧑💵🧑💵
🌮 🌳
🌮 🐹 🧑💵🧑💵
🌮 🌳
🌮 🐹 🧑💵🧑💵🧑💵
🌮 🌳
🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮🌮
</code></pre></div><p>With quadruple the Bobs, we have some very short queues, and a much more efficient taco stand. In computing, the concept of having multiple workers do tasks in parallel is called <a href="https://en.wikipedia.org/wiki/Multithreading_(computer_architecture)">multithreading</a>.</p>
<p>In Go, we can apply this concept using goroutines. Here are some illustrative snippets from my Go solution.</p>
<h2 id="setting-up-a-go-web-crawler">Setting up a Go web crawler</h2>
<p>In order to share data between our <a href="https://tour.golang.org/concurrency/1">goroutines</a>, we&rsquo;ll need to create some data structures. Our <code>Checker</code> structure will be shared, so it will have a <code>Mutex</code> (<a href="https://en.wikipedia.org/wiki/Mutual_exclusion">mutual exclusion</a>) to allow our goroutines to lock and unlock it. The <code>Checker</code> structure will also hold a list of <code>brokenLinks</code> results, and <code>visitedLinks</code>. The latter will be a map of strings to booleans, which we&rsquo;ll use to directly and efficiently check for visited links. By using a map instead of iterating over a list, our <code>visitedLinks</code> lookup will have a constant complexity of O(1) as opposed to a linear complexity of O(n), thus avoiding the creation of another bottleneck. For more on time complexity, see my <a href="https://victoria.dev/blog/a-coffee-break-introduction-to-time-complexity-of-algorithms/">coffee-break introduction to time complexity of algorithms</a> article.</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="kd">type</span> <span class="nx">Checker</span> <span class="kd">struct</span> <span class="p">{</span>
<span class="nx">startDomain</span> <span class="kt">string</span>
<span class="nx">brokenLinks</span> <span class="p">[]</span><span class="nx">Result</span>
<span class="nx">visitedLinks</span> <span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">bool</span>
<span class="nx">workerCount</span><span class="p">,</span> <span class="nx">maxWorkers</span> <span class="kt">int</span>
<span class="nx">sync</span><span class="p">.</span><span class="nx">Mutex</span>
<span class="p">}</span>
<span class="o">...</span>
<span class="c1">// Page allows us to retain parent and sublinks
</span><span class="c1"></span><span class="kd">type</span> <span class="nx">Page</span> <span class="kd">struct</span> <span class="p">{</span>
<span class="nx">parent</span><span class="p">,</span> <span class="nx">loc</span><span class="p">,</span> <span class="nx">data</span> <span class="kt">string</span>
<span class="p">}</span>
<span class="c1">// Result adds error information for the report
</span><span class="c1"></span><span class="kd">type</span> <span class="nx">Result</span> <span class="kd">struct</span> <span class="p">{</span>
<span class="nx">Page</span>
<span class="nx">reason</span> <span class="kt">string</span>
<span class="nx">code</span> <span class="kt">int</span>
<span class="p">}</span>
</code></pre></div><p>To extract links from HTML data, here&rsquo;s a parser I wrote on top of <a href="https://pkg.go.dev/golang.org/x/net/html?tab=doc">package <code>html</code></a>:</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="c1">// Extract links from HTML
</span><span class="c1"></span><span class="kd">func</span> <span class="nf">parse</span><span class="p">(</span><span class="nx">parent</span><span class="p">,</span> <span class="nx">data</span> <span class="kt">string</span><span class="p">)</span> <span class="p">([]</span><span class="kt">string</span><span class="p">,</span> <span class="p">[]</span><span class="kt">string</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">doc</span><span class="p">,</span> <span class="nx">err</span> <span class="o">:=</span> <span class="nx">html</span><span class="p">.</span><span class="nf">Parse</span><span class="p">(</span><span class="nx">strings</span><span class="p">.</span><span class="nf">NewReader</span><span class="p">(</span><span class="nx">data</span><span class="p">))</span>
<span class="k">if</span> <span class="nx">err</span> <span class="o">!=</span> <span class="kc">nil</span> <span class="p">{</span>
<span class="nx">fmt</span><span class="p">.</span><span class="nf">Println</span><span class="p">(</span><span class="s">&#34;Could not parse: &#34;</span><span class="p">,</span> <span class="nx">err</span><span class="p">)</span>
<span class="p">}</span>
<span class="nx">goodLinks</span> <span class="o">:=</span> <span class="nb">make</span><span class="p">([]</span><span class="kt">string</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="nx">badLinks</span> <span class="o">:=</span> <span class="nb">make</span><span class="p">([]</span><span class="kt">string</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="kd">var</span> <span class="nx">f</span> <span class="kd">func</span><span class="p">(</span><span class="o">*</span><span class="nx">html</span><span class="p">.</span><span class="nx">Node</span><span class="p">)</span>
<span class="nx">f</span> <span class="p">=</span> <span class="kd">func</span><span class="p">(</span><span class="nx">n</span> <span class="o">*</span><span class="nx">html</span><span class="p">.</span><span class="nx">Node</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="nx">n</span><span class="p">.</span><span class="nx">Type</span> <span class="o">==</span> <span class="nx">html</span><span class="p">.</span><span class="nx">ElementNode</span> <span class="o">&amp;&amp;</span> <span class="nf">checkKey</span><span class="p">(</span><span class="nb">string</span><span class="p">(</span><span class="nx">n</span><span class="p">.</span><span class="nx">Data</span><span class="p">))</span> <span class="p">{</span>
<span class="k">for</span> <span class="nx">_</span><span class="p">,</span> <span class="nx">a</span> <span class="o">:=</span> <span class="k">range</span> <span class="nx">n</span><span class="p">.</span><span class="nx">Attr</span> <span class="p">{</span>
<span class="k">if</span> <span class="nf">checkAttr</span><span class="p">(</span><span class="nb">string</span><span class="p">(</span><span class="nx">a</span><span class="p">.</span><span class="nx">Key</span><span class="p">))</span> <span class="p">{</span>
<span class="nx">j</span><span class="p">,</span> <span class="nx">err</span> <span class="o">:=</span> <span class="nf">formatURL</span><span class="p">(</span><span class="nx">parent</span><span class="p">,</span> <span class="nx">a</span><span class="p">.</span><span class="nx">Val</span><span class="p">)</span>
<span class="k">if</span> <span class="nx">err</span> <span class="o">!=</span> <span class="kc">nil</span> <span class="p">{</span>
<span class="nx">badLinks</span> <span class="p">=</span> <span class="nb">append</span><span class="p">(</span><span class="nx">badLinks</span><span class="p">,</span> <span class="nx">j</span><span class="p">)</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="nx">goodLinks</span> <span class="p">=</span> <span class="nb">append</span><span class="p">(</span><span class="nx">goodLinks</span><span class="p">,</span> <span class="nx">j</span><span class="p">)</span>
<span class="p">}</span>
<span class="k">break</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">for</span> <span class="nx">c</span> <span class="o">:=</span> <span class="nx">n</span><span class="p">.</span><span class="nx">FirstChild</span><span class="p">;</span> <span class="nx">c</span> <span class="o">!=</span> <span class="kc">nil</span><span class="p">;</span> <span class="nx">c</span> <span class="p">=</span> <span class="nx">c</span><span class="p">.</span><span class="nx">NextSibling</span> <span class="p">{</span>
<span class="nf">f</span><span class="p">(</span><span class="nx">c</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="nf">f</span><span class="p">(</span><span class="nx">doc</span><span class="p">)</span>
<span class="k">return</span> <span class="nx">goodLinks</span><span class="p">,</span> <span class="nx">badLinks</span>
<span class="p">}</span>
</code></pre></div><p>If you&rsquo;re wondering why I didn&rsquo;t use a more full-featured package for this project, I highly recommend <a href="https://www.theregister.co.uk/2016/03/23/npm_left_pad_chaos/">the story of <code>left-pad</code></a>. The short of it: more dependencies, more problems.</p>
<p>Here are snippets of the <code>main</code> function, where we pass in our starting URL and create a queue (or <a href="https://tour.golang.org/concurrency/2">channels</a>, in Go) to be filled with links for our goroutines to process.</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="kd">func</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="o">...</span>
<span class="nx">startURL</span> <span class="o">:=</span> <span class="nx">flag</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">&#34;url&#34;</span><span class="p">,</span> <span class="s">&#34;http://example.com&#34;</span><span class="p">,</span> <span class="s">&#34;full URL of site&#34;</span><span class="p">)</span>
<span class="o">...</span>
<span class="nx">firstPage</span> <span class="o">:=</span> <span class="nx">Page</span><span class="p">{</span>
<span class="nx">parent</span><span class="p">:</span> <span class="o">*</span><span class="nx">startURL</span><span class="p">,</span>
<span class="nx">loc</span><span class="p">:</span> <span class="o">*</span><span class="nx">startURL</span><span class="p">,</span>
<span class="p">}</span>
<span class="nx">toProcess</span> <span class="o">:=</span> <span class="nb">make</span><span class="p">(</span><span class="kd">chan</span> <span class="nx">Page</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="nx">toProcess</span> <span class="o">&lt;-</span> <span class="nx">firstPage</span>
<span class="kd">var</span> <span class="nx">wg</span> <span class="nx">sync</span><span class="p">.</span><span class="nx">WaitGroup</span>
</code></pre></div><p>The last significant piece of the puzzle is to create our workers, which we&rsquo;ll do here:</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="k">for</span> <span class="nx">i</span> <span class="o">:=</span> <span class="k">range</span> <span class="nx">toProcess</span> <span class="p">{</span>
<span class="nx">wg</span><span class="p">.</span><span class="nf">Add</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="nx">checker</span><span class="p">.</span><span class="nf">addWorker</span><span class="p">()</span>
<span class="err">🐹</span> <span class="k">go</span> <span class="nf">worker</span><span class="p">(</span><span class="nx">i</span><span class="p">,</span> <span class="o">&amp;</span><span class="nx">checker</span><span class="p">,</span> <span class="o">&amp;</span><span class="nx">wg</span><span class="p">,</span> <span class="nx">toProcess</span><span class="p">)</span>
<span class="k">if</span> <span class="nx">checker</span><span class="p">.</span><span class="nx">workerCount</span> <span class="p">&gt;</span> <span class="nx">checker</span><span class="p">.</span><span class="nx">maxWorkers</span> <span class="p">{</span>
<span class="nx">time</span><span class="p">.</span><span class="nf">Sleep</span><span class="p">(</span><span class="mi">1</span> <span class="o">*</span> <span class="nx">time</span><span class="p">.</span><span class="nx">Second</span><span class="p">)</span> <span class="c1">// throttle down
</span><span class="c1"></span> <span class="p">}</span>
<span class="p">}</span>
<span class="nx">wg</span><span class="p">.</span><span class="nf">Wait</span><span class="p">()</span>
</code></pre></div><p>A <a href="https://golang.org/pkg/sync/#WaitGroup">WaitGroup</a> does just what it says on the tin: it waits for our group of goroutines to finish. When they have, we&rsquo;ll know our Go web crawler has finished checking all the links on the site.</p>
<h2 id="did-we-do-the-thing-fast">Did we do the thing fast?</h2>
<p>Here&rsquo;s a comparison of the three programs I wrote on this journey. First, the prototype single-thread Python version:</p>
<div class="highlight"><pre class="chroma"><code class="language-text" data-lang="text">time python3 slow-link-check.py https://victoria.dev
real 17m34.084s
user 11m40.761s
sys 0m5.436s
</code></pre></div><p>This finished crawling my website in about seventeen-and-a-half minutes, which is rather long for a site at least an order of magnitude smaller than OWASP.org.</p>
<p>The multithreaded Python version did a bit better:</p>
<div class="highlight"><pre class="chroma"><code class="language-text" data-lang="text">time python3 hydra.py https://victoria.dev
real 1m13.358s
user 0m13.161s
sys 0m2.826s
</code></pre></div><p>My multithreaded Python program (which I dubbed <a href="https://github.com/victoriadrake/hydra-link-checker">Hydra</a>) finished in one minute and thirteen seconds.</p>
<p>How did Go do?</p>
<div class="highlight"><pre class="chroma"><code class="language-text" data-lang="text">time ./go-link-check --url=https://victoria.dev
real 0m7.926s
user 0m9.044s
sys 0m0.932s
</code></pre></div><p>At just under eight seconds, I found the Go version to be extremely palatable.</p>
<h2 id="breaking-bottlenecks">Breaking bottlenecks</h2>
<p>As fun as it is to simply enjoy the speedups, we can directly relate these results to everything we&rsquo;ve learned so far. Consider taking a process that used to soak up seventeen minutes and turning it into an eight-second-affair instead. Not only will that give developers a much shorter and more efficient feedback loop, it will give companies the ability to develop faster, and thus grow more quickly - while costing less. To drive the point home: a process that runs in seventeen-and-a-half minutes when it could take eight seconds will also cost over a hundred and thirty times as much to run!</p>
<p>A better work day for developers, and a better bottom line for companies. There&rsquo;s a lot of benefit to be had in making functions, code, and processes as efficient as possible - by breaking bottlenecks.</p>
Command line tricks for managing your messy open source repositoryhttps://victoria.dev/blog/command-line-tricks-for-managing-your-messy-open-source-repository/
Mon, 17 Feb 2020 08:05:06 -0500hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/command-line-tricks-for-managing-your-messy-open-source-repository/A handy toolbox for the terminal to help open source maintainers make their projects sparkle.
]]>
<p>Effective collaboration, especially in open source software development, starts with effective organization. To make sure that nothing gets missed, the general rule, &ldquo;one issue, one pull request&rdquo; is a nice rule of thumb.</p>
<p>Instead of opening an issue with a large scope like, &ldquo;Fix all the broken links in the documentation,&rdquo; open source projects will have more luck attracting contributors with several smaller and more manageable issues. In the preceding example, you might scope broken links by section or by page. This allows more contributors to jump in and dedicate small windows of their time, rather than waiting for one person to take on a larger and more tedious contribution effort.</p>
<p>Smaller scoped issues also help project maintainers see where work has been completed and where it hasn&rsquo;t. This reduces the chances that some part of the issue is missed, assumed to be completed, and later leads to bugs or security vulnerabilities.</p>
<p>That&rsquo;s all well and good; but what if you&rsquo;ve already opened several massively-scoped issues, some PRs have already been submitted or merged, and you currently have no idea where the work started or stopped?</p>
<p>It&rsquo;s going to take a little sorting out to get the state of your project back under control. Thankfully, there are a number of command line tools to help you scan, sort, and make sense of a messy repository. Here&rsquo;s a small selection of ones I use.</p>
<p>Jump to:</p>
<ul>
<li><a href="#interactive-search-and-replace-with-vim">Interactive search-and-replace with <code>vim</code></a></li>
<li><a href="#find-dead-links-in-markdown-files-with-a-node-module">Find dead links in Markdown files with a node module</a></li>
<li><a href="#list-subdirectories-with-or-without-a-git-repository-with-find">List subdirectories with or without a git repository with <code>find</code></a></li>
<li><a href="#pull-multiple-git-repositories-from-a-list-with-xargs">Pull multiple git repositories from a list with <code>xargs</code></a></li>
<li><a href="#list-issues-by-number-with-jot">List issues by number with <code>jot</code></a></li>
<li><a href="#cli-powered-open-source-organization">CLI-powered open source organization</a></li>
</ul>
<h2 id="interactive-search-and-replace-with-vim">Interactive search-and-replace with <code>vim</code></h2>
<p>You can open a file in Vim, then interactively search and replace with:</p>
<div class="highlight"><pre class="chroma"><code class="language-vim" data-lang="vim"><span class="p">:</span>%<span class="nx">s</span><span class="sr">/\&lt;word\&gt;/</span><span class="nx">newword</span>/<span class="nx">gc</span><span class="err">
</span></code></pre></div><p>The <code>%</code> indicates to look in all lines of the current file; <code>s</code> is for substitute; <code>\&lt;word\&gt;</code> matches the whole word; and the <code>g</code> for &ldquo;global&rdquo; is for every occurrence. The <code>c</code> at the end will let you view and confirm each change before it&rsquo;s made. You can run it automatically, and much faster, without <code>c</code>; however, you put yourself at risk of complicating things if you&rsquo;ve made a pattern-matching error.</p>
<h2 id="find-dead-links-in-markdown-files-with-a-node-module">Find dead links in Markdown files with a node module</h2>
<p>The <a href="https://github.com/tcort/markdown-link-check">markdown-link-check</a> node module has a great <a href="https://github.com/tcort/markdown-link-check#command-line-tool">CLI buddy</a>.</p>
<p>I use this so often I turned it into a <a href="https://victoria.dev/blog/how-to-do-twice-as-much-with-half-the-keystrokes-using-.bashrc/#bash-functions">Bash alias function</a>. To do the same, add this to your <code>.bashrc</code>:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh"><span class="c1"># Markdown link check in a folder, recursive</span>
<span class="k">function</span> mlc <span class="o">()</span> <span class="o">{</span>
find <span class="nv">$1</span> -name <span class="se">\*</span>.md -exec markdown-link-check -p <span class="o">{}</span> <span class="se">\;</span>
<span class="o">}</span>
</code></pre></div><p>Then run with <code>mlc &lt;filename&gt;</code>.</p>
<h2 id="list-subdirectories-with-or-without-a-git-repository-with-find">List subdirectories with or without a git repository with <code>find</code></h2>
<p>Print all subdirectories that are git repositories, or in other words, have a <code>.git</code> in them:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">find . -maxdepth <span class="m">1</span> -type d -exec <span class="nb">test</span> -e <span class="s1">&#39;{}/.git&#39;</span> <span class="s1">&#39;;&#39;</span> -printf <span class="s2">&#34;is git repo: %p\n&#34;</span>
</code></pre></div><p>To print all subdirectories that are not git repositories, negate the test with <code>!</code>:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">find . -maxdepth <span class="m">1</span> -type d -exec <span class="nb">test</span> <span class="s1">&#39;!&#39;</span> -e <span class="s1">&#39;{}/.git&#39;</span> <span class="s1">&#39;;&#39;</span> -printf <span class="s2">&#34;not git repo: %p\n&#34;</span>
</code></pre></div><h2 id="pull-multiple-git-repositories-from-a-list-with-xargs">Pull multiple git repositories from a list with <code>xargs</code></h2>
<p>I initially used this as part of <a href="https://victoria.dev/blog/how-to-set-up-a-fresh-ubuntu-desktop-using-only-dotfiles-and-bash-scripts/">automatically re-creating my laptop with Bash scripts</a>, but it&rsquo;s pretty handy when you&rsquo;re working with cloud instances or Dockerfiles.</p>
<p>Given a file, <code>repos.txt</code> with a repository’s SSH link on each line (and your SSH keys set up), run:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">xargs -n1 git clone &lt; repos.txt
</code></pre></div><p>If you want to pull and push many repositories, I previously wrote about <a href="https://victoria.dev/blog/how-to-write-bash-one-liners-for-cloning-and-managing-github-and-gitlab-repositories/">how to use a Bash one-liner to manage your repositories</a>.</p>
<h2 id="list-issues-by-number-with-jot">List issues by number with <code>jot</code></h2>
<p>I&rsquo;m a co-author and maintainer for the <a href="https://github.com/OWASP/wstg/">OWASP Web Security Testing Guide</a> repository where I recently took one large issue (yup, it was &ldquo;Fix all the broken links in the documentation&rdquo; - how&rsquo;d you guess?) and broke it up into several smaller, more manageable issues. A whole thirty-seven smaller, more manageable issues.</p>
<p>I wanted to enumerate all the issues that the original one became, but the idea of typing out thirty-seven issue numbers (#275 through #312) seemed awfully tedious and time-consuming. So, in natural programmer fashion, I spent the same amount of time I would have used to type out all those numbers and crafted a way to automate it instead.</p>
<p>The <code>jot</code> utility (<code>apt install athena-jot</code>) is a tiny tool that&rsquo;s a big help when you want to print out some numbers. Just tell it how many you want, and where to start and stop.</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh"><span class="c1"># jot [ reps [ begin [ end ] ] ]</span>
jot <span class="m">37</span> <span class="m">275</span> <span class="m">312</span>
</code></pre></div><p>This prints each number, inclusively, from 275 to 312 on a new line. To make these into issue number notations that GitHub and many other platforms automatically recognize and turn into links, you can pipe the output to <code>awk</code>.</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">jot <span class="m">37</span> <span class="m">275</span> <span class="m">312</span> <span class="p">|</span> awk <span class="s1">&#39;{printf &#34;#&#34;$0&#34;, &#34;}&#39;</span>
<span class="c1">#275, #276, #277, #278, #279, #280, #281, #282, #283, #284, #285, #286, #287, #288, #289, #290, #291, #292, #293, #295, #296, #297, #298, #299, #300, #301, #302, #303, #304, #305, #306, #307, #308, #309, #310, #311, #312</span>
</code></pre></div><p>You can also use <code>jot</code> to generate random or redundant data, mainly for development or testing purposes.</p>
<h2 id="cli-powered-open-source-organization">CLI-powered open source organization</h2>
<p>A well-organized open source repository is a well-maintained open source project. Save this post for handy reference, and use your newfound CLI superpowers for good! 🚀</p>
The past ten years, or, how to get better at anythinghttps://victoria.dev/blog/the-past-ten-years-or-how-to-get-better-at-anything/
Tue, 31 Dec 2019 08:27:31 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/the-past-ten-years-or-how-to-get-better-at-anything/Thoughts on learning software development, technical blogging, and what the past ten years have taught me.
]]>
<p>If you want to get better at anything:</p>
<ol>
<li>Solve your own problems,</li>
<li>Write about it,</li>
<li>Teach others.</li>
</ol>
<h2 id="1-searching-a-decade-ago">1. Searching, a decade ago</h2>
<p>I was a young graduate with newly-minted freedoms, and I was about to fall in love. I had plenty of imagination, a couple handfuls of tenacity, and no sense of direction at all.</p>
<p>For much of my youth, when I encountered a problem, I just sort of bumped up against it. I tried using whatever was in my head from past experiences or my own imagination to find a solution. For some problems, like managing staff duties at work, my experience was sufficient guidance. For other, more complicated problems, it wasn&rsquo;t.</p>
<p>When you don&rsquo;t have a wealth of experience to draw upon, relying on it is a poor strategy. Like many people at my age then, I thought I knew enough. Like many people at my age now, I recognize how insufficient &ldquo;enough&rdquo; can be. A lack of self-directed momentum meant being dragged in any direction life&rsquo;s currents took me. When falling in love turned out to mean falling from a far greater height than I had anticipated, I tumbled on, complacent. When higher-ups at work handed me further responsibilities, I accepted them without considering if I wanted them at all. When, inevitably, life became more and more complicated, I encountered even more problems I didn&rsquo;t know how to solve. I felt stuck.</p>
<p>Though I was morbidly embarrassed about it at the time, I&rsquo;m not shy to say it now. At one point, it had to be pointed out to me that I could search the Internet for the solution to any of my problems. Anything I wanted to solve - interactions with people at work, a floundering relationship, or the practicalities of filing taxes - I was lucky enough to have the greatest collection of human knowledge ever assembled at my disposal.</p>
<p>Instead of bumbling along in the floatsam of my own trial and error, I started to take advantage of the collective experiences of all those who have been here before me. They weren&rsquo;t always right, and I often found information only somewhat similar to my own experience. Still, it always got me moving in the right direction. Eventually, I started to steer.</p>
<p>There&rsquo;s a learning curve, even when just searching for a problem. Distilling the jumble of confusion in your head to the right search terms is a learned skill. It helped me to understand <a href="https://www.google.com/search/howsearchworks/crawling-indexing/">how search engines like Google work</a>:</p>
<blockquote>
<p>We use software known as web crawlers to discover publicly available webpages. Crawlers look at webpages and follow links on those pages, much like you would if you were browsing content on the web. They go from link to link and bring data about those webpages back to Google’s servers&hellip;</p>
<p>When crawlers find a webpage, our systems render the content of the page, just as a browser does. We take note of key signals — from keywords to website freshness — and we keep track of it all in the Search index.</p>
</blockquote>
<p>Sometimes, I find what I need by using the right keyword. Other times, I discover the keyword by searching for text that might surround it on the content of the page. For software development, I search for the weirdest word or combination of words attached to what I&rsquo;m trying to learn. I rarely find whole solutions in my search results, but I always find direction for solving the problem myself.</p>
<p>Solving my own problems, even just a few little ones at a time, gave me confidence and built momentum. I began to pursue the experiences I wanted, instead of waiting for experiences to happen to me.</p>
<h2 id="2-updating-the-internet-some-years-ago">2. Updating the Internet, some years ago</h2>
<p>I&rsquo;d solved myself out of a doomed relationship and stagnant job. I found myself, rather gleefully, country-hopping with just <a href="https://heronebag.com">one backpack</a> of possessions. I met, though I didn&rsquo;t know it at the time, my future husband. I found a new sense of freedom, of having options, that I knew I never wanted to give up. I had to find a means to sustain myself by working remotely.</p>
<p>When I first tried to make a living on the Internet, I felt like a right amateur. Sitting on the bed, hunched over my laptop, I started a crappy Wordpress blog with a modified theme that didn&rsquo;t entirely work. I posted about how I tried and failed to start a dropshipping business. My site was terrible, and I knew it. My first forays into being a &ldquo;real&rdquo; developer were to solve my own problems: how to get my blog working, how to set up a custom domain, how to get and use a security certificate. I found some guidance in blogs and answers that others had written, but much of it was outdated, or not entirely correct. Still, it helped me.</p>
<p>I can&rsquo;t imagine a world in which people did nothing to pass on their knowledge to future generations. Our stories are all we have beyond instinct and determination.</p>
<p>I stopped posting about dropshipping and started writing about the technical problems I was solving. I wrote about what I tried, and ultimately what worked. I started hearing from people who thanked me for explaining the solution they were looking for. Even in posts where all I&rsquo;d done was link to the correct set of instructions on some other website, people thanked me for leading them to it. I still thought my website was terrible, but I realized I was doing something useful. The more problems I solved, the better I got at solving them, and the more I wrote about it in turn.</p>
<p>One day, someone offered me money for one of my solutions. To my great delight, they weren&rsquo;t the last to do so.</p>
<p>As I built up my skills, I started taking on more challenging offers to solve problems. I discovered, as others have before me, that especially in software development, not every solution is out there waiting for you. The most frustrating part of working on an unsolved problem is that, at least to your knowledge, there&rsquo;s no one about to tell you how to solve it. If you&rsquo;re lucky, you&rsquo;ve at least got a heading from someone&rsquo;s cold trail in an old blog post. If you&rsquo;re lucky and tenacious, you&rsquo;ll find a working solution.</p>
<p>Don&rsquo;t leave it scribbled in the corner of a soon-forgotten notepad, never to ease the path of someone who comes along later. Update that old blog post by commenting on it, or sending a note to the author. Put your solution on the Internet, somewhere. Ideally, blog about it yourself in as much detail as you can recall. Some of the people who find your post might have the same problem, and might even be willing to pay you to solve it. And, if my own experience and some scattered stories hold true, one of the people to who&rsquo;ll come along later, looking for that same solution, will be you.</p>
<h2 id="3-paying-it-forwards-backwards-and-investing-two-years-ago">3. Paying it forwards, backwards, and investing; two years ago</h2>
<p>Already being familiar with how easy it is to stop steering and start drifting, I sought new ways to challenge myself and my skills. I wanted to do more than just sustain my lifestyle. I wanted to offer something to others; something that mattered.</p>
<p>A strange thing started happening when I decided, deliberately, to write an in-depth technical blog about topics I was only beginning to become familiar with. I started to deeply understand some fundamental computer science topics - and trust me, that was strange enough - but odder than that was that others started to see me as a resource. People asked me questions because they thought I had the answers. I didn&rsquo;t, at least, not always - but I knew enough now to not let that stop me. I went to find the answers, to test and understand them, and then I wrote about them to teach those who had asked. I hardly noticed, along the way, that I was learning too.</p>
<p>When someone&rsquo;s outdated blog post leads you to an eventual solution, you can pay them back by posting an update, or blogging about it yourself. When you solve an unsolved problem, you pay it forward by recording that solution for the next person who comes along (sometimes you). In either case, by writing about it - honestly, and with your best effort to be thorough and correct - you end up investing in yourself.</p>
<p>Explaining topics you&rsquo;re interested in to other people helps you find the missing pieces in your own knowledge. It helps you fill those gaps with learning, and integrate the things you learn into a new, greater understanding. Teaching something to others helps you become better at it yourself. Getting better at something - anything - means you have more to offer.</p>
<h2 id="the-past-decade-and-the-next-decade">The past decade, and the next decade</h2>
<p>It&rsquo;s the end of a decade. I went from an aimless drift through life to being captain of my ship. I bettered my environment, learned new skills, made myself a resource, and became a wife to my best friend. I&rsquo;m pretty happy with all of it.</p>
<p>It&rsquo;s the end of 2019. Despite a whole lot of life happening just this year, I&rsquo;ve written one article on this blog for each week since I started in July. That&rsquo;s 23 articles for 23 weeks, plus one Christmas bonus. I hear from people almost every day who tell me that an article I wrote was helpful to them, and it makes me happy and proud to think that I&rsquo;ve been doing something that matters. The first week of January will make this blog two years old.</p>
<p>The past several months have seen me change tack, slightly. I&rsquo;ve become very interested in cybersecurity, and have been lending my skills to the Open Web Application Security Project. I&rsquo;m now an author and maintainer of the <a href="https://github.com/OWASP/wstg">Web Security Testing Guide</a>, version 5. I&rsquo;m pretty happy with that, too.</p>
<p>Next year, I&rsquo;ll be posting a little less, though writing even more, as I pursue an old dream of publishing a book, as well as develop my new cybersecurity interests. I aim to get better at quite a few things. Thankfully, I know just how to do it - and now, so do you:</p>
<ol>
<li>Solve your own problems,</li>
<li>Write about it,</li>
<li>Teach others.</li>
</ol>
<p>Have a very happy new decade, dear reader.</p>
Healthy habits for good cybersecurityhttps://victoria.dev/blog/healthy-habits-for-good-cybersecurity/
Thu, 26 Dec 2019 08:27:31 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/healthy-habits-for-good-cybersecurity/A few helpful cybersecurity resolutions to take into the new year. Happy holidays!
]]>
<p>In a similar fashion to everyone getting the flu now and again, the risk of catching a cyberattack is a common one. Both a sophisticated social engineering attack or grammatically-lacking email phishing scam can cause real damage. No one who communicates over the Internet is immune.</p>
<p>Like proper hand washing and getting a flu shot, good habits can lower your risk of inadvertently allowing cybergerms to spread. Since the new year is an inspiring time for beginning new habits, I offer a few suggestions for ways to help protect yourself and those around you.</p>
<h2 id="1-get-a-follow-up">1. Get a follow-up</h2>
<p>Recognizing a delivery method for cyberattack is getting more difficult. Messages with malicious links do not always come from strangers. They may appear to be routine communications, or seem to originate from someone you know or work with. Attacks use subtle but deeply-engrained cognitive biases to override your common sense. Your natural response ensures you click.</p>
<p>Thankfully, there&rsquo;s a simple low-tech habit you can use to deter these attacks: before you act, follow-up.</p>
<p>You may get an email from a friend that needs help, or from your boss who&rsquo;s about to get on a plane. It could be as enticing and mysterious as a direct message from an acquaintance who sends a link asking, &ldquo;Lol. Is this you?&rdquo; It takes presence of mind to override the panic these attacks prey on, but the deterrent itself is quick and straightforward. Send a text message, pick up the phone and call, or walk down the hall, and ask, &ldquo;Did you send me this?&rdquo;</p>
<p>If the message is genuine, there&rsquo;s no harm in a few extra minutes to double check. If it&rsquo;s not, you&rsquo;ll immediately alert the originating party that they may be compromised, and you may have deterred a cyberattack!</p>
<h2 id="2-use-and-encourage-others-to-use-end-to-end-encrypted-messaging">2. Use, and encourage others to use, end-to-end encrypted messaging</h2>
<p>When individuals in a neighborhood get the flu shot, others in that neighborhood are safer for it. Encryption is similarly beneficial. Encourage your friends, coworkers, and Aunt Matilda to switch to an app like Signal. By doing so, you&rsquo;ll reduce everyone&rsquo;s exposure to more exploitable messaging systems.</p>
<p>This doesn&rsquo;t mean that you must stop using other methods of communication entirely. Instead, think of it as a hierarchy. Use Signal for important messages that should be trusted, like requests for money or making travel arrangements. Use all other methods of messaging, like SMS or social sites, only for &ldquo;unimportant&rdquo; communications. Now, if requests or links that seem important come to you through your unimportant methods, you&rsquo;ll be all the more likely to second-guess them.</p>
<h2 id="3-dont-put-that-dirty-usb-plug-into-your-">3. Don&rsquo;t put that dirty USB plug into your ***</h2>
<p>You wouldn&rsquo;t brush your teeth with a toothbrush you found on the sidewalk. Why would you plug in a USB device if you don&rsquo;t know where it&rsquo;s been?! While we might ascribe <a href="https://en.wikipedia.org/wiki/2008_cyberattack_on_United_States">putting a random found USB drive in your computer</a> to a clever exploitation of natural human curiosity, we&rsquo;re no sooner likely to suspect using <a href="https://www.howtogeek.com/444267/how-safe-are-public-charging-stations/">a public phone-charging station</a> or <a href="https://www.theverge.com/2019/8/15/20807854/apple-mac-lightning-cable-hack-mike-grover-mg-omg-cables-defcon-cybersecurity">a USB cable</a> we bought ourselves. Even seemingly-innocuous USB <a href="https://www.cbsnews.com/news/why-your-usb-device-is-a-security-risk/">peripherals</a> or <a href="https://www.us-cert.gov/ncas/current-activity/2010/03/08/Energizer-DUO-USB-Battery-Charger-Software-Allows-Remote-System">rechargeable</a> devices can be a risk.</p>
<p>Unlike email and some file-sharing services that scan and filter files before they reach your computer, plugging in via USB is as direct and <a href="https://www.wired.com/2014/07/usb-security/">unprotected</a> as connection gets. Once this connection is made, the user doesn&rsquo;t need to do anything else for a whole host of bad things to happen. Through USB connections, problems like malware and ransomware can easily infect your computer or phone.</p>
<p>There&rsquo;s no need to swear off the convenience of USB connectivity, or to avoid these devices altogether. Instead of engaging in questionable USB behavior, don&rsquo;t cheap out on USB devices and cables. If it&rsquo;s going to get plugged into your computer, ensure you&rsquo;re being extra cautious. Buy it from the manufacturer (like the Apple Store) or from a reputable company or reseller with supply chain control. When juicing up USB-rechargeables, don&rsquo;t plug them into your computer. Use <a href="https://heronebag.com/blog/40-hours-drive-time-my-road-trip-charging-essentials/">a wall charger with a USB port</a> instead.</p>
<h2 id="practice-healthy-cybersecurity-habits">Practice healthy cybersecurity habits</h2>
<p>Keeping your devices healthy and happy is a matter of practicing good habits. Like battling the flu, good habits can help protect yourself and those around you. Incorporate some conscientious cybersecurity practices in your new year resolutions - or start them right away.</p>
<p>Have a safe and happy holiday!</p>
Concurrency, parallelism, and the many threads of Santa Claus 🎅https://victoria.dev/blog/concurrency-parallelism-and-the-many-threads-of-santa-claus/
Mon, 23 Dec 2019 19:29:01 -0500hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/concurrency-parallelism-and-the-many-threads-of-santa-claus/A seasonal exploration of the difference between concurrent, parallel, and multithreaded processes.
]]>
<p>Consider the following: Santa brings toys to all the good girls and boys.</p>
<p>There are <a href="https://en.wikipedia.org/wiki/Demographics_of_the_world#Current_population_distribution">7,713,468,100 people</a> in the world in 2019, <a href="https://en.wikipedia.org/wiki/Demographics_of_the_world#Age_structure">around 26.3%</a> of which are under 15 years old. This works out to 2,028,642,110 children (persons under 15 years of age) in the world this year.</p>
<p>Santa doesn&rsquo;t seem to visit children of every religion, so we&rsquo;ll generalize and only include Christians and non-religious folks. Collectively that makes up <a href="https://en.wikipedia.org/wiki/List_of_religious_populations#Adherent_estimates_in_2019">approximately 44.72%</a> of the population. If we assume that all kids take after their parents, then 907,208,751.6 children would appear to be Santa-eligible.</p>
<p>What percentage of those children are good? It&rsquo;s impossible to know; however, we can work on a few assumptions. One is that Santa Claus functions more on optimism than economics and would likely have prepared for the possibility that every child is a good child in any given year. Thus, he would be prepared to give a toy to every child. Let&rsquo;s assume it&rsquo;s been a great year and that all 907,208,751.6 children are getting toys.</p>
<p>That&rsquo;s a lot of presents, and, as we know, they&rsquo;re all made by Santa&rsquo;s elves at his North <del>China</del> Pole workshop. Given that there are 365 days in a year and one of them is Christmas, let&rsquo;s assume that Santa&rsquo;s elves collectively have 364 days to create and gift wrap 907,208,752 (rounded up) presents. That works out to 2,492,331.74 presents per day.</p>
<p>Almost two-and-a-half million presents per day is a heavy workload for any workshop. Let&rsquo;s look at two paradigms that Santa might employ to hit this goal: concurrency, and parallelism.</p>
<h2 id="a-sequential-process">A sequential process</h2>
<p>Suppose that Santa&rsquo;s workshop is staffed by exactly one, very hard working, very tired elf. The production of one present involves four steps:</p>
<ol>
<li>Cutting wood</li>
<li>Assembly and glueing</li>
<li>Painting</li>
<li>Gift-wrapping</li>
</ol>
<p>With a single elf, only one step for one present can be happening at any instance in time. If the elf were to produce one present at a time from beginning to end, that process would be executed <em>sequentially</em>. It&rsquo;s not the most efficient method for producing two-and-a-half million presents per day; for instance, the elf would have to wait around doing nothing while the glue on the present was drying before moving on to the next step.</p>
<p><img src="sequence.png" alt="Illustration of sequence"></p>
<h2 id="concurrency">Concurrency</h2>
<p>In order to be more efficient, the elf works on all presents <em>concurrently</em>.</p>
<p>Instead of completing one present at a time, the elf first cuts all the wood for all the toys, one by one. When everything is cut, the elf assembles and glues the toys together, one after the other. This <a href="https://en.wikipedia.org/wiki/Concurrent_computing">concurrent processing</a> means that the glue from the first toy has time to dry (without needing more attention from the elf) while the remaining toys are glued together. The same goes for painting, one toy at a time, and finally wrapping.</p>
<p><img src="concurrency.png" alt="Illustration of concurrency"></p>
<p>Since one elf can only do one task at a time, a single elf is using the day as efficiently as possible by concurrently producing presents.</p>
<h2 id="parallelism">Parallelism</h2>
<p>Hopefully, Santa&rsquo;s workshop has more than just one elf. With more elves, more toys can be built simultaneously over the course of a day. This simultaneous work means that the presents are being produced in <em>parallel</em>. <a href="https://en.wikipedia.org/wiki/Parallel_computing">Parallel processing</a> carried out by multiple elves means more work happens at the same time.</p>
<p><img src="parallel.png" alt="Illustration of parallel processes"></p>
<p>Elves working in parallel can also employ concurrency. One elf can still tackle only one task at a time, so it&rsquo;s most efficient to have multiple elves concurrently producing presents.</p>
<p>Of course, if Santa&rsquo;s workshop has, say, two-and-a-half million elves, each elf would only need to finish a maximum of one present per day. In this case, working sequentially doesn&rsquo;t detract from the workshop&rsquo;s efficiency. There would still be 7,668.26 elves left over to fetch coffee and lunch.</p>
<h2 id="santa-claus-and-threading">Santa Claus, and threading</h2>
<p>After all the elves&rsquo; hard work is done, it&rsquo;s up to Santa Claus to deliver the presents &ndash; all 907,208,752 of them.</p>
<p>Santa doesn&rsquo;t need to make a visit to every kid; just to the one household tree. So how many trees does Santa need to visit? Again with broad generalization, we&rsquo;ll say that the average number of children per household worldwide is <a href="https://en.wikipedia.org/wiki/Demographics_of_the_world#Total_fertility_rate">2.45, based on the year&rsquo;s predicted fertility rates</a>. That makes 370,289,286.4 houses to visit. Let&rsquo;s round that up to 370,289,287.</p>
<p>How long does Santa have? The lore says one night, which means one earthly rotation, and thus 24 hours. <a href="https://www.noradsanta.org/">NORAD confirms</a>.</p>
<p>This means Santa must visit 370,289,287 households in 24 hours (86,400 seconds), at a rate of 4,285.75 households per second, nevermind the time it takes to put presents under the tree and grab a cookie.</p>
<p>Clearly, Santa doesn&rsquo;t exist in our dimension. This is especially likely given that despite being chubby and plump, he fits down a chimney (with a lit fire, while remaining unhurt) carrying a sack of toys containing presents for all the household&rsquo;s children. We haven&rsquo;t even considered the fact that his sleigh carries enough toys for every believing boy and girl around the world, and flies.</p>
<p>Does Santa exist outside our rules of physics? How could one entity manage to travel around the world, delivering packages, in under 24 hours at a rate of 4,285.75 households per second, and still have time for milk and cookies and kissing mommy?</p>
<p>One thing is certain: Santa uses the Internet. No other technology has yet enabled packages to travel quite so far and quite so quickly. Even so, attempting to reach upwards of four thousand households per second is no small task, even with even the best gigabit Internet hookup the North Pole has to offer. How might Santa increase his efficiency?</p>
<p>There&rsquo;s clearly only one logical conclusion to this mystery: Santa Claus is a multithreaded process.</p>
<h2 id="a-single-thread">A single thread</h2>
<p>Let&rsquo;s work outward. Think of a <a href="https://en.wikipedia.org/wiki/Thread_(computing)">thread</a> as one particular task, or the most granular sequence of instructions that Santa might execute. One thread might execute the task, <code>put present under tree</code>. A thread is a component of a process, in this case, Santa&rsquo;s process of delivering presents.</p>
<p>If Santa Claus is <a href="https://en.wikipedia.org/wiki/Thread_(computing)#Single_threading">single-threaded</a>, he, as a process, would only be able to accomplish one task at a time. Since he&rsquo;s old and a bit forgetful, he probably has a set of instructions for delivering presents, as well as a schedule to abide by. These two things guide Santa&rsquo;s thread until his process is complete.</p>
<p><img src="single.png" alt="A single Santa Claus emoji"></p>
<p>Single-threaded Santa Claus might work something like this:</p>
<ol>
<li>Land sleigh at Timmy&rsquo;s house</li>
<li>Get Timmy&rsquo;s present from sleigh</li>
<li>Enter house via chimney</li>
<li>Locate Christmas tree</li>
<li>Place Timmy&rsquo;s present under Christmas tree</li>
<li>Exit house via chimney</li>
<li>Take off in sleigh</li>
</ol>
<p>Rinse and repeat&hellip; another 370,289,286 times.</p>
<h2 id="multithreading">Multithreading</h2>
<p><a href="https://en.wikipedia.org/wiki/Thread_(computing)#Multithreading">Multithreaded</a> Santa Claus, by contrast, is the <a href="https://dc.fandom.com/wiki/Jonathan_Osterman_(Watchmen)">Doctor Manhattan</a> of the North Pole. There&rsquo;s still only one Santa Claus in the world; however, he has the amazing ability to multiply his consciousness and accomplish multiple instruction sets of tasks simultaneously. These additional task workers, or worker threads, are created and controlled by the main process of Santa delivering presents.</p>
<p><img src="cover.png" alt="Multiple Santa threads"></p>
<p>Each worker thread acts independently to complete its instructions. Since they all belong to Santa&rsquo;s consciousness, they share Santa&rsquo;s memory and know everything that Santa knows, including what planet they&rsquo;re running around on, and where to get the presents from.</p>
<p>With this shared knowledge, each thread is able to execute its set of instructions in parallel with the other threads. This multithreaded parallelism makes the one and only Santa Claus as efficient as possible.</p>
<p>If an average present delivery run takes an hour, Santa need only spawn 4,286 worker threads. With each making one delivery trip per hour, Santa will have completed all 370,289,287 trips by the end of the night.</p>
<p>Of course, in theory, Santa could even spawn 370,289,287 worker threads, each flying to one household to deliver presents for all the children in it! That would make Santa&rsquo;s process extremely efficient, and also explain how he manages to consume all those milk-dunked cookies without getting full. 🥛🍪🍪🍪</p>
<h2 id="an-efficient-and-merry-multithreaded-christmas">An efficient and merry multithreaded Christmas</h2>
<p>Thanks to modern computing, we now finally understand how Santa Claus manages the seemingly-impossible task of delivering toys to good girls and boys the world-over. From my family to yours, I hope you have a wonderful Christmas. Don&rsquo;t forget to hang up your stockings on the router shelf.</p>
<p>Of course, none of this explains how reindeer manage to fly.</p>
Word bugs in software documentation and how to fix themhttps://victoria.dev/blog/word-bugs-in-software-documentation-and-how-to-fix-them/
Wed, 18 Dec 2019 09:01:23 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/word-bugs-in-software-documentation-and-how-to-fix-them/A collection of mistakes that make documentation less awesome than it could be.
]]>
<p>I&rsquo;ve been an editor longer than I&rsquo;ve been a developer, so this topic for me is a real root issue. 🥁 When I see a great project with poorly-written docs, it hits close to <code>/home</code>. Okay, okay, I&rsquo;m done.</p>
<p>I help the <a href="https://github.com/OWASP">Open Web Application Security Project (OWASP)</a> with their <a href="https://github.com/OWASP/wstg">Web Security Testing Guide (WSTG)</a>. I was recently tasked with writing a <a href="https://github.com/OWASP/wstg/blob/master/style_guide.md">style guide</a> and article template that show how to write technical instruction for testing software applications.</p>
<p>I thought parts of the guide would benefit more people than just OWASP&rsquo;s contributors, so I&rsquo;m sharing some here.</p>
<p>Many of the projects I participate in are open source. This is a wonderful way for people to share solutions and to build on each others&rsquo; ideas. Unfortunately, it&rsquo;s also a great way for misused and non-existent words to catch on. Here&rsquo;s an excerpt of the guide with some mistakes I&rsquo;ve noticed and how you can fix them in your technical documents.</p>
<hr>
<h2 id="use-correct-words">Use Correct Words</h2>
<p>The following are frequently misused words and how to correct them.</p>
<h3 id="_andor_"><em>and/or</em></h3>
<p>While sometimes used in legal documents, <em>and/or</em> leads to ambiguity and confusion in technical writing. Instead, use <em>or</em>, which in the English language includes <em>and</em>. For example:</p>
<blockquote>
<p>Bad: &ldquo;The code will output an error number and/or description.&rdquo;<br>
Good: &ldquo;The code will output an error number or description.&rdquo;</p>
</blockquote>
<p>The latter sentence does not exclude the possibility of having both an error number and description.</p>
<p>If you need to specify all possible outcomes, use a list:</p>
<blockquote>
<p>&ldquo;The code will output an error number, or a description, or both.&rdquo;</p>
</blockquote>
<h3 id="_frontend-backend_"><em>frontend, backend</em></h3>
<p>While it&rsquo;s true that the English language evolves over time, these are not yet words.</p>
<p>When referring to nouns, use <em>front end</em> and <em>back end</em>. For example:</p>
<blockquote>
<p>Security is equally important on the front end as it is on the back end.</p>
</blockquote>
<p>As a descriptive adverb, use the hyphenated <em>front-end</em> and <em>back-end</em>.</p>
<blockquote>
<p>Both front-end developers and back-end developers are responsible for application security.</p>
</blockquote>
<h3 id="_whitebox_-_blackbox_-_greybox_"><em>whitebox</em>, <em>blackbox</em>, <em>greybox</em></h3>
<p>These are not words.</p>
<p>As nouns, use <em>white box</em>, <em>black box</em>, and <em>grey box</em>. These nouns rarely appear in connection with cybersecurity.</p>
<blockquote>
<p>My cat enjoys jumping into that grey box.</p>
</blockquote>
<p>As adverbs, use the hyphenated <em>white-box</em>, <em>black-box</em>, and <em>grey-box</em>. Do not use capitalization unless the words are in a title.</p>
<blockquote>
<p>While white-box testing involves knowledge of source code, black-box testing does not. A grey-box test is somewhere in-between.</p>
</blockquote>
<h3 id="_ie_-_eg_"><em>ie</em>, <em>eg</em></h3>
<p>These are letters.</p>
<p>The abbreviation <em>i.e.</em> refers to the Latin <em>id est</em>, which means &ldquo;in other words.&rdquo; The abbreviation <em>e.g.</em> is for <em>exempli gratia</em>, translating to &ldquo;for example.&rdquo; To use these in a sentence:</p>
<blockquote>
<p>Write using proper English, i.e. correct spelling and grammar. Use common words over uncommon ones, e.g. &ldquo;learn&rdquo; instead of &ldquo;glean.&rdquo;</p>
</blockquote>
<h3 id="_etc_"><em>etc</em></h3>
<p>These are also letters.</p>
<p>The Latin phrase <em>et cetera</em> translates to &ldquo;and the rest.&rdquo; It is abbreviated <em>etc.</em> and typically placed at the end of a list that seems redundant to complete:</p>
<blockquote>
<p>WSTG authors like rainbow colors, such as red, yellow, green, etc.</p>
</blockquote>
<p>In technical writing, the use of <em>etc.</em> is problematic. It assumes the reader knows what you&rsquo;re talking about, and they may not. Violet is one of the colors of the rainbow, but the example above does not explicitly tell you if violet is a color that WSTG authors like.</p>
<p>It is better to be explicit and thorough than to make assumptions of the reader. Only use <em>etc.</em> to avoid completing a list that was given in full earlier in the document.</p>
<h3 id="__-ellipsis"><em>&hellip;</em> (ellipsis)</h3>
<p>The ellipsis punctuation mark can indicate that words have been left out of a quote:</p>
<blockquote>
<p>Linus Torvalds once said, &ldquo;Once you realize that documentation should be laughed at&hellip; THEN, and only then, have you reached the level where you can safely read it and try to use it to actually implement a driver.&rdquo;</p>
</blockquote>
<p>As long as the omission does not change the meaning of the quote, this is acceptable usage of ellipsis in the WSTG.</p>
<p>All other uses of ellipsis, such as to indicate an unfinished thought, are not.</p>
<h3 id="_ex_"><em>ex</em></h3>
<p>While this is a word, it is likely not the word you are looking for. The word <em>ex</em> has particular meaning in the fields of finance and commerce, and may refer to a person if you are discussing your past relationships. None of these topics should appear in the WSTG.</p>
<p>The abbreviation <em>ex.</em> may be used to mean &ldquo;example&rdquo; by lazy writers. Please don&rsquo;t be lazy, and write <em>example</em> instead.</p>
<hr>
<h2 id="go-forth-and-write-docs">Go forth and write docs</h2>
<p>If these reminders are helpful, please share them freely and use them when writing your own READMEs and documentation! If there&rsquo;s some I&rsquo;ve missed, I&rsquo;d love to know.</p>
<p>And if you&rsquo;re here for the comments&hellip;</p>
<p><img src="crowder-change-my-mind.png#center" alt="Change my mind meme"></p>
<p>There are none on my blog. You can still <a href="https://victoria.dev/contact">@ me</a>.</p>
<p>If you&rsquo;d like to help contribute to the OWASP WSTG, please read <a href="https://github.com/OWASP/wstg/blob/master/CONTRIBUTING.md">the contribution guide</a>. See the <a href="https://github.com/OWASP/wstg/blob/master/style_guide.md">full style guide here</a>.</p>
User interface security for the front-end developerhttps://victoria.dev/blog/user-interface-security-for-the-front-end-developer/
Wed, 11 Dec 2019 08:27:31 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/user-interface-security-for-the-front-end-developer/Best practices that contribute to cybersecurity, for the user interface.
]]>
<p>While cybersecurity is often thought of in terms of databases and architecture, much of a strong security posture relies on elements in the domain of the front-end developer. For certain potentially devastating vulnerabilities like <a href="https://www.owasp.org/index.php/Top_10-2017_A1-Injection">SQL injection</a> and <a href="https://www.owasp.org/index.php/Top_10-2017_A7-Cross-Site_Scripting_(XSS)">Cross-Site Scripting (XSS)</a>, a well-considered user interface is the first line of defense.</p>
<p>Here are a few areas of focus for front-end developers who want to help fight the good fight.</p>
<h2 id="control-user-input">Control user input</h2>
<p>A whole whack of <a href="https://victoria.dev/blog/sql-injection-and-xss-what-white-hat-hackers-know-about-trusting-user-input/">crazy things</a> can happen when developers build a form that fails to control user input. To combat vulnerabilities like injection, it&rsquo;s important to validate or sanitize user input.</p>
<p>Input can be validated by constraining it to known values, such as by using <a href="https://developer.mozilla.org/en-US/docs/Web/Guide/HTML/HTML5/Constraint_validation#Semantic_input_types">semantic input types</a> or <a href="https://developer.mozilla.org/en-US/docs/Web/Guide/HTML/HTML5/Constraint_validation#Validation-related_attributes">validation-related attributes</a> in forms. Frameworks like <a href="https://www.djangoproject.com/">Django</a> also help by providing <a href="https://docs.djangoproject.com/en/3.0/ref/models/fields/#field-types">field types</a> for this purpose. Sanitizing data can be done by removing or replacing contextually-dangerous characters, such as by using a whitelist or escaping the input data.</p>
<p>While it may not be intuitive, even data that a user submits to their own area on a site should be validated. One of the fastest viruses to proliferate was the <a href="https://en.wikipedia.org/wiki/Samy_(computer_worm)">Samy worm</a> on MySpace (yes, I&rsquo;m old), thanks to code that Samy Kamkar was able to inject into his own profile page. Don&rsquo;t directly return any input to your site without thorough validation or santization.</p>
<p>For some further guidance on battling injection attacks, see the <a href="https://github.com/OWASP/CheatSheetSeries/blob/master/cheatsheets/Injection_Prevention_Cheat_Sheet.md">OWASP Injection Prevention Cheat Sheet</a>.</p>
<h2 id="beware-hidden-fields">Beware hidden fields</h2>
<p>Adding <code>type=&quot;hidden&quot;</code> is an enticingly convenient way to hide sensitive data in pages and forms, but unfortunately not an effective one. With tools like <a href="https://www.zaproxy.org/">ZapProxy</a> and even inspection tools in plain ol&rsquo; web browsers, users can easily click to reveal tasty bits of invisible information. Hiding checkboxes can be a neat hack for creating CSS-only switches, but hidden fields do little to contribute to security.</p>
<p>If you must use hidden fields, here are some <a href="https://www.owasp.org/index.php/Data_Validation#Hidden_fields">helpful guidelines</a>.</p>
<h2 id="carefully-consider-autofill-fields">Carefully consider autofill fields</h2>
<p>When a user chooses to give you their <a href="https://en.wikipedia.org/wiki/Personal_data">Personally Identifiable Information</a> (PII), it should be a conscious choice. Autofill form fields can be convenient - for both users and attackers. <a href="https://freedom-to-tinker.com/2017/12/27/no-boundaries-for-user-identities-web-trackers-exploit-browser-login-managers/">Exploits using hidden fields can harvest PII</a> previously captured by an autocomplete field.</p>
<p>Many users aren&rsquo;t even aware what information their browser&rsquo;s autofill has stored up. Use these fields sparingly, and disable autofilled forms for particularly sensitive data.</p>
<p>It&rsquo;s important to also weigh your risk profile against its trade-offs. If your project must be <a href="https://www.w3.org/WAI/standards-guidelines/wcag/">WCAG</a> compliant, disabling autocomplete can break your input for different modalities. For more, see <a href="https://www.w3.org/WAI/WCAG21/Understanding/identify-input-purpose.html">1.3.5: Identify Input Purpose in WCAG 2.1</a>.</p>
<h2 id="keep-errors-generic">Keep errors generic</h2>
<p>While it may seem helpful to let users know whether a piece of data exists, it&rsquo;s also very helpful to attackers. When dealing with accounts, emails, and PII, it&rsquo;s most secure to err (🥁) on the side of less. Instead of returning &ldquo;Your password for this account is incorrect,&rdquo; try the more ambiguous feedback &ldquo;Incorrect login information,&rdquo; and avoid revealing whether the username or email is in the system.</p>
<p>In order to be more helpful, provide a prominent way to contact a human in case an error should arise. Avoid revealing information that isn&rsquo;t necessary. If nothing else, for heaven&rsquo;s sake, don&rsquo;t suggest data that&rsquo;s a close match to the user input.</p>
<h2 id="be-a-bad-guy">Be a bad guy</h2>
<p>When considering security, it&rsquo;s helpful to take a step back, observe the information on display, and ask yourself how a malicious attacker would be able to utilize it. Play devil&rsquo;s advocate. If a bad guy saw this page, what new information would they gain? Does the view show any PII?</p>
<p>Ask yourself if everything on the page is actually necessary for a genuine user. If not, redact or remove it. Less is safer.</p>
<h2 id="security-starts-at-the-front-door">Security starts at the front door</h2>
<p>These days, there&rsquo;s a lot more overlap between coding on the front end and the back end. To create a well-rounded and secure application, it helps to have a general understanding of ways attackers can get their foot in the front door.</p>
The surprisingly difficult task of printing newlines in a terminalhttps://victoria.dev/blog/the-surprisingly-difficult-task-of-printing-newlines-in-a-terminal/
Wed, 04 Dec 2019 09:17:35 -0500hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/the-surprisingly-difficult-task-of-printing-newlines-in-a-terminal/Your guide to string interpolation quirks that confound the best of us.
]]>
<p>Surprisingly, getting computers to give humans readable output is no easy feat. With the introduction of <a href="https://en.wikipedia.org/wiki/Standard_streams">standard streams</a> and specifically standard output, programs gained a way to talk to each other using plain text streams; humanizing and displaying stdout is another matter. Technology throughout the computing age has tried to solve this problem, from the use of <a href="https://en.wikipedia.org/wiki/Computer_terminal#Early_VDUs">ASCII characters in video computer displays</a> to modern shell commands like <code>echo</code> and <code>printf</code>.</p>
<p>These advancements have not been seamless. The job of printing output to a terminal is fraught with quirks for programmers to navigate, as exemplified by the deceptively nontrivial task of expanding an <a href="https://en.wikipedia.org/wiki/Escape_sequence">escape sequence</a> to print newlines. The expansion of the placeholder <code>\n</code> can be accomplished in a multitude of ways, each with its own unique history and complications.</p>
<h2 id="using-echo">Using <code>echo</code></h2>
<p>From its appearance in <a href="https://en.wikipedia.org/wiki/Multics">Multics</a> to its modern-day Unix-like system ubiquity, <code>echo</code> remains a familiar tool for getting your terminal to say &ldquo;Hello world!&rdquo; Unfortunately, inconsistent implementations across operating systems make its usage tricky. Where <code>echo</code> on some systems will automatically expand escape sequences, <a href="https://man.cat-v.org/unix_8th/1/echo">others</a> require the <code>-e</code> option to do the same:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh"><span class="nb">echo</span> <span class="s2">&#34;the study of European nerves is \neurology&#34;</span>
<span class="c1"># the study of European nerves is \neurology</span>
<span class="nb">echo</span> -e <span class="s2">&#34;the study of European nerves is \neurology&#34;</span>
<span class="c1"># the study of European nerves is</span>
<span class="c1"># eurology</span>
</code></pre></div><p>Because of these inconsistencies in implementations, <code>echo</code> is considered non-portable. Additionally, its usage in conjunction with user input is relatively easy to corrupt through <a href="https://en.wikipedia.org/wiki/Code_injection#Shell_injection">shell injection attack</a> using command substitutions.</p>
<p>In modern systems, it is retained only to provide compatibility with the many programs that still use it. The <a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/echo.html#tag_20_37_16">POSIX specification recommends</a> the use of <code>printf</code> in new programs.</p>
<h2 id="using-printf">Using <code>printf</code></h2>
<p>Since 4th <a href="https://en.wikipedia.org/wiki/Research_Unix#Versions">Edition</a> Unix, the portable <a href="https://en.wikipedia.org/wiki/Printf_(Unix)"><code>printf</code> command</a> has essentially been the new and better <code>echo</code>. It allows you to use <a href="https://en.wikipedia.org/wiki/Printf_format_string#Format_placeholder_specification">format specifiers</a> to humanize input. To interpret backslash escape sequences, use <code>%b</code>. The character sequence <code>\n</code> ensures the output ends with a newline:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh"><span class="nb">printf</span> <span class="s2">&#34;%b\n&#34;</span> <span class="s2">&#34;Many females in Oble are \noblewomen&#34;</span>
<span class="c1"># Many females in Oble are</span>
<span class="c1"># oblewomen</span>
</code></pre></div><p>Though <code>printf</code> has further options that make it a far more powerful replacement of <code>echo</code>, this utility is not foolproof and can be vulnerable to an <a href="https://en.wikipedia.org/wiki/Uncontrolled_format_string">uncontrolled format string</a> attack. It&rsquo;s important for programmers to ensure they <a href="https://victoria.dev/blog/sql-injection-and-xss-what-white-hat-hackers-know-about-trusting-user-input/">carefully handle user input</a>.</p>
<h2 id="putting-newlines-in-variables">Putting newlines in variables</h2>
<p>In an effort to improve portability amongst compilers, the <a href="https://en.wikipedia.org/wiki/ANSI_C">ANSI C Standard</a> was established in 1983. With <a href="https://www.gnu.org/software/bash/manual/html_node/ANSI_002dC-Quoting.html#ANSI_002dC-Quoting">ANSI-C quoting</a> using <code>$'...'</code>, <a href="https://en.wikipedia.org/wiki/Escape_sequences_in_C#Table_of_escape_sequences">escape sequences</a> are replaced in output according to the standard.</p>
<p>This allows us to store strings with newlines in variables that are printed with the newlines interpreted. You can do this by setting the variable, then calling it with <code>printf</code> using <code>$</code>:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh"><span class="nv">puns</span><span class="o">=</span><span class="s1">$&#39;\number\narrow\nether\nice&#39;</span>
<span class="nb">printf</span> <span class="s2">&#34;%b\n&#34;</span> <span class="s2">&#34;These words started with n but don&#39;t make </span><span class="nv">$puns</span><span class="s2">&#34;</span>
<span class="c1"># These words started with n but don&#39;t make</span>
<span class="c1"># umber</span>
<span class="c1"># arrow</span>
<span class="c1"># ether</span>
<span class="c1"># ice</span>
</code></pre></div><p>The expanded variable is single-quoted, which is passed literally to <code>printf</code>. As always, it is important to properly handle the input.</p>
<h2 id="bonus-round-shell-parameter-expansion">Bonus round: shell parameter expansion</h2>
<p>In my article explaining <a href="https://victoria.dev/blog/bash-and-shell-expansions-lazy-list-making/">Bash and braces</a>, I covered the magic of <a href="https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html">shell parameter expansion</a>. We can use one expansion, <code>${parameter@operator}</code>, to interpret escape sequences, too. We use <code>printf</code>'s <code>%s</code> specifier to print as a string, and the <code>E</code> operator will properly expand the escape sequences in our variable:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh"><span class="nb">printf</span> <span class="s2">&#34;%s\n&#34;</span> <span class="si">${</span><span class="nv">puns</span><span class="p">@E</span><span class="si">}</span>
<span class="c1"># umber</span>
<span class="c1"># arrow</span>
<span class="c1"># ether</span>
<span class="c1"># ice</span>
</code></pre></div><h2 id="the-ongoing-challenge-of-humanizing-output">The ongoing challenge of humanizing output</h2>
<p><a href="https://en.wikipedia.org/wiki/String_interpolation">String interpolation</a> continues to be a chewy problem for programmers. Besides getting languages and shells to agree on what certain placeholders mean, properly using the correct escape sequences requires an eye for detail.</p>
<p>Poor string interpolation can lead to silly-looking output, as well as introduce security vulnerabilities, such as from <a href="https://en.wikipedia.org/wiki/Code_injection">injection attacks</a>. Until the next evolution of the terminal has us talking in emojis, we&rsquo;d best pay attention when printing output for humans.</p>
The care and feeding of an IoT devicehttps://victoria.dev/blog/the-care-and-feeding-of-an-iot-device/
Wed, 27 Nov 2019 08:59:35 -0500hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/the-care-and-feeding-of-an-iot-device/Why IoT devices are, basically, puppies, and whether or not you should give somebody one for Christmas.
]]>
<p>Giving someone a puppy for Christmas might work really well in a movie, but in real life often comes hitched to a multitude of responsibilities that the giftee may not be fully prepared to take on. The same is true for Internet of Things (IoT) devices, including Amazon&rsquo;s Alexa-enabled devices, Google Home, and other Internet-connected appliances like cameras, lightbulbs, and toasters. Yes, they have those now.</p>
<p>Like puppies, IoT devices are still young. Many contain <a href="https://threatpost.com/iot-devices-vulnerable-takeover/144167/">known vulnerabilities</a> that remote attackers can use to gain access to device owners&rsquo; networks. These attacks are sometimes as laughably simple as using a default username and password that the <a href="https://gdpr.report/news/2019/06/12/research-reveals-the-most-vulnerable-iot-devices/">device owner cannot change</a>.</p>
<p>Does all this mean you shouldn&rsquo;t give Grandma Mabel a new app-enabled coffee maker or Ring doorbell for Christmas? Probably, although not necessarily. Like puppies, properly-maintained IoT devices are capable of warming your heart without causing <em>too</em> much havoc; but they take a lot of work to care for. Here are a few responsibilities to keep in mind for the care and feeding of an IoT device.</p>
<h2 id="immature-security">Immature security</h2>
<p>Many manufacturers of IoT devices <a href="https://www.infosecurity-magazine.com/news/vulnerabilities-in-iot-devices/">have not made security a priority</a>. There aren&rsquo;t yet any enforced <a href="https://blog.rapid7.com/2019/03/27/the-iot-cybersecurity-improvement-act-of-2019/">security requirements</a> for this industry, which leaves the protection of your device and the network it&rsquo;s connected to in the hands of the manufacturer.</p>
<p>It&rsquo;s not just obscure no-name toasters, either; malicious third-party apps have snuck onto Amazon&rsquo;s and Google&rsquo;s more reputable devices and enabled attackers to <a href="https://www.cnet.com/news/alexa-and-google-voice-assistants-app-exploits-left-it-vulnerable-to-eavesdropping/">eavesdrop</a> on unsuspecting owners.</p>
<p>Until security regulations are put in place and enforced, it&rsquo;s buyer beware for both devices and third-party applications. To the extent possible, potential owners must do ample research to weed out vulnerable devices and untrustworthy apps.</p>
<h2 id="protecting-your-network">Protecting your network</h2>
<p>If you think hackers aren&rsquo;t likely to find your device in the vast expanse of the Internet, you might be wrong. These days, obscurity doesn&rsquo;t provide security. It&rsquo;s no longer left up to a potential attacker&rsquo;s fallible human eyes to find your insecure front door camera in a cacophony of wireless traffic; <a href="https://money.cnn.com/2013/04/08/technology/security/shodan/index.html">IoT search engines</a> like <a href="https://www.shodan.io/">Shodan</a> will do that for them. Thankfully, these search engines are also used for good, enabling white hat hackers and penetration testers to find and fix insecure devices.</p>
<p>Just like locking your own front door, IoT owners are responsible for locking down access to their devices. This may mean searching through device settings to make sure default credentials are changed, or checking to make sure that a device used on your private home network doesn&rsquo;t by default have public Internet access.</p>
<p>Where the options are available, HTTPS and <a href="https://victoria.dev/blog/personal-cybersecurity-posture-for-when-youre-just-this-guy-you-know/#1-use-multifactor-authentication">multifactor authentication</a> should be enabled. The use of a <a href="https://victoria.dev/blog/personal-cybersecurity-posture-for-when-youre-just-this-guy-you-know/#2-use-a-vpn">VPN</a> can also keep your devices from being found.</p>
<h2 id="keeping-them-patched">Keeping them patched</h2>
<p>Unlike puppies, many IoT devices are &ldquo;headless&rdquo; and have no inherent way of interfacing with a human. An app-controlled lightbulb, for example, may be all but useless without the software that makes it shine. As convenient as it may be to have your 1500K mood lighting come on automatically at dusk, it also means automatically ceding control of the device to its software developers.</p>
<p>When vulnerabilities in your phone&rsquo;s operating system are discovered and patched, it&rsquo;s likely that automatic updates are pushed and installed overnight, possibly without you even knowing. Your IoT device, on the other hand, may have no such support. In those cases, it&rsquo;s completely up to the user to discover that an update is needed, find and download the patch, then correctly update their device. Even for owners with some technical expertise, this process takes significant effort. Many <a href="https://www.machinedesign.com/industrial-automation/software-updates-are-new-hurdle-iot-security">device owners aren&rsquo;t even aware</a> that their software is dangerously outdated.</p>
<p>In practical terms, this means that users without the time, knowledge, or willingness to keep their devices updated should reconsider owning them. Alternatively, some research can help prospective owners choose devices that receive automatic push updates from their (hopefully responsible) manufacturers over WiFi.</p>
<h2 id="being-responsible">Being responsible</h2>
<p>Raising a healthy and happy IoT device is no small task, especially for potential owners with little time or willingness to put in the required effort. With the proper attention and maintenance, your Internet-connected appliance can bring joy and convenience to your life; but without, it introduces a potential security risk and a whole lot of trouble.</p>
<p>Before getting or giving IoT, be sure the potential owner is up to the task of caring for it.</p>
<p>You can learn more about basic cybersecurity for IoT (as a user or maker) by reading <a href="https://csrc.nist.gov/publications/detail/nistir/8259/draft">NIST&rsquo;s draft guidelines publication</a>.</p>
Bash and shell expansions: lazy list-makinghttps://victoria.dev/blog/bash-and-shell-expansions-lazy-list-making/
Mon, 18 Nov 2019 07:07:24 -0500hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/bash-and-shell-expansions-lazy-list-making/A tour of brace expansion, shell parameter expansions, and playing with substrings in Bash.
]]>
<p>It&rsquo;s that time of year again! When stores start putting up colourful sparkly lit-up plastic bits, we all begin to feel a little festive, and by festive I mean let&rsquo;s go shopping. Specifically, holiday gift shopping! (Gifts for yourself are still gifts, technically.)</p>
<p>Just so this doesn&rsquo;t all go completely madcap, you ought to make some gift lists. Bash can help.</p>
<h2 id="brace-expansion">Brace expansion</h2>
<p>These are not braces: <code>()</code></p>
<p>Neither are these: <code>[]</code></p>
<p><em>These</em> are braces: <code>{}</code></p>
<p>Braces tell Bash to do something with the arbitrary string or strings it finds between them. Multiple strings are comma-separated: <code>{a,b,c}</code>. You can also add an optional preamble and postscript to be attached to each expanded result. Mostly, this can save some typing, such as with common file paths and extensions.</p>
<p>Let&rsquo;s make some lists for each person we want to give stuff to. The following commands are equivalent:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">touch /home/me/gift-lists/Amy.txt /home/me/gift-lists/Bryan.txt /home/me/gift-lists/Charlie.txt
</code></pre></div><div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">touch /home/me/gift-lists/<span class="o">{</span>Amy,Bryan,Charlie<span class="o">}</span>.txt
</code></pre></div><div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">tree gift-lists
/home/me/gift-lists
├── Amy.txt
├── Bryan.txt
└── Charlie.txt
</code></pre></div><p>Oh darn, &ldquo;Bryan&rdquo; spells his name with an &ldquo;i.&rdquo; I can fix that.</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">mv /home/me/gift-lists/<span class="o">{</span>Bryan,Brian<span class="o">}</span>.txt
renamed <span class="s1">&#39;/home/me/gift-lists/Bryan.txt&#39;</span> -&gt; <span class="s1">&#39;/home/me/gift-lists/Brian.txt&#39;</span>
</code></pre></div><h2 id="shell-parameter-expansions">Shell parameter expansions</h2>
<p><a href="https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html">Shell parameter expansion</a> allows us to make all sorts of changes to parameters enclosed in braces, like manipulate and substitute text.</p>
<p>There are a few stocking stuffers that all our giftees deserve. Let&rsquo;s make that a variable:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh"><span class="nv">STUFF</span><span class="o">=</span><span class="s1">$&#39;socks\nlump of coal\nwhite chocolate&#39;</span>
<span class="nb">echo</span> <span class="s2">&#34;</span><span class="nv">$STUFF</span><span class="s2">&#34;</span>
socks
lump of coal
white chocolate
</code></pre></div><p>Now to add these items to each of our lists with some help from <a href="https://en.wikipedia.org/wiki/Tee_(command)">the <code>tee</code> command</a> to get <code>echo</code> and expansions to play nice.</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh"><span class="nb">echo</span> <span class="s2">&#34;</span><span class="nv">$STUFF</span><span class="s2">&#34;</span> <span class="p">|</span> tee <span class="o">{</span>Amy,Brian,Charlie<span class="o">}</span>.txt
cat <span class="o">{</span>Amy,Brian,Charlie<span class="o">}</span>.txt
socks
lump of coal
white chocolate
socks
lump of coal
white chocolate
socks
lump of coal
white chocolate
</code></pre></div><h3 id="pattern-match-substitution">Pattern match substitution</h3>
<p>On second thought, maybe the lump of coal isn&rsquo;t such a nice gift. You can replace it with something better using a pattern match substitution in the form of <code>${parameter/pattern/string}</code>:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh"><span class="nb">echo</span> <span class="s2">&#34;</span><span class="si">${</span><span class="nv">STUFF</span><span class="p">/lump of coal/candy cane</span><span class="si">}</span><span class="s2">&#34;</span> <span class="p">|</span> tee <span class="o">{</span>Amy,Brian,Charlie<span class="o">}</span>.txt
cat <span class="o">{</span>Amy,Brian,Charlie<span class="o">}</span>.txt
socks
candy cane
white chocolate
socks
candy cane
white chocolate
socks
candy cane
white chocolate
</code></pre></div><p>This replaces the first instance of &ldquo;lump of coal&rdquo; with &ldquo;candy cane.&rdquo; To replace all instances (if there were multiple), use <code>${parameter//pattern/string}</code>. This doesn&rsquo;t change our <code>$STUFF</code> variable, so we can still reuse the original list for someone naughty later.</p>
<h3 id="substrings">Substrings</h3>
<p>While we&rsquo;re improving things, our giftees may not all like white chocolate. We&rsquo;d better add some regular chocolate to our lists just in case. Since I&rsquo;m super lazy, I&rsquo;m just going to hit the up arrow and modify a previous Bash command. Luckily, the last word in the <code>$STUFF</code> variable is &ldquo;chocolate,&rdquo; which is nine characters long, so I&rsquo;ll tell Bash to keep just that part using <code>${parameter:offset}</code>. I&rsquo;ll use <code>tee</code>'s <code>-a</code> flag to <code>a</code>ppend to my existing lists:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh"><span class="nb">echo</span> <span class="s2">&#34;</span><span class="si">${</span><span class="nv">STUFF</span><span class="p">: -9</span><span class="si">}</span><span class="s2">&#34;</span> <span class="p">|</span> tee -a <span class="o">{</span>Amy,Brian,Charlie<span class="o">}</span>.txt
cat <span class="o">{</span>Amy,Brian,Charlie<span class="o">}</span>.txt
socks
candy cane
white chocolate
chocolate
socks
candy cane
white chocolate
chocolate
socks
candy cane
white chocolate
chocolate
</code></pre></div><p>You can also:</p>
<table>
<thead>
<tr>
<th>Do this</th>
<th>With this</th>
</tr>
</thead>
<tbody>
<tr>
<td>Get substring from <em>n</em> characters onwards</td>
<td><code>${parameter:n}</code></td>
</tr>
<tr>
<td>Get substring for <em>x</em> characters starting at <em>n</em></td>
<td><code>${parameter:n:x}</code></td>
</tr>
</tbody>
</table>
<p>There! Now our base lists are finished. Let&rsquo;s have some eggnog.</p>
<h3 id="testing-variables">Testing variables</h3>
<p>You know, it may be the eggnog, but I think I started a list for Amy yesterday and stored it in a variable that I might have called <code>amy</code>. Let&rsquo;s see if I did. I&rsquo;ll use the <code>${parameter:?word}</code> expansion. It&rsquo;ll write <code>word</code> to standard error and exit if there&rsquo;s no <code>amy</code> parameter.</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh"><span class="nb">echo</span> <span class="s2">&#34;</span><span class="si">${</span><span class="nv">amy</span><span class="p">:?no such</span><span class="si">}</span><span class="s2">&#34;</span>
bash: amy: no such
</code></pre></div><p>I guess not. Maybe it was Brian instead?</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh"><span class="nb">echo</span> <span class="s2">&#34;</span><span class="si">${</span><span class="nv">brian</span><span class="p">:?no such</span><span class="si">}</span><span class="s2">&#34;</span>
Lederhosen
</code></pre></div><p>You can also:</p>
<table>
<thead>
<tr>
<th>Do this</th>
<th>With this</th>
</tr>
</thead>
<tbody>
<tr>
<td>Substitute <code>word</code> if <code>parameter</code> is unset or null</td>
<td><code>${parameter:-word}</code></td>
</tr>
<tr>
<td>Substitute <code>word</code> if <code>parameter</code> is not unset or null</td>
<td><code>${parameter:+word}</code></td>
</tr>
<tr>
<td>Assign <code>word</code> to <code>parameter</code> if <code>parameter</code> is unset or null</td>
<td><code>${parameter:=word}</code></td>
</tr>
</tbody>
</table>
<h3 id="changing-case">Changing case</h3>
<p>That&rsquo;s right! Brian said he wanted some lederhosen and so I made myself a note. This is pretty important, so I&rsquo;ll add it to Brian&rsquo;s list in capital letters with the <code>${parameter^^pattern}</code> expansion. The <code>pattern</code> part is optional. We&rsquo;re only writing to Brian&rsquo;s list, so I&rsquo;ll just use <code>&gt;&gt;</code> instead of <code>tee -a</code>.</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh"><span class="nb">echo</span> <span class="s2">&#34;</span><span class="si">${</span><span class="nv">brian</span><span class="p">^^</span><span class="si">}</span><span class="s2">&#34;</span> &gt;&gt; Brian.txt
cat Brian.txt
socks
candy cane
white chocolate
chocolate
LEDERHOSEN
</code></pre></div><p>You can also:</p>
<table>
<thead>
<tr>
<th>Do this</th>
<th>With this</th>
</tr>
</thead>
<tbody>
<tr>
<td>Capitalize the first letter</td>
<td><code>${parameter^pattern}</code></td>
</tr>
<tr>
<td>Lowercase the first letter</td>
<td><code>${parameter,pattern}</code></td>
</tr>
<tr>
<td>Lowercase all letters</td>
<td><code>${parameter,,pattern}</code></td>
</tr>
</tbody>
</table>
<h3 id="expanding-arrays">Expanding arrays</h3>
<p>You know what, all this gift-listing business is a lot of work. I&rsquo;m just going to make <a href="https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Arrays">an array</a> of things I saw at the store:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh"><span class="nv">gifts</span><span class="o">=(</span>sweater gameboy wagon pillows chestnuts hairbrush<span class="o">)</span>
</code></pre></div><p>I can use substring expansion in the form of <code>${parameter:offset:length}</code> to make this simple. I&rsquo;ll add the first two to Amy&rsquo;s list, the middle two to Brian&rsquo;s, and the last two to Charlie&rsquo;s. I&rsquo;ll use <code>printf</code> to help with newlines.</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh"><span class="nb">printf</span> <span class="s1">&#39;%s\n&#39;</span> <span class="s2">&#34;</span><span class="si">${</span><span class="nv">gifts</span><span class="p">[@]:</span><span class="nv">0</span><span class="p">:</span><span class="nv">2</span><span class="si">}</span><span class="s2">&#34;</span> &gt;&gt; Amy.txt
<span class="nb">printf</span> <span class="s1">&#39;%s\n&#39;</span> <span class="s2">&#34;</span><span class="si">${</span><span class="nv">gifts</span><span class="p">[@]:</span><span class="nv">2</span><span class="p">:</span><span class="nv">2</span><span class="si">}</span><span class="s2">&#34;</span> &gt;&gt; Brian.txt
<span class="nb">printf</span> <span class="s1">&#39;%s\n&#39;</span> <span class="s2">&#34;</span><span class="si">${</span><span class="nv">gifts</span><span class="p">[@]: -2</span><span class="si">}</span><span class="s2">&#34;</span> &gt;&gt; Charlie.txt
</code></pre></div><div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">cat Amy.txt
socks
candy cane
white chocolate
chocolate
sweater
gameboy
cat Brian.txt
socks
candy cane
white chocolate
chocolate
LEDERHOSEN
wagon
pillows
cat Charlie.txt
socks
candy cane
white chocolate
chocolate
chestnuts
hairbrush
</code></pre></div><p>There! Now we&rsquo;ve got a comprehensive set of super personalized gift lists. Thanks Bash! Too bad it can&rsquo;t do the shopping for us, too.</p>
A cron job that could save you from a ransomware attackhttps://victoria.dev/blog/a-cron-job-that-could-save-you-from-a-ransomware-attack/
Wed, 13 Nov 2019 08:27:31 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/a-cron-job-that-could-save-you-from-a-ransomware-attack/How a simple scheduled job can help you quickly recover from ransomware.
]]>
<p>It&rsquo;s 2019, and ransomware has become a thing.</p>
<p>Systems that interact with the public, like companies, educational institutions, and public services, are most susceptible. While delivery methods for ransomware vary from the physical realm to communication via social sites and email, all methods only require one person to make one mistake in order for ransomware to proliferate.</p>
<p>Ransomware, as you may have heard, is a malicious program that encrypts your files, rendering them unreadable and useless to you. It can include instructions for paying a ransom, usually by sending cryptocurrency, in order to obtain the decryption key. Successful ransomware attacks typically exploit vital, time-sensitive systems. Victims like public services and medical facilities are more likely to have poor or zero recovery processes, leaving governments or insurance providers to reward attackers with ransom payments.</p>
<p>Individuals, especially less-than-tech-savvy ones, are no less at risk. Ransomware can occlude personal documents and family photos that may only exist on one machine.</p>
<p>Thankfully, a fairly low-tech solution exists for rendering ransomware inept: back up your data!</p>
<p>You could achieve this with a straightforward system like plugging in an external hard drive and dragging files over once a day, but this method has a few hurdles. Manually transferring files may be slow or incomplete, and besides, you&rsquo;ll first have to remember to do it.</p>
<p>In my constant pursuit of automating all the things, there&rsquo;s one tool I often return to for its simplicity and reliability: <code>cron</code>. Cron does one thing, and does it well: it runs commands on a schedule.</p>
<p>I first used it a few months shy of three years ago (Have I really been blogging that long?!) to create <a href="https://victoria.dev/blog/how-i-created-custom-desktop-notifications-using-terminal-and-cron/">custom desktop notifications on Linux</a>. Using the crontab configuration file, which you can edit by running <code>crontab -e</code>, you can specify a schedule for running any commands you like. Here&rsquo;s what the scheduling syntax looks like, from the <a href="https://en.wikipedia.org/wiki/Cron">Wikipedia cron page</a>:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh"><span class="c1"># ┌───────────── minute (0 - 59)</span>
<span class="c1"># │ ┌───────────── hour (0 - 23)</span>
<span class="c1"># │ │ ┌───────────── day of the month (1 - 31)</span>
<span class="c1"># │ │ │ ┌───────────── month (1 - 12)</span>
<span class="c1"># │ │ │ │ ┌───────────── day of the week (0 - 6)</span>
<span class="c1"># │ │ │ │ │</span>
<span class="c1"># │ │ │ │ │</span>
<span class="c1"># │ │ │ │ │</span>
<span class="c1"># * * * * * command to execute</span>
</code></pre></div><p>For example, a cron job that runs every day at 00:00 would look like:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh"><span class="m">0</span> <span class="m">0</span> * * *
</code></pre></div><p>To run a job every twelve hours, the syntax is:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh"><span class="m">0</span> */12 * * *
</code></pre></div><p>This <a href="https://crontab.guru/">great tool</a> can help you wrap your head around the cron scheduling syntax.</p>
<p>What&rsquo;s a scheduler have to do with backing up? By itself, not much. The simple beauty of cron is that it runs commands - any shell commands, and any scripts that you&rsquo;d normally run on the command line. As you may have gleaned from my other posts, I&rsquo;m of the strong opinion that you can do just about anything on the command line, including backing up your files. Options for storage in this area are plentiful, from near-to-free local and cloud options, as well as paid managed services too numerous to list. For CLI tooling, we have utilitarian classics like <code>rsync</code>, and CLI tools for specific cloud providers like AWS.</p>
<h2 id="backing-up-with-rsync">Backing up with <code>rsync</code></h2>
<p><a href="https://en.wikipedia.org/wiki/Rsync">The <code>rsync</code> utility</a> is a classic choice, and can back up your files to an external hard drive or remote server while making intelligent determinations about which files to update. It uses file size and modification times to recognize file changes, and then only transfers changed files, saving time and bandwidth.</p>
<p>The <a href="https://download.samba.org/pub/rsync/rsync.html"><code>rsync</code> syntax</a> can be a little nuanced; for example, a trailing forward slash will copy just the contents of the directory, instead of the directory itself. I found examples to be helpful in understanding the usage and syntax.</p>
<p>Here&rsquo;s one for backing up a local directory to a local destination, such as an external hard drive:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">rsync -a /home/user/directory /media/user/destination
</code></pre></div><p>The first argument is the source, and the second is the destination. Reversing these in the above example would copy files from the mounted drive to the local home directory.</p>
<p>The <code>a</code> flag for archive mode is one of <code>rsync</code>'s superpowers. Equivalent to flags <code>-rlptgoD</code>, it:</p>
<ul>
<li>Syncs files recursively through directories (<code>r</code>);</li>
<li>Preserves symlinks (<code>l</code>), permissions (<code>p</code>), modification times (<code>t</code>), groups (<code>g</code>), and owner (<code>o</code>); and</li>
<li>Copies device and special files (<code>D</code>).</li>
</ul>
<p>Here&rsquo;s another example, this time for backing up the contents of a local directory to a directory on a remote server using SSH:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">rsync -avze ssh /home/user/directory/ user@remote.host.net:home/user/directory
</code></pre></div><p>The <code>v</code> flag turns on verbose output, which is helpful if you like realtime feedback on which files are being transferred. During large transfers, however, it can tend to slow things down. The <code>z</code> flag can help with that, as it indicates that files should be compressed during transfer.</p>
<p>The <code>e</code> flag, followed by <code>ssh</code>, tells <code>rsync</code> to use SSH according to the destination instructions provided in the final argument.</p>
<h2 id="backing-up-with-aws-cli">Backing up with AWS CLI</h2>
<p>Amazon Web Services offers a command line interface tool for doing just about everything with your AWS set up, including a straightforward <a href="https://docs.aws.amazon.com/ja_jp/cli/latest/reference/s3/sync.html"><code>s3 sync</code> command</a> for recursively copying new and updated files to your S3 storage buckets. As a storage method for back up data, S3 is a stable and inexpensive choice. You can even <a href="https://docs.aws.amazon.com/AmazonS3/latest/dev/Versioning.html">turn on versioning in your bucket</a>.</p>
<p>The <a href="https://docs.aws.amazon.com/ja_jp/cli/latest/reference/s3/index.html#directory-and-s3-prefix-operations">syntax for interacting with directories</a> is fairly straightforward, and you can directly indicate your S3 bucket as an <code>S3Uri</code> argument in the form of <code>s3://mybucket/mykey</code>. To back up a local directory to your S3 bucket, the command is:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">aws s3 sync /home/user/directory s3://mybucket
</code></pre></div><p>Similar to <code>rsync</code>, reversing the source and destination would download files from the S3 bucket.</p>
<p>The <code>sync</code> command is intuitive by default. It will guess the mime type of uploaded files, as well as include files discovered by following symlinks. A variety of options exist to control these and other defaults, even including flags to specify the server-side encryption to be used.</p>
<h2 id="setting-up-your-cronjob-back-up">Setting up your cronjob back up</h2>
<p>You can edit your machine&rsquo;s cron file by running:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">crontab -e
</code></pre></div><p>Intuitive as it may be, it&rsquo;s worth mentioning that your back up commands will only run when your computer is turned on and the cron daemon is running. With this in mind, choose a schedule for your cronjob that aligns with times when your machine is powered on, and maybe not overloaded with other work.</p>
<p>To back up to an S3 bucket every day at 8AM, for example, you&rsquo;d put a line in your crontab that looks like:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh"><span class="m">0</span> <span class="m">8</span> * * * aws s3 sync /home/user/directory s3://mybucket
</code></pre></div><p>If you&rsquo;re curious whether your cron job is currently running, find the PID of cron with:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">pstree -ap <span class="p">|</span> grep cron
</code></pre></div><p>Then run <code>pstree -ap &lt;PID&gt;</code>.</p>
<p>This rabbit hole goes deeper; a quick search can reveal different ways of organizing and scheduling cronjobs, or help you find different utilities to run cronjobs when your computer is asleep. To protect against the possibility of ransomware-affected files being transferred to your back up, incrementally separated archives are a good idea. In essence, however, this basic set up is all you really need to create a reliable, automatic back up system.</p>
<h2 id="dont-feed-the-trolls">Don&rsquo;t feed the trolls</h2>
<p>Humans are fallible; that&rsquo;s why cyberattacks work. The success of a ransomware attack depends on the victim having no choice but to pay up in order to return to business as usual. A highly accessible recent back up undermines attackers who depend on us being unprepared. By blowing away a system and restoring from yesterday&rsquo;s back up, we may lose a day of progress; ransomers, however, gain nothing at all.</p>
<p>For further resources on ransomware defense for users and organizations, check out <a href="https://www.us-cert.gov/Ransomware">CISA&rsquo;s advice on ransomware</a>.</p>
Publishing GitHub event data with GitHub Actions and Pageshttps://victoria.dev/blog/publishing-github-event-data-with-github-actions-and-pages/
Mon, 04 Nov 2019 09:13:23 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/publishing-github-event-data-with-github-actions-and-pages/A guide to overcoming the GitHub event data horizon with a little command-line magic.
]]>
<p>Teams who work on GitHub rely on event data to collaborate. The data recorded as issues, pull requests, and comments, become vital to understanding the project.</p>
<p>With the general availability of GitHub Actions, we have a chance to programmatically access and preserve GitHub event data in our repository. Making the data part of the repository itself is a way of preserving it outside of GitHub, and also gives us the ability to feature the data on a front-facing website, such as with GitHub Pages, through an automated process that&rsquo;s part of our CI/CD pipeline.</p>
<p>And, if you&rsquo;re like me, you can turn <a href="https://github.com/victoriadrake/github-guestbook/issues/1">GitHub issue comments</a> into an <a href="https://victoria.dev/github-guestbook/">awesome 90s guestbook page</a>.</p>
<p>No matter the usage, the principle concepts are the same. We can use Actions to access, preserve, and display GitHub event data - with just one workflow file. To illustrate the process, I&rsquo;ll take you through the <a href="https://github.com/victoriadrake/github-guestbook/blob/master/.github/workflows/publish-comments.yml">workflow code</a> that makes my guestbook shine on.</p>
<p>For an introductory look at GitHub Actions including how workflows are triggered, see <a href="https://victoria.dev/blog/a-lightweight-tool-agnostic-ci/cd-flow-with-github-actions/">A lightweight, tool-agnostic CI/CD flow with GitHub Actions</a>.</p>
<h2 id="accessing-github-event-data">Accessing GitHub event data</h2>
<p>An Action workflow runs in an environment with some default environment variables. A lot of convenient information is available here, including event data. The most complete way to access the event data is using the <code>$GITHUB_EVENT_PATH</code> variable, the path of the file with the complete JSON event payload.</p>
<p>The expanded path looks like <code>/home/runner/work/_temp/_github_workflow/event.json</code> and its data corresponds to its webhook event. You can find the documentation for webhook event data in GitHub REST API <a href="https://developer.github.com/webhooks/#events">Event Types and Payloads</a>. To make the JSON data available in the workflow environment, you can use a tool like <code>jq</code> to parse the event data and put it in an environment variable.</p>
<p>Below, I grab the comment ID from an <a href="https://developer.github.com/v3/activity/events/types/#issuecommentevent">issue comment event</a>:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh"><span class="nv">ID</span><span class="o">=</span><span class="s2">&#34;</span><span class="k">$(</span>jq <span class="s1">&#39;.comment.id&#39;</span> <span class="nv">$GITHUB_EVENT_PATH</span><span class="k">)</span><span class="s2">&#34;</span>
</code></pre></div><p>Most event data is also available via the <a href="https://help.github.com/en/github/automating-your-workflow-with-github-actions/contexts-and-expression-syntax-for-github-actions#github-context"><code>github.event</code> context variable</a> without needing to parse JSON. The fields are accessed using dot notation, as in the example below where I grab the same comment ID:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh"><span class="nv">ID</span><span class="o">=</span><span class="si">${</span><span class="p">{ github.event.comment.id </span><span class="si">}</span><span class="o">}</span>
</code></pre></div><p>For my guestbook, I want to display entries with the user&rsquo;s handle, and the date and time. I can capture this event data like so:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh"><span class="nv">AUTHOR</span><span class="o">=</span><span class="si">${</span><span class="p">{ github.event.comment.user.login </span><span class="si">}</span><span class="o">}</span>
<span class="nv">DATE</span><span class="o">=</span><span class="si">${</span><span class="p">{ github.event.comment.created_at </span><span class="si">}</span><span class="o">}</span>
</code></pre></div><p>Shell variables are handy for accessing data, however, they&rsquo;re ephemeral. The workflow environment is created anew each run, and even shell variables set in one step do not persist to other steps. To persist the captured data, you have two options: use artifacts, or commit it to the repository.</p>
<h2 id="preserving-event-data-using-artifacts">Preserving event data: using artifacts</h2>
<p>Using artifacts, you can persist data between workflow jobs without committing it to your repository. This is handy when, for example, you wish to transform or incorporate the data before putting it somewhere more permanent. It&rsquo;s necessary to persist data between workflow jobs because:</p>
<blockquote>
<p>Each job in a workflow runs in a fresh instance of the virtual environment. When the job completes, the runner terminates and deletes the instance of the virtual environment. <em>(<a href="https://help.github.com/en/github/automating-your-workflow-with-github-actions/persisting-workflow-data-using-artifacts">Persisting workflow data using artifacts</a>)</em></p>
</blockquote>
<p>Two actions assist with using artifacts: <code>upload-artifact</code> and <code>download-artifact</code>. You can use these actions to make files available to other jobs in the same workflow. For a full example, see <a href="https://help.github.com/en/github/automating-your-workflow-with-github-actions/persisting-workflow-data-using-artifacts#passing-data-between-jobs-in-a-workflow">passing data between jobs in a workflow</a>.</p>
<p>The <code>upload-artifact</code> action&rsquo;s <code>action.yml</code> contains an <a href="https://github.com/actions/upload-artifact/blob/master/action.yml">explanation</a> of the keywords. The uploaded files are saved in <code>.zip</code> format. Another job in the same workflow run can use the <code>download-artifact</code> action to utilize the data in another step.</p>
<p>You can also manually download the archive on the workflow run page, under the repository&rsquo;s Actions tab.</p>
<p>Persisting workflow data between jobs does not make any changes to the repository files, as the artifacts generated live only in the workflow environment. Personally, being comfortable working in a shell environment, I see a narrow use case for artifacts, though I&rsquo;d have been remiss not to mention them. Besides passing data between jobs, they could be useful for creating <code>.zip</code> format archives of, say, test output data. In the case of my guestbook example, I simply ran all the necessary steps in one job, negating any need for passing data between jobs.</p>
<h2 id="preserving-event-data-pushing-workflow-files-to-the-repository">Preserving event data: pushing workflow files to the repository</h2>
<p>To preserve data captured in the workflow in the repository itself, it is necessary to add and push this data to the Git repository. You can do this in the workflow by creating new files with the data, or by appending data to existing files, using shell commands.</p>
<h3 id="creating-files-in-the-workflow">Creating files in the workflow</h3>
<p>To work with the repository files in the workflow, use the <a href="https://github.com/actions/checkout"><code>checkout</code> action</a> to first get a copy to work with:</p>
<div class="highlight"><pre class="chroma"><code class="language-yml" data-lang="yml">- <span class="k">uses</span><span class="p">:</span><span class="w"> </span>actions/checkout@master<span class="w">
</span><span class="w"> </span><span class="k">with</span><span class="p">:</span><span class="w">
</span><span class="w"> </span><span class="k">fetch-depth</span><span class="p">:</span><span class="w"> </span><span class="m">1</span><span class="w">
</span></code></pre></div><p>To add comments to my guestbook, I turn the event data captured in shell variables into proper files, using substitutions in <a href="https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html">shell parameter expansion</a> to sanitize user input and translate newlines to paragraphs. I wrote previously about <a href="https://victoria.dev/blog/sql-injection-and-xss-what-white-hat-hackers-know-about-trusting-user-input/">why user input should be treated carefully</a>.</p>
<div class="highlight"><pre class="chroma"><code class="language-yml" data-lang="yml">- <span class="k">name</span><span class="p">:</span><span class="w"> </span>Turn<span class="w"> </span>comment<span class="w"> </span>into<span class="w"> </span>file<span class="w">
</span><span class="w"> </span><span class="k">run</span><span class="p">:</span><span class="w"> </span><span class="sd">|
</span><span class="sd"> ID=${{ github.event.comment.id }}</span><span class="w">
</span><span class="w"> </span>AUTHOR=${{<span class="w"> </span>github.event.comment.user.login<span class="w"> </span>}}<span class="w">
</span><span class="w"> </span>DATE=${{<span class="w"> </span>github.event.comment.created_at<span class="w"> </span>}}<span class="w">
</span><span class="w"> </span>COMMENT=$(echo<span class="w"> </span><span class="s2">&#34;${{ github.event.comment.body }}&#34;</span>)<span class="w">
</span><span class="w"> </span>NO_TAGS=${COMMENT//<span class="p">[</span>&lt;&gt;<span class="p">]</span>/\`}<span class="w">
</span><span class="w"> </span>FOLDER=comments<span class="w">
</span><span class="w">
</span><span class="w"> </span>printf<span class="w"> </span><span class="s1">&#39;%b\n&#39;</span><span class="w"> </span><span class="s2">&#34;&lt;div class=\&#34;comment\&#34;&gt;&lt;p&gt;${AUTHOR} says:&lt;/p&gt;&lt;p&gt;${NO_TAGS//$&#39;\n&#39;/\&lt;\/p\&gt;\&lt;p\&gt;}&lt;/p&gt;&lt;p&gt;${DATE}&lt;/p&gt;&lt;/div&gt;\r\n&#34;</span><span class="w"> </span>&gt;<span class="w"> </span>${FOLDER}/${ID}.html<span class="w">
</span></code></pre></div><p>By using <code>printf</code> and directing its output with <code>&gt;</code> to a new file, the event data is transformed into an HTML file, named with the comment ID number, that contains the captured event data. Formatted, it looks like:</p>
<div class="highlight"><pre class="chroma"><code class="language-html" data-lang="html"><span class="p">&lt;</span><span class="nt">div</span> <span class="na">class</span><span class="o">=</span><span class="s">&#34;comment&#34;</span><span class="p">&gt;</span>
<span class="p">&lt;</span><span class="nt">p</span><span class="p">&gt;</span>victoriadrake says:<span class="p">&lt;/</span><span class="nt">p</span><span class="p">&gt;</span>
<span class="p">&lt;</span><span class="nt">p</span><span class="p">&gt;</span>This is a comment!<span class="p">&lt;/</span><span class="nt">p</span><span class="p">&gt;</span>
<span class="p">&lt;</span><span class="nt">p</span><span class="p">&gt;</span>2019-11-04T00:28:36Z<span class="p">&lt;/</span><span class="nt">p</span><span class="p">&gt;</span>
<span class="p">&lt;/</span><span class="nt">div</span><span class="p">&gt;</span>
</code></pre></div><p>When working with comments, one effect of naming files using the comment ID is that a new file with the same ID will overwrite the previous. This is handy for a guestbook, as it allows any edits to a comment to replace the original comment file.</p>
<p>If you&rsquo;re using a static site generator like Hugo, you could build a Markdown format file, stick it in your <code>content/</code> folder, and the regular site build will take care of the rest. In the case of my simplistic guestbook, I have an extra step to consolidate the individual comment files into a page. Each time it runs, it overwrites the existing <code>index.html</code> with the <code>header.html</code> portion (<code>&gt;</code>), then finds and appends (<code>&gt;&gt;</code>) all the comment files&rsquo; contents in descending order, and lastly appends the <code>footer.html</code> portion to end the page.</p>
<div class="highlight"><pre class="chroma"><code class="language-yml" data-lang="yml">- <span class="k">name</span><span class="p">:</span><span class="w"> </span>Assemble<span class="w"> </span>page<span class="w">
</span><span class="w"> </span><span class="k">run</span><span class="p">:</span><span class="w"> </span><span class="sd">|
</span><span class="sd"> cat header.html &gt; index.html</span><span class="w">
</span><span class="w"> </span>find<span class="w"> </span>comments/<span class="w"> </span>-name<span class="w"> </span><span class="s2">&#34;*.html&#34;</span><span class="w"> </span>|<span class="w"> </span>sort<span class="w"> </span>-r<span class="w"> </span>|<span class="w"> </span>xargs<span class="w"> </span>-I<span class="w"> </span>%<span class="w"> </span>cat<span class="w"> </span>%<span class="w"> </span>&gt;&gt;<span class="w"> </span>index.html<span class="w">
</span><span class="w"> </span>cat<span class="w"> </span>footer.html<span class="w"> </span>&gt;&gt;<span class="w"> </span>index.html<span class="w">
</span></code></pre></div><h3 id="committing-changes-to-the-repository">Committing changes to the repository</h3>
<p>Since the <code>checkout</code> action is not quite the same as cloning the repository, at time of writing, there are some <a href="https://github.community/t5/GitHub-Actions/Checkout-Action-does-not-create-local-master-and-has-no-options/td-p/31575">issues</a> still to work around. A couple extra steps are necessary to <code>pull</code>, <code>checkout</code>, and successfully <code>push</code> changes back to the <code>master</code> branch, but this is pretty trivially done in the shell.</p>
<p>Below is the step that adds, commits, and pushes changes made by the workflow back to the repository&rsquo;s <code>master</code> branch.</p>
<div class="highlight"><pre class="chroma"><code class="language-yml" data-lang="yml">- <span class="k">name</span><span class="p">:</span><span class="w"> </span>Push<span class="w"> </span>changes<span class="w"> </span>to<span class="w"> </span>repo<span class="w">
</span><span class="w"> </span><span class="k">run</span><span class="p">:</span><span class="w"> </span><span class="sd">|
</span><span class="sd"> REMOTE=https://${{ secrets.GITHUB_TOKEN }}@github.com/${{ github.repository }}</span><span class="w">
</span><span class="w"> </span>git<span class="w"> </span>config<span class="w"> </span>user.email<span class="w"> </span><span class="s2">&#34;${{ github.actor }}@users.noreply.github.com&#34;</span><span class="w">
</span><span class="w"> </span>git<span class="w"> </span>config<span class="w"> </span>user.name<span class="w"> </span><span class="s2">&#34;${{ github.actor }}&#34;</span><span class="w">
</span><span class="w">
</span><span class="w"> </span>git<span class="w"> </span>pull<span class="w"> </span>${REMOTE}<span class="w">
</span><span class="w"> </span>git<span class="w"> </span>checkout<span class="w"> </span>master<span class="w">
</span><span class="w"> </span>git<span class="w"> </span>add<span class="w"> </span>.<span class="w">
</span><span class="w"> </span>git<span class="w"> </span>status<span class="w">
</span><span class="w"> </span>git<span class="w"> </span>commit<span class="w"> </span>-am<span class="w"> </span><span class="s2">&#34;Add new comment&#34;</span><span class="w">
</span><span class="w"> </span>git<span class="w"> </span>push<span class="w"> </span>${REMOTE}<span class="w"> </span>master<span class="w">
</span></code></pre></div><p>The remote, in fact, our repository, is specified using the <code>github.repository</code> context variable. For our workflow to be allowed to push to master, we give the remote URL using <a href="https://help.github.com/en/github/automating-your-workflow-with-github-actions/authenticating-with-the-github_token">the default <code>secrets.GITHUB_TOKEN</code> variable</a>.</p>
<p>Since the workflow environment is shiny and newborn, we need to configure Git. In the above example, I&rsquo;ve used the <code>github.actor</code> context variable to input the username of the account initiating the workflow. The email is similarly configured using the <a href="https://help.github.com/en/github/setting-up-and-managing-your-github-user-account/setting-your-commit-email-address#setting-your-commit-email-address-on-github">default <code>noreply</code> GitHub email address</a>.</p>
<h2 id="displaying-event-data">Displaying event data</h2>
<p>If you&rsquo;re using GitHub Pages with the default <code>secrets.GITHUB_TOKEN</code> variable and without a site generator, pushing changes to the repository in the workflow will only update the repository files. The GitHub Pages build will fail with an error, &ldquo;Your site is having problems building: Page build failed.&rdquo;</p>
<p>To enable Actions to trigger a Pages site build, you&rsquo;ll need to create a Personal Access Token. This token can be <a href="https://help.github.com/en/github/automating-your-workflow-with-github-actions/creating-and-using-encrypted-secrets">stored as a secret in the repository</a> settings and passed into the workflow in place of the default <code>secrets.GITHUB_TOKEN</code> variable. I wrote more about <a href="https://victoria.dev/blog/a-lightweight-tool-agnostic-ci/cd-flow-with-github-actions/#environment-and-variables">Actions environment and variables in this post</a>.</p>
<p>With the use of a Personal Access Token, a push initiated by the Actions workflow will also update the Pages site. You can see it for yourself by <a href="https://github.com/victoriadrake/github-guestbook/issues/1">leaving a comment</a> in my <a href="https://victoria.dev/github-guestbook/">guestbook</a>! The comment creation event triggers the workflow, which then takes around 30 seconds to run and update the guestbook page.</p>
<p>Where a site build is necessary for changes to be published, such as when using Hugo, an Action can do this too. However, in order to avoid creating unintended loops, <a href="https://help.github.com/en/github/automating-your-workflow-with-github-actions/events-that-trigger-workflows#about-workflow-events">one Action workflow will not trigger another</a>. Instead, it&rsquo;s extremely convenient to handle the process of <a href="https://victoria.dev/blog/a-portable-makefile-for-continuous-delivery-with-hugo-and-github-pages/">building the site with a Makefile</a>, which any workflow can then run. Simply add running the Makefile as the final step in your workflow job, with the repository token where necessary:</p>
<div class="highlight"><pre class="chroma"><code class="language-yml" data-lang="yml">- <span class="k">name</span><span class="p">:</span><span class="w"> </span>Run<span class="w"> </span>Makefile<span class="w">
</span><span class="w"> </span><span class="k">env</span><span class="p">:</span><span class="w">
</span><span class="w"> </span><span class="k">TOKEN</span><span class="p">:</span><span class="w"> </span>${{<span class="w"> </span>secrets.GITHUB_TOKEN<span class="w"> </span>}}<span class="w">
</span><span class="w"> </span><span class="k">run</span><span class="p">:</span><span class="w"> </span>make<span class="w"> </span>all<span class="w">
</span></code></pre></div><p>This ensures that the final step of your workflow builds and deploys the updated site.</p>
<h2 id="no-more-event-data-horizon">No more event data horizon</h2>
<p>GitHub Actions provides a neat way to capture and utilize event data so that it&rsquo;s not only available within GitHub. The possibilities are only as limited as your imagination! Here are a few ideas for things this lets us create:</p>
<ol>
<li>A public-facing issues board, where customers without GitHub accounts can view and give feedback on project issues.</li>
<li>An automatically-updating RSS feed of new issues, comments, or PRs for any repository.</li>
<li>A comments system for static sites, utilizing GitHub issue comments as an input method.</li>
<li>An awesome 90s guestbook page.</li>
</ol>
<p>Did I mention I made a <a href="https://victoria.dev/github-guestbook/">90s guestbook page</a>? My inner-Geocities-nerd is a little excited.</p>
A lightweight, tool-agnostic CI/CD flow with GitHub Actionshttps://victoria.dev/blog/a-lightweight-tool-agnostic-ci/cd-flow-with-github-actions/
Mon, 28 Oct 2019 08:28:52 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/a-lightweight-tool-agnostic-ci/cd-flow-with-github-actions/How to take advantage of a simple GitHub Actions workflow without sacrificing agnostic tooling.
]]>
<p>Agnostic tooling is the clever notion that you should be able to run your code in various environments. With many continuous integration and continuous development (CI/CD) apps available, agnostic tooling gives developers a big advantage: portability.</p>
<p>Of course, having your CI/CD work <em>everywhere</em> is a tall order. Popular <a href="https://github.com/marketplace/category/continuous-integration">CI apps for GitHub repositories</a> alone use a multitude of configuration languages spanning <a href="https://groovy-lang.org/syntax.html">Groovy</a>, <a href="https://yaml.org/">YAML</a>, <a href="https://github.com/toml-lang/toml">TOML</a>, <a href="https://json.org/">JSON</a>, and more&hellip; all with differing syntax, of course. Porting workflows from one tool to another is more than a one-cup-of-coffee process.</p>
<p>The introduction of <a href="https://github.com/features/actions">GitHub Actions</a> has the potential to add yet another tool to the mix; or, for the right set up, greatly simplify a CI/CD workflow.</p>
<p>Prior to this article, I accomplished my CD flow with several lashed-together apps. I used AWS Lambda to trigger site builds on a schedule. I had Netlify build on push triggers, as well as run image optimization, and then push my site to the public Pages repository. I used Travis CI in the public repository to test the HTML. All this worked in conjunction with GitHub Pages, which actually hosts the site.</p>
<p>I&rsquo;m now using the GitHub Actions beta to accomplish all the same tasks, with one <a href="https://victoria.dev/blog/a-portable-makefile-for-continuous-delivery-with-hugo-and-github-pages/">portable Makefile</a> of build instructions, and without any other CI/CD apps.</p>
<h2 id="appreciating-the-shell">Appreciating the shell</h2>
<p>What do most CI/CD tools have in common? They run your workflow instructions in a shell environment! This is wonderful, because that means that most CI/CD tools can do anything that you can do in a terminal&hellip; and you can do pretty much <em>anything</em> in a terminal.</p>
<p>Especially for a contained use case like building my static site with a generator like Hugo, running it all in a shell is a no-brainer. To tell the magic box what to do, we just need to write instructions.</p>
<p>While a shell script is certainly the most portable option, I use the still-very-portable <a href="https://en.wikipedia.org/wiki/Make_(software)">Make</a> to write my process instructions. This provides me with some advantages over simple shell scripting, like the use of variables and <a href="https://en.wikipedia.org/wiki/Make_(software)#Macros">macros</a>, and the modularity of <a href="https://en.wikipedia.org/wiki/Makefile#Rules">rules</a>.</p>
<p>I got into the <a href="https://victoria.dev/blog/a-portable-makefile-for-continuous-delivery-with-hugo-and-github-pages/">nitty-gritty of my Makefile in my last post</a>. Let&rsquo;s look at how to get GitHub Actions to run it.</p>
<h2 id="using-a-makefile-with-github-actions">Using a Makefile with GitHub Actions</h2>
<p>To our point on portability, my magic Makefile is stored right in the repository root. Since it&rsquo;s included with the code, I can run the Makefile locally on any system where I can clone the repository, provided I set the environment variables. Using GitHub Actions as my CI/CD tool is as straightforward as making Make go worky-worky.</p>
<p>I found the <a href="https://help.github.com/en/articles/workflow-syntax-for-github-actions">GitHub Actions workflow syntax guide</a> to be pretty straightforward, though also lengthy on options. Here&rsquo;s the necessary set up for getting the Makefile to run.</p>
<p>The workflow file at <code>.github/workflow.yml</code> contains the following:</p>
<div class="highlight"><pre class="chroma"><code class="language-yml" data-lang="yml"><span class="k">name</span><span class="p">:</span><span class="w"> </span>make-master<span class="w">
</span><span class="w">
</span><span class="w"></span><span class="k">on</span><span class="p">:</span><span class="w">
</span><span class="w"> </span><span class="k">push</span><span class="p">:</span><span class="w">
</span><span class="w"> </span><span class="k">branches</span><span class="p">:</span><span class="w">
</span><span class="w"> </span>- master<span class="w">
</span><span class="w"> </span><span class="k">schedule</span><span class="p">:</span><span class="w">
</span><span class="w"> </span>- <span class="k">cron</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;20 13 * * *&#39;</span><span class="w">
</span><span class="w">
</span><span class="w"></span><span class="k">jobs</span><span class="p">:</span><span class="w">
</span><span class="w"> </span><span class="k">build</span><span class="p">:</span><span class="w">
</span><span class="w"> </span><span class="k">runs-on</span><span class="p">:</span><span class="w"> </span>ubuntu-latest<span class="w">
</span><span class="w"> </span><span class="k">steps</span><span class="p">:</span><span class="w">
</span><span class="w"> </span>- <span class="k">uses</span><span class="p">:</span><span class="w"> </span>actions/checkout@master<span class="w">
</span><span class="w"> </span><span class="k">with</span><span class="p">:</span><span class="w">
</span><span class="w"> </span><span class="k">fetch-depth</span><span class="p">:</span><span class="w"> </span><span class="m">1</span><span class="w">
</span><span class="w"> </span>- <span class="k">name</span><span class="p">:</span><span class="w"> </span>Run<span class="w"> </span>Makefile<span class="w">
</span><span class="w"> </span><span class="k">env</span><span class="p">:</span><span class="w">
</span><span class="w"> </span><span class="k">TOKEN</span><span class="p">:</span><span class="w"> </span>${{<span class="w"> </span>secrets.TOKEN<span class="w"> </span>}}<span class="w">
</span><span class="w"> </span><span class="k">run</span><span class="p">:</span><span class="w"> </span>make<span class="w"> </span>all<span class="w">
</span></code></pre></div><p>I&rsquo;ll explain the components that make this work.</p>
<h2 id="triggering-the-workflow">Triggering the workflow</h2>
<p>Actions support multiple <a href="https://help.github.com/en/articles/events-that-trigger-workflows">triggers for a workflow</a>. Using the <code>on</code> syntax, I&rsquo;ve defined two triggers for mine: a <a href="https://help.github.com/en/articles/workflow-syntax-for-github-actions#onpushpull_requestbranchestags">push event</a> to the <code>master</code> branch only, and a <a href="https://help.github.com/en/articles/events-that-trigger-workflows#scheduled-events-schedule">scheduled</a> <code>cron</code> job.</p>
<p>Once the <code>workflow.yml</code> file is in your repository, either of your triggers will cause Actions to run your Makefile. To see how the last run went, you can also <a href="https://help.github.com/en/articles/configuring-a-workflow#adding-a-workflow-status-badge-to-your-repository">add a fun badge</a> to the README.</p>
<h3 id="one-hacky-thing">One hacky thing</h3>
<p>Because the Makefile runs on every push to <code>master</code>, I sometimes would get errors when the site build had no changes. When Git, via <a href="https://victoria.dev/blog/a-portable-makefile-for-continuous-delivery-with-hugo-and-github-pages/">my Makefile</a>, attempted to commit to the Pages repository, no changes were detected and the commit would fail annoyingly:</p>
<div class="highlight"><pre class="chroma"><code class="language-txt" data-lang="txt">nothing to commit, working tree clean
On branch master
Your branch is up to date with &#39;origin/master&#39;.
nothing to commit, working tree clean
Makefile:62: recipe for target &#39;deploy&#39; failed
make: *** [deploy] Error 1
##[error]Process completed with exit code 2.
</code></pre></div><p>I came across some solutions that proposed using <code>diff</code> to check if a commit should be made, but this may not work for <a href="https://github.com/benmatselby/hugo-deploy-gh-pages/issues/4">reasons</a>. As a workaround, I simply added the <a href="https://gohugo.io/functions/format/#use-local-and-utc">current UTC time</a> to my index page so that every build would contain a change to be committed.</p>
<h2 id="environment-and-variables">Environment and variables</h2>
<p>You can define the <a href="https://help.github.com/en/github/automating-your-workflow-with-github-actions/virtual-environments-for-github-hosted-runners#supported-runners-and-hardware-resources">virtual environment</a> for your workflow to run in using the <code>runs-on</code> syntax. The <del>obvious best choice</del> one I chose is Ubuntu. Using <code>ubuntu-latest</code> gets me the most updated version, whatever that happens to be when you&rsquo;re reading this.</p>
<p>GitHub sets some <a href="https://help.github.com/en/github/automating-your-workflow-with-github-actions/using-environment-variables#default-environment-variables">default environment variables</a> for workflows. The <a href="https://github.com/actions/checkout"><code>actions/checkout</code> action</a> with <code>fetch-depth: 1</code> creates a copy of just the most recent commit your repository in the <code>GITHUB_WORKSPACE</code> variable. This allows the workflow to access the Makefile at <code>GITHUB_WORKSPACE/Makefile</code>. Without using the checkout action, the Makefile won&rsquo;t be found, and I get an error that looks like this:</p>
<div class="highlight"><pre class="chroma"><code class="language-txt" data-lang="txt">make: *** No rule to make target &#39;all&#39;. Stop.
Running Makefile
##[error]Process completed with exit code 2.
</code></pre></div><p>While there is a <a href="https://help.github.com/en/github/automating-your-workflow-with-github-actions/authenticating-with-the-github_token">default <code>GITHUB_TOKEN</code> secret</a>, this is not the one I used. The default is only locally scoped to the current repository. To be able to push to my separate GitHub Pages repository, I created a <a href="https://github.com/settings/tokens">personal access token</a> scoped to <code>public_repo</code> and pass it in as the <code>secrets.TOKEN</code> encrypted variable. For a step-by-step, see <a href="https://help.github.com/en/github/automating-your-workflow-with-github-actions/creating-and-using-encrypted-secrets">Creating and using encrypted secrets</a>.</p>
<h2 id="portable-tooling">Portable tooling</h2>
<p>The nice thing about using a simple Makefile to define the bulk of my CI/CD process is that it&rsquo;s completely portable. I can run a Makefile anywhere I have access to an environment, which is most CI/CD apps, virtual instances, and, of course, on my local machine.</p>
<p>One of the reasons I like GitHub Actions is that getting my Makefile to run was pretty straightforward. I think the syntax is well done - easy to read, and intuitive when it comes to finding an option you&rsquo;re looking for. For someone already using GitHub Pages, Actions provides a pretty seamless CD experience; and if that should ever change, I can run my Makefile elsewhere. ¯\_(ツ)_/¯</p>
A portable Makefile for continuous delivery with Hugo and GitHub Pageshttps://victoria.dev/blog/a-portable-makefile-for-continuous-delivery-with-hugo-and-github-pages/
Mon, 21 Oct 2019 09:09:06 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/a-portable-makefile-for-continuous-delivery-with-hugo-and-github-pages/My Makefile for building this site, optimizing images, and running my CI/CD GitHub Actions flow.
]]>
<p>Fun fact: I first launched this GitHub Pages site 1,018 days ago.</p>
<p>Since then, we&rsquo;ve grown together. From early cringe-worthy commit messages, through eighty-six versions of <a href="https://gohugo.io/">Hugo</a>, and up until last week, a less-than-streamlined multi-app continuous integration and deployment (CI/CD) workflow.</p>
<p>If you know me at all, you know I love to automate things. I&rsquo;ve been using a combination of AWS Lambda, Netlify, and Travis CI to automatically build and publish this site. My workflow for the task includes:</p>
<ul>
<li>Build with <a href="https://gohugo.io/">Hugo</a> on push to master, and on a schedule (Netlify and Lambda);</li>
<li>Optimize and resize images (Netlify);</li>
<li>Test with <a href="https://github.com/gjtorikian/html-proofer">HTMLProofer</a> (Travis CI); and</li>
<li>Deploy to my <a href="https://victoria.dev/blog/two-ways-to-deploy-a-public-github-pages-site-from-a-private-hugo-repository/">separate, public, GitHub Pages repository</a> (Netlify).</li>
</ul>
<p>Thanks to the introduction of GitHub Actions, I&rsquo;m able to do all the above with just one portable <a href="https://en.wikipedia.org/wiki/Makefile">Makefile</a>.</p>
<p>Next week I&rsquo;ll cover my Actions set up; today, I&rsquo;ll take you through the nitty-gritty of my Makefile so you can write your own.</p>
<h2 id="makefile-portability">Makefile portability</h2>
<p><a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/make.html">POSIX-standard-flavour Make</a> runs on every Unix-like system out there. <a href="https://en.wikipedia.org/wiki/Make_(software)#Derivatives">Make derivatives</a>, such as <a href="https://www.gnu.org/software/make/">GNU Make</a> and several flavours of BSD Make also run on Unix-like systems, though their particular use requires installing the respective program. To write a truly portable Makefile, mine follows the POSIX standard. (For a more thorough summation of POSIX-compatible Makefiles, I found this article helpful: <a href="https://nullprogram.com/blog/2017/08/20/">A Tutorial on Portable Makefiles</a>.) I run Ubuntu, so I&rsquo;ve tested the portability aspect using the BSD Make programs <code>bmake</code>, <code>pmake</code>, and <code>fmake</code>. Compatibility with non-Unix-like systems is a little more complicated, since shell commands differ. With derivatives such as Nmake, it&rsquo;s better to write a separate Makefile with appropriate Windows commands.</p>
<p>While much of my particular use case could be achieved with shell scripting, I find Make offers some worthwhile advantages. I enjoy the ease of using variables and <a href="https://en.wikipedia.org/wiki/Make_(software)#Macros">macros</a>, and the modularity of <a href="https://en.wikipedia.org/wiki/Makefile#Rules">rules</a> when it comes to organizing my steps.</p>
<p>The writing of rules mostly comes down to shell commands, which is the main reason Makefiles are as portable as they are. The best part is that you can do pretty much <em>anything</em> in a terminal, and certainly handle all the workflow steps listed above.</p>
<h2 id="my-continuous-deployment-makefile">My continuous deployment Makefile</h2>
<p>Here&rsquo;s the portable Makefile that handles my workflow. Yes, I put emojis in there. I&rsquo;m a monster.</p>
<div class="highlight"><pre class="chroma"><code class="language-Makefile" data-lang="Makefile"><span class="nf">.POSIX</span><span class="o">:</span>
<span class="nv">DESTDIR</span><span class="o">=</span>public
<span class="nv">HUGO_VERSION</span><span class="o">=</span>0.58.3
<span class="nv">OPTIMIZE</span> <span class="o">=</span> find <span class="k">$(</span>DESTDIR<span class="k">)</span> -not -path <span class="s2">&#34;*/static/*&#34;</span> <span class="se">\(</span> -name <span class="s1">&#39;*.png&#39;</span> -o -name <span class="s1">&#39;*.jpg&#39;</span> -o -name <span class="s1">&#39;*.jpeg&#39;</span> <span class="se">\)</span> -print0 <span class="p">|</span> <span class="se">\
</span><span class="se"></span><span class="err">xargs</span> <span class="err">-0</span> <span class="err">-P8</span> <span class="err">-n2</span> <span class="err">mogrify</span> <span class="err">-strip</span> <span class="err">-thumbnail</span> <span class="s1">&#39;1000&gt;&#39;</span>
<span class="nf">.PHONY</span><span class="o">:</span> <span class="n">all</span>
<span class="nf">all</span><span class="o">:</span> <span class="n">get_repository</span> <span class="n">clean</span> <span class="n">get</span> <span class="n">build</span> <span class="n">test</span> <span class="n">deploy</span>
<span class="nf">.PHONY</span><span class="o">:</span> <span class="n">get_repository</span>
<span class="nf">get_repository</span><span class="o">:</span>
@echo <span class="s2">&#34;🛎 Getting Pages repository&#34;</span>
git clone https://github.com/victoriadrake/victoriadrake.github.io.git <span class="k">$(</span>DESTDIR<span class="k">)</span>
<span class="nf">.PHONY</span><span class="o">:</span> <span class="n">clean</span>
<span class="nf">clean</span><span class="o">:</span>
@echo <span class="s2">&#34;🧹 Cleaning old build&#34;</span>
<span class="nb">cd</span> <span class="k">$(</span>DESTDIR<span class="k">)</span> <span class="o">&amp;&amp;</span> rm -rf *
<span class="nf">.PHONY</span><span class="o">:</span> <span class="n">get</span>
<span class="nf">get</span><span class="o">:</span>
@echo <span class="s2">&#34;❓ Checking for hugo&#34;</span>
@if ! <span class="o">[</span> -x <span class="s2">&#34;</span><span class="nv">$$</span><span class="s2">(command -v hugo)&#34;</span> <span class="o">]</span><span class="p">;</span> <span class="k">then</span><span class="se">\
</span><span class="se"></span> <span class="nb">echo</span> <span class="s2">&#34;🤵 Getting Hugo&#34;</span><span class="p">;</span><span class="se">\
</span><span class="se"></span> wget -q -P tmp/ https://github.com/gohugoio/hugo/releases/download/v<span class="k">$(</span>HUGO_VERSION<span class="k">)</span>/hugo_extended_<span class="k">$(</span>HUGO_VERSION<span class="k">)</span>_Linux-64bit.tar.gz<span class="p">;</span><span class="se">\
</span><span class="se"></span> tar xf tmp/hugo_extended_<span class="k">$(</span>HUGO_VERSION<span class="k">)</span>_Linux-64bit.tar.gz -C tmp/<span class="p">;</span><span class="se">\
</span><span class="se"></span> sudo mv -f tmp/hugo /usr/bin/<span class="p">;</span><span class="se">\
</span><span class="se"></span> rm -rf tmp/<span class="p">;</span><span class="se">\
</span><span class="se"></span> hugo version<span class="p">;</span><span class="se">\
</span><span class="se"></span> <span class="k">fi</span>
<span class="nf">.PHONY</span><span class="o">:</span> <span class="n">build</span>
<span class="nf">build</span><span class="o">:</span>
@echo <span class="s2">&#34;🍳 Generating site&#34;</span>
hugo --gc --minify -d <span class="k">$(</span>DESTDIR<span class="k">)</span>
@echo <span class="s2">&#34;🧂 Optimizing images&#34;</span>
<span class="k">$(</span>OPTIMIZE<span class="k">)</span>
<span class="nf">.PHONY</span><span class="o">:</span> <span class="n">test</span>
<span class="nf">test</span><span class="o">:</span>
@echo <span class="s2">&#34;🍜 Testing HTML&#34;</span>
docker run -v <span class="k">$(</span>GITHUB_WORKSPACE<span class="k">)</span>/<span class="k">$(</span>DESTDIR<span class="k">)</span>/:/mnt 18fgsa/html-proofer mnt --disable-external
<span class="nf">.PHONY</span><span class="o">:</span> <span class="n">deploy</span>
<span class="nf">deploy</span><span class="o">:</span>
@echo <span class="s2">&#34;🎁 Preparing commit&#34;</span>
@cd <span class="k">$(</span>DESTDIR<span class="k">)</span> <span class="se">\
</span><span class="se"></span> <span class="o">&amp;&amp;</span> git config user.email <span class="s2">&#34;hello@victoria.dev&#34;</span> <span class="se">\
</span><span class="se"></span> <span class="o">&amp;&amp;</span> git config user.name <span class="s2">&#34;Victoria via GitHub Actions&#34;</span> <span class="se">\
</span><span class="se"></span> <span class="o">&amp;&amp;</span> git add . <span class="se">\
</span><span class="se"></span> <span class="o">&amp;&amp;</span> git status <span class="se">\
</span><span class="se"></span> <span class="o">&amp;&amp;</span> git commit -m <span class="s2">&#34;🤖 CD bot is helping&#34;</span> <span class="se">\
</span><span class="se"></span> <span class="o">&amp;&amp;</span> git push -f -q https://<span class="k">$(</span>TOKEN<span class="k">)</span>@github.com/victoriadrake/victoriadrake.github.io.git master
@echo <span class="s2">&#34;🚀 Site is deployed!&#34;</span>
</code></pre></div><p>Sequentially, this workflow:</p>
<ol>
<li>Clones the public Pages repository;</li>
<li>Cleans (deletes) the previous build files;</li>
<li>Downloads and installs the specified version of Hugo, if Hugo is not already present;</li>
<li>Builds the site;</li>
<li>Optimizes images;</li>
<li>Tests the built site with HTMLProofer, and</li>
<li>Prepares a new commit and pushes to the public Pages repository.</li>
</ol>
<p>If you&rsquo;re familiar with command line, most of this may look familiar. Here are a couple bits that might warrant a little explanation.</p>
<h3 id="checking-if-a-program-is-already-installed">Checking if a program is already installed</h3>
<p>I think this bit is pretty tidy:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh"><span class="k">if</span> ! <span class="o">[</span> -x <span class="s2">&#34;</span><span class="nv">$$</span><span class="s2">(command -v hugo)&#34;</span> <span class="o">]</span><span class="p">;</span> <span class="k">then</span><span class="se">\
</span><span class="se"></span>...
<span class="k">fi</span>
</code></pre></div><p>I use a negated <code>if</code> conditional in conjunction with <code>command -v</code> to check if an executable (<code>-x</code>) called <code>hugo</code> exists. If one is not present, the script gets the specified version of Hugo and installs it. <a href="https://stackoverflow.com/a/677212">This Stack Overflow answer</a> has a nice summation of why <code>command -v</code> is a more portable choice than <code>which</code>.</p>
<h3 id="image-optimization">Image optimization</h3>
<p>My Makefile uses <code>mogrify</code> to batch resize and compress images in particular folders. It finds them automatically using the file extension, and only modifies images that are larger than the target size of 1000px in any dimension. I wrote more about the <a href="https://victoria.dev/blog/how-to-quickly-batch-resize-compress-and-convert-images-with-a-bash-one-liner/">batch-processing one-liner in this post</a>.</p>
<p>There are a few different ways to achieve this same task, one of which, theoretically, is to take advantage of Make&rsquo;s <a href="https://en.wikipedia.org/wiki/Make_(software)#Suffix_rules">suffix rules</a> to run commands only on image files. I find the shell script to be more readable.</p>
<h3 id="using-dockerized-htmlproofer">Using Dockerized HTMLProofer</h3>
<p>HTMLProofer is installed with <code>gem</code>, and uses Ruby and <a href="https://nokogiri.org/tutorials/ensuring_well_formed_markup.html">Nokogiri</a>, which adds up to a lot of installation time for a CI workflow. Thankfully, <a href="https://github.com/18F">18F</a> has a <a href="https://github.com/18F/html-proofer-docker">Dockerized version</a> that is much faster to implement. Its usage requires starting the container with the built site directory <a href="https://docs.docker.com/storage/volumes/#start-a-container-with-a-volume">mounted as a data volume</a>, which is easily achieved by appending to the <code>docker run</code> command.</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">docker run -v /absolute/path/to/site/:/mounted-site 18fgsa/html-proofer /mounted-site
</code></pre></div><p>In my Makefile, I specify the absolute site path using the <a href="https://help.github.com/en/articles/virtual-environments-for-github-actions#environment-variables">default environment variable</a> <code>GITHUB_WORKSPACE</code>. I&rsquo;ll dive into this and other GitHub Actions features in the next post.</p>
<p>In the meantime, happy Making!</p>
How to quickly batch resize, compress, and convert images with a Bash one-linerhttps://victoria.dev/blog/how-to-quickly-batch-resize-compress-and-convert-images-with-a-bash-one-liner/
Mon, 14 Oct 2019 08:27:49 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/how-to-quickly-batch-resize-compress-and-convert-images-with-a-bash-one-liner/A fast command line interface solution for batch image processing.
]]>
<p>Part of my Hugo site continuous deployment workflow is the processing of 210 images, at time of writing.</p>
<p>Here&rsquo;s my one-liner:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">find public/ -not -path <span class="s2">&#34;*/static/*&#34;</span> <span class="se">\(</span> -name <span class="s1">&#39;*.png&#39;</span> -o -name <span class="s1">&#39;*.jpg&#39;</span> -o -name <span class="s1">&#39;*.jpeg&#39;</span> <span class="se">\)</span> -print0 <span class="p">|</span> xargs -0 -P8 -n2 mogrify -strip -thumbnail <span class="s1">&#39;1000&gt;&#39;</span> -format jpg
</code></pre></div><p>I use <code>find</code> to target only certain image file formats in certain directories. With <a href="https://www.imagemagick.org/script/mogrify.php"><code>mogrify</code>, part of ImageMagick</a>, I resize only the images that are larger than a certain dimension, compress them, and strip the metadata. I tack on the <code>format</code> flag to create jpg copies of the images.</p>
<p>Here&rsquo;s the one-liner again (broken up for better reading):</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh"><span class="c1"># Look in the public/ directory</span>
find public/ <span class="se">\
</span><span class="se"></span><span class="c1"># Ignore directories called &#34;static&#34; regardless of location</span>
-not -path <span class="s2">&#34;*/static/*&#34;</span> <span class="se">\
</span><span class="se"></span><span class="c1"># Print the file paths of all files ending with any of these extensions</span>
<span class="se">\(</span> -name <span class="s1">&#39;*.png&#39;</span> -o -name <span class="s1">&#39;*.jpg&#39;</span> -o -name <span class="s1">&#39;*.jpeg&#39;</span> <span class="se">\)</span> -print0 <span class="se">\
</span><span class="se"></span><span class="c1"># Pipe the file paths to xargs and use 8 parallel workers to process 2 arguments</span>
<span class="p">|</span> xargs -0 -P8 -n2 <span class="se">\
</span><span class="se"></span><span class="c1"># Tell mogrify to strip metadata, and...</span>
mogrify -strip <span class="se">\
</span><span class="se"></span><span class="c1"># ...compress and resize any images larger than the target size (1000px in either dimension)</span>
-thumbnail <span class="s1">&#39;1000&gt;&#39;</span> <span class="se">\
</span><span class="se"></span><span class="c1"># Convert the files to jpg format</span>
-format jpg
</code></pre></div><p>That&rsquo;s it. That&rsquo;s the post.</p>
Personal cybersecurity posture for when you're just this guy, you know?https://victoria.dev/blog/personal-cybersecurity-posture-for-when-youre-just-this-guy-you-know/
Mon, 07 Oct 2019 08:30:12 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/personal-cybersecurity-posture-for-when-youre-just-this-guy-you-know/Security best practices for the average person.
]]>
<blockquote>
<p>&ldquo;Zaphod&rsquo;s just this guy, you know?&rdquo;</p>
<p><em>&ndash; Halfrunt, Hitchhiker&rsquo;s Guide to the Galaxy by Douglas Adams. The book, not the movie. Definitely not the movie.</em></p>
</blockquote>
<p>Some people (🙋🏻‍) are really into cybersecurity, end-to-end encryption, and totally geeked out when they first learned how the <a href="https://en.wikipedia.org/wiki/Enigma_machine">Enigma</a> worked. These people are likely to have an innate interest in building a less-than-laughable personal cybersecurity posture.</p>
<p>Most people, unfortunately, consider cybersecurity optional. Most people say things like:</p>
<p><em>&ldquo;There&rsquo;s no one targeting lil ol&rsquo; me.&quot;</em><br>
<em>&ldquo;I have nothing to hide, anyway.&quot;</em><br>
<em>&ldquo;I&rsquo;m too busy to learn all this stuff. Why can&rsquo;t someone just give me a simple summary of best practices that I can skim in approximately seven minutes?&quot;</em></p>
<p>To those people, I say, hello, hypothetical incorporeal reader! Here is a simple summary of best practices that you can skim in approximately seven minutes.</p>
<h2 id="wait-why-do-i-care">Wait why do I care</h2>
<p>You may have a hard time understanding why cybersecurity matters when you&rsquo;re just an average person. Sure, you don&rsquo;t want your devices hacked or your personal data stolen, but it&rsquo;s not like anyone is coming after <em>you</em>, specifically, right?</p>
<p>Hey Alex, I&rsquo;ll take &ldquo;right,&rdquo; for $400. It&rsquo;s unlikely anyone is attempting to steal your <em>particular</em> stuff, although I must admit that Persian rug of yours would really tie the room together. Instead, it can help to understand cybersecurity if you think of it in terms of low-hanging fruit.</p>
<p>You&rsquo;ve got some fruit, I&rsquo;ve got some fruit. Joe from down the block has a 1.21 gigawatt flux-capacitor-powered fruit-snatching robot. Joe doesn&rsquo;t know either of us exist, but his robot goes (very quickly) from door to door, all the way around the block, looking for fruit. If my front door is locked and yours is standing open, whose fruit is Joe&rsquo;s robot going to snatch?</p>
<p>If that sounds like boring, old, <em>regular</em> security, you&rsquo;re correct! Cybersecurity isn&rsquo;t about finding some magic spell that makes your fruit maximally secure. It&rsquo;s about making your fruit more secure than the fruit next to you. You do this by employing some thoughtful habits, in much the same way as you learned to lock your front door to guard against fruit-snatching robots.</p>
<p>Security breaches and incidents happen every day. Most of them occur because an automated scanner cast a wide net and found a person or company with lax security that a hacker could then exploit. Don&rsquo;t be that guy.</p>
<h2 id="wait-whats-a-security-posture-anyway">Wait what&rsquo;s a security posture anyway</h2>
<p>Here is how the National Institute of Standards and Technology defines security posture:</p>
<blockquote>
<p>The security status of an enterprise’s networks, information, and systems based on information assurance resources (e.g., people, hardware, software, policies) and capabilities in place to manage the defense of the enterprise and to react as the situation changes. <em>(<a href="https://csrc.nist.gov/publications/detail/sp/800-30/rev-1/final#pubs-topics">NIST Special Publication 800-30, B-11</a>)</em></p>
</blockquote>
<p>The important bit above is, <em>&ldquo;capabilities in place to manage the defense of the enterprise.&quot;</em> In the context of personal security, you are the enterprise. Congratulations. May you boldly go where no man has gone before.</p>
<p>Before you explore strange new worlds (it <em>is</em> the Internet, after all), there are steps you can take to manage your defenses. The word &ldquo;capabilities&rdquo; is apt, as having certain things in place will pretty much give you cybersecurity superpowers. Here are the three steps I consider most important and beneficial:</p>
<ol>
<li>Use multifactor authentication</li>
<li>Use a VPN</li>
<li>Develop healthy skepticism</li>
</ol>
<p>With these three keys in hand, your cybersecurity posture goes from being robot lunch to War Games - where the winning move for an attacker is not to play.</p>
<h2 id="1-use-multifactor-authentication">1. Use multifactor authentication</h2>
<p>Passwords are dead. Computationally, they are a solved problem, and cracking passwords is just <a href="https://howsecureismypassword.net/">a matter of time</a>. Unfortunately, many people still help to speed up the process by using the same <a href="https://haveibeenpwned.com/Passwords">compromised passwords</a> for multiple accounts, putting themselves at risk for inconceivable benefit. <a href="https://pages.nist.gov/800-63-3/sp800-63b.html#a2-length">Pass phrases</a> are longer and more complicated, and would take a lot more time to crack. I highly recommend them; even so, <a href="https://techcommunity.microsoft.com/t5/Azure-Active-Directory-Identity/Your-Pa-word-doesn-t-matter/ba-p/731984">your password ultimately doesn&rsquo;t matter</a>.</p>
<p>The answer, at least for now, is <a href="https://en.wikipedia.org/wiki/Multi-factor_authentication">multifactor authentication</a> (MFA). MFA is made up of three kinds of authentication factors:</p>
<ol>
<li>Something you know, like a pass phrase;</li>
<li>Something you have, like a chip pin card or phone; and</li>
<li>Something that you are, like your face or fingerprint.</li>
</ol>
<p><img src="mfa.png" alt="Also the name of my next beatboxing team."></p>
<p>Two or more of these factors are infinitely better than a password alone, especially if <a href="https://en.wikipedia.org/wiki/List_of_the_most_common_passwords">your password is on this list</a>.</p>
<p>Multiple authentication factors are now widely supported by account providers and social media sites. If you have the choice, avoid using text messages as a way of receiving authentication codes. SMS authentication leaves you vulnerable to the <a href="https://en.wikipedia.org/wiki/SIM_swap_scam">SIM swap attack</a> - please direct further questions to <a href="https://www.nytimes.com/2019/09/05/technology/sim-swap-jack-dorsey-hack.html">Jack Dorsey</a>. Instead, use an authenticator app like <a href="https://google-authenticator.com/">Google Authenticator</a> to generate codes on your device. This ensures that you alone, using that particular device, will have the correct authentication code. No power in the &lsquo;verse can stop you.</p>
<p>The Google Authenticator app works with the specific device you set it up on, so when you get a new device you will need to <a href="https://support.google.com/accounts/troubleshooter/4430955?hl=en#ts=4430956">move Google Authenticator to your new phone</a>. Hardware authentication keys such as the <a href="https://www.yubico.com/">YubiKey</a> may present less hassle when switching devices, but aren&rsquo;t yet as widely supported as authentication apps.</p>
<h2 id="2-use-a-vpn">2. Use a VPN</h2>
<p>The difference between using a VPN and not using one is like how The Dark Knight Rises was really good and Batman v Superman was really, really bad. Same franchise, totally different standards.</p>
<p>Let&rsquo;s say you send a lot of mail, but never bother to put your letters in envelopes or even fold them in half. Anyone who bothers to look will know that you&rsquo;re not really the Dread Pirate Roberts after all. When you use a Virtual Private Network, especially if you often connect to public WiFi, it&rsquo;s like putting your letters into cryptographically-sealed envelopes and sending them via a special invisible courier service. No one but the intended recipient can read your letters, and no one but you and the courier know to whom the letters are sent.</p>
<p><img src="vpnmail.png" alt="Encrypted mail still won&rsquo;t stop you from the accidental &ldquo;reply all&rdquo; unfortunately."></p>
<p>VPNs prevent others from reading your communications, like opportunistic attackers who scan open WiFi, and even your own Internet Service Provider (ISP) who may sell your usage data for advertising dollars.</p>
<p>Choosing a trustworthy VPN provider requires some research, and is in itself material enough for a separate article. As a starting point, look for providers with firm policies against logging, and expect to pay between $5-$10 USD monthly for the service. Avoid free VPN apps and services with ambiguous privacy policies; they&rsquo;ll typically cost you much more than you&rsquo;ll know.</p>
<h2 id="3-develop-healthy-skepticism">3. Develop healthy skepticism</h2>
<p>Ultimately, the weakest link in your cybersecurity defense is you. All the MFA and VPNs on the Internet won&rsquo;t protect you if a scam or malware bot can trick you into opening the front gates. Yes, I know it&rsquo;s a very nice looking wooden horse. Also free. Did you order it? No? Then it can stay outside.</p>
<p><img src="horse.png" alt="Always look a Trojan gift horse in the mouth."></p>
<p>Develop the habit of second-guessing things delivered to your virtual doorstep. Email, phone, and messaging scams range in sophistication, from rickety robot-assembled shotgun blasts to elaborate social engineering attacks that <a href="https://www.youtube.com/watch?v=8bAuA1isCz0">use cognitive biases very effectively</a>. Don&rsquo;t assume you&rsquo;re too clever for them; humans are very predictable creatures. After all, nobody expects the Spanish Inquisition.</p>
<p>Instead, ask questions. Double check communications that ask you to click on links or visit a website, even if they come from someone you know or a company you use. If you&rsquo;re not certain, based on a previous in-person interaction, that your friend or bank or mother sent this email, pick up the phone and call them. Even if you think you are certain, pick up the phone and check. You don&rsquo;t call your mother enough, anyway.</p>
<p>Oh, and if the person on the phone is from your local tax office or the IRS or the CRA and they&rsquo;re about to freeze your accounts because a case of mistaken identity has resulted in you being criminally charged for not repaying a loan on a 600-foot yacht in Malibu, just hang up. You know better than that. Tax agencies don&rsquo;t have phones.</p>
<h2 id="your-personal-cybersecurity-starter-pack">Your personal cybersecurity starter pack</h2>
<p>You now have three keys to open three gates to a robust personal cybersecurity posture. If those keys have also unlocked your curiosity, there&rsquo;s plenty more rabbit hole to go down. I highly recommend the <a href="https://securityinfive.com/">Security in Five podcast</a> for Binary Blogger&rsquo;s great advice, which inspired much of this post. <a href="https://ssd.eff.org/">Surveillance Self Defense</a> offers the Electronic Frontier Foundation&rsquo;s tips on securing online communication. Troy Hunt also has a YouTube series entitled <a href="https://www.troyhunt.com/get-to-grips-with-internet-security-basics-courtesy-of-varonis/">Internet Security Basics</a> that goes into more depth on how to protect yourself online.</p>
<p>For now, I hope you use your newfound cybersecurity powers for good. Mind what you have learned. Save you it can.</p>
Secure application architecture basics: separation, configuration, and accesshttps://victoria.dev/blog/secure-application-architecture-basics-separation-configuration-and-access/
Mon, 30 Sep 2019 08:03:12 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/secure-application-architecture-basics-separation-configuration-and-access/A starting point for building secure application architecture, for busy developers.
]]>
<p>Software developers today are encouraged to focus on building, and that&rsquo;s a great thing. We&rsquo;re benefitting from maker culture, an attitude of &ldquo;always be shipping,&rdquo; open source collaboration, and a bevy of apps that help us prioritize and execute with maximum efficiency. We&rsquo;re in an environment of constant creation, where both teams and solo entrepreneurs can be maximally productive.</p>
<p>Sometimes, this breakneck-speed productivity shows its downsides.</p>
<p>As I learn more about security best practices, I can&rsquo;t help but see more and more applications that just don&rsquo;t have a clue. A lack of awareness of security seems to lead to a lack of prioritization of tasks that don&rsquo;t directly support bringing the product to launch. The market seems to have made it more important to launch a usable product than a secure one, with the prevailing attitude being, &ldquo;we can do the security stuff later.&rdquo;</p>
<p>Cobbling together a foundation based more on expediency than longevity is a bad way to build applications and a great way to build security debt. Security debt, like technical debt, amasses when developers make (usually hasty) decisions that can make it more difficult to secure the application later on. If you&rsquo;re familiar with the concept of &ldquo;pushing left&rdquo; (or if you read my <a href="https://victoria.dev/blog/hackers-are-googling-your-plain-text-passwords-preventing-sensitive-data-exposure/">article about sensitive data exposure</a>), you&rsquo;ll know that when it comes to security, sometimes there isn&rsquo;t a version of &ldquo;later&rdquo; that isn&rsquo;t <em>too</em> late. It&rsquo;s a shame, especially since following some basic security practices with high benefit yield early on in the development process doesn&rsquo;t take significantly more time than <em>not</em> following them. Often, it comes down to having some basic but important knowledge that enables making the more secure decision.</p>
<p>While application architecture specifics vary, there are a few basic principles we can commonly apply. This article will provide a high-level overview of areas that I hope will help point developers in the right direction.</p>
<p>There must be a reason we call it application &ldquo;architecture.&rdquo; I like to think it&rsquo;s because the architecture of software is similar in some basic ways to the architecture of a building. (Or at least, in my absolute zero building-creation expertise, how I imagine a pretty utilitarian building to be.) Here&rsquo;s how I like to summarize three basic points of building secure application architecture:</p>
<ol>
<li>Separated storage</li>
<li>Customized configuration</li>
<li>Controlled access and user scope</li>
</ol>
<p>This is only a jumping-off point meant to get us started on the right foot; a complete picture of a fully-realized application&rsquo;s security posture includes areas outside the scope of this post, including authentication, logging and monitoring, integration, and sometimes compliance.</p>
<h2 id="1-separated-storage">1. Separated storage</h2>
<p>From a security standpoint, the concept of separation refers to storing files that serve different purposes in different places. When we&rsquo;re constructing our building and deciding where all the rooms go, we similarly create the lobby on the ground floor and place administrative offices on higher floors, perhaps off the main path. While both are rooms, we understand that they serve different purposes, have different functional needs, and possibly very different security requirements.</p>
<p><img src="separation.png" alt="Separation of building floors"></p>
<p>When it comes to our files, the benefit is perhaps easiest for us to understand if we consider a simple file structure:</p>
<div class="highlight"><pre class="chroma"><code class="language-txt" data-lang="txt">application/
├───html/
│ └───index.html
├───assets/
│ ├───images/
│ │ ├───rainbows.jpg
│ │ └───unicorns.jpg
│ └───style.css
└───super-secret-configurations/
└───master-keys.txt
</code></pre></div><p>In our simplified example, let&rsquo;s say that all our application&rsquo;s images are stored in the <code>application/assets/images/</code> directory. When one of our users creates a profile and uploads their picture to it, this picture is also stored in this folder. Makes sense, right? It&rsquo;s an image, and that&rsquo;s where the images go. What&rsquo;s the issue?</p>
<p>If you&rsquo;re familiar with navigating a file structure in a terminal, you may have seen this syntax before: <code>../../</code>. The two dots are a handy way of saying, &ldquo;go up one directory.&rdquo; If we execute the command <code>cd ../../</code> in the <code>images/</code> directory of our simple file structure above, we&rsquo;d go up into <code>assets/</code>, then up again to the root directory, <code>application/</code>. This is a problem because of a wee little vulnerability dubbed <a href="https://cwe.mitre.org/data/definitions/22.html">path traversal</a>.</p>
<p>While the dot syntax saves us some typing, it also introduces the interesting advantage of not actually needing to know what the parent directory is called in order to go to it. Consider an attack payload script, delivered into the <code>images/</code> folder of our insecure application, that went up one directory using <code>cd ../</code> and then sent everything it found to the attacker, on repeat. Eventually, it would reach the root application directory and access the <code>super-secret-configurations/</code> folder. Not good.</p>
<p>While other measures should be in place to prevent path traversal and related user upload vulnerabilities, the simplest prevention by far is a separation of storage. Core application files and assets should not be combined with other data, and especially not with <a href="https://victoria.dev/blog/sql-injection-and-xss-what-white-hat-hackers-know-about-trusting-user-input/">user input</a>. It&rsquo;s best to keep user-uploaded files and activity logs (which may contain juicy data and can be vulnerable to injection attacks) separate from the main application.</p>
<p>Separation can be achieved in a few ways, such as by using a different server, different instance, separate IP range, or separate domain.</p>
<h2 id="2-customized-configuration">2. Customized configuration</h2>
<p>While wasting time on customization can hinder productivity, one area that we definitely want to customize is configuration settings. <a href="https://github.com/OWASP/Top10/blob/cb5f8967bba106e14a350761ac4f93b8aec7f8fa/2017/en/0xa6-security-misconfiguration.md">Security misconfiguration</a> is listed in the OWASP Top 10. A significant number of security incidents occur because a server, firewall, or administrative account is running in production with default settings. Upon the opening of our new building, we&rsquo;d hopefully be more careful to ensure we haven&rsquo;t left any keys in the locks.</p>
<p><img src="defaultkey.png" alt="Three keys"></p>
<p>Usually, the victims of attacks related to default settings aren&rsquo;t specifically targeted. Rather, they are found by automated scanning tools that attackers run over many possible targets, effectively prodding at many different systems to see if any roll over and expose some useful exploit. The automated nature of this attack means that it&rsquo;s important for us to review settings for every piece of our architecture. Even if an individual piece doesn&rsquo;t seem significant, it may provide a vulnerability that allows an attacker to use it as a gateway to our larger application.</p>
<p>In particular, examine architecture components for unattended areas such as:</p>
<ul>
<li>Default accounts, especially with default passwords, left in service;</li>
<li>Example web pages, tutorial applications, or sample data left in the application;</li>
<li>Unnecessary ports left in service, or ports left open to the Internet;</li>
<li>Unrestricted permitted HTTP methods;</li>
<li>Sensitive information stored in automated logs;</li>
<li>Default configured permissions in managed services; and,</li>
<li>Directory listings, or sensitive file types, left accessible by default.</li>
</ul>
<p>This list isn&rsquo;t exhaustive. Specific architecture components, such as cloud storage or web servers, will have other configurable features that should be reviewed. In general, reduce the application&rsquo;s attack surface by using minimal architecture components. If we use fewer components or don&rsquo;t install modules we don&rsquo;t need, we&rsquo;ll have fewer possible attack entry points to configure and safeguard.</p>
<h2 id="3-controlled-access-and-user-scope">3. Controlled access and user scope</h2>
<p>One of the more difficult security problems to test in an application is misconfigured access control. Automated testing tools have limited capability to find areas of an application that one user shouldn&rsquo;t be able to access. Thus, this is often left to manual testing or source code review to discover. By considering this vulnerability early on in the software development lifecycle when architectural decisions are being made, we reduce the risk that it becomes a problem that&rsquo;s harder to fix later. After all, we wouldn&rsquo;t simply leave our master keys out of reach on a high ledge and hope no one comes along with a ladder.</p>
<p><img src="access.png" alt="A cartoon of a user attempting to elevate privilege"></p>
<p><a href="https://github.com/OWASP/Top10/blob/master/2017/en/0xa5-broken-access-control.md">Broken access control</a> is listed in the OWASP Top 10, which goes into more detail on its various forms. As a simple example, consider an application with two levels of access: administrators, and users. We want to build a new feature - the ability to moderate or ban users - with the intention that only administrators would be allowed to use it.</p>
<p>If we&rsquo;re aware of the possibility of access control misconfigurations or exploits, we may decide to build the moderation feature in a completely separate area from the user-accessible space, such as on a different domain, or as part of a model that users don&rsquo;t share. This greatly reduces the risk that an access control misconfiguration or elevation of privilege vulnerability might allow a user to improperly access the moderation feature later on.</p>
<p>Of course, robust access control in our application needs more support to be effective. Consider factors such as sensitive tokens, or keys passed as URL parameters, or whether a control fails securely or insecurely. Nevertheless, by considering authorization at the architectural stage, we can set ourselves up to make further reinforcements easier to implement.</p>
<h2 id="security-basics-for-maximum-benefit">Security basics for maximum benefit</h2>
<p>Similar to avoiding racking up technical debt by choosing a well-vetted framework, developers can avoid security debt by becoming aware of common vulnerabilities and the simple architectural decisions we can make to help mitigate them. For a much more detailed resource on how to bake security into our applications from the start, the <a href="https://github.com/OWASP/ASVS">OWASP Application Security Verification Standard</a> is a robust guide.</p>
Migrating to the cloud but without screwing it up, or how to move househttps://victoria.dev/blog/migrating-to-the-cloud-but-without-screwing-it-up-or-how-to-move-house/
Mon, 23 Sep 2019 08:03:12 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/migrating-to-the-cloud-but-without-screwing-it-up-or-how-to-move-house/A practical guide to moving to cloud services with minimal downtime, using AWS examples.
]]>
<p>For an application that&rsquo;s ready to scale, not using managed cloud architecture these days is like insisting on digging your own well for water. It&rsquo;s far more labour-intensive, requires buying all your own equipment, takes a lot more time, and there&rsquo;s a higher chance you&rsquo;re going to get it wrong because you don&rsquo;t personally have a whole lot of experience digging wells, anyway.</p>
<p>That said - let&rsquo;s just get this out of the way first - there is no cloud. It&rsquo;s just someone else&rsquo;s computer.</p>
<p>Of course, these days, cloud services go far beyond the utility we&rsquo;d expect from a single computer. Besides being able to quickly set up and utilize the kind of computing power that previously required a new office lease agreement to house, there are now a multitude of monitoring, management, and analysis tools at our giddy fingertips. While it&rsquo;s important to understand that the cloud isn&rsquo;t a better option in every case, for applications that can take advantage of it, we can do more, do it faster, and do it for less money than if we were to insist on building our own on-premises infrastructure.</p>
<p>That&rsquo;s all great, and easily said; moving to the cloud, however, can look from the outset like a pretty daunting task. How, exactly, do we go about shifting what may be years of on-premises data and built-up systems to <em>someone else&rsquo;s computer?</em> You know, without being able to see it, touch it, and without completely screwing up our stuff.</p>
<p>While it probably takes less work and money than setting up or maintaining the same architecture on-premise, it does take some work to move to the cloud initially. It&rsquo;s important that our application is prepared to migrate, and capable of using the benefits of cloud services once it gets there. To accomplish this, and a smooth transition, preparation is key. In fact, it&rsquo;s a whole lot like moving to a new house.</p>
<p>In this article, we&rsquo;ll take a high-level look at the general stages of taking an on-premise or self-hosted application and moving it to the cloud. This guide is meant to serve as a starting point for designing the appropriate process for your particular situation, and to enable you to better understand the cloud migration process. While cloud migration may not be the best choice for some applications - such as ones without scalable architecture or where very high computing resources are needed - a majority of modular and modern applications stand to benefit from a move to the cloud.</p>
<p>It&rsquo;s certainly possible, as I discovered at a recent event put on by <a href="https://aws.amazon.com/">Amazon Web Services</a> (AWS) Solutions Architects, to migrate smoothly and efficiently, with near-zero loss of availability to customers. I&rsquo;ll specifically reference some services provided by AWS, however, similar functionality can be found with other cloud providers. I&rsquo;ve found the offerings from AWS to be pleasantly modular in scope, which is why I use them myself and why they make good examples for discussing general concepts.</p>
<p>To have our move go as smoothly as possible, here are the things we&rsquo;ll want to consider:</p>
<ol>
<li>The type of move we&rsquo;re making;</li>
<li>The things we&rsquo;ll take, and the things we&rsquo;ll clean up;</li>
<li>How to choose the right type and size for the infrastructure we&rsquo;re moving into; and</li>
<li>How to do test runs to practice for the big day.</li>
</ol>
<h2 id="the-type-of-move-were-making">The type of move we&rsquo;re making</h2>
<p>While it&rsquo;s important to understand why we&rsquo;re moving our application to cloud services, we should also have an idea of what we&rsquo;d like it to look like when it gets there. There are three main ways to move to the cloud: re-host, re-platform, or re-factor.</p>
<h3 id="re-host">Re-host</h3>
<p>A re-host scenario is the the most straightforward type of move. It involves no change to the way our application is built or how it runs. For example, if we currently have Python code, use PostgreSQL, and serve our application with Apache, a re-host move would mean we use all the same components, combined in just the same way, only now they&rsquo;re in the cloud. It&rsquo;s a lot like moving into a new house that has the exact same floor plan as the current one. All the furniture goes into the same room it&rsquo;s in now, and it&rsquo;s going to feel pretty familiar when we get there.</p>
<p>The main draw of a re-host move is that it may offer the least amount of complication necessary in order to take advantage of going to the cloud. Scalable applications, for example, can gain the ability to automatically manage necessary application resources.</p>
<p>While re-hosting makes scaling more automatic, it&rsquo;s important to note that it won&rsquo;t in itself make an application scalable. If the application infrastructure is not organized in such a way that gives it the ability to scale, a re-factor may be necessary instead.</p>
<h3 id="re-platform">Re-platform</h3>
<p>If a component of our current application set up isn&rsquo;t working out well for us, we&rsquo;re probably going to want to re-platform. In this case, we&rsquo;re making a change to at least one component of our architecture; for example, switching our database from Oracle to MySQL on <a href="https://aws.amazon.com/rds/">Amazon Relational Database Service</a> (RDS).</p>
<p>Like moving from a small apartment in Tokyo to an equally small apartment in New York, a re-platform doesn&rsquo;t change the basic nature of our application, but does change its appearance and environment. In the database change example, we&rsquo;ll have all the same data, just organized or formatted a little differently. In most cases, we won&rsquo;t have to make these changes manually. A tool such as <a href="https://aws.amazon.com/dms/">Amazon Database Migration Service</a> (DMS) can help to seamlessly shift our data over to the new database.</p>
<p>We might re-platform in order to enable us to better meet a business demand in the future, such as scaling up, integrating with other technological components, or choosing a more modern technology stack.</p>
<h3 id="re-factor">Re-factor</h3>
<p>A move in which we re-factor our application is necessarily more complicated than our other options, however, it may provide the most overall benefit for companies or applications that have reason to make this type of move. As with code, refactoring is done when fundamental changes need to be made in order for our application to meet a business need. The specifics necessarily differ case-by-case, but typically involve changes to architectural components or how those components relate to one another. This type of move may also involve changing application code in order to optimize the application&rsquo;s performance in a cloud environment. We can think of it like moving out from our parent&rsquo;s basement in the suburbs and getting a nice townhouse in the city. There&rsquo;s no way we&rsquo;re taking that ancient hand-me-down sofa, so we&rsquo;ll need some new furniture, and for our neighbour&rsquo;s sake, probably window dressings.</p>
<p>Refactoring may enable us to modernize a dated application, or make it more efficient in general. With greater efficiency, we can better take advantage of services that cloud providers typically offer, like bursting resources or attaining deep analytical insight.</p>
<p>If a re-factor is necessary but time is scarce, it may be better to re-host or re-platform first, then re-factor later. That way, we&rsquo;ll have a job well done later instead of a hasty, botched migration (and more problems) sooner.</p>
<h2 id="what-to-take-and-what-to-clean-up">What to take, and what to clean up</h2>
<p>Over the years of living in one place, stuff tends to pile up unnoticed in nooks and crannies. When moving house, it&rsquo;s usually a great opportunity to sort everything out and decide what is useful enough to keep, and what should be discarded or given away. Moving to the cloud is a similarly great opportunity to do the same when it comes to our application.</p>
<p>While cloud storage is inexpensive nowadays, there may be some things that don&rsquo;t make sense to store any longer, or at least not keep stored with our primary application. If data cannot be discarded due to policy or regulations, we may choose a different storage class to house data that we don&rsquo;t expect to need anytime soon outside of our main application.</p>
<p>In the case of <a href="https://aws.amazon.com/s3/">Amazon&rsquo;s Simple Storage Service</a> (S3), we can choose to use different <a href="https://aws.amazon.com/s3/storage-classes/">storage classes</a> that accomplish this goal. While the data that our business relies on every day can take advantage of the Standard class 99.99% availability, data meant for long-term cold storage such as archival backups can be put into the Glacier class, which has longer retrieval time and lower cost.</p>
<h2 id="the-right-type-and-size">The right type and size</h2>
<p>Choosing the type and size of cloud infrastructure appropriate for our business is usually the part that can be the most confusing. How should we predict, in a new environment or for a growing company, the computing power we&rsquo;ll need?</p>
<p>Part of the beauty of not procuring hardware on our own is that won&rsquo;t have to make predictions like these. Using cloud storage and instances, expanding or scaling back resources can be done in a matter of minutes, sometimes seconds. With managed services, it can even be done automatically for us. With the proper support for scalability in our application, it&rsquo;s like having a magical house that instantly generates any type of room and amenity we need at that moment. The ability to continually ensure that we&rsquo;re using appropriate, cost-effective resources is at our fingertips, and often clearly visualized in charts and dashboards.</p>
<p>For applications new to the cloud, some leeway for experimentation may be necessary. While cloud services enables us to quickly spin up and try out different architectures, there&rsquo;s no guarantee that all of those set ups will work well for our application. For example, running a single instance may be <a href="http://einaregilsson.com/serverless-15-percent-slower-and-eight-times-more-expensive/">less expensive than going serverless</a>, but we&rsquo;d be hard pressed to know this until we tried it out.</p>
<p>As a starting point, we simply need enough storage and computing power to support the application as it is currently running, today. For example, in the case of storage, consider the size of the current database - the actual database data, not the total storage capacity of hardware on-premises. For a detailed cost exploration, AWS even offers a <a href="https://calculator.s3.amazonaws.com/index.html">Simple Monthly Calculator</a> with use case samples to help guide expectations.</p>
<h2 id="do-test-runs-before-the-big-day">Do test runs before the big day</h2>
<p>Running a trial cloud migration may be an odd concept, but it is an essential component to ensuring that the move goes as planned with minimal service interruption. Imagine the time and energy that would be saved in the moving house example if we could automate test runs! Invariably, some box or still-hung picture is forgotten and left out of the main truck, necessitating additional trips in other vehicles. With multiple chances to ensure we&rsquo;ve got it down pat, we minimize the possibility that our move causes any break in normal day-to-day business.</p>
<p>Generally, to do a test run, we create a duplicate version of our application. The more we can duplicate, the more thorough the test run will be, especially if our data is especially large. Though duplication may seem tedious, working with the actual components we intend to migrate is essential to ensuring the migration goes as planned. After all, if we only did a moving-house test run with one box, it wouldn&rsquo;t be very representative.</p>
<p>Test runs can help to validate our migration plan against any challenges we may encounter. These challenges might include:</p>
<ul>
<li>Downtime restrictions;</li>
<li>Encrypting data in transit and immediately when at rest on the target;</li>
<li>Schema conversion to a new target schema (the <a href="https://aws.amazon.com/dms/schema-conversion-tool/">AWS Schema Conversion Tool</a> can also help);</li>
<li>Access to databases, such as through firewalls or VPNs;</li>
<li>Developing a process to ensure that all the data successfully migrated, such as by using a hash function.</li>
</ul>
<p>Test runs also help to give us a more accurate picture of the overall time that a migration will take, as well as affording us the opportunity to fine-tune it. Factors that may affect the overall speed of a migration include:</p>
<ul>
<li>The sizes of the source and target instances;</li>
<li>Available bandwidth for moving data;</li>
<li>Schema configurations; and</li>
<li>Transaction pressure on the source, such as changes to the data and the volume of incoming transactions.</li>
</ul>
<p>Once the duplicate application has been migrated via one or more <a href="https://aws.amazon.com/cloud-data-migration/">options</a>, we test the heck out of the application that&rsquo;s now running in the cloud to ensure it performs as expected. Ideally, on the big day, we&rsquo;d follow this same general process to move up-to-date duplicate data, and then seamlessly point the &ldquo;real&rdquo; application or web address to the new location in the cloud. This means that our customers experience near-zero downtime; essentially, only the amount of time that the change in location-pointing would need to propagate to their device.</p>
<p>In the case of very large or complex applications with many components or many teams working together at the same time, a more gradual approach may be more appropriate than the &ldquo;Big Bang&rdquo; approach, and may help to mitigate risk of any interruptions. This means migrating in stages, component by component, and running tests between stages to ensure that all parts of the application are communicating with each other as expected.</p>
<h2 id="preparation-is-essential-to-a-smooth-migration">Preparation is essential to a smooth migration</h2>
<p>I hope this article has enabled a more practical understanding of how cloud migration can be achieved. With thorough preparation, it&rsquo;s possible to take advantage of all the cloud has to offer, with minimal hassle to get there.</p>
<p>My thanks to the AWS Solutions Architects who presented at Pop-Up Loft and shared their knowledge on these topics, in particular: Chandra Kapireddy, Stephen Moon, John Franklin, Michael Alpaugh, and Priyanka Mahankali.</p>
<p>One last nugget of wisdom, courtesy of John: &ldquo;Friends don&rsquo;t let friends use DMS to create schema objects.&rdquo;</p>
How users and applications stay safe on the Internet: it's proxy servers all the way downhttps://victoria.dev/blog/how-users-and-applications-stay-safe-on-the-internet-its-proxy-servers-all-the-way-down/
Mon, 16 Sep 2019 09:35:28 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/how-users-and-applications-stay-safe-on-the-internet-its-proxy-servers-all-the-way-down/An overview of how proxy servers form the basis of online anonymity, and how their use in various forms helps both users and web applications.
]]>
<p>Both Internet users and Internet-connected applications can benefit from investing in cybersecurity. One core aspect of online privacy is the use of a proxy server, though this basic building block may not be initially visible underneath its more recognizable forms. Proxy servers are a useful thing to know about nowadays, for developers, software product owners, as well as the average dog on the Internet. Let&rsquo;s explore what makes proxy servers an important piece of cybersecurity support.</p>
<blockquote>
<p>&ldquo;On the Internet, nobody knows you&rsquo;re a dog.&rdquo;</p>
</blockquote>
<p>When <a href="https://en.wikipedia.org/wiki/On_the_Internet,_nobody_knows_you%27re_a_dog">Peter Steiner&rsquo;s caption</a> was first published in The New Yorker in 1993, it reportedly went largely unnoticed. Only later did the ominous and slightly frightening allusion to online anonymity touch the public consciousness with the icy fingers of the unknown. As Internet usage became more popular, users became concerned that other people could represent themselves online in any manner they chose, without anyone else knowing who they truly were.</p>
<p>This, to make a gross understatement, is no longer the case. Thanks to <a href="https://support.mozilla.org/en-US/kb/enable-and-disable-cookies-website-preferences">tracking cookies</a>, <a href="https://robertheaton.com/2017/10/17/we-see-you-democratizing-de-anonymization/">browser fingerprinting</a>, <a href="https://www.privacypolicies.com/blog/isp-tracking-you/">Internet Service Providers (ISPs) selling our browsing logs to advertisers</a>, and our own inexplicable inclination to put our names and faces on social networks, online anonymity is out like last year&rsquo;s LaCroix flavours. While your next-door neighbor may not know how to find you online (well, except for through that location-based secondhand marketplace app you&rsquo;re using), you can be certain that at least one large advertising company has a series of zeroes and ones somewhere that represent you, the specific details of your market demographic, and all your online habits, including your preferred flavour of LaCroix.</p>
<p>There are ways to add <em>some</em> layers of obscurity, like using a corporate firewall that hides your IP, or <a href="https://www.torproject.org/">using Tor</a>. The underlying mechanism of both these methods is the same. Like being enshrouded in the layers of an onion, we&rsquo;re using one or more <a href="https://en.wikipedia.org/wiki/Proxy_server">proxy servers</a> to shield our slightly sulfuric selves from third-party tracking.</p>
<h2 id="whats-a-proxy-server-anyway">What&rsquo;s a proxy server, anyway?</h2>
<p>A proxy, in the traditional English definition, is the &ldquo;authority or power to act for another.&rdquo; (<a href="https://www.merriam-webster.com/dictionary/proxy">Merriam-Webster</a>) A proxy server, in the computing context, is a server that acts on behalf of another server, or a user&rsquo;s machine.</p>
<p>By using a proxy to browse the Internet, for example, a user can defer being personally identifiable. All of the user&rsquo;s Internet traffic appears to come from the proxy server instead of their machine.</p>
<h2 id="proxy-servers-are-for-users">Proxy servers are for users</h2>
<p>There are a few ways that we, as the client, can use a proxy server to conceal our identity when we go online. It&rsquo;s important to know that these methods offer differing levels of anonymity, and that no single method will really provide <em>true</em> anonymity; if others are actively seeking to find you on the Internet, for whatever reason, further steps should be taken to make your activity truly difficult to identify. (Those steps are beyond the scope of this article, but you can get started with the <a href="https://ssd.eff.org/">Electronic Frontier Foundation&rsquo;s (EFF) Surveillance Self-Defense</a> resource.) For the average user, however, here is a small menu of options ranging from least to most anonymous.</p>
<h3 id="use-a-proxy-in-your-web-browser">Use a proxy in your web browser</h3>
<p>Certain web browsers, including Firefox and Safari on Mac, allow us to configure them to send our Internet traffic through a proxy server. The proxy server attempts to <a href="https://en.wikipedia.org/wiki/Anonymizer">anonymize</a> our requests by replacing our originating IP address with the proxy server&rsquo;s own IP. This provides us with some anonymity, as the website we&rsquo;re trying to reach will not see our originating IP address; however, the proxy server that we choose to use will know exactly who originated the request. This method also doesn&rsquo;t necessarily encrypt traffic, block cookies, or stop social media and cross-site trackers from following us around; on the upside, it&rsquo;s the method least likely to prevent websites that use cookies from functioning properly.</p>
<p><img src="browser-proxy.png" alt="A cartoon of a proxy server guarding a browser"></p>
<p>Public proxy servers are out there, and deciding whether or not we should use any one of them is on par with deciding whether we should eat a piece of candy handed to us by a smiling stranger. If your academic institution or company provides a proxy server address, it is (hopefully) a private server with some security in place. My preferred method, if we have a little time and a few monthly dollars to invest in our security, is to set up our own virtual instance with a company such as <a href="https://aws.amazon.com/ec2/">Amazon Web Services</a> or <a href="https://www.digitalocean.com/products/droplets/">Digital Ocean</a> and use this as our proxy server.</p>
<p>To use a proxy through our browser, we can <a href="https://support.mozilla.org/en-US/kb/connection-settings-firefox">edit our Connection Settings in Firefox</a>, or <a href="https://support.apple.com/guide/safari/set-up-a-proxy-server-ibrw1053/mac">set up a proxy server using Safari on Mac</a>.</p>
<p>In regards to choosing a browser, I would happily recommend <a href="https://www.mozilla.org/en-US/firefox/new/">Firefox</a> to any Internet user who wants to beef up the security of their browsing experience right out of the box. Mozilla has been a champion of privacy-first since I&rsquo;ve heard of them, and recently made some well-received changes to <a href="https://blog.mozilla.org/blog/2019/06/04/firefox-now-available-with-enhanced-tracking-protection-by-default/">Enhanced Tracking Protection in Firefox Browser</a> that blocks social media trackers, cross-site tracking cookies, fingerprinters, and cryptominers by default.</p>
<h3 id="use-a-vpn-on-your-device">Use a VPN on your device</h3>
<p>In order to take advantage of a proxy server for all our Internet usage instead of just through one browser, we can use a Virtual Private Network (VPN). A VPN is a service, usually paid, that sends our Internet traffic through their servers, thus acting as a proxy. A VPN can be used on our laptop as well as phone and tablet devices, and since it encompasses all our Internet traffic, it doesn&rsquo;t require much extra effort to use other than ensuring our device is connected. Using a VPN is an effective way to keep nosy ISPs from snooping on our requests.</p>
<p><img src="vpn.png" alt="A cartoon depicting a private VPN"></p>
<p>To use a paid, third-party VPN service, we&rsquo;d usually sign up on their website and download their app. It&rsquo;s important to keep in mind that whichever provider we choose, we&rsquo;re entrusting them with our data. VPN providers anonymize our activity from the Internet, but can themselves see all our requests. Providers vary in terms of their privacy policies and the data they choose to log, so a little research may be necessary to determine which, if any, we are comfortable trusting.</p>
<p>We can also roll our own VPN service by using a virtual instance and <a href="https://openvpn.net/">OpenVPN</a>. OpenVPN is an open source VPN protocol, and can be used with a few virtual instance providers, such as <a href="https://openvpn.net/amazon-cloud/">Amazon VPC</a>, <a href="https://openvpn.net/microsoft-azure/">Microsoft Azure</a>, <a href="https://openvpn.net/google-cloud-vpn/">Google Cloud</a>, and <a href="https://openvpn.net/digital-ocean-vpn/">Digital Ocean Droplets</a>. I previously wrote a tutorial on <a href="https://victoria.dev/blog/how-to-set-up-openvpn-on-aws-ec2-and-fix-dns-leaks-on-ubuntu-18.04-lts/">setting up your own personal VPN service with AWS</a> using an EC2 instance. I&rsquo;ve been running this solution personally for about a month, and it&rsquo;s cost me almost $4 USD in total, which is a price I&rsquo;m quite comfortable paying for some peace of mind.</p>
<h3 id="use-tor">Use Tor</h3>
<p>Tor takes the anonymity offered by a proxy server and compounds it by forwarding our requests through a <a href="https://en.wikipedia.org/wiki/Relay_network">relay network</a> of other servers, each called a &ldquo;node.&rdquo; Our traffic passes through three nodes on its way to a destination: the <em>guard</em>, <em>middle</em>, and <em>exit</em> nodes. At each step, the request is encrypted and anonymized such that the current node only knows where to send it, and nothing more about what the request contains. This separation of knowledge means that, of the options discussed, Tor provides the most complete version of anonymity. (For a more complete explanation, see <a href="https://robertheaton.com/2019/04/06/how-does-tor-work/">Robert Heaton&rsquo;s article on how Tor works</a>, which is so excellently done that I wish I&rsquo;d written it myself.)</p>
<p><img src="tor.png" alt="Tor onion holding a Free Hugs sign"></p>
<p>That said, this level of anonymity comes with its own cost. Not monetary, as <a href="https://www.torproject.org/download/">Tor Browser</a> is free to download and use. It is, however, slower than using a VPN or simple proxy server through a browser, due to the circuitous route our requests take.</p>
<h2 id="proxy-servers-are-for-servers-too">Proxy servers are for servers too</h2>
<p>We&rsquo;re now familiar with proxy servers in the context of protecting users as they surf the web, but proxies aren&rsquo;t just for clients. Websites and Internet-connected applications can use <a href="https://en.wikipedia.org/wiki/Reverse_proxy">reverse proxy servers</a> for obfuscation too. The &ldquo;reverse&rdquo; part just means that the proxy is acting on behalf of the server, instead of the client.</p>
<p>Why would a web server care about anonymity? Generally, they don&rsquo;t, at least not in the same way some users do. Web servers can benefit from using a proxy for a few different reasons; for example, they typically offer faster service to users by <a href="https://en.wikipedia.org/wiki/Web_cache">caching</a> or <a href="https://en.wikipedia.org/wiki/HTTP_compression">compressing</a> content to optimize delivery. From a cybersecurity perspective, however, a reverse proxy can improve an application&rsquo;s security posture by obfuscating the underlying infrastructure.</p>
<p><img src="syllables.png" alt="A cartoon making fun of the big words I used"></p>
<p>Basically, by placing another web server (the &ldquo;proxy&rdquo;) in front of the web server that directly accesses all the files and assets, we make it more difficult for an attacker to pinpoint our &ldquo;real&rdquo; web server and mess with our stuff. Like when you want to see the store manager and the clerk you&rsquo;re talking to says, &ldquo;I speak for the manager,&rdquo; and you&rsquo;re not really sure there even <em>is</em> a manager, anyway, but you successfully exchange the hot pink My Little Pony they sold you for a <em>fuchsia</em> one, thankyouverymuch, so now you&rsquo;re no longer concerned with who the manager is and whether or not they really exist, and if you passed them on the street you would not be able to stop them and call them out for passing off hot pink as fuchsia, and the manager is just fine with that.</p>
<p>Some common web servers can also act as reverse proxies, often with just a minimal and straightforward configuration change. While the best choice for your particular architecture is unknown to me, I will offer a couple common examples here.</p>
<h3 id="using-nginx-as-a-reverse-proxy">Using NGINX as a reverse proxy</h3>
<p>NGINX uses the <code>proxy_pass</code> directive in its <a href="https://docs.nginx.com/nginx/admin-guide/basic-functionality/managing-configuration-files/">configuration file</a> (<code>nginx.conf</code> by default) to turn itself into a reverse proxy server. The set up requires the following lines to be placed in the configuration file:</p>
<pre><code>location /requested/path/ {
proxy_pass http://www.example.com/target/path/;
}
</code></pre><p>This specifies that all requests for the path <code>/requested/path/</code> are forwarded to <code>http://www.example.com/target/path/</code>. The target can be a domain name or an IP address, the latter with or without a port.</p>
<p>The full <a href="https://docs.nginx.com/nginx/admin-guide/web-server/reverse-proxy/">guide to using NGINX as a reverse proxy</a> is part of the NGINX documentation.</p>
<h3 id="using-apache-httpd-as-a-reverse-proxy">Using Apache httpd as a reverse proxy</h3>
<p>Apache httpd similarly requires some straightforward configuration to act as a reverse proxy server. In the <a href="https://httpd.apache.org/docs/current/configuring.html">configuration file</a>, usually <code>httpd.conf</code>, set the following directives:</p>
<pre><code>ProxyPass &quot;/requested/path/&quot; &quot;http://www.example.com/target/path/&quot;
ProxyPassReverse &quot;/requested/path/&quot; &quot;http://www.example.com/target/path/&quot;
</code></pre><p>The <code>ProxyPass</code> directive ensures that all requests for the path <code>/requested/path/</code> are forwarded to <code>http://www.example.com/target/path/</code>. The <code>ProxyPassReverse</code> directive ensures that the headers sent by the web server are modified to point to the reverse proxy server instead.</p>
<p>The full <a href="https://httpd.apache.org/docs/2.4/howto/reverse_proxy.html">reverse proxy guide for Apache HTTP server</a> is available in their documentation.</p>
<h2 id="proxy-servers-_most-of_-the-way-down">Proxy servers <em>most of</em> the way down</h2>
<p>I concede that my title is a little facetious, as cybersecurity best practices aren&rsquo;t really some eternal infinite-regression mystery (though they may sometimes seem to be). Regardless, I hope this post has helped in your understanding of what proxy servers are, how they contribute to online anonymity for both clients and servers, and that they are an integral building block of cybersecurity practices.</p>
<p>If you&rsquo;d like to learn more about personal best practices for online security, I highly recommend exploring the articles and resources provided by <a href="https://www.eff.org/">EFF</a>. For a guide to securing web sites and applications, the <a href="https://github.com/OWASP/CheatSheetSeries">OWASP Cheat Sheet Series</a> is a fantastic resource.</p>
Hackers are Googling your plain text passwords: preventing sensitive data exposurehttps://victoria.dev/blog/hackers-are-googling-your-plain-text-passwords-preventing-sensitive-data-exposure/
Mon, 09 Sep 2019 09:10:11 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/hackers-are-googling-your-plain-text-passwords-preventing-sensitive-data-exposure/Why sensitive data controls need to be established long before you think you need them, as demonstrated by Google dorking.
]]>
<p>Last week, I wrote about <a href="https://victoria.dev/blog/sql-injection-and-xss-what-white-hat-hackers-know-about-trusting-user-input/">the importance of properly handling user input</a> in our websites and applications. I alluded to an overarching security lesson that I hope to make explicit today: the security of our software, application, and customer data is built from the ground up, long before the product goes live.</p>
<p>The <a href="https://www.owasp.org/index.php/Category:OWASP_Top_Ten_Project">OWASP Top 10</a> is a comprehensive guide to web application security risks. It is relied upon by technology professionals, corporations, and those interested in cybersecurity or information security. The most recent publication lists Sensitive Data Exposure as the third most critical web application security risk. Here&rsquo;s how the risk is described:</p>
<blockquote>
<p>Many web applications and APIs do not properly protect sensitive data, such as financial, healthcare, and PII. Attackers may steal or modify such weakly protected data to conduct credit card fraud, identity theft, or other crimes. Sensitive data may be compromised without extra protection, such as encryption at rest or in transit, and requires special precautions when exchanged with the browser.</p>
</blockquote>
<p>&ldquo;Sensitive Data Exposure&rdquo; is a sort of catch-all category for leaked data resulting from many sources, ranging from weak cryptographic algorithms to unenforced encryption. The simplest source of this security risk, however, takes far fewer syllables to describe: people.</p>
<p>The phrase &ldquo;an ounce of prevention is worth a pound of cure,&rdquo; applies to medicine as well as secure software development. In the world of the latter, this is referred to as &ldquo;pushing left,&rdquo; a rather unintuitive term for establishing security best practices earlier, rather than later, in the software development life cycle (SDLC). Establishing procedures &ldquo;to the left&rdquo; of the SDLC can help ensure that the people involved in creating a software product are properly taking care of sensitive data from day one.</p>
<p>Unfortunately, a good amount of security testing often seems to occur much farther to the right side of the SDLC; too late for some security issues, such as sensitive data leakage, to be prevented.</p>
<p>I&rsquo;m one of the authors contributing to the upcoming <a href="https://github.com/OWASP/OWASP-Testing-Guide-v5">OWASP Testing Guide</a> and recently expanded a section on search engine discovery reconnaissance, or what the kids these days call &ldquo;Google dorking.&rdquo; This is one method, and arguably the most accessible method, by which a security tester (or black hat hacker) could find exposed sensitive data on the Internet. Here&rsquo;s an excerpt from that section (currently a work in progress on GitHub, to be released in v5):</p>
<blockquote>
<h3 id="search-operators">Search Operators</h3>
<p>A search operator is a special keyword that extends the capabilities of regular search queries, and can help obtain more specific results. They generally take the form of <code>operator:query</code>. Here are some commonly supported search operators:</p>
<ul>
<li><code>site:</code> will limit the search to the provided URL.</li>
<li><code>inurl:</code> will only return results that include the keyword in the URL.</li>
<li><code>intitle:</code> will only return results that have the keyword in the page title.</li>
<li><code>intext:</code> or <code>inbody:</code> will only search for the keyword in the body of pages.</li>
<li><code>filetype:</code> will match only a specific filetype, i.e. png, or php.</li>
</ul>
<p>For example, to find the web content of owasp.org as indexed by a typical search engine, the syntax required is:</p>
<p><code>site:owasp.org</code></p>
<p>&hellip;</p>
<h3 id="google-hacking-or-dorking">Google Hacking, or Dorking</h3>
<p>Searching with operators can be a very effective discovery reconnaissance technique when combined with the creativity of the tester. Operators can be chained to effectively discover specific kinds of sensitive files and information. This technique, called <a href="https://en.wikipedia.org/wiki/Google_hacking">Google hacking</a> or Google dorking, is also possible using other search engines, as long as the search operators are supported.</p>
<p>A database of dorks, such as <a href="https://www.exploit-db.com/google-hacking-database">Google Hacking Database</a>, is a useful resource that can help uncover specific information.</p>
</blockquote>
<p>Regularly reviewing search engine results can be a fruitful task for security testers. However, when a search for <code>site:myapp.com passwords</code> turns up no results, it may still be a little too early to break for lunch. Here are a couple other places a security tester might like to look for sensitive data exposed in the wild.</p>
<h2 id="pastebin">Pastebin</h2>
<p>The self-declared &ldquo;#1 paste tool since 2002,&rdquo; <a href="https://pastebin.com">Pastebin</a> allows users to temporarily store any kind of text. It&rsquo;s mostly used for sharing information with others, or retrieving your own &ldquo;paste&rdquo; on another machine, perhaps in another location. Pastebin makes it easy to share large amounts of complicated text, like error logs, source code, configuration files, tokens, api keys&hellip; what&rsquo;s that? Oh, yes, it&rsquo;s public by default.</p>
<p>Here are some screenshots of a little dorking I did for a public bug bounty program.</p>
<figure class="screenshot">
<img src="pastebin_apikey.png"
alt="A screenshot of exposed api key in Google search"/> <figcaption>
<p>API keys in plain view.</p>
</figcaption>
</figure>
<figure class="screenshot">
<img src="pastebin_pass.png"
alt="A screenshot of exposed username and password in Google search"/> <figcaption>
<p>Log-in details out in the open.</p>
</figcaption>
</figure>
<p>Thanks in part to the convenience of using Pastebin and similar websites, it would appear that some people fail to think twice before making sensitive data publicly available.</p>
<h3 id="but-why">But why?</h3>
<p>Granted, non-technical employees with access to the application may not have an understanding of which items should or should not be freely shared. Someone unfamiliar with what encrypted data is or what it looks like may not realize the difference between an encrypted string and an unencrypted token made up of many random letters and numbers. Even technical staff can miss things, make mistakes, or act carelessly after a hard day at work. It may be easy to call this a training problem and move on; however, none of these rationalizations address the root cause of the issue.</p>
<p>When people turn to outside solutions for an issue they face, it&rsquo;s usually because they haven&rsquo;t been provided with an equally-appealing internal solution, or are unaware that one exists. Employees using pastes to share or move sensitive data do so because they don&rsquo;t have an easier, more convenient, and secure internal solution to use instead.</p>
<h3 id="mitigation">Mitigation</h3>
<p>Everyone involved in the creation and maintenance of a web application should be briefed on a few basic things in regards to sensitive data protection:</p>
<ol>
<li>what constitutes sensitive data,</li>
<li>the difference between plain text and encrypted data, and</li>
<li>how to properly transmit and store sensitive data.</li>
</ol>
<p>When it comes to third-party services, ensure people are aware that some transmission may not be encrypted, or may be publicly searchable. If there is no system currently in place for safely sharing and storing sensitive data internally, this is a good place to start. The security of application data is in the hands of everyone on the team, from administrative staff to C-level executives. Ensure people have the tools they need to work securely.</p>
<h2 id="public-repositories">Public repositories</h2>
<p>Developers are notorious for leaving sensitive information hanging out where it doesn&rsquo;t belong (yes, I&rsquo;ve done it too!). Without a strong push-left approach in place for handling tokens, secrets, and keys, these little gems can end up in full public view on sites like GitHub, GitLab, and Bitbucket (to name a few). <a href="https://www.ndss-symposium.org/wp-content/uploads/2019/02/ndss2019_04B-3_Meli_paper.pdf">A 2019 study</a> found that thousands of new, unique secrets are leaked every day on GitHub alone.</p>
<figure class="screenshot">
<img src="github_tok.png"
alt="A screenshot of a Google search for tokens on GitHub"/>
</figure>
<p>GitHub has implemented measures like <a href="https://github.blog/2018-10-17-behind-the-scenes-of-github-token-scanning/">token scanning</a>, and GitLab 11.9 <a href="https://about.gitlab.com/2019/03/22/gitlab-11-9-released/">introduced secret detection</a>. While these tools aim to reduce the chances that a secret might accidentally be committed, to put it bluntly, it&rsquo;s really not their job. Secret scanning won&rsquo;t stop developers from committing the data in the first place.</p>
<h3 id="but-why-1">But why?</h3>
<p>Without an obvious process in place for managing secrets, developers may tend too much towards their innate sense of just-get-it-done-ness. Sometimes this leads to the expedient but irresponsible practice of storing keys as unencrypted variables within the program, perhaps with the intention of it being temporary. Nonetheless, these variables inevitably fall from front of mind and end up in a commit.</p>
<h3 id="mitigation-1">Mitigation</h3>
<p>Having a strong push-left culture means ensuring that sensitive data is properly stored and can be securely retrieved long before anyone is ready to make a commit. Tools and strategies for doing so are readily available for those who seek them. Here are some examples of tools that can support a push-left approach:</p>
<ul>
<li>Use a management tool to store and control access to keys and secrets, such as <a href="https://aws.amazon.com/kms/">Amazon Key Management Service</a> or Microsoft&rsquo;s <a href="https://azure.microsoft.com/en-us/services/key-vault/">Azure Key Vault</a>.</li>
<li>Make use of encrypted environment variables in CI tools, such as <a href="https://www.netlify.com/docs/continuous-deployment/#environment-variables">Netlify&rsquo;s environment variables</a> or <a href="https://help.github.com/en/articles/virtual-environments-for-github-actions#creating-and-using-secrets-encrypted-variables">virtual environments in GitHub Actions</a>.</li>
<li>Craft a robust <code>.gitignore</code> file that everyone on the team can contribute to and use.</li>
</ul>
<p>We also need not rely entirely on the public repository to catch those mistakes that may still slip through. It&rsquo;s possible to set up Git pre-commit hooks that scan for committed secrets using <a href="https://en.wikipedia.org/wiki/Regular_expression">regular expressions</a>. There are some open-source programs available for this, such as <a href="https://github.com/thoughtworks/talisman">Talisman from ThoughtWorks</a> and <a href="https://github.com/awslabs/git-secrets">git-secrets from AWS Labs</a>.</p>
<h2 id="pushing-left-to-prevent-sensitive-data-exposure">Pushing left to prevent sensitive data exposure</h2>
<p>A little perspective can go a long way in demonstrating why it&rsquo;s important to begin managing sensitive data even before any sensitive data exists. By establishing security best practices on the left of the SDLC, we give our people the best chance to increase the odds that any future dorking on our software product looks more like this.</p>
<p><img src="no_results.png#screenshot" alt="No results found in Google Search"></p>
<p>Another great resource for checking up on the security of our data is Troy Hunt&rsquo;s <a href="https://haveibeenpwned.com/">Have I Been Pwned</a>, a service that compares your data (such as your email) to data that has been leaked in previous data breaches.</p>
<p>To learn about more ways we can be proactive with our application security, the <a href="https://www.owasp.org/index.php/OWASP_Proactive_Controls">OWASP Proactive Controls</a> publication is a great resource. There&rsquo;s also more about creating a push-left approach to security in the upcoming <a href="https://github.com/OWASP/OWASP-Testing-Guide-v5">OWASP Testing Guide</a>. If these topics interest you, I encourage you to read, learn, and contribute so more people will make it harder for sensitive data to be found.</p>
SQL injection and XSS: what white hat hackers know about trusting user inputhttps://victoria.dev/blog/sql-injection-and-xss-what-white-hat-hackers-know-about-trusting-user-input/
Mon, 02 Sep 2019 09:01:23 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/sql-injection-and-xss-what-white-hat-hackers-know-about-trusting-user-input/A primer on SQL injection and cross site scripting, and how to handle user input in software development.
]]>
<p>Software developers have a lot on their minds. There are are myriad of questions to ask when it comes to creating a website or application: <em>What technologies will we use? How will the architecture be set up? What functions do we need? What will the UI look like?</em> Especially in a software market where shipping new apps seems more like a race for reputation than a well-considered process, one of the most important questions often falls to the bottom of the &ldquo;Urgent&rdquo; column: how will our product be secured?</p>
<p>If you&rsquo;re using a robust, open-source framework for building your product (and if one is applicable and available, why wouldn&rsquo;t you?) then some basic security concerns, like CSRF tokens and password encryption, may already be handled for you. Still, fast-moving developers would be well served to brush up on their knowledge of common threats and pitfalls, if only to avoid some embarrass
ing rookie mistakes. Usually, the weakest point in the security of your software is <em>you.</em></p>
<p>I&rsquo;ve recently become more interested in information security in general, and practicing ethical hacking in particular. An ethical hacker, sometimes called &ldquo;white hat&rdquo; hacker, and sometimes just &ldquo;hacker,&rdquo; is someone who searches for possible security vulnerabilities and responsibly (privately) reports them to project owners. By contrast, a malicious or &ldquo;black hat&rdquo; hacker, also called a &ldquo;cracker,&rdquo; is someone who exploits these vulnerabilities for amusement or personal gain. Both white hat and black hat hackers might use the same tools and resources, and generally try to get into places they aren&rsquo;t supposed to be; however, white hats do this with permission, and with the intention of fortifying defences instead of destroying them. Black hats are the bad guys.</p>
<p>When it comes to learning how to find security vulnerabilities, it should come as no surprise that I&rsquo;ve been devouring whatever information I can get my hands on; this post is a distillation of some key areas that are specifically helpful to developers when handling user input. These lessons have been collectively gleaned from these excellent resources:</p>
<ul>
<li>The <a href="https://www.owasp.org/index.php/Main_Page">Open Web Application Security Project</a> guides</li>
<li>The Hacker101 playlist from <a href="https://www.youtube.com/channel/UCsgzmECky2Q9lQMWzDwMhYw/">HackerOne&rsquo;s YouTube channel</a></li>
<li><a href="https://leanpub.com/web-hacking-101">Web Hacking 101</a> by Peter Yaworski</li>
<li><a href="https://brutelogic.com.br/blog/">Brute Logic&rsquo;s blog</a></li>
<li>The <a href="https://www.youtube.com/channel/UC9-y-6csu5WGm29I7JiwpnA">Computerphile</a> YouTube channel</li>
<li>Videos featuring Jason Haddix (<a href="https://github.com/jhaddix/">@jhaddix</a>) and Tom Hudson (<a href="https://github.com/tomnomnom/">@tomnomnom</a>) (two accomplished ethical hackers with different, but both effective, methodologies)</li>
</ul>
<p>You may be familiar with the catchphrase, &ldquo;sanitize your inputs!&rdquo; However, as I hope this post demonstrates, developing an application with robust security isn&rsquo;t quite so straightforward. I suggest an alternate phrase: pay attention to your inputs. Let&rsquo;s elaborate by examining the most common attacks that take advantage of vulnerabilities in this area: SQL injection and cross site scripting.</p>
<h2 id="sql-injection-attacks">SQL injection attacks</h2>
<p>If you&rsquo;re not yet familiar with SQL (Structured Query Language) injection attacks, or SQLi, here is a great <a href="https://www.youtube.com/watch?v=_jKylhJtPmI">explain-like-I&rsquo;m-five video on SQLi</a>. You may already know of this attack from <a href="https://xkcd.com/327/">xkcd&rsquo;s Little Bobby Tables</a>. Essentially, malicious actors may be able to send SQL commands that affect your application through some input on your site, like a search box that pulls results from your database. Sites coded in PHP can be especially susceptible to these, and a successful SQL attack can be devastating for software that relies on a database (as in, your Users table is now a pot of petunias).</p>
<figure class="center">
<img src="sqli.png"
alt="A monitor with an SQL Select command that gets all your base"/> <figcaption>
<p>You have no chance to survive make your time.</p>
</figcaption>
</figure>
<p>You can test your own site to see if you&rsquo;re susceptible to this kind of attack. (Please only test sites that you own, since running SQL injections where you don&rsquo;t have permission to be doing so is, possibly, illegal in your locality; and definitely, universally, not very funny.) The following payloads can be used to test inputs:</p>
<ul>
<li><code>' OR 1='1</code> evaluates to a constant true, and when successful, returns all rows in the table.</li>
<li><code>' AND 0='1</code> evaluates to a constant false, and when successful, returns no rows.</li>
</ul>
<p><a href="https://www.youtube.com/watch?v=ciNHn38EyRc">This video demonstrates the above tests</a>, and does a great job of showing how impactful an SQL injection attack can be.</p>
<p>Thankfully, there are ways to mitigate SQL injection attacks, and they all boil down to one basic concept: don&rsquo;t trust user input.</p>
<h2 id="sql-injection-mitigation">SQL injection mitigation</h2>
<p>In order to effectively mitigate SQL injections, developers must prevent users from being able to successfully submit raw SQL commands to any part of the site.</p>
<p>Some frameworks will do most of the heavy lifting for you. For example, Django implements the concept of <a href="https://en.wikipedia.org/wiki/Object-relational_mapping">Object-Relational Mapping</a>, or ORM, with its use of <a href="https://docs.djangoproject.com/en/2.2/topics/db/queries/">QuerySets</a>. We can think of these as wrapper functions that help your application query the database using pre-defined methods that avoid the use of raw SQL.</p>
<p>Being able to use a framework, however, is never a guarantee. When dealing directly with a database, there are other methods we can use to safely abstract our SQL queries from user input, though they vary in efficacy. These are, by order of most to least preferred, and with links to relevant examples:</p>
<ol>
<li>Prepared statements with variable binding (or <a href="https://cheatsheetseries.owasp.org/cheatsheets/Query_Parameterization_Cheat_Sheet.html">parameterized queries</a>),</li>
<li><a href="https://cheatsheetseries.owasp.org/cheatsheets/SQL_Injection_Prevention_Cheat_Sheet.html#defense-option-2-stored-procedures">Stored procedures</a>; and</li>
<li><a href="https://cheatsheetseries.owasp.org/cheatsheets/SQL_Injection_Prevention_Cheat_Sheet.html#defense-option-3-whitelist-input-validation">Whitelisting</a> or <a href="https://cheatsheetseries.owasp.org/cheatsheets/SQL_Injection_Prevention_Cheat_Sheet.html#defense-option-4-escaping-all-user-supplied-input">escaping</a> user input.</li>
</ol>
<p>If you want to implement the above techniques, the linked cheatsheets are a great starting point for digging deeper. Suffice to say, the use of these techniques to obtain data instead of using raw SQL queries helps to minimize the chances that SQL will be processed by any part of your application that takes input from users, thus mitigating SQL injection attacks.</p>
<p>The battle, however, is only half won&hellip;</p>
<h2 id="cross-site-scripting-xss-attacks">Cross Site Scripting (XSS) attacks</h2>
<p>If you&rsquo;re a malicious coder, JavaScript is pretty much your best friend. The right commands will do anything a legitimate user could do (and even some things they aren&rsquo;t supposed to be able to) on a web page, sometimes without any interaction on the part of an actual user. <a href="https://en.wikipedia.org/wiki/Cross-site_scripting">Cross Site Scripting</a> attacks, or XSS, occur when JavaScript code is injected into a web page and changes that page&rsquo;s behavior. Its effects can range from prank nuisance occurrences to more severe authentication bypasses or credential stealing. <a href="https://blogs.apache.org/infra/entry/apache_org_04_09_2010">This incident report from Apache in 2010</a> is a good example of how XSS can be chained in a larger attack to take over accounts and machines.</p>
<figure>
<img src="xss.png"
alt="An HTML dance party with a little JS cutting in"/> <figcaption>
<p>The annual DOM dance-off receives an unexpected guest);</p>
</figcaption>
</figure>
<p>XSS can occur on the server or on the client side, and generally comes in three flavors: DOM (<a href="https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model/Introduction">Document Object Model</a>) based, stored, and reflected XSS. The differences amount to where the attack payload is injected into the application.</p>
<h3 id="dom-based-xss">DOM based XSS</h3>
<p><a href="https://www.owasp.org/index.php/DOM_Based_XSS">DOM based XSS</a> occurs when a JavaScript payload affects the structure, behavior, or content of the web page the user has loaded in their browser. These are most commonly executed through modified URLs, such as in <a href="https://www.owasp.org/index.php/Phishing">phishing emails</a>.</p>
<p>To see how easy it would be for injected JavaScript to manipulate a page, we can create a working example with an HTML web page. Try creating a file on your local system called <code>xss-test.html</code> (or whatever you like) with the following HTML and JavaScript code:</p>
<div class="highlight"><pre class="chroma"><code class="language-html" data-lang="html"><span class="p">&lt;</span><span class="nt">html</span><span class="p">&gt;</span>
<span class="p">&lt;</span><span class="nt">head</span><span class="p">&gt;</span>
<span class="p">&lt;</span><span class="nt">title</span><span class="p">&gt;</span>My XSS Example<span class="p">&lt;/</span><span class="nt">title</span><span class="p">&gt;</span>
<span class="p">&lt;/</span><span class="nt">head</span><span class="p">&gt;</span>
<span class="p">&lt;</span><span class="nt">body</span><span class="p">&gt;</span>
<span class="p">&lt;</span><span class="nt">h1</span> <span class="na">id</span><span class="o">=</span><span class="s">&#34;greeting&#34;</span><span class="p">&gt;</span>Hello there!<span class="p">&lt;/</span><span class="nt">h1</span><span class="p">&gt;</span>
<span class="p">&lt;</span><span class="nt">script</span><span class="p">&gt;</span>
<span class="kd">var</span> <span class="nx">name</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">URLSearchParams</span><span class="p">(</span><span class="nb">document</span><span class="p">.</span><span class="nx">location</span><span class="p">.</span><span class="nx">search</span><span class="p">).</span><span class="nx">get</span><span class="p">(</span><span class="s1">&#39;name&#39;</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">name</span> <span class="o">!==</span> <span class="s1">&#39;null&#39;</span><span class="p">)</span> <span class="p">{</span>
<span class="nb">document</span><span class="p">.</span><span class="nx">getElementById</span><span class="p">(</span><span class="s1">&#39;greeting&#39;</span><span class="p">).</span><span class="nx">innerHTML</span> <span class="o">=</span> <span class="s1">&#39;Hello &#39;</span> <span class="o">+</span> <span class="nx">name</span> <span class="o">+</span> <span class="s1">&#39;!&#39;</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">&lt;/</span><span class="nt">script</span><span class="p">&gt;</span>
<span class="p">&lt;/</span><span class="nt">h1</span><span class="p">&gt;</span>
<span class="p">&lt;/</span><span class="nt">html</span><span class="p">&gt;</span>
</code></pre></div><p>This web page will display the title &ldquo;Hello there!&rdquo; unless it receives a <a href="https://en.wikipedia.org/wiki/Query_string">URL parameter from a query string</a> with a value for <code>name</code>. To see the script work, open the page in a browser with an appended URL parameter, like so:</p>
<p><code>file:///path/to/file/xss-test.html?name=Victoria</code></p>
<p>Fun, right? Our insecure (in the safety sense, not the emotional one) page takes the URL parameter value for <code>name</code> and displays it in the DOM. The page is expecting the value to be a nice friendly string, but what if we change it to something else? Since the page is owned by us and only exists on our local system, we can test it all we like. What happens if we change the <code>name</code> parameter to, say, <code>&lt;img+src+onerror=alert(&quot;pwned&quot;)&gt;</code>?</p>
<p><img src="pwned.png#screenshot" alt="A screenshot of the XSS page example"></p>
<p>This is just one example, largely based on one from <a href="https://brutelogic.com.br/blog/dom-based-xss-the-3-sinks/">Brute&rsquo;s post</a>, that demonstrates how an XSS attack could be executed. Funny pop-up alerts may be amusing, but JavaScript can do a lot of harm, including helping malicious attackers steal passwords and personal information.</p>
<h3 id="stored-and-reflected-xss">Stored and reflected XSS</h3>
<p><a href="https://en.wikipedia.org/wiki/Cross-site_scripting#Persistent_(or_stored)">Stored XSS</a> occurs when the attack payload is stored on the server, such as in a database. The attack affects a victim whenever that stored data is retrieved and rendered in the browser. For example, instead of using a URL query string, an attacker might update their profile page on a social site to include a hidden script in, say, their &ldquo;About Me&rdquo; section. The script, improperly stored on the site&rsquo;s server, would successfully execute at a later time when another user views the attacker&rsquo;s profile.</p>
<p>One of the most famous examples of this is the <a href="https://en.wikipedia.org/wiki/Samy_(computer_worm)">Samy worm</a> that all but took over MySpace in 2005. It propagated by sending HTTP requests that replicated it onto a victim&rsquo;s profile page whenever an infected profile was viewed. Within just 20 hours, it had spread to over a million users.</p>
<p><a href="https://en.wikipedia.org/wiki/Cross-site_scripting#Non-persistent_(reflected)">Reflected XSS</a> similarly occurs when the injected payload travels to the server, however, the malicious code does not end up stored in a database. It is instead immediately returned to the browser by the web application. An attack like this might be executed by luring the victim to click a malicious link that sends a request to the vulnerable website&rsquo;s server. The server would then send a response to the attacker as well as the victim, which may result in the attacker being able to obtain passwords, or perpetrate actions that appear to originate from the victim.</p>
<h2 id="xss-attack-mitigation">XSS attack mitigation</h2>
<p>In all of these cases, XSS attacks can be mitigated with two key strategies: validating form fields, and avoiding the direct injection of user input on the web page.</p>
<h3 id="validating-form-fields">Validating form fields</h3>
<p>Frameworks can again help us out when it comes to making sure that user-submitted forms are on the up-and-up. One example is <a href="https://docs.djangoproject.com/en/2.2/ref/forms/fields/#built-in-field-classes">Django&rsquo;s built-in <code>Field</code> classes</a>, which provide fields that validate to some commonly used types and also specify sane defaults. Django&rsquo;s <code>EmailField</code>, for instance, uses a set of rules to determine if the input provided is a valid email. If the submitted string has characters in it that are not typically present in email addresses, or if it doesn&rsquo;t imitate the common format of an email address, then Django won&rsquo;t consider the field valid and the form will not be submitted.</p>
<p>If relying on a framework isn&rsquo;t an option, we can implement our own input validation. This can be accomplished with a few different techniques, including <a href="https://en.wikipedia.org/wiki/Type_conversion">type conversion</a>, for example, ensuring that a number is of type <code>int()</code>; checking minimum and maximum range values for numbers and lengths for strings; using a pre-defined array of choices that avoids arbitrary input, for example, months of the year; and checking data against strict <a href="https://en.wikipedia.org/wiki/Regular_expression">regular expressions</a>.</p>
<p>Thankfully, we needn&rsquo;t start from scratch. Open source resources are available to help, such as the <a href="https://www.owasp.org/index.php/OWASP_Validation_Regex_Repository">OWASP Validation Regex Repository</a>, which provides patterns to match against for some common forms of data. Many programming languages offer validation libraries specific to their syntax, and we can find <a href="https://github.com/search?q=validation+library">plenty of these on GitHub</a>. Additionally, the <a href="https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet">XSS Filter Evasion Cheat Sheet</a> has a couple suggestions for test payloads we can use to test our existing applications.</p>
<p>While it may seem tedious, properly implemented input validation can protect our application from being susceptible to XSS.</p>
<h3 id="avoiding-direct-injection">Avoiding direct injection</h3>
<p>Elements of an application that directly return user input to the browser may not, on a casual inspection, be obvious. We can determine areas of our application that may be at risk by exploring a few questions:</p>
<ul>
<li>How does data flow through our application?</li>
<li>What does a user expect to happen when they interact with this input?</li>
<li>Where on our page does data appear? Does it become embedded in a string or an attribute?</li>
</ul>
<p>Here are some sample payloads that we can play with in order to test inputs on our site (again, only our own site!) courtesy of <a href="https://www.hacker101.com/">Hacker101</a>. The successful execution of any of these samples can indicate a possible XSS vulnerability due to direct injection.</p>
<ul>
<li><code>&quot;&gt;&lt;h1&gt;test&lt;/h1&gt;</code></li>
<li><code>'+alert(1)+'</code></li>
<li><code>&quot;onmouserover=&quot;alert(1)</code></li>
<li><code>http://&quot;onmouseover=&quot;alert(1)</code></li>
</ul>
<p>As a general rule, if you are able to design around directly injecting input, do so. Alternatively, be sure to completely understand the effect of the methods you choose; for example, using <code>innerText</code> instead of <code>innerHTML</code> in JavaScript will ensure that content will be set as plain text instead of (potentially vulnerable) HTML.</p>
<h2 id="pay-attention-to-your-inputs">Pay attention to your inputs</h2>
<p>Software developers are at a marked disadvantage when it comes to competing with black hat, or malicious, hackers. For all the work we do to secure each and every input that could potentially compromise our application, an attacker need only find the one we missed. It&rsquo;s like installing deadbolts on all the doors, but leaving a window open!</p>
<p>By learning to think along the same lines as an attacker, however, we can better prepare our software to stand up against bad actors. Exciting as it may be to ship features as quickly as possible, we&rsquo;ll avoid racking up a lot of security debt if we take the time beforehand to think through our application&rsquo;s flow, follow the data, and pay attention to our inputs.</p>
How to set up OpenVPN on AWS EC2 and fix DNS leaks on Ubuntu 18.04 LTShttps://victoria.dev/blog/how-to-set-up-openvpn-on-aws-ec2-and-fix-dns-leaks-on-ubuntu-18.04-lts/
Mon, 26 Aug 2019 09:01:23 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/how-to-set-up-openvpn-on-aws-ec2-and-fix-dns-leaks-on-ubuntu-18.04-lts/A guide for setting up your own private VPN service, and understanding and fixing a DNS leak.
]]>
<p>While rolling your own Virtual Private Network (VPN) is far more complicated than choosing a VPN provider from someone&rsquo;s &ldquo;best VPN 2019&rdquo; list, the more I learn about why someone should use a VPN at all, the less appealing the latter option becomes. Besides the dangers of trusting a fake VPN app or falling victim to a lookalike URL, even <em>legit</em> VPN service providers have pressures and motivations that may not be aligned with the privacy you hope to be purchasing.</p>
<p>Usually, the point of using a VPN is to gain a layer of privacy by disguising your location. If you aren&rsquo;t currently using one, you can see what the Internet knows about where you are at <a href="https://dnsleaktest.com/">the DNS leak test website</a>. You&rsquo;ll see a big hello, your IP address, and your location. If that&rsquo;s a little unsettling, know that a VPN can help to shield your location and online activities from wandering eyes and opportunistic advertisers. The former might be a too-curious or even malicious public-WiFi-cafe-goer, but the latter, counterintuitively, might be your own household Internet Service Provider (ISP).</p>
<p>Using a VPN means that the Internet can&rsquo;t easily see your location, and your ISP can&rsquo;t see your unencrypted web traffic (and neither can your curious coffee shop neighbor). Your ISP <em>can</em> see the amount of data you&rsquo;re sending, in its encrypted form, and that you&rsquo;re sending it to your VPN server - but that&rsquo;s all.</p>
<p>Unless you have a <a href="https://dnsleaktest.com/what-is-a-dns-leak.html">DNS leak</a>.</p>
<p>If you are still using your ISP&rsquo;s DNS server, they are still able to see all the URLs the server is resolving for you. So they&rsquo;ll know you asked for <code>lastminutebackwax.com</code>, although they won&rsquo;t be able to decrypt the data that was exchanged with the site. (Is it just me, or does that seem even worse, somehow?)</p>
<p>Setting up your own instance and VPN service offers some peace of mind over trusting yet another company to do right with your data. Note that a VPN will <em>not</em> give you complete online anonymity; there are many other ways your Internet presence can be tracked and your location discovered. However, if properly set up, without DNS leaks, you&rsquo;ll have about as much Internet privacy as can be afforded without using <a href="https://en.wikipedia.org/wiki/Tor_(anonymity_network)">Tor</a>.</p>
<h1 id="setting-up-our-vpn">Setting up our VPN</h1>
<p>This post will cover how to set up the <a href="https://aws.amazon.com/marketplace/pp/B00MI40CAE/">OpenVPN Access Server</a> product on AWS Marketplace, running on an <a href="https://aws.amazon.com/ec2/">Amazon EC2 instance</a>. Then, we&rsquo;ll look at how to fix a <a href="https://gitlab.gnome.org/GNOME/NetworkManager-openvpn/issues/10">known NetworkManager bug in Ubuntu 18.04 that might cause DNS leaks</a>. The whole process should take about fifteen minutes, so grab a ☕ and let&rsquo;s do some adulting.</p>
<p><em>Note: IDs and IP addresses shown for demonstration in this tutorial are invalid.</em></p>
<h2 id="1-launch-the-openvpn-access-server-on-aws-marketplace">1. Launch the OpenVPN Access Server on AWS Marketplace</h2>
<p>The <a href="https://aws.amazon.com/marketplace/pp/B00MI40CAE">OpenVPN Access Server</a> is available on AWS Marketplace. The Bring Your Own License (BYOL) model doesn&rsquo;t actually require a license for up to two connected devices; to connect more clients, you can get <a href="https://aws.amazon.com/marketplace/seller-profile/ref=srh_res_product_vendor?ie=UTF8&amp;id=aac3a8a3-2823-483c-b5aa-60022894b89d">bundled billing</a> for five, ten, or twenty-five clients, or <a href="https://openvpn.net/pricing/">purchase a minimum of ten OpenVPN licenses a la carte</a> for $15/device/year. For most of us, the two free connected devices will suffice; and if using an EC2 Micro instance, our set up will be <a href="https://aws.amazon.com/free/">AWS Free Tier eligible</a> as well.</p>
<p>Start by clicking &ldquo;Continue to Subscribe&rdquo; for the <a href="https://aws.amazon.com/marketplace/pp/B00MI40CAE">OpenVPN Access Server</a>, which will bring you to a page that looks like this:</p>
<p><img src="1-subscribe.jpg#screenshot" alt="Subscription details page for OpenVPN Access Server"></p>
<p>Click &ldquo;Continue to Configuration.&rdquo;</p>
<p><img src="2-configure.jpg#screenshot" alt="Configure this software page for OpenVPN Access Server"></p>
<p>You may notice that the EC2 instance type in the right side bar (and consequently, the Monthly Estimate) isn&rsquo;t the one we want - that&rsquo;s okay, we can change it soon. Just ensure that the &ldquo;Region&rdquo; chosen is where we want the instance to be located. Generally, the closer it is to the physical location of your client (your laptop, in this case), the faster your VPN will be. Click &ldquo;Continue to Launch.&rdquo;</p>
<p><img src="3-launch.jpg#screenshot" alt="Launch this software page"></p>
<p>On this page, we&rsquo;ll change three things:</p>
<h3 id="1-the-ec2-instance-type">1. The EC2 Instance type</h3>
<p>Different types of EC2 (Elastic Compute Cloud) instances will offer us different levels of computing power. If you plan to use your instance for something more than just this VPN, you may want to choose something with higher memory or storage capacity, depending on how you plan to use it. We can view each instance offering on the <a href="https://aws.amazon.com/ec2/instance-types/">Amazon EC2 Instance Types page</a>.</p>
<p>For simple VPN use, the <code>t2.nano</code> or <code>t2.micro</code> instances are likely sufficient. Only the Micro instance is Free Tier eligible.</p>
<h3 id="2-the-security-group-settings">2. The Security Group settings</h3>
<p>A <a href="https://docs.aws.amazon.com/vpc/latest/userguide/VPC_SecurityGroups.html">Security Group</a> is a profile, or collection of settings, that Amazon uses to control access to our instance. If you&rsquo;ve set up other AWS products before, you may already have some groups with their own rules defined. We should be careful to understand the reasons for our Security Group settings, as these define how public or private our instance is, and consequently, who has access to it.</p>
<p>If we click &ldquo;Create New Based on Seller Settings,&rdquo; the OpenVPN server defines some recommended settings for a default Security Group.</p>
<p><img src="4-security-group.jpg#screenshot" alt="Security group settings"></p>
<p>The default recommended settings are all <code>0.0.0.0/0</code> for TCP ports 22, 943, 443, and 945, and UDP port 1194. OpenVPN offers an <a href="https://openvpn.net/vpn-server-resources/amazon-web-services-ec2-byol-appliance-quick-start-guide/#Instance_Launch_Options">explanation of how the ports are used</a> on their website. With the default settings, all these ports are left open to support various features of the OpenVPN server. We may wish to restrict access to these ports to a specific IP address or block of addresses (like that of your own ISP) to increase the security of our instance. However, if your IP address frequently changes (like when you travel and connect to a different WiFi network), restricting the ports may not be as helpful as we hope.</p>
<p>In any case, our instance will require SSH keys to connect to, and the OpenVPN server will be password protected. Unless you have other specific security goals, it&rsquo;s fine to accept the default settings for now.</p>
<p>Let&rsquo;s give the Security Group a name and brief description, so we know what it&rsquo;s for. Then click &ldquo;Save.&rdquo;</p>
<h3 id="3-the-key-pair-settings">3. The Key Pair settings</h3>
<p>The aforementioned SSH keys are access credentials that we&rsquo;ll use to connect to our instance. We can <a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html#having-ec2-create-your-key-pair">create a key pair</a> in this section, or you can choose a key pair you may already be using with AWS.</p>
<p><img src="5-keys.jpg#screenshot" alt="Key Pair Settings link"></p>
<p>To create a new set of access credentials, click &ldquo;Create a key pair in EC2&rdquo; to open a new window. Then, click the &ldquo;Create Key Pair&rdquo; blue button. Once you give your key pair a name, it will be created and the private key will automatically download to your machine. It&rsquo;s a file ending with the extension <code>.pem</code>. Store this key in a secure place on your computer. We&rsquo;ll need to refer to it when we connect to our new EC2 instance.</p>
<p>We can return to the previous window to select the key pair we just created. If it doesn&rsquo;t show up, hit the little &ldquo;refresh&rdquo; icon next to the drop-down. Once it&rsquo;s selected, hit the shiny yellow &ldquo;Launch&rdquo; button.</p>
<p>We should see a message like this:</p>
<p><img src="6-launched.jpg#screenshot" alt="Launch success message"></p>
<p>Great stuff! Now that our instance exists, let&rsquo;s make sure we can access it and start up our VPN. For a shortcut to the next step, click on the &ldquo;EC2 Console&rdquo; link in the success message.</p>
<h2 id="2-associate-an-elastic-ip">2. Associate an Elastic IP</h2>
<p>Amazon&rsquo;s <a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html">Elastic IP Addresses</a> provides us with a public IPv4 address controlled by our account, unlike the public IP address tied to our EC2 instance. It&rsquo;s considered a best practice to create one and associate it with our VPN instance. If anything should go wrong with our instance, or if we want to use a new instance for our VPN in the future, the Elastic IP can be disassociated from the current instance and reassociated with our new one. This makes the transition seamless for our connected clients. Think of the Elastic IP like a web domain address that we register - we can point it at whatever we choose.</p>
<p>We can create a new Elastic IP address on the <a href="https://console.aws.amazon.com/ec2/">Amazon EC2 Console</a>. If you clicked the link from the success message above, we&rsquo;re already there.</p>
<p><img src="7-ec2.jpg#screenshot" alt="EC2 console"></p>
<p>If you have more than one instance, take note of the Instance ID of the one we&rsquo;ve just launched.</p>
<p>In the left sidebar under &ldquo;Network &amp; Security,&rdquo; choose &ldquo;Elastic IPs.&rdquo; Then click the blue &ldquo;Allocate new address&rdquo; button.</p>
<p><img src="8-elasticip.jpg#screenshot" alt="Allocate new address page"></p>
<p>Choose &ldquo;Amazon Pool,&rdquo; then click &ldquo;Allocate.&rdquo;</p>
<p><img src="9-elasticip.jpg#screenshot" alt="Allocate elastic IP success message"></p>
<p>Success! We can click &ldquo;Close&rdquo; to return to the Elastic IP console.</p>
<p><img src="10-associateip.jpg#screenshot" alt="Associate elastic IP"></p>
<p>Now that we have an Elastic IP, let&rsquo;s associate it with our instance. Select the IP address, then click &ldquo;Actions,&rdquo; and choose &ldquo;Associate address.&rdquo;</p>
<p><img src="11-associateip.jpg#screenshot" alt="Associate elastic IP with instance"></p>
<p>Ensure the &ldquo;Instance&rdquo; option is selected, then click the drop-down menu. We should see our EC2 instance ID there. Select it, then click &ldquo;Associate.&rdquo;</p>
<p><img src="12-associateip.jpg#screenshot" alt="Associate elastic IP success message"></p>
<p>Success! Now that we&rsquo;ll be able to access our VPN instance, let&rsquo;s get our VPN service up and running.</p>
<h2 id="3-initialize-openvpn-on-the-ec2-server">3. Initialize OpenVPN on the EC2 server</h2>
<p>First, we&rsquo;ll need to connect to the EC2 instance via our terminal. We&rsquo;ll use the private key we created earlier.</p>
<p>Open a new terminal window and navigate to the directory containing the private key <code>.pem</code> file. We&rsquo;ll need to set its permissions with:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">sudo chmod <span class="m">400</span> &lt;name&gt;.pem
</code></pre></div><p>Be sure to substitute <code>&lt;name&gt;</code> with the name of your key.</p>
<p>This sets the file permissions to <code>-r--------</code> so that it can only be read by the user (you). It may help to protect the private key from read and write operations by other users, but more pertinently, will prevent AWS from throwing an error when we try to connect to our instance.</p>
<p>We can now do just that by running:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">ssh -i &lt;name&gt;.pem openvpnas@&lt;elastic ip&gt;
</code></pre></div><p>The user <code>openvpnas</code> is set up by the OpenVPN Access Server to allow us to connect to our instance. Replace <code>&lt;elastic ip&gt;</code> with the Elastic IP address we just associated.</p>
<p>We may get a message saying that the authenticity of our host can&rsquo;t be established. As long as we&rsquo;ve typed the Elastic IP correctly, we can go ahead and answer &ldquo;yes&rdquo; to the prompt.</p>
<p>Upon the initial connection to the OpenVPN instance, a set up wizard called &ldquo;Initial Configuration Tool&rdquo; should automatically run. (If, for some reason, it doesn&rsquo;t, or you panic-mashed a button, we can restart it with <code>sudo ovpn-init –ec2</code>.) We&rsquo;ll be asked to accept the agreement, then the wizard will help to walk us through some configuration settings for our VPN server.</p>
<p>You may generally accept the default settings, however, there are a couple questions you may like to answer knowledgeably. They are:</p>
<blockquote>
<p><strong>Should client traffic be routed by default through the VPN?</strong></p>
</blockquote>
<blockquote>
<p><em>Why you might like to answer &ldquo;yes&rdquo;:</em> Answering &ldquo;yes&rdquo; to this option can prevent <a href="https://en.wikipedia.org/wiki/Split_tunneling">split tunneling</a>, a situation in which you may bypass the VPN when connected to WiFi networks.</p>
</blockquote>
<blockquote>
<p><strong>Should client DNS traffic be routed by default through the VPN?</strong></p>
</blockquote>
<blockquote>
<p><em>Why you might like to answer &ldquo;yes&rdquo;:</em> This setting can help prevent DNS leaks by specifying that DNS requests should be handled by the VPN. If you answer &ldquo;yes&rdquo; to the previous question, it will be enabled regardless.</p>
</blockquote>
<p>When asked for our &ldquo;OpenVPN-AS license key&rdquo;, we can leave it blank to use the VPN with up to two clients. If you&rsquo;ve purchased a key, enter it here.</p>
<p>Once the configuration wizard finishes running, we should see the message &ldquo;Initial Configuration Complete!&rdquo; Before we move on, we should set a password for our server&rsquo;s administration account. To do this, run:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">sudo passwd openvpn
</code></pre></div><p>Then enter your chosen password twice. Now we&rsquo;re ready to get connected!</p>
<p>To close the ssh connection, type <code>exit</code>.</p>
<h2 id="4-connect-the-client-to-the-vpn">4. Connect the client to the VPN</h2>
<p>To connect our client (in this case, our laptop) to the VPN and start reaping the benefits, we&rsquo;ll need to do two things; first, obtain our connection profile; second, install the <code>openvpn</code> daemon.</p>
<h3 id="1-get-your-ovpn-connection-profile">1. Get your <code>.ovpn</code> connection profile</h3>
<p>We&rsquo;ll need to download a connection profile for ourselves; this is like a personal configuration file with information, including keys, that the VPN server will need to allow our connection. We can do this by logging in with the password we just set at our Elastic IP address, port 943. This looks like:</p>
<pre><code>https://&lt;elastic ip&gt;:943/
</code></pre><p>The <code>https</code> part is important; without it, the instance won&rsquo;t send any data.</p>
<p>When we go to this URL, we may see a page warning us that this site&rsquo;s certificate issuer is unknown or invalid. As long as we&rsquo;ve typed our Elastic IP correctly, it&rsquo;s safe to proceed. If you&rsquo;re using Firefox, click &ldquo;Advanced,&rdquo; and then &ldquo;Accept the Risk and Continue.&rdquo; In Chrome, click &ldquo;Advanced,&rdquo; then &ldquo;Proceed to &hellip;&rdquo; the elastic IP.</p>
<p><img src="13-warning.jpg#screenshot" alt="Security warning page"></p>
<p>Log in with the username <code>openvpn</code> and the password we just set. We&rsquo;ll now be presented with a link to download our user-locked connection profile:</p>
<p><img src="14-profile.jpg#screenshot" alt="Connection profile download page"></p>
<p>When we click the link, a file named <code>client.ovpn</code> will download.</p>
<h3 id="2-install-and-start-openvpn-on-your-ubuntu-1804-client">2. Install and start <code>openvpn</code> on your Ubuntu 18.04 client</h3>
<p>The <code>openvpn</code> daemon will allow our client to connect to our VPN server. It can be installed through the default Ubuntu repositories. Run:</p>
<pre><code>sudo apt install openvpn
</code></pre><p>In order for OpenVPN to automatically start when we boot up our computer, we&rsquo;ll need to rename and move the connection profile file. I suggest using a <a href="https://en.wikipedia.org/wiki/Symbolic_link">symlink</a> to accomplish this, as it leaves our original file more easily accessible for editing, and allows us to store it in any directory we choose. We can create a symlink by running this command in the directory where our file is located:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">sudo ln -s client.ovpn /etc/openvpn/&lt;name&gt;.conf
</code></pre></div><p>This creates a symbolic link for the connection profile in the appropriate folder for <code>systemd</code> to find it. The <code>&lt;name&gt;</code> can be anything. When the Linux kernel has booted, <code>systemd</code> is used to initialize the services and daemons that the user has set up to run; one of these will now be OpenVPN. Renaming the file with the extension <code>.conf</code> will let the <code>openvpn</code> daemon know to use it as our connection file.</p>
<p>For now, we can manually start and connect to OpenVPN by running:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">sudo openvpn --config client.ovpn
</code></pre></div><p>We&rsquo;ll be asked for a username and password, which will be the same credentials we used before. Once the service finishes starting up, we&rsquo;ll see &ldquo;Initialization Sequence Complete.&rdquo; If we now visit <a href="https://www.dnsleaktest.com/">the DNS leak test website</a>, we should see the Elastic IP and the location of our EC2 server. Yay!</p>
<p>If you&rsquo;re on a later version of Ubuntu, you may check for DNS leaks by clicking on one of the &ldquo;test&rdquo; buttons. If all the ISPs shown are Amazon and none are your own service provider&rsquo;s, congratulations! No leaks! You can move on to <a href="#3-set-up-openvpn-as-networkmanager-system-connection">Step 3 in the second section</a> below, after which, you&rsquo;ll be finished.</p>
<p>If you&rsquo;re using Ubuntu 18.04 LTS, however, we&rsquo;re not yet done.</p>
<h1 id="what-a-dns-leak-looks-like">What a DNS leak looks like</h1>
<p>To see what a DNS leak looks like, click on one of the &ldquo;test&rdquo; buttons on the <a href="https://www.dnsleaktest.com/">the DNS leak test page</a>. When we do, we&rsquo;ll see not only our Amazon.com IP addresses, but also our own ISP and location.</p>
<p>We can also see the leak by running <code>systemd-resolve --status</code> in our terminal. Our results will contain two lines under different interfaces that both have entries for DNS Servers. It&rsquo;ll look something like this:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">Link <span class="m">7</span> <span class="o">(</span>tun0<span class="o">)</span>
Current Scopes: DNS
LLMNR setting: yes
MulticastDNS setting: no
DNSSEC setting: no
DNSSEC supported: no
DNS Servers: 172.31.0.2
DNS Domain: ~.
Link <span class="m">3</span> <span class="o">(</span>wlp4s0<span class="o">)</span>
Current Scopes: none
LLMNR setting: yes
MulticastDNS setting: no
DNSSEC setting: no
DNSSEC supported: no
DNS Servers: 192.168.0.1
DNS Domain: ~.
</code></pre></div><p>The <a href="https://unix.stackexchange.com/questions/434916/how-to-fix-openvpn-dns-leak">DNS leak problem in Ubuntu 18.04</a> stems from Ubuntu&rsquo;s DNS resolver, <code>systemd-resolved</code>, failing to properly handle our OpenVPN configuration. In order to try and be a good, efficient DNS resolver, <code>systemd-resolved</code> will send DNS lookup requests in parallel to each interface that has a DNS server configuration, and then utilizes the fastest response. In our case, we only want to use our VPN&rsquo;s DNS servers. Sorry, <code>systemd-resolved</code>. You tried.</p>
<h1 id="how-to-fix-openvpn-dns-leak-on-ubuntu-1804">How to fix OpenVPN DNS leak on Ubuntu 18.04</h1>
<p>Luckily, there is a fix that we can implement. We&rsquo;ll need to install a few helpers from the Ubuntu repositories, update our configuration file, then set up OpenVPN using NetworkManager. Let&rsquo;s do it!</p>
<h2 id="1-install-some-helpers">1. Install some helpers</h2>
<p>To properly integrate OpenVPN with <code>systemd-resolved</code>, we&rsquo;ll need a bit more help. In a terminal, run:</p>
<pre><code>sudo apt install -y openvpn-systemd-resolved network-manager-openvpn network-manager-openvpn-gnome
</code></pre><p>This will install a helper script that integrates OpenVPN and <code>systemd-resolved</code>, a NetworkManager plugin for OpenVPN, and its GUI counterpart for GNOME desktop environment.</p>
<h2 id="2-add-dns-implementation-to-your-connection-profile">2. Add DNS implementation to your connection profile</h2>
<p>We&rsquo;ll need to edit the connection profile file we downloaded earlier. Since it&rsquo;s symbolically linked, we can accomplish this by changing the <code>.ovpn</code> file, wherever it&rsquo;s stored. Run <code>vim &lt;name&gt;.ovpn</code> to open it in Vim, then add the following lines at the bottom. Explanation in the comments:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh"><span class="c1"># Allow OpenVPN to call user-defined scripts</span>
script-security <span class="m">2</span>
<span class="c1"># Tell systemd-resolved to send all DNS queries over the VPN</span>
dhcp-option DOMAIN-ROUTE .
<span class="c1"># Use the update-systemd-resolved script when TUN/TAP device is opened,</span>
<span class="c1"># and also run the script on restarts and before the TUN/TAP device is closed</span>
up /etc/openvpn/update-systemd-resolved
up-restart
down /etc/openvpn/update-systemd-resolved
down-pre
</code></pre></div><p>For the full list of OpenVPN options, see <a href="https://openvpn.net/community-resources/reference-manual-for-openvpn-2-1/">OpenVPN Scripting and Environment Variables</a>. You may also like <a href="https://en.wikipedia.org/wiki/TUN/TAP">more information about TUN/TAP</a>.</p>
<h2 id="3-set-up-openvpn-as-networkmanager-system-connection">3. Set up OpenVPN as NetworkManager system connection</h2>
<p>We&rsquo;ll use the GUI to set up our VPN with NetworkManager. Open up Network Settings, which should look something like this:</p>
<p><img src="15-networksettings.png#screenshot" alt="Network Settings window on Ubuntu 18.04"></p>
<p>Then click the &ldquo;+&rdquo; button. On the window that pops up, counterintuitively, choose &ldquo;Import from file&hellip;&rdquo; instead of the OpenVPN option.</p>
<p><img src="16-importvpn.jpg#screenshot" alt="Add VPN window"></p>
<p>Navigate to, and then select, your <code>.ovpn</code> file. We should now see something like this:</p>
<p><img src="17-vpnsettings.png#screenshot" alt="The filled VPN connection settings"></p>
<p>Add your username and password for the server (<code>openvpn</code> and the password we set in <a href="#3-initialize-openvpn-on-the-ec2-server">the first section&rsquo;s Step 3</a>), and your user key password (the same one again, if you&rsquo;ve followed this tutorial), then click the &ldquo;Add&rdquo; button.</p>
<h2 id="4-edit-your-openvpn-networkmanager-configuration">4. Edit your OpenVPN NetworkManager configuration</h2>
<p>Nearly there! Now that we&rsquo;ve added the VPN as a NetworkManager connection, we&rsquo;ll need to make a quick change to it. We can see a list of NetworkManager connections by running:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">ls -la /etc/NetworkManager/system-connections/*
</code></pre></div><p>The one for our VPN is probably called <code>openvpn</code>, so let&rsquo;s edit it by running:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">sudo vim /etc/NetworkManager/system-connections/openvpn
</code></pre></div><p>Under <code>[ipv4]</code>, we&rsquo;ll need to add the line <code>dns-priority=-42</code>. It should end up looking like this:</p>
<p><img src="18-connsettings.jpg#screenshot" alt="Connection settings for ipv4"></p>
<p>Setting a negative number is a workaround that prioritizes this DNS server. The actual number is arbitrary (<code>-1</code> should also work) but I like 42. ¯\_(ツ)_/¯</p>
<h2 id="5-restart-connect-profit">5. Restart, connect, profit!!!</h2>
<p>In a terminal, run:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">sudo service network-manager restart
</code></pre></div><p>Then in the Network Settings, click the magic button that turns on the VPN:</p>
<p><img src="19-vpnon.jpg#screenshot" alt="Network Settings window"></p>
<p>Finally, visit <a href="https://www.dnsleaktest.com/">the DNS leak test website</a> and click on &ldquo;Extended test&rdquo; to verify the fix. If everything&rsquo;s working properly, we should now see a list containing only our VPN ISP.</p>
<p><img src="20-noleaks.png#screenshot" alt="Successful DNS leak test results"></p>
<p>And we&rsquo;re done! Congratulations on rolling your very own VPN server and stopping DNS leaks with OpenVPN. Enjoy surfing in (relative) privacy. Now your only worry at the local coffeeshop is who&rsquo;s watching you surf from the seat behind you.</p>
How to do twice as much with half the keystrokes using `.bashrc`https://victoria.dev/blog/how-to-do-twice-as-much-with-half-the-keystrokes-using-.bashrc/
Wed, 21 Aug 2019 09:17:02 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/how-to-do-twice-as-much-with-half-the-keystrokes-using-.bashrc/An overview of time-saving aliases, functions, and creating a useful Bash prompt.
]]>
<p>In my <a href="https://victoria.dev/blog/how-to-set-up-a-fresh-ubuntu-desktop-using-only-dotfiles-and-bash-scripts/">recent post about setting up Ubuntu with Bash scripts</a>, I briefly alluded to the magic of <code>.bashrc</code>. This didn&rsquo;t really do it justice, so here&rsquo;s a quick post that offers a bit more detail about what the Bash configuration file can do.</p>
<p>My current configuration hugely improves my workflow, and saves me well over 50% of the keystrokes I would have to employ without it! Let&rsquo;s look at some examples of aliases, functions, and prompt configurations that can improve our workflow by helping us be more efficient with fewer key presses.</p>
<h1 id="bash-aliases">Bash aliases</h1>
<p>A smartly written <code>.bashrc</code> can save a whole lot of keystrokes. We can take advantage of this in the literal sense by using <a href="https://www.gnu.org/software/bash/manual/html_node/Aliases.html">bash aliases</a>, or strings that expand to larger commands. For an indicative example, here is a Bash alias for copying files in the terminal:</p>
<div class="highlight"><pre class="chroma"><code class="language-bash" data-lang="bash"><span class="c1"># Always copy contents of directories (r)ecursively and explain (v) what was done</span>
<span class="nb">alias</span> <span class="nv">cp</span><span class="o">=</span><span class="s1">&#39;cp -rv&#39;</span>
</code></pre></div><p>The <code>alias</code> command defines the string we&rsquo;ll type, followed by what that string will expand to. We can override existing commands like <code>cp</code> above. On its own, the <code>cp</code> command will only copy files, not directories, and succeeds silently. With this alias, we need not remember to pass those two flags, nor <code>cd</code> or <code>ls</code> the location of our copied file to confirm that it&rsquo;s there! Now, just those two key presses (for <code>c</code> and <code>d</code>) will do all of that for us.</p>
<p>Here are a few more <code>.bashrc</code> aliases for passing flags with common functions.</p>
<div class="highlight"><pre class="chroma"><code class="language-bash" data-lang="bash"><span class="c1"># List contents with colors for file types, (A)lmost all hidden files (without . and ..), in (C)olumns, with class indicators (F)</span>
<span class="nb">alias</span> <span class="nv">ls</span><span class="o">=</span><span class="s1">&#39;ls --color=auto -ACF&#39;</span>
<span class="c1"># List contents with colors for file types, (a)ll hidden entries (including . and ..), use (l)ong listing format, with class indicators (F)</span>
<span class="nb">alias</span> <span class="nv">ll</span><span class="o">=</span><span class="s1">&#39;ls --color=auto -alF&#39;</span>
<span class="c1"># Explain (v) what was done when moving a file</span>
<span class="nb">alias</span> <span class="nv">mv</span><span class="o">=</span><span class="s1">&#39;mv -v&#39;</span>
<span class="c1"># Create any non-existent (p)arent directories and explain (v) what was done</span>
<span class="nb">alias</span> <span class="nv">mkdir</span><span class="o">=</span><span class="s1">&#39;mkdir -pv&#39;</span>
<span class="c1"># Always try to (c)ontinue getting a partially-downloaded file</span>
<span class="nb">alias</span> <span class="nv">wget</span><span class="o">=</span><span class="s1">&#39;wget -c&#39;</span>
</code></pre></div><p>Aliases come in handy when we want to avoid typing long commands, too. Here are a few I use when working with Python environments:</p>
<div class="highlight"><pre class="chroma"><code class="language-bash" data-lang="bash"><span class="nb">alias</span> <span class="nv">pym</span><span class="o">=</span><span class="s1">&#39;python3 manage.py&#39;</span>
<span class="nb">alias</span> <span class="nv">mkenv</span><span class="o">=</span><span class="s1">&#39;python3 -m venv env&#39;</span>
<span class="nb">alias</span> <span class="nv">startenv</span><span class="o">=</span><span class="s1">&#39;source env/bin/activate &amp;&amp; which python3&#39;</span>
<span class="nb">alias</span> <span class="nv">stopenv</span><span class="o">=</span><span class="s1">&#39;deactivate&#39;</span>
</code></pre></div><p>For further inspiration on ways Bash aliases can save time, I highly recommend <a href="https://www.digitalocean.com/community/tutorials/an-introduction-to-useful-bash-aliases-and-functions">the examples in this article</a>.</p>
<h1 id="bash-functions">Bash functions</h1>
<p>One downside of the aliases above is that they&rsquo;re rather static - they&rsquo;ll always expand to exactly the text declared. For a Bash alias that takes arguments, we&rsquo;ll need to create a function. We can do this like so:</p>
<div class="highlight"><pre class="chroma"><code class="language-bash" data-lang="bash"><span class="c1"># Show contents of the directory after changing to it</span>
<span class="k">function</span> <span class="nb">cd</span> <span class="o">()</span> <span class="o">{</span>
<span class="nb">builtin</span> <span class="nb">cd</span> <span class="s2">&#34;</span><span class="nv">$1</span><span class="s2">&#34;</span>
ls -ACF
<span class="o">}</span>
</code></pre></div><p>I can&rsquo;t begin to tally how many times I&rsquo;ve typed <code>cd</code> and then <code>ls</code> immediately after to see the contents of the directory I&rsquo;m now in. With this function set up, it all happens with just those two letters! The function takes the first argument, <code>$1</code>, as the location to change directory to, then prints the contents of that directory in nicely formatted columns with file type indicators. The <code>builtin</code> part is necessary to get Bash to allow us to override this default command.</p>
<p>Bash functions are very useful when it comes to downloading or upgrading software, too. I previously spent at least a few minutes every couple weeks downloading the new extended version of the <a href="https://gohugo.io/categories/releases">static site generator Hugo</a>, thanks to their excellent shipping frequency. With a function, I only need to pass in the version, and the upgrade happens in a few seconds.</p>
<div class="highlight"><pre class="chroma"><code class="language-bash" data-lang="bash"><span class="c1"># Hugo install or upgrade</span>
<span class="k">function</span> gethugo <span class="o">()</span> <span class="o">{</span>
wget -q -P tmp/ https://github.com/gohugoio/hugo/releases/download/v<span class="s2">&#34;</span><span class="nv">$@</span><span class="s2">&#34;</span>/hugo_extended_<span class="s2">&#34;</span><span class="nv">$@</span><span class="s2">&#34;</span>_Linux-64bit.tar.gz
tar xf tmp/hugo_extended_<span class="s2">&#34;</span><span class="nv">$@</span><span class="s2">&#34;</span>_Linux-64bit.tar.gz -C tmp/
sudo mv -f tmp/hugo /usr/local/bin/
rm -rf tmp/
hugo version
<span class="o">}</span>
</code></pre></div><p>The <code>$@</code> notation simply takes all the arguments given, replacing its spot in the function. To run the above function and download Hugo version 0.57.2, we use the command <code>gethugo 0.57.2</code>.</p>
<p>I&rsquo;ve got one for <a href="https://golang.org/">Golang</a>, too:</p>
<div class="highlight"><pre class="chroma"><code class="language-bash" data-lang="bash"><span class="k">function</span> getgolang <span class="o">()</span> <span class="o">{</span>
sudo rm -rf /usr/local/go
wget -q -P tmp/ https://dl.google.com/go/go<span class="s2">&#34;</span><span class="nv">$@</span><span class="s2">&#34;</span>.linux-amd64.tar.gz
sudo tar -C /usr/local -xzf tmp/go<span class="s2">&#34;</span><span class="nv">$@</span><span class="s2">&#34;</span>.linux-amd64.tar.gz
rm -rf tmp/
go version
<span class="o">}</span>
</code></pre></div><p>Or how about a function that adds a remote origin URL for GitLab to the current repository?</p>
<div class="highlight"><pre class="chroma"><code class="language-bash" data-lang="bash"><span class="k">function</span> glab <span class="o">()</span> <span class="o">{</span>
git remote set-url origin --add git@gitlab.com:<span class="s2">&#34;</span><span class="nv">$@</span><span class="s2">&#34;</span>/<span class="s2">&#34;</span><span class="si">${</span><span class="nv">PWD</span><span class="p">##*/</span><span class="si">}</span><span class="s2">&#34;</span>.git
git remote -v
<span class="o">}</span>
</code></pre></div><p>With <code>glab username</code>, we can create a new <code>origin</code> URL for the current Git repository with our <code>username</code> on GitLab.com. Pushing to a new remote URL <a href="https://victoria.dev/blog/how-to-write-bash-one-liners-for-cloning-and-managing-github-and-gitlab-repositories/#a-bash-one-liner-to-create-and-push-many-repositories-on-gitlab">automatically creates a new private GitLab repository</a>, so this is a useful shortcut for creating backups!</p>
<p>Bash functions are really only limited by the possibilities of scripting, of which there are, practically, few limits. If there&rsquo;s anything we do on a frequent basis that requires typing a few lines into a terminal, we can probably create a Bash function for it!</p>
<h1 id="bash-prompt">Bash prompt</h1>
<p>Besides directory contents, it&rsquo;s also useful to see the full path of the directory we&rsquo;re in. The Bash prompt can show us this path, along with other useful information like our current Git branch. To make it more readable, we can define colours for each part of the prompt. Here&rsquo;s how we can set up our prompt in <code>.bashrc</code> to accomplish this:</p>
<div class="highlight"><pre class="chroma"><code class="language-bash" data-lang="bash"><span class="c1"># Colour codes are cumbersome, so let&#39;s name them</span>
<span class="nv">txtcyn</span><span class="o">=</span><span class="s1">&#39;\[\e[0;96m\]&#39;</span> <span class="c1"># Cyan</span>
<span class="nv">txtpur</span><span class="o">=</span><span class="s1">&#39;\[\e[0;35m\]&#39;</span> <span class="c1"># Purple</span>
<span class="nv">txtwht</span><span class="o">=</span><span class="s1">&#39;\[\e[0;37m\]&#39;</span> <span class="c1"># White</span>
<span class="nv">txtrst</span><span class="o">=</span><span class="s1">&#39;\[\e[0m\]&#39;</span> <span class="c1"># Text Reset</span>
<span class="c1"># Which (C)olour for what part of the prompt?</span>
<span class="nv">pathC</span><span class="o">=</span><span class="s2">&#34;</span><span class="si">${</span><span class="nv">txtcyn</span><span class="si">}</span><span class="s2">&#34;</span>
<span class="nv">gitC</span><span class="o">=</span><span class="s2">&#34;</span><span class="si">${</span><span class="nv">txtpur</span><span class="si">}</span><span class="s2">&#34;</span>
<span class="nv">pointerC</span><span class="o">=</span><span class="s2">&#34;</span><span class="si">${</span><span class="nv">txtwht</span><span class="si">}</span><span class="s2">&#34;</span>
<span class="nv">normalC</span><span class="o">=</span><span class="s2">&#34;</span><span class="si">${</span><span class="nv">txtrst</span><span class="si">}</span><span class="s2">&#34;</span>
<span class="c1"># Get the name of our branch and put parenthesis around it</span>
gitBranch<span class="o">()</span> <span class="o">{</span>
git branch 2&gt; /dev/null <span class="p">|</span> sed -e <span class="s1">&#39;/^[^*]/d&#39;</span> -e <span class="s1">&#39;s/* \(.*\)/(\1)/&#39;</span>
<span class="o">}</span>
<span class="c1"># Build the prompt</span>
<span class="nb">export</span> <span class="nv">PS1</span><span class="o">=</span><span class="s2">&#34;</span><span class="si">${</span><span class="nv">pathC</span><span class="si">}</span><span class="s2">\w </span><span class="si">${</span><span class="nv">gitC</span><span class="si">}</span><span class="s2">\$(gitBranch) </span><span class="si">${</span><span class="nv">pointerC</span><span class="si">}</span><span class="s2">\$</span><span class="si">${</span><span class="nv">normalC</span><span class="si">}</span><span class="s2"> &#34;</span>
</code></pre></div><p>Result:</p>
<div class="highlight"><pre class="chroma"><code class="language-bash" data-lang="bash">~/github/myrepo <span class="o">(</span>master<span class="o">)</span> $
</code></pre></div><p>Naming the colours helps to easily identify where one colour starts and stops, and where the next one begins. The prompt that we see in our terminal is defined by the string following <code>export PS1</code>, with each component of the prompt set with an <a href="https://www.tldp.org/HOWTO/Bash-Prompt-HOWTO/bash-prompt-escape-sequences.html">escape sequence</a>. Let&rsquo;s break that down:</p>
<ul>
<li><code>\w</code> displays the current working directory,</li>
<li><code>\$(gitBranch)</code> calls the <code>gitBranch</code> function defined above, which displays the current Git branch,</li>
<li><code>\$</code> will display a &ldquo;$&rdquo; if you are a normal user or in normal user mode, and a &ldquo;#&rdquo; if you are root.</li>
</ul>
<p>The <a href="https://www.gnu.org/software/bash/manual/html_node/Controlling-the-Prompt.html">full list of Bash escape sequences</a> can help us display many more bits of information, including even the time and date! Bash prompts are highly customizable and individual, so feel free to set it up any way you please.</p>
<p>Here are a few options that put information front and centre and can help us to work more efficiently.</p>
<h2 id="for-the-procrastination-averse">For the procrastination-averse</h2>
<p>Username and current time with seconds, in 24-hour HH:MM:SS format:</p>
<div class="highlight"><pre class="chroma"><code class="language-bash" data-lang="bash"><span class="nb">export</span> <span class="nv">PS1</span><span class="o">=</span><span class="s2">&#34;</span><span class="si">${</span><span class="nv">userC</span><span class="si">}</span><span class="s2">\u </span><span class="si">${</span><span class="nv">normalC</span><span class="si">}</span><span class="s2">at \t &gt;&#34;</span>
</code></pre></div><pre><code>user at 09:35:55 &gt;
</code></pre><h2 id="for-those-who-always-like-to-know-where-they-stand">For those who always like to know where they stand</h2>
<p>Full file path on a separate line, and username:</p>
<div class="highlight"><pre class="chroma"><code class="language-bash" data-lang="bash"><span class="nb">export</span> <span class="nv">PS1</span><span class="o">=</span><span class="s2">&#34;</span><span class="si">${</span><span class="nv">pathC</span><span class="si">}</span><span class="s2">\w</span><span class="si">${</span><span class="nv">normalC</span><span class="si">}</span><span class="s2">\n\u:&#34;</span>
</code></pre></div><pre><code>~/github/myrepo
user:
</code></pre><h2 id="for-the-minimalist">For the minimalist</h2>
<div class="highlight"><pre class="chroma"><code class="language-bash" data-lang="bash"><span class="nb">export</span> <span class="nv">PS1</span><span class="o">=</span><span class="s2">&#34;&gt;&#34;</span>
</code></pre></div><pre><code>&gt;
</code></pre><p>We can build many practical prompts with just the basic escape sequences; once we start to integrate functions with prompts, as in the Git branch example, things can get really complicated. Whether this amount of complication is an addition or a detriment to your productivity, only you can know for sure!</p>
<p>Many fancy Bash prompts are possible with programs readily available with a quick search. I&rsquo;ve intentionally not provided samples here because, well, if you can tend to get as excited about this stuff as I can, it might be a couple hours before you get back to what you were doing before you started reading this post, and I just can&rsquo;t have that on my conscience. 🥺</p>
<p>We&rsquo;ve hopefully struck a nice balance now between time invested and usefulness gained from our Bash configuration file! I hope you use your newly-recovered keystroke capacity for good.</p>
How to set up a fresh Ubuntu desktop using only dotfiles and bash scriptshttps://victoria.dev/blog/how-to-set-up-a-fresh-ubuntu-desktop-using-only-dotfiles-and-bash-scripts/
Mon, 19 Aug 2019 07:58:18 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/how-to-set-up-a-fresh-ubuntu-desktop-using-only-dotfiles-and-bash-scripts/Configure settings, install programs, and customize your desktop environment with a single bash command.
]]>
<p>One of my most favourite things about open source files on GitHub is the ability to see how others do (what some people might call) mundane things, like set up their <code>.bashrc</code> and other dotfiles. While I&rsquo;m not as enthusiastic about ricing as I was when I first came to the Linux side, I still get pretty excited when I find a config setting that makes things prettier and faster, and thus, better.</p>
<p>I recently came across a few such things, particularly in <a href="https://github.com/tomnomnom">Tom Hudson&rsquo;s</a> dotfiles. Tom seems to like to script things, and some of those things include automatically setting up symlinks, and installing Ubuntu repository applications and other programs. This got me thinking. Could I automate the set up of a new machine to replicate my current one?</p>
<p>Being someone generally inclined to take things apart in order to see how they work, I know I&rsquo;ve messed up my laptop on occasion. (Usually when I&rsquo;m away from home, and my backup harddrive isn&rsquo;t.) On those rare but really inconvenient situations when my computer becomes a shell of its former self, (ba-dum-ching) it&rsquo;d be quite nice to have a fast, simple way of putting Humpty Dumpty back together again, just the way I like.</p>
<p>In contrast to creating a <a href="https://askubuntu.com/questions/19901/how-to-make-a-disk-image-and-restore-from-it-later">disk image and restoring it later</a>, a collection of bash scripts is easier to create, maintain, and move around. They require no special utilities, only an external transportation method. It&rsquo;s like passing along the recipe, instead of the whole bundt cake. (Mmm, cake.)</p>
<p>Additionally, functionality like this would be super useful when setting up a virtual machine, or VM, or even just a virtual private server, or VPS. (Both of which, now that I write this, would probably make more forgiving targets for my more destructive experimentations&hellip; live and learn!)</p>
<p>Well, after some grepping and Googling and digging around, I now have a suite of scripts that can do this:</p>
<!-- raw HTML omitted -->
<p>This is the tail end of a test run of the set up scripts on a fresh Ubuntu desktop, loaded off a bootable USB. It had all my programs and settings restored in under three minutes!</p>
<p>This post will cover how to achieve the automatic set up of a computer running Ubuntu Desktop (in my case, Ubuntu LTS 18.04) using bash scripts. The majority of the information covered is applicable to all the Linux desktop flavours, though some syntax may differ. The bash scripts cover three main areas: linking dotfiles, installing software from Ubuntu and elsewhere, and setting up the desktop environment. We&rsquo;ll cover each of these areas and go over the important bits so that you can begin to craft your own scripts.</p>
<h1 id="dotfiles">Dotfiles</h1>
<p>Dotfiles are what most Linux enthusiasts call configuration files. They typically live in the user&rsquo;s home directory (denoted in bash scripts with the <a href="https://www.tldp.org/LDP/abs/html/internal.html#BUILTINREF">builtin</a> variable <code>$HOME</code>) and control the appearance and behaviour of all kinds of programs. The file names begin with <code>.</code>, which denotes hidden files in Linux (hence &ldquo;dot&rdquo; files). Here are some common dotfiles and ways in which they&rsquo;re useful.</p>
<h2 id="bashrc"><code>.bashrc</code></h2>
<p>The <code>.bashrc</code> file is a list of commands executed at startup by interactive, non-login shells. <a href="https://www.tldp.org/LDP/abs/html/intandnonint.html">Interactive vs non-interactive shells</a> can be a little confusing, but aren&rsquo;t necessary for us to worry about here. For our purposes, any time you open a new terminal, see a prompt, and can type commands into it, your <code>.bashrc</code> was executed.</p>
<p>Lines in this file can help improve your workflow by creating aliases that reduce keystrokes, or by displaying a helpful prompt with useful information. It can even run user-created programs, like <a href="https://github.com/victoriadrake/eddie-terminal">Eddie</a>. For more ideas, you can have a look at <a href="https://github.com/victoriadrake/dotfiles/blob/master/.bashrc">my <code>.bashrc</code> file on GitHub</a>.</p>
<h2 id="vimrc"><code>.vimrc</code></h2>
<p>The <code>.vimrc</code> dotfile configures the champion of all text editors, <a href="https://www.vim.org/about.php">Vim</a>. (If you haven&rsquo;t yet wielded the powers of the keyboard shortcuts, I highly recommend <a href="https://vim-adventures.com/">a fun game to learn Vim with</a>.)</p>
<p>In <code>.vimrc</code>, we can set editor preferences such as display settings, colours, and custom keyboard shortcuts. You can take a look at <a href="https://github.com/victoriadrake/dotfiles/blob/master/.vimrc">my <code>.vimrc</code> on GitHub</a>.</p>
<p>Other dotfiles may be useful depending on the programs you use, such as <code>.gitconfig</code> or <code>.tmux.conf</code>. Exploring dotfiles on GitHub is a great way to get a sense of what&rsquo;s available and useful to you!</p>
<h1 id="linking-dotfiles">Linking dotfiles</h1>
<p>We can use a script to create symbolic links, or <a href="https://en.wikipedia.org/wiki/Symbolic_link#POSIX_and_Unix-like_operating_systems">symlinks</a> for all our dotfiles. This allows us to keep all the files in a central repository, where they can easily be managed, while also providing a sort of placeholder in the spot that our programs expect the configuration file to be found. This is typically, but not always, the user home directory. For example, since I store my dotfiles on GitHub, I keep them in a directory with a path like <code>~/github/dotfiles/</code> while the files themselves are symlinked, resulting in a path like <code>~/.vimrc</code>.</p>
<p>To programmatically check for and handle any existing files and symlinks, then create new ones, we can use <a href="https://github.com/victoriadrake/dotfiles/blob/master/scripts/symlink.sh">this elegant shell script</a>. I compliment it only because I blatantly stole the core of it from <a href="https://github.com/tomnomnom/dotfiles/blob/master/setup.sh">Tom&rsquo;s setup script</a>, so I can&rsquo;t take the credit for how lovely it is.</p>
<p>The <code>symlink.sh</code> script works by attempting to create symlinks for each dotfile in our <code>$HOME</code>. It first checks to see if a symlink already exists, or if a regular file or directory with the same name exists. In the former case, the symlink is removed and remade; in the latter, the file or directory is renamed, then the symlink is made.</p>
<h1 id="installing-software">Installing software</h1>
<p>One of the beautiful things about exploring shell scripts is discovering how much can be achieved using only the command line. As someone whose first exposure to computers was through a graphical operating system, I find working in the terminal to be refreshingly fast.</p>
<p>With Ubuntu, most programs we likely require are available through the default Ubuntu software repositories. We typically search for these with the command <code>apt search &lt;program&gt;</code> and install them with <code>sudo apt install &lt;program&gt;</code>. Some software we&rsquo;d like may not be in the default repositories, or may not be offered there in the most current version. In these cases, we can still install these programs in Ubuntu using a <a href="https://en.wikipedia.org/wiki/Ubuntu#Package_Archives">PPA, or Personal Package Archive</a>. We&rsquo;ll just have to be careful that the PPAs we choose are from the official sources.</p>
<p>If a program we&rsquo;d like doesn&rsquo;t appear in the default repositories or doesn&rsquo;t seem to have a PPA, we may still be able to install it via command line. A quick search for &ldquo;<!-- raw HTML omitted --> installation command line&rdquo; should get some answers.</p>
<p>Since bash scripts are just a collection of commands that we could run individually in the terminal, creating a script to install all our desired programs is as straightforward as putting all the commands into a script file. I chose to organize my installation scripts between the default repositories, which are installed by <a href="https://github.com/victoriadrake/dotfiles/blob/master/scripts/aptinstall.sh">my <code>aptinstall.sh</code> script</a>, and programs that involve external sources, handled with <a href="https://github.com/victoriadrake/dotfiles/blob/master/scripts/programs.sh">my <code>programs.sh</code> script</a>.</p>
<h1 id="setting-up-the-desktop-environment">Setting up the desktop environment</h1>
<p>On the recent occasions when I&rsquo;ve gotten a fresh desktop (intentionally or otherwise) I always seem to forget how long it takes to remember, find, and then change all the desktop environment settings. Keyboard shortcuts, workspaces, sound settings, night mode&hellip; it adds up!</p>
<p>Thankfully, all these settings have to be stored somewhere in a non-graphical format, which means that if we can discover how that&rsquo;s done, we can likely find a way to easily manipulate the settings with a bash script. Lo and behold the terminal command, <code>gsettings list-recursively</code>.</p>
<p>There are a heck of a lot of settings for GNOME desktop environment. We can make the list easier to scroll through (if, like me, you&rsquo;re sometimes the type of person to say &ldquo;Just let me look at everything and figure out what I want!&quot;) by piping to <code>less</code>: <code>gsettings list-recursively | less</code>. Alternatively, if we have an inkling as to what we might be looking for, we can use <code>grep</code>: <code>gsettings list-recursively | grep 'keyboard'</code>.</p>
<p>We can manipulate our settings with the <code>gsettings set</code> command. It can sometimes be difficult to find the syntax for the setting we want, so when we&rsquo;re first building our script, I recommend using the GUI to make the changes, then finding the <code>gsettings</code> line we changed and recording its value.</p>
<p>For some inspiration, you can view <a href="https://github.com/victoriadrake/dotfiles/blob/master/scripts/desktop.sh">my <code>desktop.sh</code> settings script on GitHub</a>.</p>
<h1 id="putting-it-all-together">Putting it all together</h1>
<p>Having modular scripts (one for symlinks, two for installing programs, another for desktop settings) is useful for both keeping things organized and for being able to run some but not all of the automated set up. For instance, if I were to set up a VPS in which I only use the command line, I wouldn&rsquo;t need to bother with installing graphical programs or desktop settings.</p>
<p>In cases where I do want to run all the scripts, however, doing so one-by-one is a little tedious. Thankfully, since bash scripts can themselves be run by terminal commands, we can simply write another master script to run them all!</p>
<p>Here&rsquo;s my master script to handle the set up of a new Ubuntu desktop machine:</p>
<div class="highlight"><pre class="chroma"><code class="language-bash" data-lang="bash"><span class="cp">#!/bin/bash
</span><span class="cp"></span>
./symlink.sh
./aptinstall.sh
./programs.sh
./desktop.sh
<span class="c1"># Get all upgrades</span>
sudo apt upgrade -y
<span class="c1"># See our bash changes</span>
<span class="nb">source</span> ~/.bashrc
<span class="c1"># Fun hello</span>
figlet <span class="s2">&#34;... and we&#39;re back!&#34;</span> <span class="p">|</span> lolcat
</code></pre></div><p>I threw in the upgrade line for good measure. It will make sure that the programs installed on our fresh desktop have the latest updates. Now a simple, single bash command will take care of everything!</p>
<p>You may have noticed that, while our desktop now looks and runs familiarly, these scripts don&rsquo;t cover one very important area: our files. Hopefully, you have a back up method for those that involves some form of reliable external hardware. If not, and if you tend to put your work in external repository hosts like GitHub or GitLab, I do have a way to <a href="https://victoria.dev/blog/how-to-write-bash-one-liners-for-cloning-and-managing-github-and-gitlab-repositories/">automatically clone and back up your GitHub repositories with bash one-liners</a>.</p>
<p>Relying on external repository hosts doesn&rsquo;t offer 100% coverage, however. Files that you wouldn&rsquo;t put in an externally hosted repository (private or otherwise) consequently can&rsquo;t be pulled. Git ignored objects that can&rsquo;t be generated from included files, like private keys and secrets, will not be recreated. Those files, however, are likely small enough that you could fit a whole bunch on a couple encrypted USB flash drives (and if you don&rsquo;t have private key backups, maybe you ought to do that first?).</p>
<p>That said, I hope this post has given you at least some inspiration as to how dotfiles and bash scripts can help to automate setting up a fresh desktop. If you come up with some settings you find useful, please help others discover them by sharing your dotfiles, too!</p>
How to write Bash one-liners for cloning and managing GitHub and GitLab repositorieshttps://victoria.dev/blog/how-to-write-bash-one-liners-for-cloning-and-managing-github-and-gitlab-repositories/
Tue, 06 Aug 2019 10:55:19 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/how-to-write-bash-one-liners-for-cloning-and-managing-github-and-gitlab-repositories/Using xargs and awk in Bash to automate managing remote-hosted repositories.
]]>
<p>Few things are more satisfying to me than one elegant line of Bash that automates hours of tedious work. As part of some recent explorations into <a href="https://victoria.dev/blog/how-to-set-up-a-fresh-ubuntu-desktop-using-only-dotfiles-and-bash-scripts/">automatically re-creating my laptop with Bash scripts</a>, I wanted to find a way to easily clone my GitHub-hosted repositories to a new machine. After a bit of digging around, I wrote a one-liner that did just that. Then, in the spirit of not putting all our eggs in the same basket, I wrote another one-liner to automatically create and push to GitLab-hosted backups as well. Here they are.</p>
<h1 id="a-bash-one-liner-to-clone-all-your-github-repositories">A Bash one-liner to clone all your GitHub repositories</h1>
<p>Caveat: you&rsquo;ll need a list of the GitHub repositories you want to clone. The good thing about that is it gives you full agency to choose just the repositories you want on your machine, instead of going in whole-hog.</p>
<p>You can easily clone GitHub repositories without entering your password each time by using HTTPS with your <a href="https://help.github.com/en/articles/caching-your-github-password-in-git">15-minute cached credentials</a> or, my preferred method, by <a href="https://help.github.com/en/articles/connecting-to-github-with-ssh">connecting to GitHub with SSH</a>. For brevity I&rsquo;ll assume we&rsquo;re going with the latter, and our SSH keys are set up.</p>
<p>Given a list of GitHub URLs in the file <code>gh-repos.txt</code>, like this:</p>
<div class="highlight"><pre class="chroma"><code class="language-txt" data-lang="txt">git@github.com:username/first-repository.git
git@github.com:username/second-repository.git
git@github.com:username/third-repository.git
</code></pre></div><p>We run:</p>
<div class="highlight"><pre class="chroma"><code class="language-bash" data-lang="bash">xargs -n1 git clone &lt; gh-repos.txt
</code></pre></div><p>This clones all the repositories on the list into the current folder. This same one-liner works for GitLab repositories as well, if you substitute the appropriate URLs.</p>
<h2 id="whats-going-on-here">What&rsquo;s going on here?</h2>
<p>There are two halves to this one-liner: the input, counterintuitively on the right side, and the part that makes stuff happen, on the left. We could make the order of these parts more intuitive (maybe?) by writing the same command like this:</p>
<div class="highlight"><pre class="chroma"><code class="language-bash" data-lang="bash">&lt;gh-repos.txt xargs -n1 git clone
</code></pre></div><p>To run a command for each line of our input, <code>gh-repos.txt</code>, we use <code>xargs -n1</code>. The tool <code>xargs</code> reads items from input and executes any commands it finds (it will <code>echo</code> if it doesn&rsquo;t find any). By default, it assumes that items are separated by spaces; new lines also works and makes our list easier to read. The flag <code>-n1</code> tells <code>xargs</code> to use <code>1</code> argument, or in our case, one line, per command. We build our command with <code>git clone</code>, which <code>xargs</code> then executes for each line. Ta-da.</p>
<h1 id="a-bash-one-liner-to-create-and-push-many-repositories-on-gitlab">A Bash one-liner to create and push many repositories on GitLab</h1>
<p>GitLab, unlike GitHub, lets us do this nifty thing where we don&rsquo;t have to use the website to make a new repository first. We can <a href="https://gitlab.com/help/gitlab-basics/create-project#push-to-create-a-new-project">create a new GitLab repository from our terminal</a>. The newly created repository defaults to being set as Private, so if we want to make it Public on GitLab, we&rsquo;ll have to do that manually later.</p>
<p>The GitLab docs tell us to push to create a new project using <code>git push --set-upstream</code>, but I don&rsquo;t find this to be very convenient for using GitLab as a backup. As I work with my repositories in the future, I&rsquo;d like to run one command that pushes to both GitHub <em>and</em> GitLab without additional effort on my part.</p>
<p>To make this Bash one-liner work, we&rsquo;ll also need a list of repository URLs for GitLab (ones that don&rsquo;t exist yet). We can easily do this by copying our GitHub repository list, opening it up with Vim, and doing a <a href="https://vim.fandom.com/wiki/Search_and_replace">search-and-replace</a>:</p>
<div class="highlight"><pre class="chroma"><code class="language-bash" data-lang="bash">cp gh-repos.txt gl-repos.txt
vim gl-repos.txt
:%s/<span class="se">\&lt;</span>github<span class="se">\&gt;</span>/gitlab/g
:wq
</code></pre></div><p>This produces <code>gl-repos.txt</code>, which looks like:</p>
<div class="highlight"><pre class="chroma"><code class="language-txt" data-lang="txt">git@gitlab.com:username/first-repository.git
git@gitlab.com:username/second-repository.git
git@gitlab.com:username/third-repository.git
</code></pre></div><p>We can create these repositories on GitLab, add the URLs as remotes, and push our code to the new repositories by running:</p>
<div class="highlight"><pre class="chroma"><code class="language-bash" data-lang="bash">awk -F<span class="s1">&#39;\/|(\.git)&#39;</span> <span class="s1">&#39;{system(&#34;cd ~/FULL/PATH/&#34; $2 &#34; &amp;&amp; git remote set-url origin --add &#34; $0 &#34; &amp;&amp; git push&#34;)}&#39;</span> gl-repos.txt
</code></pre></div><p>Hang tight and I&rsquo;ll explain it; for now, take note that <code>~/FULL/PATH/</code> should be the full path to the directory containing our GitHub repositories.</p>
<p>We do have to make note of a couple assumptions:</p>
<ol>
<li>The name of the directory on your local machine that contains the repository is the same as the name of the repository in the URL (this will be the case if it was cloned with the one-liner above);</li>
<li>Each repository is currently checked out to the branch you want pushed, ie. <code>master</code>.</li>
</ol>
<p>The one-liner could be expanded to handle these assumptions, but it is the humble opinion of the author that at that point, we really ought to be writing a Bash script.</p>
<h2 id="whats-going-on-here-1">What&rsquo;s going on here?</h2>
<p>Our Bash one-liner uses each line (or URL) in the <code>gl-repos.txt</code> file as input. With <code>awk</code>, it splits off the name of the directory containing the repository on our local machine, and uses these pieces of information to build our larger command. If we were to <code>print</code> the output of <code>awk</code>, we&rsquo;d see:</p>
<div class="highlight"><pre class="chroma"><code class="language-bash" data-lang="bash"><span class="nb">cd</span> ~/FULL/PATH/first-repository <span class="o">&amp;&amp;</span> git remote set-url origin --add git@gitlab.com:username/first-repository.git <span class="o">&amp;&amp;</span> git push
<span class="nb">cd</span> ~/FULL/PATH/second-repository <span class="o">&amp;&amp;</span> git remote set-url origin --add git@gitlab.com:username/second-repository.git <span class="o">&amp;&amp;</span> git push
<span class="nb">cd</span> ~/FULL/PATH/third-repository <span class="o">&amp;&amp;</span> git remote set-url origin --add git@gitlab.com:username/third-repository.git <span class="o">&amp;&amp;</span> git push
</code></pre></div><p>Let&rsquo;s look at how we build this command.</p>
<h3 id="splitting-strings-with-awk">Splitting strings with <code>awk</code></h3>
<p>The tool <code>awk</code> can split input based on <a href="https://www.gnu.org/software/gawk/manual/html_node/Command-Line-Field-Separator.html">field separators</a>. The default separator is a whitespace character, but we can change this by passing the <code>-F</code> flag. Besides single characters, we can also use a <a href="https://www.gnu.org/software/gawk/manual/html_node/Regexp-Field-Splitting.html#Regexp-Field-Splitting">regular expression field separator</a>. Since our repository URLs have a set format, we can grab the repository names by asking for the substring between the slash character <code>/</code> and the end of the URL, <code>.git</code>.</p>
<p>One way to accomplish this is with our regex <code>\/|(\.git)</code>:</p>
<ul>
<li><code>\/</code> is an escaped <code>/</code> character;</li>
<li><code>|</code> means &ldquo;or&rdquo;, telling awk to match either expression;</li>
<li><code>(\.git)</code> is the capture group at the end of our URL that matches &ldquo;.git&rdquo;, with an escaped <code>.</code> character. This is a bit of a cheat, as &ldquo;.git&rdquo; isn&rsquo;t strictly splitting anything (there&rsquo;s nothing on the other side) but it&rsquo;s an easy way for us to take this bit off.</li>
</ul>
<p>Once we&rsquo;ve told <code>awk</code> where to split, we can grab the right substring with the <a href="https://www.gnu.org/software/gawk/manual/html_node/Fields.html#index-_0024-_0028dollar-sign_0029_002c-_0024-field-operator">field operator</a>. We refer to our fields with a <code>$</code> character, then by the field&rsquo;s column number. In our example, we want the second field, <code>$2</code>. Here&rsquo;s what all the substrings look like:</p>
<div class="highlight"><pre class="chroma"><code class="language-bash" data-lang="bash">1: git@gitlab.com:username
2: first-repository
</code></pre></div><p>To use the whole string, or in our case, the whole URL, we use the field operator <code>$0</code>. To write the command, we just substitute the field operators for the repository name and URL. Running this with <code>print</code> as we&rsquo;re building it can help to make sure we&rsquo;ve got all the spaces right.</p>
<div class="highlight"><pre class="chroma"><code class="language-bash" data-lang="bash">awk -F<span class="s1">&#39;\/|(\.git)&#39;</span> <span class="s1">&#39;{print &#34;cd ~/FULL/PATH/&#34; $2 &#34; &amp;&amp; git remote set-url origin --add &#34; $0 &#34; &amp;&amp; git push&#34;}&#39;</span> gl-repos.txt
</code></pre></div><h3 id="running-the-command">Running the command</h3>
<p>We build our command inside the parenthesis of <code>system()</code>. By using this as the output of <code>awk</code>, each command will run as soon as it is built and output. The <code>system()</code> function creates a <a href="https://en.wikipedia.org/wiki/Child_process">child process</a> that executes our command, then returns once the command is completed. In plain English, this lets us perform the Git commands on each repository, one-by-one, without breaking from our main process in which <code>awk</code> is doing things with our input file. Here&rsquo;s our final command again, all put together.</p>
<div class="highlight"><pre class="chroma"><code class="language-bash" data-lang="bash">awk -F<span class="s1">&#39;\/|(\.git)&#39;</span> <span class="s1">&#39;{system(&#34;cd ~/FULL/PATH/&#34; $2 &#34; &amp;&amp; git remote set-url origin --add &#34; $0 &#34; &amp;&amp; git push&#34;)}&#39;</span> gl-repos.txt
</code></pre></div><h3 id="using-our-backups">Using our backups</h3>
<p>By adding the GitLab URLs as remotes, we&rsquo;ve simplified the process of pushing to both externally hosted repositories. If we run <code>git remote -v</code> in one of our repository directories, we&rsquo;ll see:</p>
<div class="highlight"><pre class="chroma"><code class="language-bash" data-lang="bash">origin git@github.com:username/first-repository.git <span class="o">(</span>fetch<span class="o">)</span>
origin git@github.com:username/first-repository.git <span class="o">(</span>push<span class="o">)</span>
origin git@gitlab.com:username/first-repository.git <span class="o">(</span>push<span class="o">)</span>
</code></pre></div><p>Now, simply running <code>git push</code> without arguments will push the current branch to both remote repositories.</p>
<p>We should also note that <code>git pull</code> will generally only try to pull from the remote repository you originally cloned from (the URL marked <code>(fetch)</code> in our example above). Pulling from multiple Git repositories at the same time is possible, but complicated, and beyond the scope of this post. Here&rsquo;s an <a href="https://astrofloyd.wordpress.com/2015/05/05/git-pushing-to-and-pulling-from-multiple-remote-locations-remote-url-and-pushurl/">explanation of pushing and pulling to multiple remotes</a> to help get you started, if you&rsquo;re curious. The <a href="https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes">Git documentation on remotes</a> may also be helpful.</p>
<h1 id="to-elaborate-on-the-succinctness-of-bash-one-liners">To elaborate on the succinctness of Bash one-liners</h1>
<p>Bash one-liners, when understood, can be fun and handy shortcuts. At the very least, being aware of tools like <code>xargs</code> and <code>awk</code> can help to automate and alleviate a lot of tediousness in our work. However, there are some downsides.</p>
<p>In terms of an easy-to-understand, maintainable, and approachable tool, Bash one-liners suck. They&rsquo;re usually more complicated to write than a Bash script using <code>if</code> or <code>while</code> loops, and certainly more complicated to read. It&rsquo;s likely that when we write them, we&rsquo;ll miss a single quote or closing parenthesis somewhere; and as I hope this post demonstrates, they can take quite a bit of explaining, too. So why use them?</p>
<p>Imagine reading a recipe for baking a cake, step by step. You understand the methods and ingredients, and gather your supplies. Then, as you think about it, you begin to realize that if you just throw all the ingredients at the oven in precisely the right order, a cake will instantly materialize. You try it, and it works!</p>
<p>That would be pretty satisfying, wouldn&rsquo;t it?</p>
A quick guide to changing your GitHub usernamehttps://victoria.dev/blog/a-quick-guide-to-changing-your-github-username/
Sun, 28 Jul 2019 15:19:13 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/a-quick-guide-to-changing-your-github-username/Some additional steps to consider after making a change to your username on GitHub.
]]>
<p>This being the 2,38947234th and probably last time I&rsquo;ll change my username, (marriage is permanent, right?) I thought I&rsquo;d better write a quick post on how this transition can be achieved as smoothly as possible. You can read <a href="https://help.github.com/en/articles/changing-your-github-username">official instructions on how to change your GitHub username</a> here, and they will tell you how to do it and what happens. The following is a quick guide to some things to consider <em>afterwards.</em></p>
<h2 id="where-to-make-changes">Where to make changes</h2>
<ol>
<li>Change username in <a href="https://github.com/settings/admin">GitHub account settings.</a></li>
<li>If using GitHub Pages, change name of your &ldquo;username.github.io&rdquo; repository.</li>
<li>If using other services that point to your &ldquo;username.github.io&rdquo; repository address, update them.</li>
<li>If using Netlify, you <em>may</em> want to sign in and reconnect your repositories. (Mine still worked, but due to a possibly unrelated issue, I&rsquo;m not positive.)</li>
<li>Sign in to Travis CI and other integrations (find them in your repository Settings tab -&gt; Integrations &amp; services). This will update your username there.</li>
<li>Update your local files and repository links with <em>very carefully executed</em> <code>find</code> and <code>sed</code> commands, and push back changes to GitHub.</li>
<li>Redeploy any websites you may have with your updated GitHub link.</li>
<li>Fix any links around the web to your profile, your repositories, or Gists you may have shared.</li>
</ol>
<h2 id="local-file-updates">Local file updates</h2>
<p>Here are some suggestions for strings to search and replace your username in.</p>
<ul>
<li><code>github.com/username</code> (References to your GitHub page in READMEs or in website copy)</li>
<li><code>username.github.io</code> (Links to your GitHub Page)</li>
<li><code>git@github.com:username</code> (Git config remote ssh urls)</li>
<li><code>travis-ci.com/username</code> (Travis badges in READMEs)</li>
<li><code>shields.io/github/.../username</code> (Shields badges in READMEs, types include <code>contributors</code>, <code>stars</code>, <code>tags</code>, and more)</li>
</ul>
<p>You can quickly identify where the above strings are located using this command for each string:</p>
<p><code>grep -rnw -e 'foobar'</code></p>
<p>This will recursively (<code>r</code>) search all files for strings matching the whole (<code>w</code>) pattern (<code>e</code>) provided and prefix results with the line numbers (<code>n</code>) so you can easily find them.</p>
<p>Using <code>find</code> and <code>sed</code> can make these changes much faster. See <a href="https://victoria.dev/blog/how-to-replace-a-string-in-a-dozen-old-blog-posts-with-one-sed-terminal-command/">this article on search and replace</a>.</p>
<p>Enjoy your new handle! (I hope it sticks.)</p>
Two ways to deploy a public GitHub Pages site from a private Hugo repositoryhttps://victoria.dev/blog/two-ways-to-deploy-a-public-github-pages-site-from-a-private-hugo-repository/
Mon, 22 Apr 2019 10:05:15 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/two-ways-to-deploy-a-public-github-pages-site-from-a-private-hugo-repository/Keep your drafts out of the public eye by making use of continuous deployment tools to publish your public GitHub Pages site - from a separate private repository.
]]>
<p>Tools like Travis CI and Netlify offer some pretty nifty features, like seamlessly deploying your GitHub Pages site when changes are pushed to its repository. Along with a static site generator like Hugo, keeping a blog up to date is pretty painless.</p>
<p>I&rsquo;ve used Hugo to build my site for years, but until this past week I&rsquo;d never hooked up my Pages repository to any deployment service. Why? Because using a tool that built my site before deploying it seemed to require having the whole recipe in one place - and if you&rsquo;re using GitHub Pages with the free version of GitHub, <a href="https://help.github.com/en/articles/configuring-a-publishing-source-for-github-pages">that place is public</a>. That means that all my three-in-the-morning bright ideas and messy unfinished (and unfunny) drafts would be publicly available - and no amount of continuous convenience was going to convince me to do that.</p>
<p>So I kept things separated, with Hugo&rsquo;s messy behind-the-scenes stuff in a local Git repository, and the generated <code>public/</code> folder pushing to my GitHub Pages remote repository. Each time I wanted to deploy my site, I&rsquo;d have to get on my laptop and <code>hugo</code> to build my site, then <code>cd public/ &amp;&amp; git add . &amp;&amp; git commit</code>&hellip; etc etc. And all was well, except for the nagging feeling that there was a better way to do this.</p>
<p>I wrote another article a little while back about <a href="https://victoria.dev/blog/a-remote-sync-solution-for-ios-and-linux-git-and-working-copy/">using GitHub and Working Copy</a> to make changes to my repositories on my iPad whenever I&rsquo;m out and about. It seemed off to me that I could do everything except deploy my site from my iPad, so I set out to change that.</p>
<p>A couple three-in-the-morning bright ideas and a revoked access token later (oops), I now have not one but <em>two</em> ways to deploy to my public GitHub Pages repository from an entirely separated, private GitHub repository. In this post, I&rsquo;ll take you through achieving this with <a href="https://travis-ci.com/">Travis CI</a> or using <a href="http://netlify.com/">Netlify</a> and <a href="https://www.gnu.org/software/make/">Make</a>.</p>
<p>There&rsquo;s nothing hackish about it - my public GitHub Pages repository still looks the same as it does when I pushed to it locally from my terminal. Only now, I&rsquo;m able to take advantage of a couple great deployment tools to have the site update whenever I push to my private repo, whether I&rsquo;m on my laptop or out and about with my iPad.</p>
<figure>
<img src="im-on-a-bridge.jpg"
alt="Hashtag: you did not push from there"/> <figcaption>
<p>#YouDidNotPushFromThere</p>
</figcaption>
</figure>
<p>This article assumes you have working knowledge of Git and GitHub Pages. If not, you may like to spin off some browser tabs from my articles on <a href="https://victoria.dev/blog/a-remote-sync-solution-for-ios-and-linux-git-and-working-copy/">using GitHub and Working Copy</a> and <a href="https://victoria.dev/blog/how-i-ditched-wordpress-and-set-up-my-custom-domain-https-site-for-almost-free/">building a site with Hugo and GitHub Pages</a> first.</p>
<p>Let&rsquo;s do it!</p>
<h1 id="private-to-public-github-pages-deployment-with-travis-ci">Private-to-public GitHub Pages deployment with Travis CI</h1>
<p>Travis CI has the built-in ability (♪) to <a href="https://docs.travis-ci.com/user/deployment/pages/">deploy to GitHub Pages</a> following a successful build. They do a decent job in the docs of explaining how to add this feature, especially if you&rsquo;ve used Travis CI before&hellip; which I haven&rsquo;t. Don&rsquo;t worry, I did the bulk of the figuring-things-out for you.</p>
<ul>
<li>Travis CI gets all its instructions from a configuration file in the root of your repository called <code>.travis.yml</code></li>
<li>You need to provide a <a href="https://help.github.com/en/articles/creating-a-personal-access-token-for-the-command-line">GitHub personal access token</a> as a secure encrypted variable, which you can generate using <code>travis</code> on the command line</li>
<li>Once your script successfully finishes doing what you&rsquo;ve told it to do (not necessarily what you <em>want</em> it to do but that&rsquo;s a whole other blog post), Travis will deploy your build directory to a repository you can specify with the <code>repo</code> configuration variable.</li>
</ul>
<h2 id="setting-up-the-travis-configuration-file">Setting up the Travis configuration file</h2>
<p>Create a new configuration file for Travis with the filename <code>.travis.yml</code> (note the leading &ldquo;.&quot;). These scripts are very customizable and I struggled to find a relevant example to use as a starting point - luckily, you don&rsquo;t have that problem!</p>
<p>Here&rsquo;s my basic <code>.travis.yml</code>:</p>
<div class="highlight"><pre class="chroma"><code class="language-yml" data-lang="yml"><span class="k">git</span><span class="p">:</span><span class="w">
</span><span class="w"> </span><span class="k">depth</span><span class="p">:</span><span class="w"> </span><span class="kc">false</span><span class="w">
</span><span class="w">
</span><span class="w"></span><span class="k">env</span><span class="p">:</span><span class="w">
</span><span class="w"> </span><span class="k">global</span><span class="p">:</span><span class="w">
</span><span class="w"> </span>- HUGO_VERSION=<span class="s2">&#34;0.54.0&#34;</span><span class="w">
</span><span class="w"> </span><span class="k">matrix</span><span class="p">:</span><span class="w">
</span><span class="w"> </span>- YOUR_ENCRYPTED_VARIABLE<span class="w">
</span><span class="w">
</span><span class="w"></span><span class="k">install</span><span class="p">:</span><span class="w">
</span><span class="w"> </span>- wget<span class="w"> </span>-q<span class="w"> </span>https<span class="p">:</span>//github.com/gohugoio/hugo/releases/download/v${HUGO_VERSION}/hugo_${HUGO_VERSION}_Linux-64bit.tar.gz<span class="w">
</span><span class="w"> </span>- tar<span class="w"> </span>xf<span class="w"> </span>hugo_${HUGO_VERSION}_Linux-64bit.tar.gz<span class="w">
</span><span class="w"> </span>- mv<span class="w"> </span>hugo<span class="w"> </span>~/bin/<span class="w">
</span><span class="w">
</span><span class="w"></span><span class="k">script</span><span class="p">:</span><span class="w">
</span><span class="w"> </span>- hugo<span class="w"> </span>--gc<span class="w"> </span>--minify<span class="w">
</span><span class="w">
</span><span class="w"></span><span class="k">deploy</span><span class="p">:</span><span class="w">
</span><span class="w"> </span><span class="k">provider</span><span class="p">:</span><span class="w"> </span>pages<span class="w">
</span><span class="w"> </span><span class="k">skip-cleanup</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span><span class="w"> </span><span class="k">github-token</span><span class="p">:</span><span class="w"> </span>$GITHUB_TOKEN<span class="w">
</span><span class="w"> </span><span class="k">keep-history</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span><span class="w"> </span><span class="k">local-dir</span><span class="p">:</span><span class="w"> </span>public<span class="w">
</span><span class="w"> </span><span class="k">repo</span><span class="p">:</span><span class="w"> </span>gh-username/gh-username.github.io<span class="w">
</span><span class="w"> </span><span class="k">target-branch</span><span class="p">:</span><span class="w"> </span>master<span class="w">
</span><span class="w"> </span><span class="k">verbose</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span><span class="w"> </span><span class="k">on</span><span class="p">:</span><span class="w">
</span><span class="w"> </span><span class="k">branch</span><span class="p">:</span><span class="w"> </span>master<span class="w">
</span></code></pre></div><p>This script downloads and installs Hugo, builds the site with the garbage collection and minify <a href="https://gohugo.io/commands/hugo/#synopsis">flags</a>, then deploys the <code>public/</code> directory to the specified <code>repo</code> - in this example, your public GitHub Pages repository. You can read about each of the <code>deploy</code> configuration options <a href="https://docs.travis-ci.com/user/deployment/pages/#further-configuration">here</a>.</p>
<p>To <a href="https://docs.travis-ci.com/user/environment-variables#defining-encrypted-variables-in-travisyml">add the GitHub personal access token as an encrypted variable</a>, you don&rsquo;t need to manually edit your <code>.travis.yml</code>. The <code>travis</code> gem commands below will encrypt and add the variable for you when you run them in your repository directory.</p>
<p>First, install <code>travis</code> with <code>sudo gem install travis</code>.</p>
<p>Then <a href="https://help.github.com/en/articles/creating-a-personal-access-token-for-the-command-line">generate your GitHub personal access token</a>, copy it (it only shows up once!) and run the commands below in your repository root, substituting your token for the kisses:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">travis login --pro --github-token xxxxxxxxxxxxxxxxxxxxxxxxxxx
travis encrypt <span class="nv">GITHUB_TOKEN</span><span class="o">=</span>xxxxxxxxxxxxxxxxxxxxxxxxxxx --add env.matrix
</code></pre></div><p>Your encrypted token magically appears in the file. Once you&rsquo;ve committed <code>.travis.yml</code> to your private Hugo repository, Travis CI will run the script and if the build succeeds, will deploy your site to your public GitHub Pages repo. Magic!</p>
<p>Travis will always run a build each time you push to your private repository. If you don&rsquo;t want to trigger this behavior with a particular commit, <a href="https://docs.travis-ci.com/user/customizing-the-build/#skipping-a-build">add the <code>skip</code> command to your commit message</a>.</p>
<p><em>Yo that&rsquo;s cool but I like Netlify.</em></p>
<p>Okay fine.</p>
<h1 id="deploying-to-a-separate-repository-with-netlify-and-make">Deploying to a separate repository with Netlify and Make</h1>
<p>We can get Netlify to do our bidding by using a Makefile, which we&rsquo;ll run with Netlify&rsquo;s build command.</p>
<p>Here&rsquo;s what our <code>Makefile</code> looks like:</p>
<div class="highlight"><pre class="chroma"><code class="language-makefile" data-lang="makefile"><span class="nv">SHELL</span><span class="o">:=</span>/bin/bash
<span class="nv">BASEDIR</span><span class="o">=</span><span class="k">$(</span>CURDIR<span class="k">)</span>
<span class="nv">OUTPUTDIR</span><span class="o">=</span>public
<span class="nf">.PHONY</span><span class="o">:</span> <span class="n">all</span>
<span class="nf">all</span><span class="o">:</span> <span class="n">clean</span> <span class="n">get_repository</span> <span class="n">build</span> <span class="n">deploy</span>
<span class="nf">.PHONY</span><span class="o">:</span> <span class="n">clean</span>
<span class="nf">clean</span><span class="o">:</span>
@echo <span class="s2">&#34;Removing public directory&#34;</span>
rm -rf <span class="k">$(</span>BASEDIR<span class="k">)</span>/<span class="k">$(</span>OUTPUTDIR<span class="k">)</span>
<span class="nf">.PHONY</span><span class="o">:</span> <span class="n">get_repository</span>
<span class="nf">get_repository</span><span class="o">:</span>
@echo <span class="s2">&#34;Getting public repository&#34;</span>
git clone https://github.com/gh-username/gh-username.github.io.git public
<span class="nf">.PHONY</span><span class="o">:</span> <span class="n">build</span>
<span class="nf">build</span><span class="o">:</span>
@echo <span class="s2">&#34;Generating site&#34;</span>
hugo --gc --minify
<span class="nf">.PHONY</span><span class="o">:</span> <span class="n">deploy</span>
<span class="nf">deploy</span><span class="o">:</span>
@echo <span class="s2">&#34;Preparing commit&#34;</span>
@cd <span class="k">$(</span>OUTPUTDIR<span class="k">)</span> <span class="se">\
</span><span class="se"></span> <span class="o">&amp;&amp;</span> git config user.email <span class="s2">&#34;you@youremail.com&#34;</span> <span class="se">\
</span><span class="se"></span> <span class="o">&amp;&amp;</span> git config user.name <span class="s2">&#34;Your Name&#34;</span> <span class="se">\
</span><span class="se"></span> <span class="o">&amp;&amp;</span> git add . <span class="se">\
</span><span class="se"></span> <span class="o">&amp;&amp;</span> git status <span class="se">\
</span><span class="se"></span> <span class="o">&amp;&amp;</span> git commit -m <span class="s2">&#34;Deploy via Makefile&#34;</span> <span class="se">\
</span><span class="se"></span> <span class="o">&amp;&amp;</span> git push -f -q https://<span class="k">$(</span>GITHUB_TOKEN<span class="k">)</span>@github.com/gh-username/gh-username.github.io.git master
@echo <span class="s2">&#34;Pushed to remote&#34;</span>
</code></pre></div><p>To preserve the Git history of our separate GitHub Pages repository, we&rsquo;ll first clone it, build our new Hugo site to it, and then push it back to the Pages repository. This script first removes any existing <code>public/</code> folder that might contain files or a Git history. It then clones our Pages repository to <code>public/</code>, builds our Hugo site (essentially updating the files in <code>public/</code>), then takes care of committing the new site to the Pages repository.</p>
<p>In the <code>deploy</code> section, you&rsquo;ll notice lines starting with <code>&amp;&amp;</code>. These are chained commands. Since Make <a href="https://www.gnu.org/software/make/manual/html_node/Execution.html#Execution">invokes a new sub-shell for each line</a>, it starts over with every new line from our root directory. To get our <code>cd</code> to stick and avoid running our Git commands in the project root directory, we&rsquo;re chaining the commands and using the backslash character to <a href="http://clarkgrubb.com/makefile-style-guide#breaking-long-lines">break long lines</a> for readability.</p>
<p>By chaining our commands, we&rsquo;re able to <a href="https://stackoverflow.com/questions/6116548/how-to-tell-git-to-use-the-correct-identity-name-and-email-for-a-given-project">configure our Git identity</a>, add all our updated files, and create a commit for our Pages repository.</p>
<p>Similarly to using Travis CI, we&rsquo;ll need to pass in a <a href="https://github.com/settings/tokens">GitHub personal access token</a> to push to our public GitHub Pages repository - only Netlify doesn&rsquo;t provide a straightforward way to encrypt the token in our Makefile.</p>
<p>Instead, we&rsquo;ll use Netlify&rsquo;s <a href="https://www.netlify.com/docs/continuous-deployment/#build-environment-variables">Build Environment Variables</a>, which live safely in our site settings in the Netlify app. We can then call our token variable in the Makefile. We use it to push (quietly, to avoid printing the token in logs) to our Pages repository by <a href="https://stackoverflow.com/questions/44773415/how-to-push-a-commit-to-github-from-a-circleci-build-using-a-personal-access-tok">passing it in the remote URL</a>.</p>
<p>To avoid printing the token in Netlify&rsquo;s logs, we suppress <a href="https://www.gnu.org/software/make/manual/html_node/Echoing.html#Echoing">recipe echoing</a> for that line with the leading <code>@</code> character.</p>
<p>With your Makefile in the root of your private GitHub repository, you can set up Netlify to run it for you.</p>
<h2 id="setting-up-netlify">Setting up Netlify</h2>
<p>Getting set up with Netlify via the <a href="https://app.netlify.com/">web UI</a> is straightforward. Once you sign in with GitHub, choose the private GitHub repository where your Hugo site lives. The next page Netlify takes you to lets you enter deploy settings:</p>
<p><img src="netlify-new-site.png" alt="Create a new site page"></p>
<p>You can specify the build command that will run your Makefile (<code>make all</code> for this example). The branch to deploy and the publish directory don&rsquo;t matter too much in our specific case, since we&rsquo;re only concerned with pushing to a separate repository. You can enter the typical <code>master</code> deploy branch and <code>public</code> publish directory.</p>
<p>Under &ldquo;Advanced build settings&rdquo; click &ldquo;New variable&rdquo; to add your GitHub personal access token as a Build Environment Variable. In our example, the variable name is <code>GITHUB_TOKEN</code>. Click &ldquo;Deploy site&rdquo; to make the magic happen.</p>
<p>If you&rsquo;ve already previously set up your repository with Netlify, find the settings for Continuous Deployment under Settings &gt; Build &amp; deploy.</p>
<p>Netlify will build your site each time you push to the private repository. If you don&rsquo;t want a particular commit to trigger a build, <a href="https://www.netlify.com/docs/continuous-deployment/#skipping-a-deploy">add <code>[skip ci]</code> in your Git commit message</a>.</p>
<h2 id="same-same-but-different">Same same but different</h2>
<p>One effect of using Netlify this way is that your site will be built in two places: one is the separate, public GitHub Pages repository that the Makefile pushes to, and the other is your Netlify site that deploys on their CDN from your linked private GitHub repository. The latter is useful if you&rsquo;re going to play with <a href="https://www.netlify.com/blog/2016/07/20/introducing-deploy-previews-in-netlify/">Deploy Previews</a> and other Netlify features, but those are outside the scope of this post.</p>
<p>The main point is that your GitHub Pages site is now updated in your public repo. Yay!</p>
<h1 id="go-forth-and-deploy-fearlessly">Go forth and deploy fearlessly</h1>
<p>I hope the effect of this new information is that you feel more able to update your sites, wherever you happen to be. The possibilities are endless - at home on your couch with your laptop, out cafe-hopping with your iPad, or in the middle of a first date on your phone. Endless!</p>
<figure>
<img src="date-deploy.png"
alt="Don&#39;t update your site from your phone on a date"/> <figcaption>
<p>Don&rsquo;t do stuff on your phone when you&rsquo;re on a date. Not if you want a second one, anyway.</p>
</figcaption>
</figure>
A remote sync solution for iOS and Linux: Git and Working Copyhttps://victoria.dev/blog/a-remote-sync-solution-for-ios-and-linux-git-and-working-copy/
Fri, 15 Mar 2019 11:55:28 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/a-remote-sync-solution-for-ios-and-linux-git-and-working-copy/How to set up a cross-platform cloud sync solution for working anywhere using Git on iOS.
]]>
<p>I previously wrote about a (hackish) way to use a <a href="https://victoria.dev/blog/how-i-set-up-a-single-dropbox-folder-on-my-dual-boot-windows-and-linux-system/">single Dropbox folder on a dual-boot Windows and Linux machine</a>. I&rsquo;ve since <del>gained some sense</del> gone full Linux with Ubuntu 18.04 LTS, but the Dropbox set up seems to have stopped being an option in any case. Fortunately, I&rsquo;ve since found a much better (far less hackish) way to remote-sync files across different file systems. Reflecting my current set up, I&rsquo;m talking about iOS (iPad and iPhone) and my Linux machine.</p>
<p>The new sync system is based on Git, very customizable, and conveniently extensible. Beyond text files, you can sync anything that Git can (which is almost everything - if you want to edit your <code>.gitignore</code>d files on the go I&rsquo;m not sure I can help). If you&rsquo;re already familiar with Git, getting set up will be a walk in the park. If Git is new to you, I think these tools help make the concepts of Git cloning, pulling, and pushing straightforward to understand.</p>
<h1 id="components">Components</h1>
<ul>
<li><a href="https://workingcopy.app">Working Copy app</a> ($15.99 one-time pro-unlock and well worth it, iOS only)</li>
<li><a href="https://ia.net/writer">iA Writer app</a> ($8.99 one-time purchase for iOS, also available on Mac, Windows, and Android)</li>
<li>GitHub repositories (<a href="https://github.blog/2019-01-07-new-year-new-github/">private</a> or public, both free)</li>
</ul>
<p>I was inspired by <a href="https://www.macstories.net/ios/my-markdown-writing-and-collaboration-workflow-powered-by-working-copy-3-6-icloud-drive-and-github/">this article</a> as well as <a href="http://blog.joncairns.com/2011/10/how-to-use-git-submodules/">this one</a>.</p>
<h1 id="get-set-up">Get set up</h1>
<p>Here are the steps to setting up that I&rsquo;ll walk you through in this article.</p>
<ol>
<li>Create your remote repository</li>
<li>Clone repository to iPad with Working Copy</li>
<li>Open and edit files with iA Writer</li>
<li>Push changes back to remote</li>
<li>Pull changes from repository on your computer</li>
</ol>
<p>This system is straightforward to set up whether you&rsquo;re a command line whiz or just getting into Git. Let&rsquo;s do it!</p>
<h2 id="create-your-remote-repository">Create your remote repository</h2>
<p>GitHub now offers free <a href="https://github.blog/2019-01-07-new-year-new-github/">private repositories</a> for up to three collaborators. Choose &ldquo;Private&rdquo; on GitHub&rsquo;s repository creation page:</p>
<p><img src="github-private-repo.png#screenshot" alt="Selection options for public and private repository"></p>
<p>Create the repository. If you&rsquo;d like to, you can follow GitHub&rsquo;s instructions to push some files to it from your computer, or you can add files later from your iPad.</p>
<h2 id="clone-repository-to-ipad-with-working-copy">Clone repository to iPad with Working Copy</h2>
<p>Download <a href="https://workingcopy.app">Working Copy</a> from the App Store. It&rsquo;s one of the more expensive apps I&rsquo;ve purchased, but I think it&rsquo;s well worth it. Developer <a href="https://twitter.com/palmin">Anders Borum</a> has a steady track record of frequent updates and incorporating the latest features for iOS apps, like <a href="https://workingcopy.app/manual/dragdrop">drag and drop</a> on iPad. I think he&rsquo;s fairly priced his product in light of the work he puts into maintaining and enhancing it.</p>
<p>In Working Copy, find the gear icon in the top left corner and touch to open Settings.</p>
<p><img src="workingcopy-settings.png#screenshot" alt="Settings menu in Working Copy"></p>
<p>Tap on SSH Keys, and you&rsquo;ll see this screen:</p>
<p><img src="workingcopy-ssh.png#screenshot" alt="SSH Key for Working Copy on iPad"></p>
<p>SSH keys, or Secure Shell keys, are access credentials used in the <a href="https://en.wikipedia.org/wiki/Secure_Shell">SSH protocol</a>. Your key is a password that your device will use to securely connect with your remote repository host - GitHub, in our example. Since anyone with your SSH keys can potentially pretend to be you and gain access to your files, it&rsquo;s important not to share them accidentally, like in a screenshot on a blog post.</p>
<p>Tap on the second line that looks like &ldquo;WorkingCopy@iPad-xxxxxxxx&rdquo; to get this screen:</p>
<p><img src="workingcopy-ssh-connect.png#screenshot" alt="Connect to GitHub or Bitbucket in Working Copy"></p>
<p>Working Copy supports easy connection to both BitBucket and GitHub. Tap &ldquo;Connect With GitHub&rdquo; or BitBucket to bring up some familiar sign-in screens that will authorize Working Copy to access your account(s).</p>
<p>Once connected, tap the &ldquo;+&rdquo; symbol in the top right of the side bar to add a new repository. Choose &ldquo;Clone repository&rdquo; to bring up this screen:</p>
<p><img src="workingcopy-read-repos.png#screenshot" alt="Loading repositories from remote"></p>
<p>Here, you can either manually input the remote URL, or simply choose from the list of repositories that Working Copy fetches from your connected account. When you make your choice, the app clones the repository to your iPad and it will show up in the sidebar. You&rsquo;re connected!</p>
<h2 id="open-and-edit-files-with-ia-writer">Open and edit files with iA Writer</h2>
<p>One of the (many) reasons I adore <a href="https://ia.net/writer">iA Writer</a> is its ability to select your freshly cloned remote repository as a Library Location. To do this in the iA Writer app:</p>
<ol>
<li>From the main Library list, in the top right of the sidebar, tap &ldquo;Edit&rdquo;</li>
<li>Tap &ldquo;Add Location&hellip;&rdquo;</li>
<li>A helpful popup appears. Tap OK.</li>
<li>From the Working Copy location, tap &ldquo;Select&rdquo; in the top right, then choose the repository folder.</li>
<li>Tap &ldquo;Open&rdquo;, then &ldquo;Done&rdquo;</li>
</ol>
<p>Your remote repository now appears as a Location in the sidebar. Tap on it to work within this directory.</p>
<p>While inside this location, new files you create (by tapping the pencil-and-paper icon in the top right corner) will be saved to this folder locally. As you work, iA Writer automatically saves your progress. Next, we&rsquo;ll look at pushing those files and changes back to your remote.</p>
<h2 id="push-changes-back-to-remote">Push changes back to remote</h2>
<p>Once you&rsquo;ve made changes to your files, open Working Copy again. You should see a yellow dot on your changed repository.</p>
<p><img src="workingcopy-changed-repo.png#screenshot" alt="Yellow dot indicating changes to repository"></p>
<p>Tap on your repository name, then on &ldquo;Repository Status and Configuration&rdquo; at the top of the sidebar. Your changed files will be indicated by yellow dots or green &ldquo;+&rdquo; symbols. These mean that you&rsquo;ve modified or added files, respectively.</p>
<p>Working Copy is a sweet iOS Git client, and you can tap on your files to see additional information including a comparison of changes (&ldquo;diff&rdquo;) as well as status and Git history. You can even edit files right within the app, with <a href="https://workingcopyapp.com/manual/edit">syntax highlighting</a> for its many supported languages. For now, we&rsquo;ll look at how to push your changed work to your remote repository.</p>
<p><img src="workingcopy-changes-to-commit.png#screenshot" alt="Changes to commit"></p>
<p>On the &ldquo;Repository Status and Configuration&rdquo; page, you&rsquo;ll see right at the top that there are changes to be committed. If you&rsquo;re new to Git, this is like &ldquo;saving your changes&rdquo; to your Git history, something typically done with the terminal command <a href="https://git-scm.com/docs/git-commit"><code>git commit</code></a>. You can think of this as saving the files that we&rsquo;ll want to send to the GitHub repository. Tap &ldquo;Commit changes.&rdquo;</p>
<p><img src="workingcopy-commit-changes.png#screenshot" alt="Add a commit message and select files to commit"></p>
<p>Enter your commit message, and select the files you want to add. Turn on the &ldquo;Push&rdquo; switch to send everything to your remote repository when you commit the files. Then tap &ldquo;Commit.&rdquo;</p>
<p>You&rsquo;ll see a progress bar as your files are uploaded, and then a confirmation message on the status screen.</p>
<p><img src="workingcopy-commit-success.png#screenshot" alt="Commit success message"></p>
<p>Congratulations! Your changes are now present in your remote repository on GitHub. You&rsquo;ve successfully synced your files remotely!</p>
<h2 id="pull-changes-from-repository-on-your-computer">Pull changes from repository on your computer</h2>
<p>To bring your updated files full circle to your computer, you pull them from the GitHub repository. I prefer to use the terminal for this as it&rsquo;s quick and easy, but GitHub also offers a <a href="https://help.github.com/en/desktop/getting-started-with-github-desktop">graphical client</a> if terminal commands seem a little alien for now.</p>
<p>If you started with the GitHub repository, you can clone it to a folder on your computer by following <a href="https://help.github.com/en/articles/cloning-a-repository">these instructions</a>.</p>
<h2 id="staying-in-sync">Staying in sync</h2>
<p>When you update your work on your computer, you&rsquo;ll use Git to push your changes to the remote repository. To do this, you can use GitHub&rsquo;s <a href="https://help.github.com/en/desktop/getting-started-with-github-desktop">graphical client</a>, or follow <a href="https://help.github.com/en/articles/adding-an-existing-project-to-github-using-the-command-line">these instructions</a>.</p>
<p>On your iOS device, Working Copy makes pulling and pushing as simple as a single tap. On the Repository Status and Configuration page, tap on the remote name under &ldquo;Remotes&rdquo;.</p>
<p><img src="workingcopy-git-remote.png#screenshot" alt="List of Remotes in Working Copy"></p>
<p>Then tap &ldquo;Synchronize&rdquo;. Working Copy will take care of the details of pushing your committed changes and/or pulling any new changes it finds from the remote repository.</p>
<h1 id="not-bad-right">Not bad, right?</h1>
<p>For a Git-based developer and work-anywhere-aholic like me, this set up couldn&rsquo;t be more convenient. Working Copy really makes staying in sync with my remote repositories seamless, nevermind the ability to work with any of my GitHub repos on the go.</p>
<p>For editing on the go, here&rsquo;s a useful tip. Use <code>.gitignore</code> in your sync repository if you don&rsquo;t need to move large files, like images, around with you. This will stop the ignored files from being pushed to GitHub and pulled to your iOS device - they&rsquo;ll only remain on your computer&rsquo;s larger hard drive. The <code>.gitignore</code> file of one of my sync repositories looks like this:</p>
<pre><code>*.png
*.jpeg
*.jpg
*.mp4
*.gif
</code></pre><p>This means all the media files stay on my computer, and I can pull just the text file content to my iPad from GitHub to work on while I&rsquo;m out and about.</p>
<p>I most recently used this set up to get some writing done while hanging out in the atrium of Washington DC&rsquo;s National Portrait Gallery, which is pleasantly photogenic.</p>
<p><img src="washington-portrait-gallery.jpg" alt="The atrium of the National Portrait Gallery"></p>
<p><a href="https://twitter.com/victoriadotdev">I&rsquo;d love to hear</a> how this set up works for you and how you use it. In the meantime, happy working!</p>
On doing great thingshttps://victoria.dev/blog/on-doing-great-things/
Fri, 08 Mar 2019 18:36:15 -0500hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/on-doing-great-things/Some thoughts inspired by International Women's Day, Grace Hopper, and making contributions to the world of tech.
<p>It&rsquo;s International Women&rsquo;s Day, and I&rsquo;m thinking about Grace Hopper.</p>
<p><a href="https://en.m.wikipedia.org/wiki/Grace_Hopper">Grace Hopper</a> was an amazing lady who did great things. She envisioned and helped create programming languages that translate English terms into machine code. She persevered in her intention to join the US Navy from the time she was rejected at 34 years old, to being sworn in to the US Navy Reserve three years later, to retiring with the rank of commander at age 60&hellip; then was recalled (twice) and promoted to the rank of captain at the age of 67. She advocated for distributed networks and developed computer testing standards we use today, among other achievements too numerous to list here.</p>
<p>By my read, throughout her life, she kept her focus on her work. She did great things because she could do them, and felt some duty to do them. Her work speaks for itself.</p>
<p>I recently came across a sizeable rock denoting a rather small, quiet park. It looks like this:</p>
<p><img src="grace-murray-hopper-park.jpeg#center" alt="Signage on a rock denoting Grace Murray Hopper Park"></p>
<p>When I first saw this park, I thought it in no way did this great lady justice. But upon some reflection, its lack of assumption and grandeur grew on me. And today, it drew to the forefront something that&rsquo;s been on my mind.</p>
<p>I try and contribute regularly to the wide world of technology, usually through building things, writing, and mentorship. I sometimes get asked to participate in female-focused tech events. I hear things like, &ldquo;too few developers are women,&rdquo; or &ldquo;we need more women in blockchain,&rdquo; or &ldquo;we need more female coders.&rdquo;</p>
<p>For some time I haven&rsquo;t been sure how to respond, because while my answer isn&rsquo;t &ldquo;yes,&rdquo; it&rsquo;s not exactly &ldquo;no,&rdquo; either. It&rsquo;s really, &ldquo;no, because&hellip;&rdquo; and it&rsquo;s because I&rsquo;m afraid. I&rsquo;m afraid of misrepresenting myself, my values, and my goals.</p>
<p>Discrimination and racism are real things. They exist in the minds and attitudes of a very small percentage of very loud people, as they always will. These people aren&rsquo;t, however, the majority. They are small.</p>
<p>I think that on the infrequent occasions when we encounter these people, we should do our best to lead by example. We should have open minds, tell our stories, listen to theirs. Try and learn something. That&rsquo;s all.</p>
<p>When I present myself, I don&rsquo;t point out that I&rsquo;m a woman. I don&rsquo;t align myself with &ldquo;women in tech&rdquo; or seek to represent them. I don&rsquo;t go to women-only meetings or support organizations that discriminate against men, or anyone at all. It&rsquo;s not because I&rsquo;m insecure as a woman, or ashamed that I&rsquo;m a woman, or some other inflammatory adjective that lately shows up in conjunction with being female. It&rsquo;s because I&rsquo;ve no reason to point out my gender, any more than needing to point out that my hair is black, or that I&rsquo;m short. It&rsquo;s obvious and simultaneously irrelevant.</p>
<p>When I identify with a group, I talk about the go-getters who wake up at 0500 every day and go work out - no matter the weather, or whether they feel like it. I tell stories about the people I met in different countries around the world, who left home, struck out on their own, and had an adventure, because they saw value in the experience. I identify with people who constantly build things, try things, design and make things, and then share those things with the world, because they love to do so. This is how I see myself. This is what matters to me.</p>
<p>Like the unassuming park named after an amazing woman, when truly great things are done, they are done relatively quietly. Not done for the fanfare of announcing them to the world, but for the love of the thing itself. So go do great things, please. The world still needs them.</p>
Git commit practices your future self will thank you forhttps://victoria.dev/blog/git-commit-practices-your-future-self-will-thank-you-for/
Mon, 06 Aug 2018 08:54:56 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/git-commit-practices-your-future-self-will-thank-you-for/How squash commits, vimrc, and git tags can help you make great Git commits.
]]>
<p>A history of clean commits can be evidence of a lot of things: attention to detail, good work ethic, and genuine investment in the project. What do your Git commits say about you?</p>
<p>If, like me, there <em>might</em> be one or two less than stellar ones, well, we&rsquo;re only human. A nice part of being human is having the ability to learn new and complex things fairly quickly, and continuously improve ourselves. In that spirit, I&rsquo;d like to share some things I&rsquo;ve learned about creating clean, useful, and responsible Git commits.</p>
<h1 id="what-does-it-mean-to-commit-responsibly">What does it mean to commit responsibly?</h1>
<p>Whether our code will be seen by the entire open source community or just future versions of ourselves, either one will be grateful if we commit responsibly today. Being responsible can mean a lot of things to different people, so I enlisted some of <a href="https://mastodon.technology/@victoria/">mastodon.technology</a> and <a href="https://dev.to/victoria/what-does-it-mean-to-commit-responsibly-22mi">dev.to</a> to help round out my list. From those (really great) threads, I distilled these main points:</p>
<blockquote>
<p><strong>Committing responsibly</strong></p>
<ol>
<li>Provide and/or use tests to avoid committing bugs or broken builds</li>
<li>Write clean code that meets style specifications</li>
<li>Use descriptive commit messages that reference related discussion</li>
<li>Make only one change per commit and avoid including unrelated changes</li>
</ol>
</blockquote>
<p>Some of the above is achieved through maintaining a short feedback loop that helps you improve your code quality while staying accountable to yourself. <a href="https://victoria.dev/blog/how-to-set-up-a-short-feedback-loop-as-a-solo-coder/">I wrote another article</a> that discusses this in detail, especially the part about <a href="https://victoria.dev/blog/how-to-set-up-a-short-feedback-loop-as-a-solo-coder/#block-out-time-for-code-review">code review</a>. Other items on this list have to do specifically with making commits in Git. There are some features of Git that can benefit us in these areas, as can harnessing tools like Vim. I&rsquo;ll cover those topics here.</p>
<p>If the majority of your Git commits so far have been created with something like <code>git commit -m &quot;Bug fixes&quot;</code> then this is the article for you!</p>
<h1 id="write-great-git-commit-messages-with-a-template">Write great Git commit messages with a template</h1>
<p>I think <a href="https://github.com/torvalds/subsurface-for-dirk/commit/b6590150d68df528efd40c889ba6eea476b39873">Linus</a> would be very happy if we didn&rsquo;t use <code>git commit -m &quot;Fix bug&quot;</code> in a public repository ever again. As very well put in <a href="https://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html">this classic post</a> and <a href="https://chris.beams.io/posts/git-commit/">the seven rules of a great Git commit message</a>:</p>
<blockquote>
<p>A properly formed Git commit subject line should always be able to complete the following sentence:</p>
<p>If applied, this commit will <em>your subject line here</em></p>
</blockquote>
<p><a href="http://who-t.blogspot.com/2009/12/on-commit-messages.html">This other classic post</a> also discusses three questions that the body of the commit message should answer:</p>
<blockquote>
<p>Why is it necessary?<br>
How does it address the issue?<br>
What effects does the patch have?</p>
</blockquote>
<p>This can be a lot to remember to cover, but there&rsquo;s a slick way to have these prompts at hand right when you need it. You can set up a commit message template by using the <code>commit.template</code> <a href="https://git-scm.com/book/en/v2/Customizing-Git-Git-Configuration">configuration value</a>.</p>
<p>To set it, configure Git to use a template file (for example, <code>.gitmessage</code> in your home directory), then create the template file with Vim:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">git config --global commit.template ~/.gitmessage
vim ~/.gitmessage
</code></pre></div><p>When we run <code>git commit</code> without the <code>-m</code> message flag, the editor will open with our helpful template ready to go. Here&rsquo;s my commit message template:</p>
<pre><code class="language-console" data-lang="console"># If applied, this commit will...
# [Add/Fix/Remove/Update/Refactor/Document] [issue #id] [summary]
# Why is it necessary? (Bug fix, feature, improvements?)
-
# How does the change address the issue?
-
# What side effects does this change have?
-
</code></pre><p>I&rsquo;m a fan of this format because commented lines are not included in the final message. I can simply fill in the blank lines with text and bullet points under the prompts, and it comes out looking something like this:</p>
<div class="highlight"><pre class="chroma"><code class="language-txt" data-lang="txt">Fix #16 missing CSS variables
- Fix for unstyled elements
- Add background color, height for code blocks
- Only affects highlight class
</code></pre></div><h2 id="reference-related-discussion">Reference related discussion</h2>
<p>Issue trackers in <a href="https://help.github.com/articles/closing-issues-using-keywords/">GitHub</a> and <a href="https://confluence.atlassian.com/bitbucket/resolve-issues-automatically-when-users-push-code-221451126.html">Bitbucket</a> both recognize the keywords <code>close</code>, <code>fix</code>, and <code>resolve</code> followed immediately by the issue or pull request number. These keywords conveniently help us close the referenced issue or pull request, and this helps maintain a clear trail of changes. <a href="https://about.gitlab.com/2016/03/08/gitlab-tutorial-its-all-connected/">GitLab</a>, and issue trackers like <a href="https://confluence.atlassian.com/jirasoftwarecloud/referencing-issues-in-your-development-work-777002789.html">Jira</a> offer similar functionalities.</p>
<h2 id="use-helpful-vim-settings-for-git-commit-messages">Use helpful Vim settings for git commit messages</h2>
<p>By adding a few lines to our Vim configuration, we can make writing great git commit messages easy. We can add these lines to <code>~/.vimrc</code> to turn on syntax highlighting in general, and spell check and text wrapping for commit messages in particular:</p>
<div class="highlight"><pre class="chroma"><code class="language-vimrc" data-lang="vimrc"><span class="c">&#34; Filetype detection, plugins, and indent rules</span><span class="err">
</span><span class="err"></span><span class="nx">filetype</span> <span class="nx">plugin</span> <span class="nx">indent</span> <span class="nx">on</span><span class="err">
</span><span class="err"></span><span class="c">
</span><span class="c">&#34; Syntax highlighting</span><span class="err">
</span><span class="err"></span><span class="nx">syntax</span> <span class="nx">on</span><span class="err">
</span><span class="err"></span><span class="c">
</span><span class="c">&#34; Spell check and line wrap just for git commit messages</span><span class="err">
</span><span class="err"></span><span class="nx">autocmd</span> <span class="nx">Filetype</span> <span class="nx">gitcommit</span> <span class="nx">setlocal</span> <span class="nx">spell</span> <span class="nx">textwidth</span><span class="p">=</span><span class="m">72</span><span class="err">
</span></code></pre></div><p>If you&rsquo;re curious, <a href="https://gist.github.com/victoriadrake/81699ada73748ecf7603c7708a5385ff">my full <code>~/.vimrc</code> is on GitHub</a>.</p>
<p>Other editors have settings that can help us out as well. I came across <a href="https://dev.to/shreyasminocha/how-i-do-my-git-commits-34d">these for Sublime Text 3</a> and <a href="https://github.com/Microsoft/vscode-docs/blob/master/docs/getstarted/tips-and-tricks.md#language-specific-settings">language specific settings for VS Code</a>.</p>
<h1 id="one-change-per-commit-how-to-squash-git-commits">One change per commit: how to squash Git commits</h1>
<p><img src="git-commit-squash.png" alt="A doodle of squash"></p>
<!-- raw HTML omitted -->
<p>Let&rsquo;s get one thing out of the way first: rewriting Git history just for the sake of having a pretty tree, especially with public repositories, is generally not advisable. It&rsquo;s kind of like going back in time, where changes you make to your version of the project cause it to look completely different from a version that someone else forked from a point in history that you&rsquo;ve now erased - I mean, haven&rsquo;t you seen <em>Back to the Future Part II</em>? (If you&rsquo;d rather maintain that only one <em>Back to the Future</em> movie was ever made, thus sparing your future self from having to watch the sequels, I get it.)</p>
<p>Here&rsquo;s the main point. If you&rsquo;ve pushed messy commits to a public repository, I say go right ahead and leave them be, instead of complicating things further. (We all learn from our embarrassments, especially the public ones - I&rsquo;m looking at you, past-Vicky.) If your messy commits currently only exist on your local version, great! We can tidy them up into one clean, well-described commit that we&rsquo;ll be proud to push, and no one will be the wiser.</p>
<p>There are a couple different ways to squash commits, and choosing the appropriate one depends on what we need to achieve.</p>
<p>The following examples are illustrated using <code>git log --graph</code>, with some options for brevity. We can set a handy alias to see this log format in our terminal with:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">git config --global alias.plog <span class="s2">&#34;log --graph --pretty=format:&#39;%h -%d %s %n&#39; --abbrev-commit --date=relative --branches&#34;</span>
</code></pre></div><p>Then we just do <code>git plog</code> to see the pretty log.</p>
<h2 id="method-1-one-commit-to-rule-the-master-branch">Method #1: one commit to rule the master branch</h2>
<p>This is appropriate when:</p>
<ul>
<li>We&rsquo;re committing directly to master</li>
<li>We don&rsquo;t intend to open a pull request to merge a feature</li>
<li>We don&rsquo;t want to preserve history of branches or changes we haven&rsquo;t yet pushed</li>
</ul>
<p>This method takes a Git tree that looks like this:</p>
<pre><code class="language-console" data-lang="console">* 3e8fd79 - (HEAD -&gt; master) Fix a thing
|
* 4f0d387 - Tweak something
|
* 0a6b8b3 - Merge branch 'new-article'
|\
| * 33b5509 - (new-article) Update article again again
| |
| * 1782e63 - Update article again
| |
| * 3c5b6a8 - Update article
| |
* | f790737 - (master) Tweak unrelated article
|/
|
* 65af7e7 Add social media link
|
* 0e3fa32 (origin/master, origin/HEAD) Update theme
</code></pre><p>And makes it look like this:</p>
<pre><code class="language-console" data-lang="console">* 7f9a127 - (HEAD -&gt; master) Add new article
|
* 0e3fa32 - (origin/master, origin/HEAD) Update theme
</code></pre><p>Here&rsquo;s how to do it - hold on to your hoverboards, it&rsquo;s super complicated:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">git reset --soft origin/master
git commit
</code></pre></div><p>Yup that&rsquo;s all. We can delete the unwanted branch with <code>git branch -D new-article</code>.</p>
<h2 id="method-2-not-_that_-much">Method #2: not <em>that</em> much!</h2>
<p>This is appropriate when:</p>
<ul>
<li>We want to squash the last <em>x</em> commits but not <em>all</em> commits since <code>origin/master</code></li>
<li>We want to open a pull request to merge a branch</li>
</ul>
<p>This method takes a Git tree that looks like this:</p>
<pre><code class="language-console" data-lang="console">* 13a070f - (HEAD -&gt; new-article) Finish new article
|
* 78e728a - Edit article draft
|
* d62603c - Add example
|
* 1aeb20e - Update draft
|
* 5a8442a - Add new article draft
|
| * 65af7e7 - (master) Add social media link
|/
|
* 0e3fa32 - (origin/master, origin/HEAD) Update theme
</code></pre><p>And makes it look like this:</p>
<pre><code class="language-console" data-lang="console">* 90da69a - (HEAD -&gt; new-article) Add new article
|
| * 65af7e7 - (master) Add social media link
|/
|
* 0e3fa32 - (origin/master, origin/HEAD) Update theme
</code></pre><p>To squash the last five commits on branch <code>new-article</code> into one, we use:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">git reset --soft HEAD~5
git commit -m <span class="s2">&#34;New message for the combined commit&#34;</span>
</code></pre></div><p>Where <code>--soft</code> leaves our files untouched and staged, and <code>5</code> can be thought of as &ldquo;the number of previous commits I want to combine.&rdquo;</p>
<p>We can then do <code>git merge master</code> and create our pull request.</p>
<h2 id="method-3-getting-picky">Method #3: getting picky</h2>
<p>Say we had a really confusing afternoon and our Git tree looks like this:</p>
<pre><code class="language-console" data-lang="console">* dc89918 - (HEAD -&gt; master) Add link
|
* 9b6780f - Update image asset
|
* 6379956 - Fix CSS bug
|
* 16ee1f3 - Merge master into branch
|\
| |
| * ccec365 - Update list page
| |
* | 033dee7 - Fix typo
| |
* | 90da69a - Add new article
|/
|
* 0e3fa32 - (origin/master, origin/HEAD) Update theme
</code></pre><p>We want to retain some of this history, but clean up the commits. We also want to change the messages for some of the commits. To achieve this, we&rsquo;ll use <code>git rebase</code>.</p>
<p>This is appropriate when:</p>
<ul>
<li>We want to squash only some commits</li>
<li>We want to edit previous commit messages</li>
<li>We want to delete or reorder specific commits</li>
</ul>
<p>Git <code>rebase</code> is a powerful tool, and handy once we&rsquo;ve got the hang of it. To change all the commits since <code>origin/master</code>, we do:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">git rebase -i origin/master
</code></pre></div><p>Or, we can do:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">git rebase -i 0e3fa32
</code></pre></div><p>Where the commit hash is the last commit we want to retain as-is.</p>
<p>The <code>-i</code> option lets us run the interactive rebase tool, which launches our editor with, essentially, a script for us to modify. We&rsquo;ll see a list of our commits in reverse order to the git log, with the oldest at the top:</p>
<pre><code class="language-console" data-lang="console">pick 90da69a Add new article
pick 033dee7 Fix typo
pick ccec365 Update list page
pick 6379956 Fix CSS bug
pick 9b6780f Update image asset
pick dc89918 Add link
# Rebase 0e3fa32..dc89918 onto 0e3fa32 (6 commands)
#
# Commands:
# p, pick = use commit
# r, reword = use commit, but edit the commit message
# e, edit = use commit, but stop for amending
# s, squash = use commit, but meld into previous commit
# f, fixup = like &quot;squash&quot;, but discard this commit's log message
# x, exec = run command (the rest of the line) using shell
# d, drop = remove commit
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
#
# Note that empty commits are commented out
#
~
</code></pre><p>The comments give us a handy guide as to what we&rsquo;re able to do. For now, let&rsquo;s squash the commits with small changes into the more significant commits. In our editor, we change the script to look like this:</p>
<pre><code class="language-console" data-lang="console">pick 90da69a Add new article
squash 033dee7 Fix typo
pick ccec365 Update list page
squash 6379956 Fix CSS bug
squash 9b6780f Update image asset
squash dc89918 Add link
</code></pre><p>Once we save the changes, the interactive tool continues to run. It will execute our instructions in sequence. In this case, we see the editor again with the following:</p>
<pre><code class="language-console" data-lang="console"># This is a combination of 2 commits.
# This is the 1st commit message:
Add new article
# This is the commit message #2:
Fix typo
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
# interactive rebase in progress; onto 0e3fa32
# Last commands done (2 commands done):
# pick 90da69a Add new article
# squash 033dee7 Fix typo
# Next commands to do (4 remaining commands):
# pick ccec365 Update list page
# squash 6379956 Fix CSS bug
# You are currently rebasing branch 'master' on '0e3fa32'.
#
# Changes to be committed:
# modified: ...
#
~
</code></pre><p>Here&rsquo;s our chance to create a new commit message for this first squash, if we want to. Once we save it, the interactive tool will go on to the next instructions. Unless&hellip;</p>
<pre><code class="language-console" data-lang="console">[detached HEAD 3cbad01] Add new article
1 file changed, 129 insertions(+), 19 deletions(-)
Auto-merging content/dir/file.md
CONFLICT (content): Merge conflict in content/dir/file.md
error: could not apply ccec365... Update list page
Resolve all conflicts manually, mark them as resolved with
&quot;git add/rm &lt;conflicted_files&gt;&quot;, then run &quot;git rebase --continue&quot;.
You can instead skip this commit: run &quot;git rebase --skip&quot;.
To abort and get back to the state before &quot;git rebase&quot;, run &quot;git rebase --abort&quot;.
Could not apply ccec365... Update list page
</code></pre><p>Again, the tool offers some very helpful instructions. Once we fix the merge conflict, we can resume the process with <code>git rebase --continue</code>. Our interactive rebase picks up where it left off.</p>
<p>Once all the squashing is done, our Git tree looks like this:</p>
<pre><code class="language-console" data-lang="console">* 3564b8c - (HEAD -&gt; master) Update list page
|
* 3cbad01 - Add new article
|
* 0e3fa32 - (origin/master, origin/HEAD) Update theme
</code></pre><p>Phew, much better.</p>
<h1 id="git-stash">Git stash</h1>
<p>If we&rsquo;re in the middle of some work and it&rsquo;s not a good time to commit, but we need to switch branches, <a href="https://git-scm.com/book/en/v1/Git-Tools-Stashing">stashing</a> can be a good option. Stashing lets us save our unfinished work without needing to create a half-assed commit. It&rsquo;s like that pile of paper on your desk representing all the stuff you&rsquo;ve been in the middle of doing since two weeks ago. Yup, that one.</p>
<p>It&rsquo;s as easy as typing <code>git stash</code>:</p>
<pre><code class="language-console" data-lang="console">Saved working directory and index state WIP on master: 3564b8c Update list page
</code></pre><p>The dirty work we&rsquo;re in the midst of is safely tucked away, and our working directory is clean - just as it was after our last commit. To see what&rsquo;s in our stash stack, we do <code>git stash list</code>:</p>
<pre><code class="language-console" data-lang="console">stash@{0}: WIP on master: 3564b8c Update list page
stash@{1}: WIP on master: 90da69a Add new article
stash@{2}: WIP on cleanup: 0e3fa32 Update theme
</code></pre><p>To restore our work in progress, we use <code>git stash apply</code>. Git will try and apply our most recent stashed work. To apply an older stash, we use <code>git stash apply stash@{1}</code> where <code>1</code> is the stash to apply. If changes since stashing our work prevent the stash from reapplying cleanly, Git will give us a merge conflict to resolve.</p>
<p>Applying a stash doesn&rsquo;t remove it from our list. To remove a stash from our stack, we do <code>git stash drop stash@{0}</code> where <code>0</code> is the one we want to remove.</p>
<p>We can also use <code>git stash pop</code> to apply the most recent stash and then immediately remove it from the stack.</p>
<h1 id="tag-release-versions-using-annotated-git-tags">Tag release versions using annotated Git tags</h1>
<p>In the spirit of having a beautiful, clean Git history, there&rsquo;s one more thing we can do to help make our commit log inspire infinite joy in its viewers. If you&rsquo;ve never heard of <code>git tag</code>, your master branch history might look like this&hellip;</p>
<pre><code class="language-console" data-lang="console">* 0377782 - Update theme
|
* ecf8128 - Add about page (#25)
|
* 33e432f - Fix #23 navigation bug
|
* 08b853b - Create blog section
|
* 63d18b4 - Add theme (#12)
|
* 233e23f - Add main content (#6)
</code></pre><p>Wouldn&rsquo;t it be nice if it looked like this instead?</p>
<pre><code class="language-console" data-lang="console">* 0377782 - (tag: v2.1.0) Update theme
|
* ecf8128 - Add about page (#25)
|
* 33e432f - Fix #23 navigation bug
|
* 08b853b - (tag: v2.0.0) Create blog section
|
* 63d18b4 - Add theme (#12)
|
* 233e23f - (tag: v1.1.0) Add main content (#6)
</code></pre><p>We can tag Git commits with anything, but tags are especially helpful for semantic versioning of releases. Sites like <a href="https://help.github.com/articles/creating-releases/">GitHub</a> and <a href="https://docs.gitlab.com/ce/workflow/releases.html">GitLab</a> have pages for repositories that list tags, letting viewers of our project browse the release versions. This can be helpful for public projects to differentiate major releases, updates with bug fixes, or beta versions.</p>
<p>There are two types of Git tags: lightweight and annotated. For adding a version tag to commits, we use annotated Git tags.</p>
<p>The <a href="https://git-scm.com/docs/git-tag">Git tag documentation</a> explains it this way:</p>
<blockquote>
<p>Tag objects (created with -a, -s, or -u) are called &ldquo;annotated&rdquo; tags; they contain a creation date, the tagger name and e-mail, a tagging message, and an optional GnuPG signature. Whereas a &ldquo;lightweight&rdquo; tag is simply a name for an object (usually a commit object).</p>
<p>Annotated tags are meant for release while lightweight tags are meant for private or temporary object labels. For this reason, some git commands for naming objects (like git describe) will ignore lightweight tags by default.</p>
</blockquote>
<p>We can think of lightweight tags as bookmarks, and annotated tags as signed releases.</p>
<p>For public repositories, annotated tags allow us to:</p>
<ul>
<li>See who tagged the commit, which may differ from the commit author</li>
<li>See all the tags with <code>git describe</code></li>
<li>Avoid conflicting tag names</li>
</ul>
<p>To create an annotated Git tag and attach it to our current (last) commit, we do:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">git tag -a v1.2.0 -m <span class="s2">&#34;Clever release title&#34;</span>
</code></pre></div><p>This tags the commit on our local repository. To push all annotated tags to the remote, we do:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">git push --follow-tags
</code></pre></div><p>We can also set our Git configuration to push our annotated tags by default:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">git config --global push.followTags <span class="nb">true</span>
</code></pre></div><p>If we then want to skip pushing tags this time, we pass <code>--no-follow-tags</code>.</p>
<h1 id="practice-responsible-commits">Practice responsible commits</h1>
<p>A little time invested in getting familiar with these tools and practices can make your commits even more useful and well-crafted. With a little practice, these processes will become second nature. You can make it even easier by creating a personal commit checklist on paper to keep handy while you work - or if that isn&rsquo;t fun enough, <a href="https://victoria.dev/blog/an-automatic-interactive-pre-commit-checklist-in-the-style-of-infomercials/">make it an interactive pre-commit hook.</a></p>
<p>Creating clean, useful, and responsible Git commits says a lot about you. Especially in the current climate of remote work, Git commits may be a primary way that people interact with you over projects. With a little practice and effort, you can make your commit habits an even better reflection of your best work - work that is evidently created with care and pride.</p>
<h1 id="reference-links">Reference links</h1>
<p><a href="https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History">Git Tools - Rewriting History</a><br>
<a href="https://semver.org/">Semantic Versioning 2.0.0</a><br>
<a href="https://git-scm.com/docs/git-describe">git-describe</a><br>
<a href="https://git-scm.com/book/en/v2/Git-Tools-Signing-Your-Work">Git Tools - Signing Your Work</a><br>
<a href="https://git-scm.com/book/en/v2/Customizing-Git-Git-Configuration">Customizing Git - Git Configuration</a><br>
<a href="https://git-scm.com/book/en/v1/Git-Tools-Stashing">Git Tools - Stashing</a><br>
<a href="https://git-scm.com/book/en/v2/Git-Basics-Tagging">Git Basics - Tagging</a></p>
An automatic interactive pre-commit checklist, in the style of infomercialshttps://victoria.dev/blog/an-automatic-interactive-pre-commit-checklist-in-the-style-of-infomercials/
Mon, 23 Jul 2018 09:38:09 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/an-automatic-interactive-pre-commit-checklist-in-the-style-of-infomercials/How to set up an interactive checklist using a Git pre-commit hook script.
<p>What&rsquo;s that, you say? You&rsquo;ve become tired of regular old boring <em>paper checklists?</em> Well, my friend, today is your lucky day! You, yes, <em>you,</em> can become the proud owner of a brand-spanking-new <em>automatic interactive pre-commit hook checklist!</em> You&rsquo;re gonna love this! Your life will be so much easier! Just wait until your friends see you.</p>
<h1 id="whats-a-pre-commit-hook">What&rsquo;s a pre-commit hook?</h1>
<p>Did you know that nearly <em>1 out of 5 coders</em> are too embarrassed to ask this question? Don&rsquo;t worry, it&rsquo;s perfectly normal. In the next 60 seconds we&rsquo;ll tell you all you need to know to pre-commit with confidence.</p>
<p>A <a href="https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks">Git hook</a> is a feature of Git that triggers custom scripts at useful moments. They can be used for all kinds of reasons to help you automate your work, and best of all, you already have them! In every repository that you initialize with <code>git init</code>, you&rsquo;ll have a set of example scripts living in <code>.git/hooks</code>. They all end with <code>.sample</code> and activating them is as easy as renaming the file to remove the <code>.sample</code> part.</p>
<p>Git hooks are not copied when a repository is cloned, so you can make them as personal as you like.</p>
<p>The useful moment in particular that we&rsquo;ll talk about today is the <em>pre-commit</em>. This hook is run after you do <code>git commit</code>, and before you write a commit message. Exiting this hook with a non-zero status will abort the commit, which makes it extremely useful for last-minute quality checks. Or, a bit of fun. Why not both!</p>
<h1 id="how-do-i-get-a-pre-commit-checklist">How do I get a pre-commit checklist?</h1>
<p>I only want the best for my family and my commits, and that&rsquo;s why I choose an interactive pre-commit checklist. Not only is it fun to use, it helps to keep my projects safe from unexpected off-spec mistakes!</p>
<p>It&rsquo;s so easy! I just write a bash script that can read user input, and plop it into <code>.git/hooks</code> as a file named <code>pre-commit</code>. Then I do <code>chmod +x .git/hooks/pre-commit</code> to make it executable, and I&rsquo;m done!</p>
<p>Oh look, here comes an example bash script now!</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh"><span class="cp">#!/bin/sh
</span><span class="cp"></span>
<span class="nb">echo</span> <span class="s2">&#34;Would you like to play a game?&#34;</span>
<span class="c1"># Read user input, assign stdin to keyboard</span>
<span class="nb">exec</span> &lt; /dev/tty
<span class="k">while</span> <span class="nb">read</span> -p <span class="s2">&#34;Have you double checked that only relevant files were added? (Y/n) &#34;</span> yn<span class="p">;</span> <span class="k">do</span>
<span class="k">case</span> <span class="nv">$yn</span> in
<span class="o">[</span>Yy<span class="o">]</span> <span class="o">)</span> break<span class="p">;;</span>
<span class="o">[</span>Nn<span class="o">]</span> <span class="o">)</span> <span class="nb">echo</span> <span class="s2">&#34;Please ensure the right files were added!&#34;</span><span class="p">;</span> <span class="nb">exit</span> 1<span class="p">;;</span>
* <span class="o">)</span> <span class="nb">echo</span> <span class="s2">&#34;Please answer y (yes) or n (no):&#34;</span> <span class="o">&amp;&amp;</span> <span class="k">continue</span><span class="p">;</span>
<span class="k">esac</span>
<span class="k">done</span>
<span class="k">while</span> <span class="nb">read</span> -p <span class="s2">&#34;Has the documentation been updated? (Y/n) &#34;</span> yn<span class="p">;</span> <span class="k">do</span>
<span class="k">case</span> <span class="nv">$yn</span> in
<span class="o">[</span>Yy<span class="o">]</span> <span class="o">)</span> break<span class="p">;;</span>
<span class="o">[</span>Nn<span class="o">]</span> <span class="o">)</span> <span class="nb">echo</span> <span class="s2">&#34;Please add or update the docs!&#34;</span><span class="p">;</span> <span class="nb">exit</span> 1<span class="p">;;</span>
* <span class="o">)</span> <span class="nb">echo</span> <span class="s2">&#34;Please answer y (yes) or n (no):&#34;</span> <span class="o">&amp;&amp;</span> <span class="k">continue</span><span class="p">;</span>
<span class="k">esac</span>
<span class="k">done</span>
<span class="k">while</span> <span class="nb">read</span> -p <span class="s2">&#34;Do you know which issue or PR numbers to reference? (Y/n) &#34;</span> yn<span class="p">;</span> <span class="k">do</span>
<span class="k">case</span> <span class="nv">$yn</span> in
<span class="o">[</span>Yy<span class="o">]</span> <span class="o">)</span> break<span class="p">;;</span>
<span class="o">[</span>Nn<span class="o">]</span> <span class="o">)</span> <span class="nb">echo</span> <span class="s2">&#34;Better go check those tracking numbers!&#34;</span><span class="p">;</span> <span class="nb">exit</span> 1<span class="p">;;</span>
* <span class="o">)</span> <span class="nb">echo</span> <span class="s2">&#34;Please answer y (yes) or n (no):&#34;</span> <span class="o">&amp;&amp;</span> <span class="k">continue</span><span class="p">;</span>
<span class="k">esac</span>
<span class="k">done</span>
<span class="nb">exec</span> &lt;<span class="p">&amp;</span>-
</code></pre></div><h1 id="take-my-money">Take my money!</h1>
<p>Don&rsquo;t delay! Take advantage <em>right now</em> of this generous <em>one-time offer!</em> An interactive pre-commit hook checklist can be yours, today, for the low, low price of&hellip; free? Wait, who wrote this script?</p>
How to set up a short feedback loop as a solo coderhttps://victoria.dev/blog/how-to-set-up-a-short-feedback-loop-as-a-solo-coder/
Mon, 02 Jul 2018 10:08:41 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/how-to-set-up-a-short-feedback-loop-as-a-solo-coder/Strategies for continuous improvement when you're a freelance developer.
]]>
<p>I&rsquo;ve spent the last couple years as a solo freelance developer. Comparing this experience to previously working in companies, I&rsquo;ve noticed that those of us who work alone can have fewer iterative opportunities for improvement than developers who work on teams. Integral to having opportunities to improve is the concept of a short feedback loop: a process of incorporating new learning from observation and previous experience continuously over a short period of time. This process has to be <em>manufactured</em> by people working mostly alone, instead of, as is often the case, <em>adopted</em> when you join a team.</p>
<p>In this post I hope to share what I&rsquo;ve learned about setting yourself up to improve quickly and continuously as a solo coder.</p>
<h1 id="about-feedback-loops">About feedback loops</h1>
<p>United States Air Force Colonel John Boyd developed the concept of the <a href="https://en.wikipedia.org/wiki/OODA_loop">OODA loop</a>, OODA being an acronym for <em>observe, orient, decide, act</em>. In military operations, this illustrates a process of decision-making based on the constant ingestion of new information:</p>
<p><strong>Observe:</strong> Obtain raw information about unfolding circumstances and the current environment.<br>
<strong>Orient:</strong> Put raw observations in context. Consider such things as relevancy to the current situation and previously gained knowledge and expertise.<br>
<strong>Decide:</strong> Make a plan for moving towards your goal.<br>
<strong>Act:</strong> Execute the plan.</p>
<p>Since it&rsquo;s a loop, the <em>act</em> stage leads directly back into the <em>observe</em> stage. This is the critical &ldquo;feed back&rdquo; concept that enables increasingly successful iterations. It&rsquo;s widely applicable beyond military operations - you may recognize it as the origin of the <a href="https://en.wikipedia.org/wiki/PDCA">PDCA</a> (plan-do-check-act) method.</p>
<p>I like the OODA loop for being a succinct illustration of a general feedback loop. Many concepts and working methods build on the idea of feedback loops, including <a href="https://en.wikipedia.org/wiki/DevOps">DevOps</a> and <a href="https://en.wikipedia.org/wiki/Agile_software_development">Agile software development</a> methods.</p>
<h2 id="development-team-feedback-loop">Development team feedback loop</h2>
<p>Let&rsquo;s look at what some components of a feedback loop for a developer on a team might look like:</p>
<ol>
<li>Direction from product owners or reviews from users</li>
<li>Daily scrum/standup with whole team</li>
<li>Prioritization with developer team</li>
<li>Individual coding and testing</li>
<li>Peer code review</li>
<li>Deployment and performance monitoring</li>
</ol>
<p>Implicit in these steps is the support of co-workers and management - in other words, someone to answer to. How can a solo freelance developer create a similar environment of accountability?</p>
<h2 id="solo-developer-feedback-loop">Solo developer feedback loop</h2>
<p>Here are some possible steps that an individual freelance developer can implement to create a short feedback loop:</p>
<ol>
<li>Build discipline</li>
<li>Clarify concrete top-level goals</li>
<li>Prioritize and plan mid-level and low-level goals</li>
<li>Automate your work</li>
<li>Block out time for code review</li>
<li>Block out time for process review</li>
<li>Update your goals and processes with the results of your reviews</li>
</ol>
<p>I&rsquo;ll cover each of these stages in detail below.</p>
<h1 id="build-discipline">Build discipline</h1>
<p>More of a prerequisite than a stage in itself, building discipline is what enables our short feedback loop to work. Nothing else in this article will be helpful unless we have the skill to do something we don&rsquo;t want to do. Discipline is most certainly a skill. It can be learned, trained, and improved just like any other.</p>
<p>Why is discipline so important? Because when we&rsquo;re crunching to get a project completed this Friday evening, we&rsquo;re not going to want to write a good commit message. We&rsquo;re not going to want to clean up the code comments. We just want to see the darn thing go, <em>Hello, git push -f</em>. It&rsquo;s in those moments that discipline enables us to not miss an opportunity to practice, learn, and improve our work process. Discipline helps us avoid Friday night commits that turn into Monday morning <code>git reset --hard</code>s.</p>
<h1 id="clarify-concrete-top-level-goals">Clarify concrete top-level goals</h1>
<p><img src="feedback-topgoal.png" alt="Envisioning a peanut butter and jelly sandwich"></p>
<p>Whether working for a client or bootstrapping our own best-new-app-ever, we won&rsquo;t be able to measure any progress or improvements without something to measure them against.</p>
<p>When I&rsquo;m discussing a new project with a client, I always speak in terms of concrete achievements. This could take the form of accomplishing a specific feature by a certain date, or deciding what the MVP looks like to a user. This is as much to my benefit as my client&rsquo;s. By agreeing, in writing, <em>what</em> will be achieved and <em>when</em>, my client and I have clearly defined top-level goals and can both assess how the project is progressing. When I&rsquo;m working for myself, I treat myself as I would a client. I make a commitment, in writing, describing what will be achieved, and when. This can be something as simple as a goals list for the week, or as detailed as a kanban board.</p>
<p>The point of having a concrete goal, however, is not to stick to it at all costs. It&rsquo;s important to set an expectation, with ourselves and with our clients, that the goals will be revisited at mutually-agreeable dates over the course of the project. This enables the all-important &ldquo;feed back&rdquo; part of the loop.</p>
<h1 id="prioritize-and-plan-mid-level-and-low-level-goals">Prioritize and plan mid-level and low-level goals</h1>
<p><img src="feedback-stepgoals.png" alt="The components of a peanut butter and jelly sandwich"></p>
<p>Few goals are achieved all in one step. Even the simple process of making a peanut butter and jelly sandwich (a favourite computer programming <a href="https://www.youtube.com/watch?v=y62zj9ozPOM&amp;t=1016s">teaching example</a>) can be broken down into successively smaller, more precise instructions. While we humans may not require the granularity that a computer program does, goals that are chunked into time-boxed, achievable steps are much more easily digested. 🥪</p>
<p>Start with the mid-level goals, and make each step concrete. If the goal is to release a new open source web app, for example, the steps might look like this:</p>
<ol>
<li>Complete app JavaScript</li>
<li>Create front end and stylesheet</li>
<li>Do local tests</li>
<li>Set up cloud server</li>
<li>Deploy app to cloud</li>
<li>Do tests</li>
<li>Add repository to GitHub</li>
<li>Post on Hacker News</li>
<li>Profit!!!</li>
</ol>
<p>Each of the above examples encapsulates many smaller, low-level goals - we can think of these as our to-do list items. For example, &ldquo;Set up cloud server&rdquo; might involve:</p>
<ol>
<li>Research cloud providers</li>
<li>Decide on service and sign up</li>
<li>Set up server/instance</li>
<li>Add integrations</li>
<li>Test deployment</li>
</ol>
<p>Our parameters for chunk sizes and what constitutes a &ldquo;step&rdquo; may be different from one another, and will likely change from project to project. If your mid-level and low-level steps clearly define a concrete path for achieving the top-level goals you set, then you&rsquo;re in good shape. Later on, evaluating the decision process that brought us to these mid-level and low-level goals enables us to bring our feedback loop full circle.</p>
<h1 id="automate-your-work">Automate your work</h1>
<p><img src="feedback-autorobot.png" alt="Peanut butter and jelly sandwich robot"></p>
<p>I recently read a great article entitled <a href="https://queue.acm.org/detail.cfm?id=3197520">Manual Work is a Bug</a>. It discusses a process by which successful developers document and eventually automate their work. The beauty of this idea is in its simplicity. By writing down the things we do manually, we&rsquo;re able to correct and refine our processes. By refining our processes, we can more easily translate them into code snippets and scripts. With a collection of scripts that we can string together, we can automate our work.</p>
<p>Automating work isn&rsquo;t only about saving time. It reduces haven&rsquo;t-had-my-coffee-yet errors, minimizes cognitive load allowing more room for creativity, and allows our processes to be repeatable across collaborators and projects. It help shorten our feedback loop by ensuring we aren&rsquo;t doing the same thing three times in three different ways.</p>
<p>We can begin to automate by starting our own personal wiki. If we build a habit of writing down every manual thing we do, no matter how basic it may seem at the time, we give ourselves more opportunities to spot patterns, and thus possible integrations and improvements.</p>
<p>The first time we do something manually, we write out the steps. The second time, we follow the steps. This gives us the opportunity to correct and refine them based on what we&rsquo;ve learned since the first time. Over successive iterations, we might replace parts of manual commands with variables; we might find handy snippets of bash scripts that automate just a part of our task. As long as we keep revising and improving our personal wiki, we&rsquo;re moving towards automation.</p>
<h1 id="block-out-time-for-code-review">Block out time for code review</h1>
<p><img src="cover_feedback-pbjreview.png" alt="Reviewing a peanut butter and jelly sandwich with a clipboard"></p>
<p>It&rsquo;s all too easy to commit messy code when we work alone. We think, <em>who&rsquo;s going to see it? I&rsquo;ll fix it later.</em> Each time that happens, though, we&rsquo;re building a habit. It&rsquo;s a bad one.</p>
<p>Working alone means there&rsquo;s no one likely to give feedback on our commits when we&rsquo;re doing something that doesn&rsquo;t make sense, or that could be improved. Instead, we have to actively seek out opportunities to improve. Open source communities are amazing for this. There&rsquo;s a wealth of information available to us in terms of coding styles, examples of refactored code, and a smorgasbord of snippets that achieve that-thing-we-were-trying-to-do but in fewer lines. We can learn all we please, if we just block out the time to do it.</p>
<p>Schedule your own code review at a time that makes sense for you and the project you&rsquo;re working on. This might be each time you finish a fix or feature, or at regular intervals daily or weekly. If you have someone who can help, book them. There are also <a href="https://victoria.dev/blog/top-free-resources-for-developing-coding-superpowers/">chatrooms full of people</a> happy to lend a hand.</p>
<p>Do some research on basic best practices for what you&rsquo;re working on. Set yourself a time limit though, and take what you read with a grain of salt. There&rsquo;s a lot of rabbit holes in that field. As a starting point, I&rsquo;d recommend learning about DRY code, and watching <a href="https://www.youtube.com/watch?v=p0O1VVqRSK0&amp;feature=youtu.be&amp;t=330">Uncle Bob demand professionalism in software development</a>.</p>
<h2 id="code-review-checklist">Code review checklist</h2>
<p>Here&rsquo;s my personal code review checklist, based off some general best practices. Feel free to use it as a starting point for your own!</p>
<blockquote>
<p><strong>Victoria&rsquo;s Code Review Extravaganza!</strong></p>
<ul>
<li><input disabled="" type="checkbox"> This solves a high-priority item.</li>
<li><input disabled="" type="checkbox"> This is a complete implementation that follows the specification.</li>
<li><input disabled="" type="checkbox"> Off-topic changes were not included and have been added to backlog.</li>
<li><input disabled="" type="checkbox"> Variable names are meaningful and there are no magic numbers.</li>
<li><input disabled="" type="checkbox"> Correct and useful error messages are returned at every opportunity.</li>
<li><input disabled="" type="checkbox"> No debugging print statements were left in.</li>
<li><input disabled="" type="checkbox"> This code is DRY and modular.</li>
<li><input disabled="" type="checkbox"> This code is secure. Private and public code are well separated.</li>
<li><input disabled="" type="checkbox"> This code is its own documentation, or the documentation is up to date.</li>
<li><input disabled="" type="checkbox"> A five-year-old could follow this, seriously it&rsquo;s that readable.</li>
<li><input disabled="" type="checkbox"> Unit tests successfully pass.</li>
<li><input disabled="" type="checkbox"> Master was merged into the branch and tested.</li>
<li><input disabled="" type="checkbox"> Formatting follows style guidelines.</li>
<li><input disabled="" type="checkbox"> I cannot find any further edge cases or known defects.</li>
<li><input disabled="" type="checkbox"> I would be happy if this code was publicly attributed to me.</li>
<li><input disabled="" type="checkbox"> I fully understand what the code does and the impact of the changes I made.</li>
<li><input disabled="" type="checkbox"> I actually verified that it actually does what I said it does.</li>
</ul>
</blockquote>
<p><a href="https://dev.to/gonedark/writing-clean-code">Here is an excellent example</a> of cleaning up code with some of the above points in mind.</p>
<h1 id="block-out-time-for-process-review">Block out time for process review</h1>
<p><img src="feedback-robotreview.png" alt="Reviewing sandwich making robot with clipboard"></p>
<p>Just as we learn from reviewing our code, we refine our processes by reviewing them as well. Process review is most beneficial when visited at regular intervals throughout the project, not just after the project&rsquo;s completion. For short-term projects, a good starting point for scheduling process reviews is at each half-mark - once midway through, and again after completion. Long-term projects may have reviews at each quarter-mark.</p>
<h2 id="process-review-questions">Process review questions</h2>
<p>Process review can be as simple as a short list of questions:</p>
<ol>
<li>What were my top-level goals for this period? Did I meet them?</li>
<li>What were my mid-level and low-level goals for this period? Did I meet them?</li>
<li>Would I have been better served with different or more specific goals? Why?</li>
<li>Did I successfully remove or automate obstacles?</li>
<li>Did I stick to my code review schedule? Why or why not?</li>
<li>How might I remove obstacles next time?</li>
</ol>
<p>Setting aside dedicated time for our process review can help us to answer questions like these thoughtfully and honestly. This allows us to squeeze out every bit of learning we can from our review, helping to shorten our feedback loop.</p>
<h1 id="update-your-goals-and-processes-with-the-results-of-your-reviews">Update your goals and processes with the results of your reviews</h1>
<p><img src="feedback-updategoals.png" alt="Adding additional arms to robot while envisioning a multi-layer PB&J"></p>
<p>All the performance data in the world is no good to us if we don&rsquo;t put it into practice. With each successive code review, we can refine and add to our checklist. With what we learn from each process review, we can fine tune and improve our processes. The more we can invent concrete and observable ways to implement our learning, the more success we&rsquo;ll have.</p>
<p>Making a conscious effort to utilize and practice the things we&rsquo;ve learned is the final, vital, component of our feedback loop. The more often we incorporate new learning, the shorter our loop becomes, allowing us to improve that much faster.</p>
Adorable bookmarklets want to help delete your social media datahttps://victoria.dev/blog/adorable-bookmarklets-want-to-help-delete-your-social-media-data/
Thu, 14 Jun 2018 13:12:02 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/adorable-bookmarklets-want-to-help-delete-your-social-media-data/Bookmarklets you can use in your browser to help clean up your social media data.
]]>
<p>A little while ago I wrote about <a href="https://victoria.dev/blog/why-im-automatically-deleting-my-old-tweets-using-aws-lambda/">a Lambda function I called ephemeral</a> for deleting my old tweets. While it&rsquo;s a great project for someone familiar with or wanting to learn to use Lambda, it isn&rsquo;t simple for a non-technical person to set up. There are services out there that will delete your tweets for you, but require your access credentials. There didn&rsquo;t seem to be anything that provided convenience without also requiring authentication.</p>
<p>So, I went oldschool and created the ephemeral bookmarklet.</p>
<p>If that didn&rsquo;t make you instantly nostalgic, a <a href="https://en.wikipedia.org/wiki/Bookmarklet">bookmarklet</a> is a little application that lives as a bookmark in your web browser. You &ldquo;install&rdquo; it by dragging the link to your bookmarks toolbar, or right-clicking on the link and choosing &ldquo;Bookmark this link&rdquo; (Firefox). You click it to execute the program on the current page.</p>
<p>Here&rsquo;s what the ephemeral bookmarklet will do:</p>
<!-- raw HTML omitted -->
<p>The ephemeral bookmarklet is part of a new suite of tools for personal data management that I&rsquo;m co-creating with Adam Drake. You can <a href="https://adamdrake.github.io/pdmtools/">get all the bookmarklets on this page</a>, and they&rsquo;re also open source <a href="https://github.com/adamdrake/pdmtools">on GitHub</a>.</p>
<p>There are currently bookmarklets for managing your data on LinkedIn and Twitter. We&rsquo;re looking for testers and contributors to help make this a comprehensive toolset for your social media data management. If you write code, I invite you to contribute and help this toolset grow.</p>
<p>∩{｡◕‿◕｡}∩ &ndash; Bookmarklet says hi!</p>
A coffee-break introduction to time complexity of algorithmshttps://victoria.dev/blog/a-coffee-break-introduction-to-time-complexity-of-algorithms/
Wed, 30 May 2018 14:08:28 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/a-coffee-break-introduction-to-time-complexity-of-algorithms/A groundwork understanding of algorithm time complexity in about fifteen minutes.
]]>
<p>Just like writing your very first <code>for</code> loop, understanding time complexity is an integral milestone to learning how to write efficient complex programs. Think of it as having a superpower that allows you to know exactly what type of program might be the most efficient in a particular situation - before even running a single line of code.</p>
<p>The fundamental concepts of complexity analysis are well worth studying. You&rsquo;ll be able to better understand how the code you&rsquo;re writing will interact with the program&rsquo;s input, and as a result, you&rsquo;ll spend a lot less wasted time writing slow and problematic code. It won&rsquo;t take long to go over all you need to know in order to start writing more efficient programs - in fact, we can do it in about fifteen minutes. You can go grab a coffee right now (or tea, if that&rsquo;s your thing) and I&rsquo;ll take you through it before your coffee break is over. Go ahead, I&rsquo;ll wait.</p>
<p>All set? Let&rsquo;s do it!</p>
<h2 id="what-is-time-complexity-anyway">What is &ldquo;time complexity&rdquo; anyway?</h2>
<p>The time complexity of an algorithm is an <strong>approximation</strong> of how long that algorithm will take to process some input. It describes the efficiency of the algorithm by the magnitude of its operations. This is different than the number of times an operation repeats; I&rsquo;ll expand on that later. Generally, the fewer operations the algorithm has, the faster it will be.</p>
<p>We write about time complexity using <a href="https://en.wikipedia.org/wiki/Big_O_notation">Big O notation</a>, which looks something like <em>O</em>(<em>n</em>). There&rsquo;s rather a lot of math involved in its formal definition, but informally we can say that Big O notation gives us our algorithm&rsquo;s approximate run time in the <strong>worst case</strong>, or in other words, its upper bound.<!-- raw HTML omitted -->[<a href="#sources">2</a>]<!-- raw HTML omitted --> It is inherently relative and comparative.<!-- raw HTML omitted -->[<a href="#sources">3</a>]<!-- raw HTML omitted --> We&rsquo;re describing the algorithm&rsquo;s efficiency relative to the increasing size of its input data, <em>n</em>. If the input is a string, then <em>n</em> is the length of the string. If it&rsquo;s a list of integers, <em>n</em> is the length of the list.</p>
<p>It&rsquo;s easiest to picture what Big O notation represents with a graph:</p>
<figure class="screenshot">
<img src="graph.png"
alt="A graph showing different classes of time complexity"/> <figcaption>
<p>Lines made with the very excellent Desmos graph calculator. You can <a href="https://www.desmos.com/calculator/xpfyjl1lbn">play with this graph here</a>.</p>
</figcaption>
</figure>
<p>Here are the main important points to remember as you read the rest of this article:</p>
<ul>
<li>Time complexity is an approximation</li>
<li>An algorithm&rsquo;s time complexity approximates its worst case run time</li>
</ul>
<h2 id="determining-time-complexity">Determining time complexity</h2>
<p>There are different classes of complexity that we can use to quickly understand an algorithm. I&rsquo;ll illustrate some of these classes using nested loops and other examples.</p>
<h2 id="polynomial-time-complexity">Polynomial time complexity</h2>
<p>A <strong>polynomial</strong>, from the Greek <em>poly</em> meaning &ldquo;many,&rdquo; and Latin <em>nomen</em> meaning &ldquo;name,&rdquo; describes an expression comprised of constant variables, and addition, multiplication, and exponentiation to a non-negative integer power.<!-- raw HTML omitted -->[<a href="#sources">4</a>]<!-- raw HTML omitted --> That&rsquo;s a super math-y way to say that it contains variables usually denoted by letters, and symbols that look like these:</p>
<p><img src="polynomial.png" alt="A polynomial example"></p>
<p>The below classes describe polynomial algorithms. Some have food examples.</p>
<h3 id="constant">Constant</h3>
<p>A <strong>constant time</strong> algorithm doesn&rsquo;t change its running time in response to the input data. No matter the size of the data it receives, the algorithm takes the same amount of time to run. We denote this as a time complexity of <em>O</em>(1).</p>
<figure class="screenshot">
<img src="graph%281%29.png"
alt="A graph showing constant time complexity."/>
</figure>
<p>Here&rsquo;s one example of a constant algorithm that takes the first item in a slice.</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="kd">func</span> <span class="nf">takeCupcake</span><span class="p">(</span><span class="nx">cupcakes</span> <span class="p">[]</span><span class="kt">int</span><span class="p">)</span> <span class="kt">int</span> <span class="p">{</span>
<span class="k">return</span> <span class="nx">cupcakes</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="p">}</span>
</code></pre></div><figure>
<img src="cupcakes.png"
alt="Types of cupcakes"/> <figcaption>
<p>Choice of flavours are: vanilla cupcake, strawberry cupcake, mint chocolate cupcake, lemon cupcake, and wibbly wobbly, timey wimey cupcake.</p>
</figcaption>
</figure>
<p>With this constant-time algorithm, no matter how many cupcakes are on offer, you just get the first one. Oh well. Flavours are overrated anyway.</p>
<h3 id="linear">Linear</h3>
<p>The running duration of a <strong>linear</strong> algorithm is constant. It will process the input in <em>n</em> number of operations. This is often the best possible (most efficient) case for time complexity where all the data must be examined.</p>
<figure class="screenshot">
<img src="graph%28n%29.png"
alt="A graph showing linear time complexity."/>
</figure>
<p>Here&rsquo;s an example of code with time complexity of <em>O</em>(<em>n</em>):</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="kd">func</span> <span class="nf">eatChips</span><span class="p">(</span><span class="nx">bowlOfChips</span> <span class="kt">int</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="nx">chip</span> <span class="o">:=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">chip</span> <span class="o">&lt;=</span> <span class="nx">bowlOfChips</span><span class="p">;</span> <span class="nx">chip</span><span class="o">++</span> <span class="p">{</span>
<span class="c1">// dip chip
</span><span class="c1"></span> <span class="p">}</span>
<span class="p">}</span>
</code></pre></div><p>Here&rsquo;s another example of code with time complexity of <em>O</em>(<em>n</em>):</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="kd">func</span> <span class="nf">eatChips</span><span class="p">(</span><span class="nx">bowlOfChips</span> <span class="kt">int</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="nx">chip</span> <span class="o">:=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">chip</span> <span class="o">&lt;=</span> <span class="nx">bowlOfChips</span><span class="p">;</span> <span class="nx">chip</span><span class="o">++</span> <span class="p">{</span>
<span class="c1">// double dip chip
</span><span class="c1"></span> <span class="p">}</span>
<span class="p">}</span>
</code></pre></div><p>It doesn&rsquo;t matter whether the code inside the loop executes once, twice, or any number of times. Both these loops process the input by a constant factor of <em>n</em>, and thus can be described as linear.</p>
<figure>
<img src="dip.png"
alt="Lifeguard MIQ the chip says no double dipping"/> <figcaption>
<p>Don&rsquo;t double dip in a shared bowl.</p>
</figcaption>
</figure>
<h3 id="quadratic">Quadratic</h3>
<figure class="screenshot">
<img src="graph%28n2%29.png"
alt="A graph showing quadratic time complexity"/>
</figure>
<p>Now here&rsquo;s an example of code with time complexity of <em>O</em>(_n_<!-- raw HTML omitted -->2<!-- raw HTML omitted -->):</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="kd">func</span> <span class="nf">pizzaDelivery</span><span class="p">(</span><span class="nx">pizzas</span> <span class="kt">int</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="nx">pizza</span> <span class="o">:=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">pizza</span> <span class="o">&lt;=</span> <span class="nx">pizzas</span><span class="p">;</span> <span class="nx">pizza</span><span class="o">++</span> <span class="p">{</span>
<span class="c1">// slice pizza
</span><span class="c1"></span> <span class="k">for</span> <span class="nx">slice</span> <span class="o">:=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">slice</span> <span class="o">&lt;=</span> <span class="nx">pizza</span><span class="p">;</span> <span class="nx">slice</span><span class="o">++</span> <span class="p">{</span>
<span class="c1">// eat slice of pizza
</span><span class="c1"></span> <span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div><p>Because there are two nested loops, or nested linear operations, the algorithm process the input _n_<!-- raw HTML omitted -->2<!-- raw HTML omitted --> times.</p>
<h3 id="cubic">Cubic</h3>
<figure class="screenshot">
<img src="graph%28n3%29.png"
alt="A graph showing cubic time complexity"/>
</figure>
<p>Extending on the previous example, this code with three nested loops has time complexity of <em>O</em>(_n_<!-- raw HTML omitted -->3<!-- raw HTML omitted -->):</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="kd">func</span> <span class="nf">pizzaDelivery</span><span class="p">(</span><span class="nx">boxesDelivered</span> <span class="kt">int</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="nx">pizzaBox</span> <span class="o">:=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">pizzaBox</span> <span class="o">&lt;=</span> <span class="nx">boxesDelivered</span><span class="p">;</span> <span class="nx">pizzaBox</span><span class="o">++</span> <span class="p">{</span>
<span class="c1">// open box
</span><span class="c1"></span> <span class="k">for</span> <span class="nx">pizza</span> <span class="o">:=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">pizza</span> <span class="o">&lt;=</span> <span class="nx">pizzaBox</span><span class="p">;</span> <span class="nx">pizza</span><span class="o">++</span> <span class="p">{</span>
<span class="c1">// slice pizza
</span><span class="c1"></span> <span class="k">for</span> <span class="nx">slice</span> <span class="o">:=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">slice</span> <span class="o">&lt;=</span> <span class="nx">pizza</span><span class="p">;</span> <span class="nx">slice</span><span class="o">++</span> <span class="p">{</span>
<span class="c1">// eat slice of pizza
</span><span class="c1"></span> <span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div><figure>
<img src="unsliced.png"
alt="A pizza pie in a box with a pizza slicer dependency"/> <figcaption>
<p>Seriously though, who delivers unsliced pizza??</p>
</figcaption>
</figure>
<h3 id="logarithmic">Logarithmic</h3>
<p>A <strong>logarithmic</strong> algorithm is one that reduces the size of the input at every step.
We denote this time complexity as <em>O</em>(log <em>n</em>), where <strong>log</strong>, the logarithm function, is this shape:</p>
<figure class="screenshot">
<img src="graph%28logn%29.png"
alt="A graph showing logarithmic time complexity"/>
</figure>
<p>One example of this is a <a href="https://en.wikipedia.org/wiki/Binary_search_algorithm">binary search algorithm</a> that finds the position of an element within a sorted array. Here&rsquo;s how it would work, assuming we&rsquo;re trying to find the element <em>x</em>:</p>
<ol>
<li>If <em>x</em> matches the middle element <em>m</em> of the array, return the position of <em>m</em></li>
<li>If <em>x</em> doesn&rsquo;t match <em>m</em>, see if <em>m</em> is larger or smaller than <em>x</em>
<ul>
<li>If larger, discard all array items greater than <em>m</em></li>
<li>If smaller, discard all array items smaller than <em>m</em></li>
</ul>
</li>
<li>Continue by repeating steps 1 and 2 on the remaining array until <em>x</em> is found</li>
</ol>
<p>I find the clearest analogy for understanding binary search is imagining the process of locating a book in a bookstore aisle. If the books are organized by author&rsquo;s last name and you want to find &ldquo;Terry Pratchett,&rdquo; you know you need to look for the &ldquo;P&rdquo; section.</p>
<p>You can approach the shelf at any point along the aisle and look at the author&rsquo;s last name there. If you&rsquo;re looking at a book by Neil Gaiman, you know you can ignore all the rest of the books to your left, since no letters that come before &ldquo;G&rdquo; in the alphabet happen to be &ldquo;P.&rdquo; You would then move down the aisle to the right any amount, and repeat this process until you&rsquo;ve found the Terry Pratchett section, which should be rather sizable if you&rsquo;re at any decent bookstore because wow did he write a lot of books.</p>
<h3 id="quasilinear">Quasilinear</h3>
<figure class="screenshot">
<img src="graph%28nlogn%29.png"
alt="A graph showing quasilinear time complexity"/>
</figure>
<p>Often seen with sorting algorithms, the time complexity <em>O</em>(<em>n</em> log <em>n</em>) can describe a data structure where each operation takes <em>O</em>(log <em>n</em>) time. One example of this is <a href="https://en.wikipedia.org/wiki/Quicksort">quick sort</a>, a divide-and-conquer algorithm.</p>
<p>Quick sort works by dividing up an unsorted array into smaller chunks that are easier to process. It sorts the sub-arrays, and thus the whole array. Think about it like trying to put a deck of cards in order. It&rsquo;s faster if you split up the cards and get five friends to help you.</p>
<h3 id="non-polynomial-time-complexity">Non-polynomial time complexity</h3>
<p>The below classes of algorithms are non-polynomial.</p>
<h3 id="factorial">Factorial</h3>
<figure class="screenshot">
<img src="graph%28nfac%29.png"
alt="A graph showing factorial time complexity"/>
</figure>
<p>An algorithm with time complexity <em>O</em>(<em>n</em>!) often iterates through all permutations of the input elements. One common example is a <a href="https://en.wikipedia.org/wiki/Brute-force_search">brute-force search</a> seen in the <a href="https://en.wikipedia.org/wiki/Travelling_salesman_problem#Computing_a_solution">travelling salesman problem</a>. It tries to find the least costly path between a number of points by enumerating all possible permutations and finding the ones with the lowest cost.</p>
<h3 id="exponential">Exponential</h3>
<p>An <strong>exponential</strong> algorithm often also iterates through all subsets of the input elements. It is denoted <em>O</em>(2<!-- raw HTML omitted -->_n_<!-- raw HTML omitted -->) and is often seen in brute-force algorithms. It is similar to factorial time except in its rate of growth, which as you may not be surprised to hear, is exponential. The larger the data set, the more steep the curve becomes.</p>
<figure class="screenshot">
<img src="graph%282n%29.png"
alt="A graph showing exponential time complexity"/>
</figure>
<p>In cryptography, a brute-force attack may systematically check all possible elements of a password by iterating through subsets. Using an exponential algorithm to do this, it becomes incredibly resource-expensive to brute-force crack a long password versus a shorter one. This is one reason that a long password is considered more secure than a shorter one.</p>
<p>There are further time complexity classes less commonly seen that I won&rsquo;t cover here, but you can read about these and find examples in <a href="https://en.wikipedia.org/wiki/Time_complexity#Table_of_common_time_complexities">this handy table</a>.</p>
<h3 id="recursion-time-complexity">Recursion time complexity</h3>
<p>As I described in my article <a href="https://victoria.dev/blog/understanding-array.prototype.reduce-and-recursion-using-apple-pie/">explaining recursion using apple pie</a>, a recursive function calls itself under specified conditions. Its time complexity depends on how many times the function is called and the time complexity of a single function call. In other words, it&rsquo;s the product of the number of times the function runs and a single execution&rsquo;s time complexity.</p>
<p>Here&rsquo;s a recursive function that eats pies until no pies are left:</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="kd">func</span> <span class="nf">eatPies</span><span class="p">(</span><span class="nx">pies</span> <span class="kt">int</span><span class="p">)</span> <span class="kt">int</span> <span class="p">{</span>
<span class="k">if</span> <span class="nx">pies</span> <span class="o">==</span> <span class="mi">0</span> <span class="p">{</span>
<span class="k">return</span> <span class="nx">pies</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nf">eatPies</span><span class="p">(</span><span class="nx">pies</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div><p>The time complexity of a single execution is constant. No matter how many pies are input, the program will do the same thing: check to see if the input is 0. If so, return, and if not, call itself with one fewer pie.</p>
<p>The initial number of pies could be any number, and we need to process all of them, so we can describe the input as <em>n</em>. Thus, the time complexity of this recursive function is the product <em>O</em>(<em>n</em>).</p>
<figure>
<img src="piespile.png"
alt="A pile of pizza boxes with pies to be eaten"/> <figcaption>
<p>This function&rsquo;s return value is zero, plus some indigestion.</p>
</figcaption>
</figure>
<h3 id="worst-case-time-complexity">Worst case time complexity</h3>
<p>So far, we&rsquo;ve talked about the time complexity of a few nested loops and some code examples. Most algorithms, however, are built from many combinations of these. How do we determine the time complexity of an algorithm containing many of these elements strung together?</p>
<p>Easy. We can describe the total time complexity of the algorithm by finding the largest complexity among all of its parts. This is because the slowest part of the code is the bottleneck, and time complexity is concerned with describing the worst case for the algorithm&rsquo;s run time.</p>
<p>Say we have a program for an office party. If our program looks like this:</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="kn">package</span> <span class="nx">main</span>
<span class="kn">import</span> <span class="s">&#34;fmt&#34;</span>
<span class="kd">func</span> <span class="nf">takeCupcake</span><span class="p">(</span><span class="nx">cupcakes</span> <span class="p">[]</span><span class="kt">int</span><span class="p">)</span> <span class="kt">int</span> <span class="p">{</span>
<span class="nx">fmt</span><span class="p">.</span><span class="nf">Println</span><span class="p">(</span><span class="s">&#34;Have cupcake number&#34;</span><span class="p">,</span><span class="nx">cupcakes</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="k">return</span> <span class="nx">cupcakes</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="p">}</span>
<span class="kd">func</span> <span class="nf">eatChips</span><span class="p">(</span><span class="nx">bowlOfChips</span> <span class="kt">int</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">fmt</span><span class="p">.</span><span class="nf">Println</span><span class="p">(</span><span class="s">&#34;Have some chips!&#34;</span><span class="p">)</span>
<span class="k">for</span> <span class="nx">chip</span> <span class="o">:=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">chip</span> <span class="o">&lt;=</span> <span class="nx">bowlOfChips</span><span class="p">;</span> <span class="nx">chip</span><span class="o">++</span> <span class="p">{</span>
<span class="c1">// dip chip
</span><span class="c1"></span> <span class="p">}</span>
<span class="nx">fmt</span><span class="p">.</span><span class="nf">Println</span><span class="p">(</span><span class="s">&#34;No more chips.&#34;</span><span class="p">)</span>
<span class="p">}</span>
<span class="kd">func</span> <span class="nf">pizzaDelivery</span><span class="p">(</span><span class="nx">boxesDelivered</span> <span class="kt">int</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">fmt</span><span class="p">.</span><span class="nf">Println</span><span class="p">(</span><span class="s">&#34;Pizza is here!&#34;</span><span class="p">)</span>
<span class="k">for</span> <span class="nx">pizzaBox</span> <span class="o">:=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">pizzaBox</span> <span class="o">&lt;=</span> <span class="nx">boxesDelivered</span><span class="p">;</span> <span class="nx">pizzaBox</span><span class="o">++</span> <span class="p">{</span>
<span class="c1">// open box
</span><span class="c1"></span> <span class="k">for</span> <span class="nx">pizza</span> <span class="o">:=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">pizza</span> <span class="o">&lt;=</span> <span class="nx">pizzaBox</span><span class="p">;</span> <span class="nx">pizza</span><span class="o">++</span> <span class="p">{</span>
<span class="c1">// slice pizza
</span><span class="c1"></span> <span class="k">for</span> <span class="nx">slice</span> <span class="o">:=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">slice</span> <span class="o">&lt;=</span> <span class="nx">pizza</span><span class="p">;</span> <span class="nx">slice</span><span class="o">++</span> <span class="p">{</span>
<span class="c1">// eat slice of pizza
</span><span class="c1"></span> <span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="nx">fmt</span><span class="p">.</span><span class="nf">Println</span><span class="p">(</span><span class="s">&#34;Pizza is gone.&#34;</span><span class="p">)</span>
<span class="p">}</span>
<span class="kd">func</span> <span class="nf">eatPies</span><span class="p">(</span><span class="nx">pies</span> <span class="kt">int</span><span class="p">)</span> <span class="kt">int</span> <span class="p">{</span>
<span class="k">if</span> <span class="nx">pies</span> <span class="o">==</span> <span class="mi">0</span> <span class="p">{</span>
<span class="nx">fmt</span><span class="p">.</span><span class="nf">Println</span><span class="p">(</span><span class="s">&#34;Someone ate all the pies!&#34;</span><span class="p">)</span>
<span class="k">return</span> <span class="nx">pies</span>
<span class="p">}</span>
<span class="nx">fmt</span><span class="p">.</span><span class="nf">Println</span><span class="p">(</span><span class="s">&#34;Eating pie...&#34;</span><span class="p">)</span>
<span class="k">return</span> <span class="nf">eatPies</span><span class="p">(</span><span class="nx">pies</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span>
<span class="p">}</span>
<span class="kd">func</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="nf">takeCupcake</span><span class="p">([]</span><span class="kt">int</span><span class="p">{</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">})</span>
<span class="nf">eatChips</span><span class="p">(</span><span class="mi">23</span><span class="p">)</span>
<span class="nf">pizzaDelivery</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
<span class="nf">eatPies</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
<span class="nx">fmt</span><span class="p">.</span><span class="nf">Println</span><span class="p">(</span><span class="s">&#34;Food gone. Back to work!&#34;</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div><p>We can describe the time complexity of all the code by the complexity of its most complex part. This program is made up of functions we&rsquo;ve already seen, with the following time complexity classes:</p>
<table>
<thead>
<tr>
<th>Function</th>
<th>Class</th>
<th>Big O</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>takeCupcake</code></td>
<td>constant</td>
<td><em>O</em>(1)</td>
</tr>
<tr>
<td><code>eatChips</code></td>
<td>linear</td>
<td><em>O</em>(<em>n</em>)</td>
</tr>
<tr>
<td><code>pizzaDelivery</code></td>
<td>cubic</td>
<td><em>O</em>(_n_<!-- raw HTML omitted -->3<!-- raw HTML omitted -->)</td>
</tr>
<tr>
<td><code>eatPies</code></td>
<td>linear (recursive)</td>
<td><em>O</em>(<em>n</em>)</td>
</tr>
</tbody>
</table>
<p>To describe the time complexity of the entire office party program, we choose the worst case. This program would have the time complexity <em>O</em>(_n_<!-- raw HTML omitted -->3<!-- raw HTML omitted -->).</p>
<p>Here&rsquo;s the office party soundtrack, just for fun.</p>
<div class="highlight"><pre class="chroma"><code class="language-bash" data-lang="bash">Have cupcake number <span class="m">1</span>
Have some chips!
No more chips.
Pizza is here!
Pizza is gone.
Eating pie...
Eating pie...
Eating pie...
Someone ate all the pies!
Food gone. Back to work!
</code></pre></div><h2 id="p-vs-np-np-complete-and-np-hard">P vs NP, NP-complete, and NP-hard</h2>
<p>You may come across these terms in your explorations of time complexity. Informally, <strong>P</strong> (for Polynomial time), is a class of problems that is quick to solve. <strong>NP</strong>, for Nondeterministic Polynomial time, is a class of problems where the answer can be quickly verified in polynomial time. NP encompasses P, but also another class of problems called <strong>NP-complete</strong>, for which no fast solution is known.<!-- raw HTML omitted -->[<a href="#sources">5</a>]<!-- raw HTML omitted --> Outside of NP but still including NP-complete is yet another class called <strong>NP-hard</strong>, which includes problems that no one has been able to verifiably solve with polynomial algorithms.<!-- raw HTML omitted -->[<a href="#sources">6</a>]<!-- raw HTML omitted --></p>
<figure class="screenshot">
<img src="pnpeuler.svg"
alt="Euler diagram"/> <figcaption>
<p>P vs NP Euler diagram, <a href="https://commons.wikimedia.org/w/index.php?curid=3532181">by Behnam Esfahbod, CC BY-SA 3.0</a></p>
</figcaption>
</figure>
<p><a href="https://en.wikipedia.org/wiki/P_versus_NP_problem">P versus NP</a> is an unsolved, open question in computer science.</p>
<p>Anyway, you don&rsquo;t generally need to know about NP and NP-hard problems to begin taking advantage of understanding time complexity. They&rsquo;re a whole other Pandora&rsquo;s box.</p>
<h2 id="approximate-the-efficiency-of-an-algorithm-before-you-write-the-code">Approximate the efficiency of an algorithm before you write the code</h2>
<p>So far, we&rsquo;ve identified some different time complexity classes and how we might determine which one an algorithm falls into. So how does this help us before we&rsquo;ve written any code to evaluate?</p>
<p>By combining a little knowledge of time complexity with an awareness of the size of our input data, we can take a guess at an efficient algorithm for processing our data within a given time constraint. We can base our estimation on the fact that a modern computer can perform some hundreds of millions of operations in a second.<!-- raw HTML omitted -->[<a href="#sources">1</a>]<!-- raw HTML omitted --> The following table from the <a href="#sources">Competitive Programmer&rsquo;s Handbook</a> offers some estimates on required time complexity to process the respective input size in a time limit of one second.</p>
<table>
<thead>
<tr>
<th>Input size</th>
<th>Required time complexity for 1s processing time</th>
</tr>
</thead>
<tbody>
<tr>
<td>n ≤ 10</td>
<td><em>O</em>(<em>n</em>!)</td>
</tr>
<tr>
<td>n ≤ 20</td>
<td><em>O</em>(2<!-- raw HTML omitted -->_n_<!-- raw HTML omitted -->)</td>
</tr>
<tr>
<td>n ≤ 500</td>
<td><em>O</em>(_n_<!-- raw HTML omitted -->3<!-- raw HTML omitted -->)</td>
</tr>
<tr>
<td>n ≤ 5000</td>
<td><em>O</em>(_n_<!-- raw HTML omitted -->2<!-- raw HTML omitted -->)</td>
</tr>
<tr>
<td>n ≤ 10<!-- raw HTML omitted -->6<!-- raw HTML omitted --></td>
<td><em>O</em>(<em>n</em> log <em>n</em>) or <em>O</em>(<em>n</em>)</td>
</tr>
<tr>
<td>n is large</td>
<td><em>O</em>(1) or <em>O</em>(log <em>n</em>)</td>
</tr>
</tbody>
</table>
<p>Keep in mind that time complexity is an approximation, and not a guarantee. We can save a lot of time and effort by immediately ruling out algorithm designs that are unlikely to suit our constraints, but we must also consider that Big O notation doesn&rsquo;t account for <strong>constant factors</strong>. Here&rsquo;s some code to illustrate.</p>
<p>The following two algorithms both have <em>O</em>(<em>n</em>) time complexity.</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="kd">func</span> <span class="nf">makeCoffee</span><span class="p">(</span><span class="nx">scoops</span> <span class="kt">int</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="nx">scoop</span> <span class="o">:=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">scoop</span> <span class="o">&lt;=</span> <span class="nx">scoops</span><span class="p">;</span> <span class="nx">scoop</span><span class="o">++</span> <span class="p">{</span>
<span class="c1">// add instant coffee
</span><span class="c1"></span> <span class="p">}</span>
<span class="p">}</span>
</code></pre></div><div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="kd">func</span> <span class="nf">makeStrongCoffee</span><span class="p">(</span><span class="nx">scoops</span> <span class="kt">int</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="nx">scoop</span> <span class="o">:=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">scoop</span> <span class="o">&lt;=</span> <span class="mi">3</span><span class="o">*</span><span class="nx">scoops</span><span class="p">;</span> <span class="nx">scoop</span><span class="o">++</span> <span class="p">{</span>
<span class="c1">// add instant coffee
</span><span class="c1"></span> <span class="p">}</span>
<span class="p">}</span>
</code></pre></div><p>The first function makes a cup of coffee with the number of scoops we ask for. The second function also makes a cup of coffee, but it triples the number of scoops we ask for. To see an illustrative example, let&rsquo;s ask both these functions for a cup of coffee with a million scoops.</p>
<p>Here&rsquo;s the output of the Go test:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">Benchmark_makeCoffee-4 <span class="m">1000000000</span> 0.29 ns/op
Benchmark_makeStrongCoffee-4 <span class="m">1000000000</span> 0.86 ns/op
</code></pre></div><p>Our first function, <code>makeCoffee</code>, completed in an average 0.29 nanoseconds. Our second function, <code>makeStrongCoffee</code>, completed in an average of 0.86 nanoseconds. While those may both seem like pretty small numbers, consider that the stronger coffee took near three times longer to make. This should make sense intuitively, since we asked it to triple the scoops. Big O notation alone wouldn&rsquo;t tell you this, since the constant factor of the tripled scoops isn&rsquo;t accounted for.</p>
<h2 id="improve-time-complexity-of-existing-code">Improve time complexity of existing code</h2>
<p>Becoming familiar with time complexity gives us the opportunity to write code, or refactor code, to be more efficient. To illustrate, I&rsquo;ll give a concrete example of one way we can refactor a bit of code to improve its time complexity.</p>
<p>Let&rsquo;s say a bunch of people at the office want some pie. Some people want pie more than others. The amount that everyone wants some pie is represented by an <code>int</code> &gt; 0:</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="nx">diners</span> <span class="o">:=</span> <span class="p">[]</span><span class="kt">int</span><span class="p">{</span><span class="mi">2</span><span class="p">,</span> <span class="mi">88</span><span class="p">,</span> <span class="mi">87</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">42</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">34</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">43</span><span class="p">,</span> <span class="mi">56</span><span class="p">}</span>
</code></pre></div><p>Unfortunately, we&rsquo;re bootstrapped and there are only three forks to go around. Since we&rsquo;re a cooperative bunch, the three people who want pie the most will receive the forks to eat it with. Even though they&rsquo;ve all agreed on this, no one seems to want to sort themselves out and line up in an orderly fashion, so we&rsquo;ll have to make do with everybody jumbled about.</p>
<p>Without sorting the list of diners, return the three largest integers in the slice.</p>
<p>Here&rsquo;s a function that solves this problem and has <em>O</em>(_n_<!-- raw HTML omitted -->2<!-- raw HTML omitted -->) time complexity:</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="kd">func</span> <span class="nf">giveForks</span><span class="p">(</span><span class="nx">diners</span> <span class="p">[]</span><span class="kt">int</span><span class="p">)</span> <span class="p">[]</span><span class="kt">int</span> <span class="p">{</span>
<span class="c1">// make a slice to store diners who will receive forks
</span><span class="c1"></span> <span class="kd">var</span> <span class="nx">withForks</span> <span class="p">[]</span><span class="kt">int</span>
<span class="c1">// loop over three forks
</span><span class="c1"></span> <span class="k">for</span> <span class="nx">i</span> <span class="o">:=</span> <span class="mi">1</span><span class="p">;</span> <span class="nx">i</span> <span class="o">&lt;=</span> <span class="mi">3</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span> <span class="p">{</span>
<span class="c1">// variables to keep track of the highest integer and where it is
</span><span class="c1"></span> <span class="kd">var</span> <span class="nx">max</span><span class="p">,</span> <span class="nx">maxIndex</span> <span class="kt">int</span>
<span class="c1">// loop over the diners slice
</span><span class="c1"></span> <span class="k">for</span> <span class="nx">n</span> <span class="o">:=</span> <span class="k">range</span> <span class="nx">diners</span> <span class="p">{</span>
<span class="c1">// if this integer is higher than max, update max and maxIndex
</span><span class="c1"></span> <span class="k">if</span> <span class="nx">diners</span><span class="p">[</span><span class="nx">n</span><span class="p">]</span> <span class="p">&gt;</span> <span class="nx">max</span> <span class="p">{</span>
<span class="nx">max</span> <span class="p">=</span> <span class="nx">diners</span><span class="p">[</span><span class="nx">n</span><span class="p">]</span>
<span class="nx">maxIndex</span> <span class="p">=</span> <span class="nx">n</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c1">// remove the highest integer from the diners slice for the next loop
</span><span class="c1"></span> <span class="nx">diners</span> <span class="p">=</span> <span class="nb">append</span><span class="p">(</span><span class="nx">diners</span><span class="p">[:</span><span class="nx">maxIndex</span><span class="p">],</span> <span class="nx">diners</span><span class="p">[</span><span class="nx">maxIndex</span><span class="o">+</span><span class="mi">1</span><span class="p">:]</span><span class="o">...</span><span class="p">)</span>
<span class="c1">// keep track of who gets a fork
</span><span class="c1"></span> <span class="nx">withForks</span> <span class="p">=</span> <span class="nb">append</span><span class="p">(</span><span class="nx">withForks</span><span class="p">,</span> <span class="nx">max</span><span class="p">)</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nx">withForks</span>
<span class="p">}</span>
</code></pre></div><p>This program works, and eventually returns diners <code>[88 87 56]</code>. Everyone gets a little impatient while it&rsquo;s running though, since it takes rather a long time (about 120 nanoseconds) just to hand out three forks, and the pie&rsquo;s getting cold. How could we improve it?</p>
<p>By thinking about our approach in a slightly different way, we can refactor this program to have <em>O</em>(<em>n</em>) time complexity:</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="kd">func</span> <span class="nf">giveForks</span><span class="p">(</span><span class="nx">diners</span> <span class="p">[]</span><span class="kt">int</span><span class="p">)</span> <span class="p">[]</span><span class="kt">int</span> <span class="p">{</span>
<span class="c1">// make a slice to store diners who will receive forks
</span><span class="c1"></span> <span class="kd">var</span> <span class="nx">withForks</span> <span class="p">[]</span><span class="kt">int</span>
<span class="c1">// create variables for each fork
</span><span class="c1"></span> <span class="kd">var</span> <span class="nx">first</span><span class="p">,</span> <span class="nx">second</span><span class="p">,</span> <span class="nx">third</span> <span class="kt">int</span>
<span class="c1">// loop over the diners
</span><span class="c1"></span> <span class="k">for</span> <span class="nx">i</span> <span class="o">:=</span> <span class="k">range</span> <span class="nx">diners</span> <span class="p">{</span>
<span class="c1">// assign the forks
</span><span class="c1"></span> <span class="k">if</span> <span class="nx">diners</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span> <span class="p">&gt;</span> <span class="nx">first</span> <span class="p">{</span>
<span class="nx">third</span> <span class="p">=</span> <span class="nx">second</span>
<span class="nx">second</span> <span class="p">=</span> <span class="nx">first</span>
<span class="nx">first</span> <span class="p">=</span> <span class="nx">diners</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span>
<span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="nx">diners</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span> <span class="p">&gt;</span> <span class="nx">second</span> <span class="p">{</span>
<span class="nx">third</span> <span class="p">=</span> <span class="nx">second</span>
<span class="nx">second</span> <span class="p">=</span> <span class="nx">diners</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span>
<span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="nx">diners</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span> <span class="p">&gt;</span> <span class="nx">third</span> <span class="p">{</span>
<span class="nx">third</span> <span class="p">=</span> <span class="nx">diners</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c1">// list the final result of who gets a fork
</span><span class="c1"></span> <span class="nx">withForks</span> <span class="p">=</span> <span class="nb">append</span><span class="p">(</span><span class="nx">withForks</span><span class="p">,</span> <span class="nx">first</span><span class="p">,</span> <span class="nx">second</span><span class="p">,</span> <span class="nx">third</span><span class="p">)</span>
<span class="k">return</span> <span class="nx">withForks</span>
<span class="p">}</span>
</code></pre></div><p>Here&rsquo;s how the new program works:</p>
<p>Initially, diner <code>2</code> (the first in the list) is assigned the <code>first</code> fork. The other forks remain unassigned.</p>
<p>Then, diner <code>88</code> is assigned the first fork instead. Diner <code>2</code> gets the <code>second</code> one.</p>
<p>Diner <code>87</code> isn&rsquo;t greater than <code>first</code> which is currently <code>88</code>, but it is greater than <code>2</code> who has the <code>second</code> fork. So, the <code>second</code> fork goes to <code>87</code>. Diner <code>2</code> gets the <code>third</code> fork.</p>
<p>Continuing in this violent and rapid fork exchange, diner <code>16</code> is then assigned the <code>third</code> fork instead of <code>2</code>, and so on.</p>
<p>We can add a print statement in the loop to see how the fork assignments play out:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh"><span class="m">0</span> <span class="m">0</span> <span class="m">0</span>
<span class="m">2</span> <span class="m">0</span> <span class="m">0</span>
<span class="m">88</span> <span class="m">2</span> <span class="m">0</span>
<span class="m">88</span> <span class="m">87</span> <span class="m">2</span>
<span class="m">88</span> <span class="m">87</span> <span class="m">16</span>
<span class="m">88</span> <span class="m">87</span> <span class="m">42</span>
<span class="m">88</span> <span class="m">87</span> <span class="m">42</span>
<span class="m">88</span> <span class="m">87</span> <span class="m">42</span>
<span class="m">88</span> <span class="m">87</span> <span class="m">42</span>
<span class="m">88</span> <span class="m">87</span> <span class="m">43</span>
<span class="o">[</span><span class="m">88</span> <span class="m">87</span> 56<span class="o">]</span>
</code></pre></div><p>This program is much faster, and the whole epic struggle for fork domination is over in 47 nanoseconds.</p>
<p>As you can see, with a little change in perspective and some refactoring, we&rsquo;ve made this simple bit of code faster and more efficient.</p>
<p>Well, it looks like our fifteen minute coffee break is up! I hope I&rsquo;ve given you a comprehensive introduction to calculating time complexity. Time to get back to work, hopefully applying your new knowledge to write more effective code! Or maybe just sound smart at your next office party. :)</p>
<h2 id="sources">Sources</h2>
<p>&ldquo;If I have seen further it is by standing on the shoulders of Giants.&rdquo; &ndash;Isaac Newton, 1675</p>
<ol>
<li>Antti Laaksonen. <em><a href="https://cses.fi/book.pdf">Competitive Programmer&rsquo;s Handbook (pdf)</a>,</em> 2017</li>
<li>Wikipedia: <a href="https://en.wikipedia.org/wiki/Big_O_notation">Big O notation</a></li>
<li>StackOverflow: <a href="https://stackoverflow.com/a/487278">What is a plain English explanation of “Big O” notation?</a></li>
<li>Wikipedia: <a href="https://en.wikipedia.org/wiki/Polynomial">Polynomial</a></li>
<li>Wikipedia: <a href="https://en.wikipedia.org/wiki/NP-completeness">NP-completeness</a></li>
<li>Wikipedia: <a href="https://en.wikipedia.org/wiki/NP-hardness">NP-hardness</a></li>
<li><a href="https://www.desmos.com/">Desmos graph calculator</a></li>
</ol>
Knapsack problem algorithms for my real-life carry-on knapsackhttps://victoria.dev/blog/knapsack-problem-algorithms-for-my-real-life-carry-on-knapsack/
Wed, 09 May 2018 21:00:35 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/knapsack-problem-algorithms-for-my-real-life-carry-on-knapsack/Using a greedy algorithm and dynamic programming to pack my full-time nomad travel bag.
]]>
<p>I&rsquo;m a nomad and live out of one carry-on bag. This means that the total weight of all my worldly possessions must fall under airline cabin baggage weight limits - usually 10kg. On some smaller airlines, however, this weight limit drops to 7kg. Occasionally, I have to decide not to bring something with me to adjust to the smaller weight limit.</p>
<p>As a practical exercise, deciding what to leave behind (or get rid of altogether) entails laying out all my things and choosing which ones to keep. That decision is based on the item&rsquo;s usefulness to me (its worth) and its weight.</p>
<figure>
<img src="knapsack-stuff.jpeg"
alt="All my stuff, laid out flat"/> <figcaption>
<p>This is all my stuff, and my Minaal Carry-on bag.</p>
</figcaption>
</figure>
<p>Being a programmer, I&rsquo;m aware that decisions like this could be made more efficiently by a computer. It&rsquo;s done so frequently and so ubiquitously, in fact, that many will recognize this scenario as the classic <em>packing problem</em> or <em>knapsack problem.</em> How do I go about telling a computer to put as many important items in my bag as possible while coming in at or under a weight limit of 7kg? With algorithms! Yay!</p>
<p>I&rsquo;ll discuss two common approaches to solving the knapsack problem: one called a <em>greedy algorithm,</em> and another called <em>dynamic programming</em> (a little harder, but better, faster, stronger&hellip;).</p>
<p>Let&rsquo;s get to it.</p>
<h2 id="the-set-up">The set up</h2>
<p>I prepared my data in the form of a CSV file with three columns: the item&rsquo;s name (a string), a representation of its worth (an integer), and its weight in grams (an integer). There are 40 items in total. I represented worth by ranking each item from 40 to 1, with 40 being the most important and 1 equating with something like &ldquo;why do I even have this again?&rdquo; (If you&rsquo;ve never listed out all your possessions and ranked them by order of how useful they are to you, I highly recommend you try it. It can be a very revealing exercise.)</p>
<p><strong>Total weight of all items and bag:</strong> 9003g</p>
<p><strong>Bag weight:</strong> 1415g</p>
<p><strong>Airline limit:</strong> 7000g</p>
<p><strong>Maximum weight of items I can pack:</strong> 5585g</p>
<p><strong>Total possible worth of items:</strong> 820</p>
<p><strong>The challenge:</strong> Pack as many items as the limit allows while maximizing the total worth.</p>
<h2 id="data-structures">Data structures</h2>
<h3 id="reading-in-a-file">Reading in a file</h3>
<p>Before we can begin thinking about how to solve the knapsack problem, we have to solve the problem of reading in and storing our data. Thankfully, the Go standard library&rsquo;s <code>io/ioutil</code> package makes the first part straightforward.</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="kn">package</span> <span class="nx">main</span>
<span class="kn">import</span> <span class="p">(</span>
<span class="s">&#34;fmt&#34;</span>
<span class="s">&#34;io/ioutil&#34;</span>
<span class="p">)</span>
<span class="kd">func</span> <span class="nf">check</span><span class="p">(</span><span class="nx">e</span> <span class="kt">error</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="nx">e</span> <span class="o">!=</span> <span class="kc">nil</span> <span class="p">{</span>
<span class="nb">panic</span><span class="p">(</span><span class="nx">e</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="kd">func</span> <span class="nf">readItems</span><span class="p">(</span><span class="nx">path</span> <span class="kt">string</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">dat</span><span class="p">,</span> <span class="nx">err</span> <span class="o">:=</span> <span class="nx">ioutil</span><span class="p">.</span><span class="nf">ReadFile</span><span class="p">(</span><span class="nx">path</span><span class="p">)</span>
<span class="nf">check</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="nx">fmt</span><span class="p">.</span><span class="nf">Print</span><span class="p">(</span><span class="nb">string</span><span class="p">(</span><span class="nx">dat</span><span class="p">))</span>
<span class="p">}</span>
</code></pre></div><p>The <code>ReadFile()</code> function takes a file path and returns the file&rsquo;s contents and an error (<code>nil</code> if the call is successful) so we&rsquo;ve also created a <code>check()</code> function to handle any errors that might be returned. In a real-world application we probably would want to do something more sophisticated than <code>panic</code>, but that&rsquo;s not important right now.</p>
<h3 id="creating-a-struct">Creating a struct</h3>
<p>Now that we&rsquo;ve got our data, we should probably do something with it. Since we&rsquo;re working with real-life items and a real-life bag, let&rsquo;s create some types to represent them and make it easier to conceptualize our program. A <code>struct</code> in Go is a typed collection of fields. Here are our two types:</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="kd">type</span> <span class="nx">item</span> <span class="kd">struct</span> <span class="p">{</span>
<span class="nx">name</span> <span class="kt">string</span>
<span class="nx">worth</span><span class="p">,</span> <span class="nx">weight</span> <span class="kt">int</span>
<span class="p">}</span>
<span class="kd">type</span> <span class="nx">bag</span> <span class="kd">struct</span> <span class="p">{</span>
<span class="nx">bagWeight</span><span class="p">,</span> <span class="nx">currItemsWeight</span><span class="p">,</span> <span class="nx">maxItemsWeight</span><span class="p">,</span> <span class="nx">totalWeight</span> <span class="kt">int</span>
<span class="nx">items</span> <span class="p">[]</span><span class="nx">item</span>
<span class="p">}</span>
</code></pre></div><p>It is helpful to use field names that are very descriptive. You can see that the structs are set up just as we&rsquo;ve described the things they represent. An <code>item</code> has a <code>name</code> (string), and a <code>worth</code> and <code>weight</code> (integers). A <code>bag</code> has several fields of type <code>int</code> representing its attributes, and also has the ability to hold <code>items</code>, represented in the struct as a slice of <code>item</code> type thingamabobbers.</p>
<h3 id="parsing-and-storing-our-data">Parsing and storing our data</h3>
<p>Several comprehensive Go packages exist that we could use to parse our CSV data&hellip; but where&rsquo;s the fun in that? Let&rsquo;s go basic with some string splitting and a for loop. Here&rsquo;s our updated <code>readItems()</code> function:</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="kd">func</span> <span class="nf">readItems</span><span class="p">(</span><span class="nx">path</span> <span class="kt">string</span><span class="p">)</span> <span class="p">[]</span><span class="nx">item</span> <span class="p">{</span>
<span class="nx">dat</span><span class="p">,</span> <span class="nx">err</span> <span class="o">:=</span> <span class="nx">ioutil</span><span class="p">.</span><span class="nf">ReadFile</span><span class="p">(</span><span class="nx">path</span><span class="p">)</span>
<span class="nf">check</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="nx">lines</span> <span class="o">:=</span> <span class="nx">strings</span><span class="p">.</span><span class="nf">Split</span><span class="p">(</span><span class="nb">string</span><span class="p">(</span><span class="nx">dat</span><span class="p">),</span> <span class="s">&#34;\n&#34;</span><span class="p">)</span>
<span class="nx">itemList</span> <span class="o">:=</span> <span class="nb">make</span><span class="p">([]</span><span class="nx">item</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="k">for</span> <span class="nx">i</span><span class="p">,</span> <span class="nx">v</span> <span class="o">:=</span> <span class="k">range</span> <span class="nx">lines</span> <span class="p">{</span>
<span class="k">if</span> <span class="nx">i</span> <span class="o">==</span> <span class="mi">0</span> <span class="p">{</span>
<span class="k">continue</span>
<span class="p">}</span>
<span class="nx">s</span> <span class="o">:=</span> <span class="nx">strings</span><span class="p">.</span><span class="nf">Split</span><span class="p">(</span><span class="nx">v</span><span class="p">,</span> <span class="s">&#34;,&#34;</span><span class="p">)</span>
<span class="nx">newItemWorth</span><span class="p">,</span> <span class="nx">_</span> <span class="o">:=</span> <span class="nx">strconv</span><span class="p">.</span><span class="nf">Atoi</span><span class="p">(</span><span class="nx">s</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
<span class="nx">newItemWeight</span><span class="p">,</span> <span class="nx">_</span> <span class="o">:=</span> <span class="nx">strconv</span><span class="p">.</span><span class="nf">Atoi</span><span class="p">(</span><span class="nx">s</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span>
<span class="nx">newItem</span> <span class="o">:=</span> <span class="nx">item</span><span class="p">{</span><span class="nx">name</span><span class="p">:</span> <span class="nx">s</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="nx">worth</span><span class="p">:</span> <span class="nx">newItemWorth</span><span class="p">,</span> <span class="nx">weight</span><span class="p">:</span> <span class="nx">newItemWeight</span><span class="p">}</span>
<span class="nx">itemList</span> <span class="p">=</span> <span class="nb">append</span><span class="p">(</span><span class="nx">itemList</span><span class="p">,</span> <span class="nx">newItem</span><span class="p">)</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nx">itemList</span>
<span class="p">}</span>
</code></pre></div><p>Using <code>strings.Split</code>, we split our <code>dat</code> on newlines. We then create an empty <code>itemList</code> to hold our items.</p>
<p>In our for loop, we skip the first line of our CSV file (the headers) then iterate over each line. We use <code>strconv.Atoi</code> (read &ldquo;A to i&rdquo;) to convert the values for each item&rsquo;s worth and weight into integers. We then create a <code>newItem</code> with these field values and append it to the <code>itemList</code>. Finally, we return <code>itemList</code>.</p>
<p>Here&rsquo;s what our set up looks like so far:</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="kn">package</span> <span class="nx">main</span>
<span class="kn">import</span> <span class="p">(</span>
<span class="s">&#34;io/ioutil&#34;</span>
<span class="s">&#34;strconv&#34;</span>
<span class="s">&#34;strings&#34;</span>
<span class="p">)</span>
<span class="kd">type</span> <span class="nx">item</span> <span class="kd">struct</span> <span class="p">{</span>
<span class="nx">name</span> <span class="kt">string</span>
<span class="nx">worth</span><span class="p">,</span> <span class="nx">weight</span> <span class="kt">int</span>
<span class="p">}</span>
<span class="kd">type</span> <span class="nx">bag</span> <span class="kd">struct</span> <span class="p">{</span>
<span class="nx">bagWeight</span><span class="p">,</span> <span class="nx">currItemsWeight</span><span class="p">,</span> <span class="nx">maxItemsWeight</span><span class="p">,</span> <span class="nx">totalWeight</span><span class="p">,</span> <span class="nx">totalWorth</span> <span class="kt">int</span>
<span class="nx">items</span> <span class="p">[]</span><span class="nx">item</span>
<span class="p">}</span>
<span class="kd">func</span> <span class="nf">check</span><span class="p">(</span><span class="nx">e</span> <span class="kt">error</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="nx">e</span> <span class="o">!=</span> <span class="kc">nil</span> <span class="p">{</span>
<span class="nb">panic</span><span class="p">(</span><span class="nx">e</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="kd">func</span> <span class="nf">readItems</span><span class="p">(</span><span class="nx">path</span> <span class="kt">string</span><span class="p">)</span> <span class="p">[]</span><span class="nx">item</span> <span class="p">{</span>
<span class="nx">dat</span><span class="p">,</span> <span class="nx">err</span> <span class="o">:=</span> <span class="nx">ioutil</span><span class="p">.</span><span class="nf">ReadFile</span><span class="p">(</span><span class="nx">path</span><span class="p">)</span>
<span class="nf">check</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="nx">lines</span> <span class="o">:=</span> <span class="nx">strings</span><span class="p">.</span><span class="nf">Split</span><span class="p">(</span><span class="nb">string</span><span class="p">(</span><span class="nx">dat</span><span class="p">),</span> <span class="s">&#34;\n&#34;</span><span class="p">)</span>
<span class="nx">itemList</span> <span class="o">:=</span> <span class="nb">make</span><span class="p">([]</span><span class="nx">item</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="k">for</span> <span class="nx">i</span><span class="p">,</span> <span class="nx">v</span> <span class="o">:=</span> <span class="k">range</span> <span class="nx">lines</span> <span class="p">{</span>
<span class="k">if</span> <span class="nx">i</span> <span class="o">==</span> <span class="mi">0</span> <span class="p">{</span>
<span class="k">continue</span> <span class="c1">// skip the headers on the first line
</span><span class="c1"></span> <span class="p">}</span>
<span class="nx">s</span> <span class="o">:=</span> <span class="nx">strings</span><span class="p">.</span><span class="nf">Split</span><span class="p">(</span><span class="nx">v</span><span class="p">,</span> <span class="s">&#34;,&#34;</span><span class="p">)</span>
<span class="nx">newItemWorth</span><span class="p">,</span> <span class="nx">_</span> <span class="o">:=</span> <span class="nx">strconv</span><span class="p">.</span><span class="nf">Atoi</span><span class="p">(</span><span class="nx">s</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
<span class="nx">newItemWeight</span><span class="p">,</span> <span class="nx">_</span> <span class="o">:=</span> <span class="nx">strconv</span><span class="p">.</span><span class="nf">Atoi</span><span class="p">(</span><span class="nx">s</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span>
<span class="nx">newItem</span> <span class="o">:=</span> <span class="nx">item</span><span class="p">{</span><span class="nx">name</span><span class="p">:</span> <span class="nx">s</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="nx">worth</span><span class="p">:</span> <span class="nx">newItemWorth</span><span class="p">,</span> <span class="nx">weight</span><span class="p">:</span> <span class="nx">newItemWeight</span><span class="p">}</span>
<span class="nx">itemList</span> <span class="p">=</span> <span class="nb">append</span><span class="p">(</span><span class="nx">itemList</span><span class="p">,</span> <span class="nx">newItem</span><span class="p">)</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nx">itemList</span>
<span class="p">}</span>
</code></pre></div><p>Now that we&rsquo;ve got our data structures set up, let&rsquo;s get packing (🥁) on the first approach.</p>
<h2 id="greedy-algorithm">Greedy algorithm</h2>
<p>A greedy algorithm is the most straightforward approach to solving the knapsack problem, in that it is a one-pass algorithm that constructs a single final solution. At each stage of the problem, the greedy algorithm picks the option that is locally optimal, meaning it looks like the most suitable option right now. It does not revise its previous choices as it progresses through our data set.</p>
<h3 id="building-our-greedy-algorithm">Building our greedy algorithm</h3>
<p>The steps of the algorithm we&rsquo;ll use to solve our knapsack problem are:</p>
<ol>
<li>Sort items by worth, in descending order.</li>
<li>Start with the highest worth item. Put items into the bag until the next item on the list cannot fit.</li>
<li>Try to fill any remaining capacity with the next item on the list that can fit.</li>
</ol>
<p>If you read my <a href="https://victoria.dev/blog/how-to-code-a-satellite-algorithm-and-cook-paella-from-scratch/">article about solving problems and making paella</a>, you&rsquo;ll know that I always start by figuring out what the next most important question is. In this case, there are three main operations we need to figure out how to do:</p>
<ul>
<li>Sort items by worth.</li>
<li>Put an item in the bag.</li>
<li>Check to see if the bag is full.</li>
</ul>
<p>The first one is just a docs lookup away. Here&rsquo;s how we sort a slice in Go:</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="nx">sort</span><span class="p">.</span><span class="nf">Slice</span><span class="p">(</span><span class="nx">is</span><span class="p">,</span> <span class="kd">func</span><span class="p">(</span><span class="nx">i</span><span class="p">,</span> <span class="nx">j</span> <span class="kt">int</span><span class="p">)</span> <span class="kt">bool</span> <span class="p">{</span>
<span class="k">return</span> <span class="nx">is</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">worth</span> <span class="p">&gt;</span> <span class="nx">is</span><span class="p">[</span><span class="nx">j</span><span class="p">].</span><span class="nx">worth</span>
<span class="p">})</span>
</code></pre></div><p>The <code>sort.Slice()</code> function orders our items according to the less function we provide. In this case, it will order the highest worth items before the lowest worth items.</p>
<p>Given that we don&rsquo;t want to put an item in the bag if it doesn&rsquo;t fit, we&rsquo;ll complete the last two tasks in reverse. First, we&rsquo;ll check to see if the item fits. If so, it goes in the bag.</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="kd">func</span> <span class="p">(</span><span class="nx">b</span> <span class="o">*</span><span class="nx">bag</span><span class="p">)</span> <span class="nf">addItem</span><span class="p">(</span><span class="nx">i</span> <span class="nx">item</span><span class="p">)</span> <span class="kt">error</span> <span class="p">{</span>
<span class="k">if</span> <span class="nx">b</span><span class="p">.</span><span class="nx">currItemsWeight</span><span class="o">+</span><span class="nx">i</span><span class="p">.</span><span class="nx">weight</span> <span class="o">&lt;=</span> <span class="nx">b</span><span class="p">.</span><span class="nx">maxItemsWeight</span> <span class="p">{</span>
<span class="nx">b</span><span class="p">.</span><span class="nx">currItemsWeight</span> <span class="o">+=</span> <span class="nx">i</span><span class="p">.</span><span class="nx">weight</span>
<span class="nx">b</span><span class="p">.</span><span class="nx">items</span> <span class="p">=</span> <span class="nb">append</span><span class="p">(</span><span class="nx">b</span><span class="p">.</span><span class="nx">items</span><span class="p">,</span> <span class="nx">i</span><span class="p">)</span>
<span class="k">return</span> <span class="kc">nil</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nx">errors</span><span class="p">.</span><span class="nf">New</span><span class="p">(</span><span class="s">&#34;could not fit item&#34;</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div><p>Notice the <code>*</code> in our first line there. That indicates that <code>bag</code> is a pointer receiver (as opposed to a value receiver). It&rsquo;s a concept that can be slightly confusing if you&rsquo;re new to Go. Here are <a href="https://github.com/golang/go/wiki/CodeReviewComments#receiver-type">some things to consider</a> that might help you decide when to use a value receiver and when to use a pointer receiver. For the purposes of our <code>addItem()</code> function, this case applies:</p>
<blockquote>
<p>If the method needs to mutate the receiver, the receiver must be a pointer.</p>
</blockquote>
<p>Our use of a pointer receiver tells our function we want to operate on <em>this specific bag in particular</em>, not a new bag. It&rsquo;s important because without it, every item would always fit in a newly created bag! A little detail like this can make the difference between code that works and code that keeps you up until 4am chugging Red Bull and muttering to yourself. (Go to bed on time even if your code doesn&rsquo;t work - you&rsquo;ll thank me later.)</p>
<p>Now that we&rsquo;ve got our components, let&rsquo;s put together our greedy algorithm:</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="kd">func</span> <span class="nf">greedy</span><span class="p">(</span><span class="nx">is</span> <span class="p">[]</span><span class="nx">item</span><span class="p">,</span> <span class="nx">b</span> <span class="nx">bag</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">sort</span><span class="p">.</span><span class="nf">Slice</span><span class="p">(</span><span class="nx">is</span><span class="p">,</span> <span class="kd">func</span><span class="p">(</span><span class="nx">i</span><span class="p">,</span> <span class="nx">j</span> <span class="kt">int</span><span class="p">)</span> <span class="kt">bool</span> <span class="p">{</span>
<span class="k">return</span> <span class="nx">is</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">worth</span> <span class="p">&gt;</span> <span class="nx">is</span><span class="p">[</span><span class="nx">j</span><span class="p">].</span><span class="nx">worth</span>
<span class="p">})</span>
<span class="k">for</span> <span class="nx">i</span> <span class="o">:=</span> <span class="k">range</span> <span class="nx">is</span> <span class="p">{</span>
<span class="nx">b</span><span class="p">.</span><span class="nf">addItem</span><span class="p">(</span><span class="nx">is</span><span class="p">[</span><span class="nx">i</span><span class="p">])</span>
<span class="p">}</span>
<span class="nx">b</span><span class="p">.</span><span class="nx">totalWeight</span> <span class="p">=</span> <span class="nx">b</span><span class="p">.</span><span class="nx">bagWeight</span> <span class="o">+</span> <span class="nx">b</span><span class="p">.</span><span class="nx">currItemsWeight</span>
<span class="k">for</span> <span class="nx">_</span><span class="p">,</span> <span class="nx">v</span> <span class="o">:=</span> <span class="k">range</span> <span class="nx">b</span><span class="p">.</span><span class="nx">items</span> <span class="p">{</span>
<span class="nx">b</span><span class="p">.</span><span class="nx">totalWorth</span> <span class="o">+=</span> <span class="nx">v</span><span class="p">.</span><span class="nx">worth</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div><p>Then in our <code>main()</code> function, we&rsquo;ll create our bag, read in our data, and call our greedy algorithm. Here&rsquo;s what it looks like, all set up and ready to go:</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="kd">func</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="nx">minaal</span> <span class="o">:=</span> <span class="nx">bag</span><span class="p">{</span><span class="nx">bagWeight</span><span class="p">:</span> <span class="mi">1415</span><span class="p">,</span> <span class="nx">currItemsWeight</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="nx">maxItemsWeight</span><span class="p">:</span> <span class="mi">5585</span><span class="p">}</span>
<span class="nx">itemList</span> <span class="o">:=</span> <span class="nf">readItems</span><span class="p">(</span><span class="s">&#34;objects.csv&#34;</span><span class="p">)</span>
<span class="nf">greedy</span><span class="p">(</span><span class="nx">itemList</span><span class="p">,</span> <span class="nx">minaal</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div><h3 id="greedy-algorithm-results">Greedy algorithm results</h3>
<p>So how does this algorithm do when it comes to efficiently packing our bag to maximize its total worth? Here&rsquo;s the result:</p>
<p><strong>Total weight of bag and items:</strong> 6987g</p>
<p><strong>Total worth of packed items:</strong> 716</p>
<p>Here are the items our greedy algorithm chose, sorted by worth:</p>
<table>
<thead>
<tr>
<th>Item</th>
<th>Worth</th>
<th>Weight</th>
</tr>
</thead>
<tbody>
<tr>
<td>Lenovo X1 Carbon (5th Gen)</td>
<td>40</td>
<td>112</td>
</tr>
<tr>
<td>10 pairs thongs</td>
<td>39</td>
<td>80</td>
</tr>
<tr>
<td>5 Underarmour Strappy</td>
<td>38</td>
<td>305</td>
</tr>
<tr>
<td>1 pair Uniqlo leggings</td>
<td>37</td>
<td>185</td>
</tr>
<tr>
<td>2 Lululemon Cool Racerback</td>
<td>36</td>
<td>174</td>
</tr>
<tr>
<td>Chargers and cables in Mini Bomber Travel Kit</td>
<td>35</td>
<td>665</td>
</tr>
<tr>
<td>The Roost Stand</td>
<td>34</td>
<td>170</td>
</tr>
<tr>
<td>ThinkPad Compact Bluetooth Keyboard with trackpoint</td>
<td>33</td>
<td>460</td>
</tr>
<tr>
<td>Seagate Backup PlusSlim</td>
<td>32</td>
<td>159</td>
</tr>
<tr>
<td>1 pair black denim shorts</td>
<td>31</td>
<td>197</td>
</tr>
<tr>
<td>2 pairs Nike Pro shorts</td>
<td>30</td>
<td>112</td>
</tr>
<tr>
<td>2 pairs Lululemon shorts</td>
<td>29</td>
<td>184</td>
</tr>
<tr>
<td>Isabella T-Strap Croc sandals</td>
<td>28</td>
<td>200</td>
</tr>
<tr>
<td>2 Underarmour HeatGear CoolSwitch tank tops</td>
<td>27</td>
<td>138</td>
</tr>
<tr>
<td>5 pairs black socks</td>
<td>26</td>
<td>95</td>
</tr>
<tr>
<td>2 pairs Injinji Women&rsquo;s Run Lightweight No-Show Toe Socks</td>
<td>25</td>
<td>54</td>
</tr>
<tr>
<td>1 fancy tank top</td>
<td>24</td>
<td>71</td>
</tr>
<tr>
<td>1 light and stretchylong-sleeve shirt (Gap Fit)</td>
<td>23</td>
<td>147</td>
</tr>
<tr>
<td>Uniqlo Ultralight Down insulating jacket</td>
<td>22</td>
<td>235</td>
</tr>
<tr>
<td>Patagonia Torrentshell</td>
<td>21</td>
<td>301</td>
</tr>
<tr>
<td>Lightweight Merino Wool Buff</td>
<td>20</td>
<td>50</td>
</tr>
<tr>
<td>1 LBD (H&amp;M)</td>
<td>19</td>
<td>174</td>
</tr>
<tr>
<td>Field Notes Pitch Black Memo Book Dot-Graph</td>
<td>18</td>
<td>68</td>
</tr>
<tr>
<td>Innergie PocketCell USB-C 6000mAh power bank</td>
<td>17</td>
<td>14</td>
</tr>
<tr>
<td>JBL Reflect Mini Bluetooth Sport Headphones</td>
<td>13</td>
<td>14</td>
</tr>
<tr>
<td>Oakley Latch Sunglasses</td>
<td>11</td>
<td>30</td>
</tr>
<tr>
<td>Petzl E+LITE Emergency Headlamp</td>
<td>8</td>
<td>27</td>
</tr>
</tbody>
</table>
<p>It&rsquo;s clear that the greedy algorithm is a straightforward way to quickly find a feasible solution. For small data sets, it will probably be close to the optimal solution. The algorithm packed a total item worth of 716 (104 points less than the maximum possible value), while filling the bag with just 13g left over.</p>
<p>As we learned earlier, the greedy algorithm doesn&rsquo;t improve upon the solution it returns. It simply adds the next highest worth item it can to the bag.</p>
<p>Let&rsquo;s look at another method for solving the knapsack problem that will give us the optimal solution - the highest possible total worth under the weight limit.</p>
<h2 id="dynamic-programming">Dynamic programming</h2>
<p>The name &ldquo;dynamic programming&rdquo; can be a bit misleading. It&rsquo;s not a style of programming, as the name might cause you to infer, but simply another approach.</p>
<p>Dynamic programming differs from the straightforward greedy algorithm in a few key ways. Firstly, a dynamic programming bag packing solution enumerates the entire solution space with all possibilities of item combinations that could be used to pack our bag. Where a greedy algorithm chooses the most optimal <em>local</em> solution, dynamic programming algorithms are able to find the most optimal <em>global</em> solution.</p>
<p>Secondly, dynamic programming uses memoization to store the results of previously computed operations and returns the cached result when the operation occurs again. This allows it to &ldquo;remember&rdquo; previous combinations. This takes less time than it would to re-compute the answer again.</p>
<h3 id="building-our-dynamic-programming-algorithm">Building our dynamic programming algorithm</h3>
<p>To use dynamic programming to find the optimal recipe for packing our bag, we&rsquo;ll need to:</p>
<ol>
<li>Create a matrix representing all subsets of the items (the solution space) with rows representing items and columns representing the bag&rsquo;s remaining weight capacity</li>
<li>Loop through the matrix and calculate the worth that can be obtained by each combination of items at each stage of the bag&rsquo;s capacity</li>
<li>Examine the completed matrix to determine which items to add to the bag in order to produce the maximum possible worth for the bag in total</li>
</ol>
<p>It will be most helpful to visualize our solution space. Here&rsquo;s a representation of what we&rsquo;re building with our code:</p>
<figure>
<img src="knapsack-matrix.jpg"
alt="A sketch of the matrix with rows for items and columns for grams of weight."/> <figcaption>
<p>The empty knapsackian multiverse.</p>
</figcaption>
</figure>
<p>In Go, we can create this matrix as a slice of slices.</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="nx">matrix</span> <span class="o">:=</span> <span class="nb">make</span><span class="p">([][]</span><span class="kt">int</span><span class="p">,</span> <span class="nx">numItems</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span> <span class="c1">// rows representing items
</span><span class="c1"></span><span class="k">for</span> <span class="nx">i</span> <span class="o">:=</span> <span class="k">range</span> <span class="nx">matrix</span> <span class="p">{</span>
<span class="nx">matrix</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span> <span class="p">=</span> <span class="nb">make</span><span class="p">([]</span><span class="kt">int</span><span class="p">,</span> <span class="nx">capacity</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span> <span class="c1">// columns representing grams of weight
</span><span class="c1"></span><span class="p">}</span>
</code></pre></div><p>We&rsquo;ve padded the rows and columns by <code>1</code> so that the indicies match the item and weight numbers.</p>
<p>Now that we&rsquo;ve created our matrix, we&rsquo;ll fill it by looping over the rows and the columns:</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="c1">// loop through table rows
</span><span class="c1"></span><span class="k">for</span> <span class="nx">i</span> <span class="o">:=</span> <span class="mi">1</span><span class="p">;</span> <span class="nx">i</span> <span class="o">&lt;=</span> <span class="nx">numItems</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span> <span class="p">{</span>
<span class="c1">// loop through table columns
</span><span class="c1"></span> <span class="k">for</span> <span class="nx">w</span> <span class="o">:=</span> <span class="mi">1</span><span class="p">;</span> <span class="nx">w</span> <span class="o">&lt;=</span> <span class="nx">capacity</span><span class="p">;</span> <span class="nx">w</span><span class="o">++</span> <span class="p">{</span>
<span class="c1">// do stuff in each element
</span><span class="c1"></span> <span class="p">}</span>
<span class="p">}</span>
</code></pre></div><p>Then for each element, we&rsquo;ll calculate the worth value to ascribe to it. We do this with code that represents the following:</p>
<blockquote>
<p>If the item at the index matching the current row fits within the weight capacity represented by the current column, take the maximum of either:</p>
</blockquote>
<blockquote>
<ol>
<li>The total worth of the items already in the bag or,</li>
<li>The total worth of all the items in the bag except the item at the previous row index, plus the new item&rsquo;s worth</li>
</ol>
</blockquote>
<p>In other words, as our algorithm considers one of the items, we&rsquo;re asking it to decide whether this item added to the bag would produce a higher total worth than the last item it added to the bag, at the bag&rsquo;s current total weight. If this current item is a better choice, put it in - if not, leave it out.</p>
<p>Here&rsquo;s the code that accomplishes this:</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="c1">// if weight of item matching this index can fit at the current capacity column...
</span><span class="c1"></span><span class="k">if</span> <span class="nx">is</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">1</span><span class="p">].</span><span class="nx">weight</span> <span class="o">&lt;=</span> <span class="nx">w</span> <span class="p">{</span>
<span class="c1">// worth of this subset without this item
</span><span class="c1"></span> <span class="nx">valueOne</span> <span class="o">:=</span> <span class="nb">float64</span><span class="p">(</span><span class="nx">matrix</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">1</span><span class="p">][</span><span class="nx">w</span><span class="p">])</span>
<span class="c1">// worth of this subset without the previous item, and this item instead
</span><span class="c1"></span> <span class="nx">valueTwo</span> <span class="o">:=</span> <span class="nb">float64</span><span class="p">(</span><span class="nx">is</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">1</span><span class="p">].</span><span class="nx">worth</span> <span class="o">+</span> <span class="nx">matrix</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">1</span><span class="p">][</span><span class="nx">w</span><span class="o">-</span><span class="nx">is</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">1</span><span class="p">].</span><span class="nx">weight</span><span class="p">])</span>
<span class="c1">// take maximum of either valueOne or valueTwo
</span><span class="c1"></span> <span class="nx">matrix</span><span class="p">[</span><span class="nx">i</span><span class="p">][</span><span class="nx">w</span><span class="p">]</span> <span class="p">=</span> <span class="nb">int</span><span class="p">(</span><span class="nx">math</span><span class="p">.</span><span class="nf">Max</span><span class="p">(</span><span class="nx">valueOne</span><span class="p">,</span> <span class="nx">valueTwo</span><span class="p">))</span>
<span class="c1">// if the new worth is not more, carry over the previous worth
</span><span class="c1"></span><span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="nx">matrix</span><span class="p">[</span><span class="nx">i</span><span class="p">][</span><span class="nx">w</span><span class="p">]</span> <span class="p">=</span> <span class="nx">matrix</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">1</span><span class="p">][</span><span class="nx">w</span><span class="p">]</span>
<span class="p">}</span>
</code></pre></div><p>This process of comparing item combinations will continue until every item has been considered at every possible stage of the bag&rsquo;s increasing total weight. When all the above have been considered, we&rsquo;ll have enumerated the solution space - filled the matrix - with all possible total worth values.</p>
<p>We&rsquo;ll have a big chart of numbers, and in the last column at the last row we&rsquo;ll have our highest possible value.</p>
<figure>
<img src="knapsack-matrix-filled.jpg"
alt="A strictly representative representation of the filled matrix."/> <figcaption>
<p>A strictly representative representation of the filled matrix.</p>
</figcaption>
</figure>
<p>That&rsquo;s great, but how do we find out which combination of items were put in the bag to achieve that worth?</p>
<h3 id="getting-our-optimized-item-list">Getting our optimized item list</h3>
<p>To see which items combine to create our optimal packing list, we&rsquo;ll need to examine our matrix in reverse to the way we created it. Since we know the highest possible value is in the last row in the last column, we&rsquo;ll start there. To find the items, we:</p>
<ol>
<li>Get the value of the current cell</li>
<li>Compare the value of the current cell to the value in the cell directly above it</li>
<li>If the values differ, there was a change to the bag items; find the next cell to examine by moving backwards through the columns according to the current item&rsquo;s weight (find the value of the bag before this current item was added)</li>
<li>If the values match, there was no change to the bag items; move up to the cell in the row above and repeat</li>
</ol>
<p>The nature of the action we&rsquo;re trying to achieve lends itself well to a recursive function. If you recall from <a href="https://victoria.dev/blog/understanding-array.prototype.reduce-and-recursion-using-apple-pie/">my previous article about making apple pie</a>, recursive functions are simply functions that call themselves under certain conditions. Here&rsquo;s what it looks like:</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="kd">func</span> <span class="nf">checkItem</span><span class="p">(</span><span class="nx">b</span> <span class="o">*</span><span class="nx">bag</span><span class="p">,</span> <span class="nx">i</span> <span class="kt">int</span><span class="p">,</span> <span class="nx">w</span> <span class="kt">int</span><span class="p">,</span> <span class="nx">is</span> <span class="p">[]</span><span class="nx">item</span><span class="p">,</span> <span class="nx">matrix</span> <span class="p">[][]</span><span class="kt">int</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="nx">i</span> <span class="o">&lt;=</span> <span class="mi">0</span> <span class="o">||</span> <span class="nx">w</span> <span class="o">&lt;=</span> <span class="mi">0</span> <span class="p">{</span>
<span class="k">return</span>
<span class="p">}</span>
<span class="nx">pick</span> <span class="o">:=</span> <span class="nx">matrix</span><span class="p">[</span><span class="nx">i</span><span class="p">][</span><span class="nx">w</span><span class="p">]</span>
<span class="k">if</span> <span class="nx">pick</span> <span class="o">!=</span> <span class="nx">matrix</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">1</span><span class="p">][</span><span class="nx">w</span><span class="p">]</span> <span class="p">{</span>
<span class="nx">b</span><span class="p">.</span><span class="nf">addItem</span><span class="p">(</span><span class="nx">is</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
<span class="nf">checkItem</span><span class="p">(</span><span class="nx">b</span><span class="p">,</span> <span class="nx">i</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="nx">w</span><span class="o">-</span><span class="nx">is</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">1</span><span class="p">].</span><span class="nx">weight</span><span class="p">,</span> <span class="nx">is</span><span class="p">,</span> <span class="nx">matrix</span><span class="p">)</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="nf">checkItem</span><span class="p">(</span><span class="nx">b</span><span class="p">,</span> <span class="nx">i</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="nx">w</span><span class="p">,</span> <span class="nx">is</span><span class="p">,</span> <span class="nx">matrix</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div><p>Our <code>checkItem()</code> function calls itself if the condition we described in step 4 is true. If step 3 is true, it also calls itself, but with different arguments.</p>
<p>Recursive functions require a base case. In this example, we want the function to stop once we run out of values of worth to compare. Thus our base case is when either <code>i</code> or <code>w</code> are <code>0</code>.</p>
<p>Here&rsquo;s how the dynamic programming approach looks when it&rsquo;s all put together:</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="kd">func</span> <span class="nf">checkItem</span><span class="p">(</span><span class="nx">b</span> <span class="o">*</span><span class="nx">bag</span><span class="p">,</span> <span class="nx">i</span> <span class="kt">int</span><span class="p">,</span> <span class="nx">w</span> <span class="kt">int</span><span class="p">,</span> <span class="nx">is</span> <span class="p">[]</span><span class="nx">item</span><span class="p">,</span> <span class="nx">matrix</span> <span class="p">[][]</span><span class="kt">int</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="nx">i</span> <span class="o">&lt;=</span> <span class="mi">0</span> <span class="o">||</span> <span class="nx">w</span> <span class="o">&lt;=</span> <span class="mi">0</span> <span class="p">{</span>
<span class="k">return</span>
<span class="p">}</span>
<span class="nx">pick</span> <span class="o">:=</span> <span class="nx">matrix</span><span class="p">[</span><span class="nx">i</span><span class="p">][</span><span class="nx">w</span><span class="p">]</span>
<span class="k">if</span> <span class="nx">pick</span> <span class="o">!=</span> <span class="nx">matrix</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">1</span><span class="p">][</span><span class="nx">w</span><span class="p">]</span> <span class="p">{</span>
<span class="nx">b</span><span class="p">.</span><span class="nf">addItem</span><span class="p">(</span><span class="nx">is</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
<span class="nf">checkItem</span><span class="p">(</span><span class="nx">b</span><span class="p">,</span> <span class="nx">i</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="nx">w</span><span class="o">-</span><span class="nx">is</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">1</span><span class="p">].</span><span class="nx">weight</span><span class="p">,</span> <span class="nx">is</span><span class="p">,</span> <span class="nx">matrix</span><span class="p">)</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="nf">checkItem</span><span class="p">(</span><span class="nx">b</span><span class="p">,</span> <span class="nx">i</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="nx">w</span><span class="p">,</span> <span class="nx">is</span><span class="p">,</span> <span class="nx">matrix</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="kd">func</span> <span class="nf">dynamic</span><span class="p">(</span><span class="nx">is</span> <span class="p">[]</span><span class="nx">item</span><span class="p">,</span> <span class="nx">b</span> <span class="o">*</span><span class="nx">bag</span><span class="p">)</span> <span class="o">*</span><span class="nx">bag</span> <span class="p">{</span>
<span class="nx">numItems</span> <span class="o">:=</span> <span class="nb">len</span><span class="p">(</span><span class="nx">is</span><span class="p">)</span> <span class="c1">// number of items in knapsack
</span><span class="c1"></span> <span class="nx">capacity</span> <span class="o">:=</span> <span class="nx">b</span><span class="p">.</span><span class="nx">maxItemsWeight</span> <span class="c1">// capacity of knapsack
</span><span class="c1"></span>
<span class="c1">// create the empty matrix
</span><span class="c1"></span> <span class="nx">matrix</span> <span class="o">:=</span> <span class="nb">make</span><span class="p">([][]</span><span class="kt">int</span><span class="p">,</span> <span class="nx">numItems</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span> <span class="c1">// rows representing items
</span><span class="c1"></span> <span class="k">for</span> <span class="nx">i</span> <span class="o">:=</span> <span class="k">range</span> <span class="nx">matrix</span> <span class="p">{</span>
<span class="nx">matrix</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span> <span class="p">=</span> <span class="nb">make</span><span class="p">([]</span><span class="kt">int</span><span class="p">,</span> <span class="nx">capacity</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span> <span class="c1">// columns representing grams of weight
</span><span class="c1"></span> <span class="p">}</span>
<span class="c1">// loop through table rows
</span><span class="c1"></span> <span class="k">for</span> <span class="nx">i</span> <span class="o">:=</span> <span class="mi">1</span><span class="p">;</span> <span class="nx">i</span> <span class="o">&lt;=</span> <span class="nx">numItems</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span> <span class="p">{</span>
<span class="c1">// loop through table columns
</span><span class="c1"></span> <span class="k">for</span> <span class="nx">w</span> <span class="o">:=</span> <span class="mi">1</span><span class="p">;</span> <span class="nx">w</span> <span class="o">&lt;=</span> <span class="nx">capacity</span><span class="p">;</span> <span class="nx">w</span><span class="o">++</span> <span class="p">{</span>
<span class="c1">// if weight of item matching this index can fit at the current capacity column...
</span><span class="c1"></span> <span class="k">if</span> <span class="nx">is</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">1</span><span class="p">].</span><span class="nx">weight</span> <span class="o">&lt;=</span> <span class="nx">w</span> <span class="p">{</span>
<span class="c1">// worth of this subset without this item
</span><span class="c1"></span> <span class="nx">valueOne</span> <span class="o">:=</span> <span class="nb">float64</span><span class="p">(</span><span class="nx">matrix</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">1</span><span class="p">][</span><span class="nx">w</span><span class="p">])</span>
<span class="c1">// worth of this subset without the previous item, and this item instead
</span><span class="c1"></span> <span class="nx">valueTwo</span> <span class="o">:=</span> <span class="nb">float64</span><span class="p">(</span><span class="nx">is</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">1</span><span class="p">].</span><span class="nx">worth</span> <span class="o">+</span> <span class="nx">matrix</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">1</span><span class="p">][</span><span class="nx">w</span><span class="o">-</span><span class="nx">is</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">1</span><span class="p">].</span><span class="nx">weight</span><span class="p">])</span>
<span class="c1">// take maximum of either valueOne or valueTwo
</span><span class="c1"></span> <span class="nx">matrix</span><span class="p">[</span><span class="nx">i</span><span class="p">][</span><span class="nx">w</span><span class="p">]</span> <span class="p">=</span> <span class="nb">int</span><span class="p">(</span><span class="nx">math</span><span class="p">.</span><span class="nf">Max</span><span class="p">(</span><span class="nx">valueOne</span><span class="p">,</span> <span class="nx">valueTwo</span><span class="p">))</span>
<span class="c1">// if the new worth is not more, carry over the previous worth
</span><span class="c1"></span> <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="nx">matrix</span><span class="p">[</span><span class="nx">i</span><span class="p">][</span><span class="nx">w</span><span class="p">]</span> <span class="p">=</span> <span class="nx">matrix</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">1</span><span class="p">][</span><span class="nx">w</span><span class="p">]</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="nf">checkItem</span><span class="p">(</span><span class="nx">b</span><span class="p">,</span> <span class="nx">numItems</span><span class="p">,</span> <span class="nx">capacity</span><span class="p">,</span> <span class="nx">is</span><span class="p">,</span> <span class="nx">matrix</span><span class="p">)</span>
<span class="c1">// add other statistics to the bag
</span><span class="c1"></span> <span class="nx">b</span><span class="p">.</span><span class="nx">totalWorth</span> <span class="p">=</span> <span class="nx">matrix</span><span class="p">[</span><span class="nx">numItems</span><span class="p">][</span><span class="nx">capacity</span><span class="p">]</span>
<span class="nx">b</span><span class="p">.</span><span class="nx">totalWeight</span> <span class="p">=</span> <span class="nx">b</span><span class="p">.</span><span class="nx">bagWeight</span> <span class="o">+</span> <span class="nx">b</span><span class="p">.</span><span class="nx">currItemsWeight</span>
<span class="k">return</span> <span class="nx">b</span>
<span class="p">}</span>
</code></pre></div><h3 id="dynamic-programming-results">Dynamic programming results</h3>
<p>We expect that the dynamic programming approach will give us a more optimized solution than the greedy algorithm. So did it? Here are the results:</p>
<p><strong>Total weight of bag and items:</strong> 6982g</p>
<p><strong>Total worth of packed items:</strong> 757</p>
<p>Here are the items our dynamic programming algorithm chose, sorted by worth:</p>
<table>
<thead>
<tr>
<th>Item</th>
<th>Worth</th>
<th>Weight</th>
</tr>
</thead>
<tbody>
<tr>
<td>10 pairs thongs</td>
<td>39</td>
<td>80</td>
</tr>
<tr>
<td>5 Underarmour Strappy</td>
<td>38</td>
<td>305</td>
</tr>
<tr>
<td>1 pair Uniqlo leggings</td>
<td>37</td>
<td>185</td>
</tr>
<tr>
<td>2 Lululemon Cool Racerback</td>
<td>36</td>
<td>174</td>
</tr>
<tr>
<td>Chargers and cables in Mini Bomber Travel Kit</td>
<td>35</td>
<td>665</td>
</tr>
<tr>
<td>The Roost Stand</td>
<td>34</td>
<td>170</td>
</tr>
<tr>
<td>ThinkPad Compact Bluetooth Keyboard with trackpoint</td>
<td>33</td>
<td>460</td>
</tr>
<tr>
<td>Seagate Backup Plus Slim</td>
<td>32</td>
<td>159</td>
</tr>
<tr>
<td>1 pair black denim shorts</td>
<td>31</td>
<td>197</td>
</tr>
<tr>
<td>2 pairs Nike Pro shorts</td>
<td>30</td>
<td>112</td>
</tr>
<tr>
<td>2 pairs Lululemon shorts</td>
<td>29</td>
<td>184</td>
</tr>
<tr>
<td>Isabella T-Strap Croc sandals</td>
<td>28</td>
<td>200</td>
</tr>
<tr>
<td>2 Underarmour HeatGear CoolSwitch tank tops</td>
<td>27</td>
<td>138</td>
</tr>
<tr>
<td>5 pairs black socks</td>
<td>26</td>
<td>95</td>
</tr>
<tr>
<td>2 pairs Injinji Women&rsquo;s Run Lightweight No-Show Toe Socks</td>
<td>25</td>
<td>54</td>
</tr>
<tr>
<td>1 fancy tank top</td>
<td>24</td>
<td>71</td>
</tr>
<tr>
<td>1 light and stretchy long-sleeve shirt (Gap Fit)</td>
<td>23</td>
<td>147</td>
</tr>
<tr>
<td>Uniqlo Ultralight Down insulating jacket</td>
<td>22</td>
<td>235</td>
</tr>
<tr>
<td>Patagonia Torrentshell</td>
<td>21</td>
<td>301</td>
</tr>
<tr>
<td>Lightweight Merino Wool Buff</td>
<td>20</td>
<td>50</td>
</tr>
<tr>
<td>1 LBD (H&amp;M)</td>
<td>19</td>
<td>174</td>
</tr>
<tr>
<td>Field Notes Pitch Black Memo Book Dot-Graph</td>
<td>18</td>
<td>68</td>
</tr>
<tr>
<td>Innergie PocketCell USB-C 6000mAh power bank</td>
<td>17</td>
<td>148</td>
</tr>
<tr>
<td>Important papers</td>
<td>16</td>
<td>228</td>
</tr>
<tr>
<td>Deuter First Aid Kit Active</td>
<td>15</td>
<td>144</td>
</tr>
<tr>
<td>Stanley Classic Vacuum Camp Mug 16oz</td>
<td>14</td>
<td>454</td>
</tr>
<tr>
<td>JBL Reflect Mini Bluetooth Sport Headphones</td>
<td>13</td>
<td>14</td>
</tr>
<tr>
<td>Anker SoundCore nano Bluetooth Speaker</td>
<td>12</td>
<td>80</td>
</tr>
<tr>
<td>Oakley Latch Sunglasses</td>
<td>11</td>
<td>30</td>
</tr>
<tr>
<td>Ray Ban Wayfarer Classic</td>
<td>10</td>
<td>45</td>
</tr>
<tr>
<td>Petzl E+LITE Emergency Headlamp</td>
<td>8</td>
<td>27</td>
</tr>
<tr>
<td>Peak Design Cuff Camera Wrist Strap</td>
<td>6</td>
<td>26</td>
</tr>
<tr>
<td>Travelon Micro Scale</td>
<td>5</td>
<td>125</td>
</tr>
<tr>
<td>Humangear GoBites Duo</td>
<td>3</td>
<td>22</td>
</tr>
</tbody>
</table>
<p>There&rsquo;s an obvious improvement to our dynamic programming solution over what the greedy algorithm gave us. Our total worth of 757 is 41 points greater than the greedy algorithm&rsquo;s solution of 716, and for a few grams less weight too!</p>
<h3 id="input-sort-order">Input sort order</h3>
<p>While testing my dynamic programming solution, I implemented the <a href="https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle">Fisher-Yates shuffle algorithm</a> on the input before passing it into my function, just to ensure that the answer wasn&rsquo;t somehow dependent on the sort order of the input. Here&rsquo;s what the shuffle looks like in Go:</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="nx">rand</span><span class="p">.</span><span class="nf">Seed</span><span class="p">(</span><span class="nx">time</span><span class="p">.</span><span class="nf">Now</span><span class="p">().</span><span class="nf">UnixNano</span><span class="p">())</span>
<span class="k">for</span> <span class="nx">i</span> <span class="o">:=</span> <span class="k">range</span> <span class="nx">itemList</span> <span class="p">{</span>
<span class="nx">j</span> <span class="o">:=</span> <span class="nx">rand</span><span class="p">.</span><span class="nf">Intn</span><span class="p">(</span><span class="nx">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span>
<span class="nx">itemList</span><span class="p">[</span><span class="nx">i</span><span class="p">],</span> <span class="nx">itemList</span><span class="p">[</span><span class="nx">j</span><span class="p">]</span> <span class="p">=</span> <span class="nx">itemList</span><span class="p">[</span><span class="nx">j</span><span class="p">],</span> <span class="nx">itemList</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span>
<span class="p">}</span>
</code></pre></div><p>Of course I then realized that Go 1.10 now has a built-in shuffle&hellip; it works precisely the same way and looks like this:</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="nx">rand</span><span class="p">.</span><span class="nf">Shuffle</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">itemList</span><span class="p">),</span> <span class="kd">func</span><span class="p">(</span><span class="nx">i</span><span class="p">,</span> <span class="nx">j</span> <span class="kt">int</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">itemList</span><span class="p">[</span><span class="nx">i</span><span class="p">],</span> <span class="nx">itemList</span><span class="p">[</span><span class="nx">j</span><span class="p">]</span> <span class="p">=</span> <span class="nx">itemList</span><span class="p">[</span><span class="nx">j</span><span class="p">],</span> <span class="nx">itemList</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span>
<span class="p">})</span>
</code></pre></div><p>So did the order in which the items were processed affect the outcome? Well&hellip;</p>
<h4 id="suddenly-a-rogue-weight-appears">Suddenly&hellip; a rogue weight appears!</h4>
<p>As it turns out, in a way, the answer did depend on the order of the input. When I ran my dynamic programming algorithm several times, I sometimes saw a different total weight for the bag, though the total worth remained at 757. I initially thought this was a bug before examining the two sets of items that accompanied the two different total weight values. Everything was the same except for a few changes that collectively added up to a different item subset accounting for 14 of the 757 worth points.</p>
<p>In this case, there were two equally optimal solutions based only on the success metric of the highest total possible worth. Shuffling the input seemed to affect the placement of the items in the matrix and thus, the path that the <code>checkItem()</code> function took as it went through the matrix to find the chosen items. Since the success metric of having the highest possible worth was the same in both item sets, we don&rsquo;t have a single unique solution - there&rsquo;s two!</p>
<p>As an academic exercise, both these sets of items are correct answers. We may choose to optimize further by another metric, say, the total weight of all the items. The highest possible worth at the least possible weight could be seen as an ideal solution.</p>
<p>Here&rsquo;s the second, lighter, dynamic programming result:</p>
<p><strong>Total weight of bag and items:</strong> 6955g</p>
<p><strong>Total worth of packed items:</strong> 757</p>
<table>
<thead>
<tr>
<th>Item</th>
<th>Worth</th>
<th>Weight</th>
</tr>
</thead>
<tbody>
<tr>
<td>10 pairs thongs</td>
<td>39</td>
<td>80</td>
</tr>
<tr>
<td>5 Underarmour Strappy</td>
<td>38</td>
<td>305</td>
</tr>
<tr>
<td>1 pair Uniqlo leggings</td>
<td>37</td>
<td>185</td>
</tr>
<tr>
<td>2 Lululemon Cool Racerback</td>
<td>36</td>
<td>174</td>
</tr>
<tr>
<td>Chargers and cables in Mini Bomber Travel Kit</td>
<td>35</td>
<td>665</td>
</tr>
<tr>
<td>The Roost Stand</td>
<td>34</td>
<td>170</td>
</tr>
<tr>
<td>ThinkPad Compact Bluetooth Keyboard with trackpoint</td>
<td>33</td>
<td>460</td>
</tr>
<tr>
<td>Seagate Backup Plus Slim</td>
<td>32</td>
<td>159</td>
</tr>
<tr>
<td>1 pair black denim shorts</td>
<td>31</td>
<td>197</td>
</tr>
<tr>
<td>2 pairs Nike Pro shorts</td>
<td>30</td>
<td>112</td>
</tr>
<tr>
<td>2 pairs Lululemon shorts</td>
<td>29</td>
<td>184</td>
</tr>
<tr>
<td>Isabella T-Strap Croc sandals</td>
<td>28</td>
<td>200</td>
</tr>
<tr>
<td>2 Underarmour HeatGear CoolSwitch tank tops</td>
<td>27</td>
<td>138</td>
</tr>
<tr>
<td>5 pairs black socks</td>
<td>26</td>
<td>95</td>
</tr>
<tr>
<td>2 pairs Injinji Women&rsquo;s Run Lightweight No-Show Toe Socks</td>
<td>25</td>
<td>54</td>
</tr>
<tr>
<td>1 fancy tank top</td>
<td>24</td>
<td>71</td>
</tr>
<tr>
<td>1 light and stretchy long-sleeve shirt (Gap Fit)</td>
<td>23</td>
<td>147</td>
</tr>
<tr>
<td>Uniqlo Ultralight Down insulating jacket</td>
<td>22</td>
<td>235</td>
</tr>
<tr>
<td>Patagonia Torrentshell</td>
<td>21</td>
<td>301</td>
</tr>
<tr>
<td>Lightweight Merino Wool Buff</td>
<td>20</td>
<td>50</td>
</tr>
<tr>
<td>1 LBD (H&amp;M)</td>
<td>19</td>
<td>174</td>
</tr>
<tr>
<td>Field Notes Pitch Black Memo Book Dot-Graph</td>
<td>18</td>
<td>68</td>
</tr>
<tr>
<td>Innergie PocketCell USB-C 6000mAh power bank</td>
<td>17</td>
<td>148</td>
</tr>
<tr>
<td>Important papers</td>
<td>16</td>
<td>228</td>
</tr>
<tr>
<td>Deuter First Aid Kit Active</td>
<td>15</td>
<td>144</td>
</tr>
<tr>
<td>JBL Reflect Mini Bluetooth Sport Headphones</td>
<td>13</td>
<td>14</td>
</tr>
<tr>
<td>Anker SoundCore nano Bluetooth Speaker</td>
<td>12</td>
<td>80</td>
</tr>
<tr>
<td>Oakley Latch Sunglasses</td>
<td>11</td>
<td>30</td>
</tr>
<tr>
<td>Ray Ban Wayfarer Classic</td>
<td>10</td>
<td>45</td>
</tr>
<tr>
<td>Zip bag of toiletries</td>
<td>9</td>
<td>236</td>
</tr>
<tr>
<td>Petzl E+LITE Emergency Headlamp</td>
<td>8</td>
<td>27</td>
</tr>
<tr>
<td>Peak Design Cuff Camera Wrist Strap</td>
<td>6</td>
<td>26</td>
</tr>
<tr>
<td>Travelon Micro Scale</td>
<td>5</td>
<td>125</td>
</tr>
<tr>
<td>BlitzWolf Bluetooth Tripod/Monopod</td>
<td>4</td>
<td>150</td>
</tr>
<tr>
<td>Humangear GoBites Duo</td>
<td>3</td>
<td>22</td>
</tr>
<tr>
<td>Vapur Bottle 1L</td>
<td>1</td>
<td>41</td>
</tr>
</tbody>
</table>
<h2 id="which-approach-is-better">Which approach is better?</h2>
<h3 id="go-benchmarking">Go benchmarking</h3>
<p>The Go standard library&rsquo;s <code>testing</code> package makes it straightforward for us to <a href="https://golang.org/pkg/testing/#hdr-Benchmarks">benchmark</a> these two approaches. We can find out how long it takes each algorithm to run, and how much memory each uses. Here&rsquo;s a simple <code>main_test.go</code> file:</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="kn">package</span> <span class="nx">main</span>
<span class="kn">import</span> <span class="p">(</span>
<span class="s">&#34;testing&#34;</span>
<span class="p">)</span>
<span class="kd">func</span> <span class="nf">Benchmark_greedy</span><span class="p">(</span><span class="nx">b</span> <span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">B</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">itemList</span> <span class="o">:=</span> <span class="nf">readItems</span><span class="p">(</span><span class="s">&#34;objects.csv&#34;</span><span class="p">)</span>
<span class="k">for</span> <span class="nx">i</span> <span class="o">:=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="p">&lt;</span> <span class="nx">b</span><span class="p">.</span><span class="nx">N</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span> <span class="p">{</span>
<span class="nx">minaal</span> <span class="o">:=</span> <span class="nx">bag</span><span class="p">{</span><span class="nx">bagWeight</span><span class="p">:</span> <span class="mi">1415</span><span class="p">,</span> <span class="nx">currItemsWeight</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="nx">maxItemsWeight</span><span class="p">:</span> <span class="mi">5585</span><span class="p">}</span>
<span class="nf">greedy</span><span class="p">(</span><span class="nx">itemList</span><span class="p">,</span> <span class="nx">minaal</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="kd">func</span> <span class="nf">Benchmark_dynamic</span><span class="p">(</span><span class="nx">b</span> <span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">B</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">itemList</span> <span class="o">:=</span> <span class="nf">readItems</span><span class="p">(</span><span class="s">&#34;objects.csv&#34;</span><span class="p">)</span>
<span class="k">for</span> <span class="nx">i</span> <span class="o">:=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="p">&lt;</span> <span class="nx">b</span><span class="p">.</span><span class="nx">N</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span> <span class="p">{</span>
<span class="nx">minaal</span> <span class="o">:=</span> <span class="nx">bag</span><span class="p">{</span><span class="nx">bagWeight</span><span class="p">:</span> <span class="mi">1415</span><span class="p">,</span> <span class="nx">currItemsWeight</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="nx">maxItemsWeight</span><span class="p">:</span> <span class="mi">5585</span><span class="p">}</span>
<span class="nf">dynamic</span><span class="p">(</span><span class="nx">itemList</span><span class="p">,</span> <span class="o">&amp;</span><span class="nx">minaal</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div><p>We can run <code>go test -bench=. -benchmem</code> to see these results:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">Benchmark_greedy-4 <span class="m">1000000</span> <span class="m">1619</span> ns/op <span class="m">2128</span> B/op <span class="m">9</span> allocs/op
Benchmark_dynamic-4 <span class="m">1000</span> <span class="m">1545322</span> ns/op <span class="m">2020332</span> B/op <span class="m">49</span> allocs/op
</code></pre></div><h4 id="greedy-algorithm-performance">Greedy algorithm performance</h4>
<p>After running the greedy algorithm 1,000,000 times, the speed of the algorithm was reliably measured to be 0.001619 milliseconds (translation: very fast). It required 2128 Bytes or 2-ish kilobytes of memory and 9 distinct memory allocations per iteration.</p>
<h4 id="dynamic-programming-performance">Dynamic programming performance</h4>
<p>The dynamic programming algorithm was run 1,000 times. Its speed was measured to be 1.545322 milliseconds or 0.001545322 seconds (translation: still pretty fast). It required 2,020,332 Bytes or 2-ish Megabytes, and 49 distinct memory allocations per iteration.</p>
<h3 id="the-verdict">The verdict</h3>
<p>Part of choosing the right approach to solving any programming problem is taking into account the size of the input data set. In this case, it&rsquo;s a small one. In this scenario, a one-pass greedy algorithm will always be faster and less resource-needy than dynamic programming, simply because it has fewer steps. Our greedy algorithm was almost two orders of magnitude faster and less memory-hungry than our dynamic programming algorithm.</p>
<p>Not having those extra steps, however, means that getting the best possible solution from the greedy algorithm is unlikely.</p>
<p>It&rsquo;s clear that the dynamic programming algorithm gave us better numbers: a lower weight, and higher overall worth.</p>
<table>
<thead>
<tr>
<th></th>
<th>Greedy algorithm</th>
<th>Dynamic programming</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Total weight:</strong></td>
<td>6987g</td>
<td>6955g</td>
</tr>
<tr>
<td><strong>Total worth:</strong></td>
<td>716</td>
<td>757</td>
</tr>
</tbody>
</table>
<p>Where dynamic programming on small data sets lacks in performance, it makes up in optimization. The question then becomes whether that additional optimization is worth the performance cost.</p>
<p>&ldquo;Better,&rdquo; of course, is a subjective judgement. If speed and low resource usage is our success metric, then the greedy algorithm is clearly better. If the total worth of items in the bag is our success metric, then dynamic programming is clearly better. However, our scenario is a practical one, and only one of these algorithm designs returned an answer I&rsquo;d choose. In optimizing for the overall greatest possible total worth of the items in the bag, the dynamic programming algorithm left out my highest-worth, but also heaviest, item: my laptop. The chargers and cables, Roost stand, and keyboard that were included aren&rsquo;t much use without it.</p>
<h3 id="better-algorithm-design">Better algorithm design</h3>
<p>There&rsquo;s a simple way to alter the dynamic programming approach so that the laptop is always included: we can modify the data so that the worth of the laptop is greater than the sum of the worth of all the other items. (Try it out!)</p>
<p>Perhaps in re-designing the dynamic programming algorithm to be more practical, we might choose another success metric that better reflects an item&rsquo;s importance, instead of a subjective worth value. There are many possible metrics we can use to represent the value of an item. Here are a few examples of a good proxy:</p>
<ul>
<li>Amount of time spent using the item</li>
<li>Initial cost of purchasing the item</li>
<li>Cost of replacement if the item were lost today</li>
<li>Dollar value of the product of using the item</li>
</ul>
<p>By the same token, the greedy algorithm&rsquo;s results might be improved with the use of one of these alternate metrics.</p>
<p>On top of choosing an appropriate approach to solving the knapsack problem in general, it is helpful to design our algorithm in a way that translates the practicalities of a scenario into code.</p>
<p>There are many considerations for better algorithm design beyond the scope of this introductory post. One of these is <strong>time complexity</strong>, and I&rsquo;ve <a href="https://victoria.dev/blog/a-coffee-break-introduction-to-time-complexity-of-algorithms/">written about it here</a>. A future algorithm may very well decide my bag&rsquo;s contents on the next trip, but we&rsquo;re not quite there yet. Stay tuned!</p>
Why I'm automatically deleting my old tweets using AWS Lambdahttps://victoria.dev/blog/why-im-automatically-deleting-my-old-tweets-using-aws-lambda/
Thu, 12 Apr 2018 10:51:15 -0500hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/why-im-automatically-deleting-my-old-tweets-using-aws-lambda/From now on, my tweets are ephemeral. Here’s why I’m deleting all my old tweets, and the AWS Lambda function I’m using to do all this for free.
<p>From now on, my tweets are ephemeral. Here’s why I’m deleting all my old tweets, and the AWS Lambda function I’m using to do all this for free.</p>
<p><a href="https://victoria.dev/blog/why-im-automatically-deleting-my-old-tweets-using-aws-lambda/#ephemeral-tweets"><em>Click here to skip to the code part.</em></a></p>
<h1 id="stuff-and-opinions">Stuff and opinions</h1>
<p>I&rsquo;ve only been a one-bag nomad for a little over a year and a half. Before that, I lived as most people do in an apartment or a house. I owned furniture, more clothing than I strictly needed, and enough &ldquo;stuff&rdquo; to fill at least a few moving boxes. If I went to live somewhere else, moving for school or family or work, I packed up all my things and brought them with me. Over the years, I accumulated more and more stuff.</p>
<p>Adopting what many would call a minimalist lifestyle has rapidly changed a lot of my longstanding views. Giving away all my stuff (an idea I once thought to be interesting in principle but practically a little bit ridiculous) has become normal. It&rsquo;s normal for me, now, to not own things that I don&rsquo;t use on a regular basis. I don&rsquo;t keep wall shelves packed with old books or dishes or clothing or childhood toys because those items aren&rsquo;t relevant to me anymore. I just keep fond memories, instead.</p>
<p>Imagine, for a moment, that I still lived in a house. Imagine that in that house, on the fridge, is a drawing I made when I was six-years-old. In the bottom right corner of that drawing scribbled in green crayon are the words &ldquo;broccoli is dumb - Victoria, Age 6.&rdquo;</p>
<p>If you were in my house and saw that drawing on the fridge, would you assume that the statement &ldquo;broccoli is dumb&rdquo; comprised an accurate and current account of my opinions on broccoli? Of course not. I was six when I wrote that. I&rsquo;ve had plenty of time to change my mind.</p>
<h1 id="social-media-isnt-social">Social media isn&rsquo;t social</h1>
<p>I have a friend whom I&rsquo;ve known since we were both in kindergarten. We went through grade school together, then spoke to and saw each other on infrequent occasions across the years. We&rsquo;re both adults now. Sometimes when we chat, we&rsquo;ll recall some amusing memory from when we were younger. The nature of memory being what it is, I have no illusion that what we recall is recounted with much accuracy. Our impressions of things that happened - mistakes we made and moments of victory alike - are coloured by the experiences we&rsquo;ve had since then, and all the things we&rsquo;ve learned. An awkward moment at a school colleague&rsquo;s birthday party becomes an example of a child learning to socialize, instead of the world-ending moment of embarrassment it probably felt like at the time.</p>
<p>This is how memory works. In a sense, it gets updated, as well it should. People living in small communities remember things that their neighbour did many years ago, but recall them in the context of who their neighbour is now, and what their current relationship is like. This re-colouring of history is an important part of how people <a href="https://www.smithsonianmag.com/science-nature/how-our-brains-make-memories-14466850/">heal</a>, <a href="http://news.feinberg.northwestern.edu/2014/02/memory_rewrite/">make good decisions</a>, and <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3709095/">socialize</a>.</p>
<p>Social media does not do this. Your perfectly preserved tweet from five days or five years ago can be recalled with absolute accuracy. For most people, this is not particularly worrying. We tend to tweet about pretty mundane things - things that pop into mind when we&rsquo;re bored and want someone to notice us. Individually, usually, our old tweets are pretty insignificant. In aggregate, however, they paint a pretty complete picture of a person&rsquo;s random, unintentionally telling thoughts. This is the problem.</p>
<p>The assumption made of things written in social media and on Twitter specifically is a very different assumption than you might make about someone&rsquo;s notepad scribble from last week. I&rsquo;m not endeavoring to speculate why - I&rsquo;ve just seen enough cases of someone getting publicly flogged for something they posted years ago to know that it does happen. This is weird. If you wouldn&rsquo;t assume that a notepad scribble from last week or a crayon drawing from decades ago reflects the essence of who someone is <em>now,</em> why would you assume that an old tweet does?</p>
<p>You are not the same person you were last month - you&rsquo;ve seen things, read things, understood and learned things that have, in some small way, changed you. While a person may have the same sense of self and identity through most of their life, even this grows and changes over the years. We change our opinions, our desires, our habits. We are not stagnant beings, and we should not let ourselves be represented as such, however unintentionally.</p>
<h1 id="ephemeral-tweets">Ephemeral tweets</h1>
<p>If you look at my Twitter profile page today, you&rsquo;ll see fewer tweets there than you have fingers (I hope). I&rsquo;m using <a href="https://github.com/victoriadrake/ephemeral">ephemeral</a> - a lightweight utility I wrote for use on <a href="https://aws.amazon.com/lambda/">AWS Lambda</a> - to delete all my tweets older than a few days. I&rsquo;m doing this for the same reason that I don&rsquo;t hang on to stuff that I no longer use - that stuff isn&rsquo;t relevant to me anymore. It doesn&rsquo;t represent me, either.</p>
<p>The code that makes up ephemeral is written in Go. AWS Lambda creates an environment for each Lambda function, so ephemeral utilizes environment variables for your private Twitter API keys and the maximum age of the tweets you want to keep, represented in hours, like <code>72h</code>.</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="kd">var</span> <span class="p">(</span>
<span class="nx">consumerKey</span> <span class="p">=</span> <span class="nf">getenv</span><span class="p">(</span><span class="s">&#34;TWITTER_CONSUMER_KEY&#34;</span><span class="p">)</span>
<span class="nx">consumerSecret</span> <span class="p">=</span> <span class="nf">getenv</span><span class="p">(</span><span class="s">&#34;TWITTER_CONSUMER_SECRET&#34;</span><span class="p">)</span>
<span class="nx">accessToken</span> <span class="p">=</span> <span class="nf">getenv</span><span class="p">(</span><span class="s">&#34;TWITTER_ACCESS_TOKEN&#34;</span><span class="p">)</span>
<span class="nx">accessTokenSecret</span> <span class="p">=</span> <span class="nf">getenv</span><span class="p">(</span><span class="s">&#34;TWITTER_ACCESS_TOKEN_SECRET&#34;</span><span class="p">)</span>
<span class="nx">maxTweetAge</span> <span class="p">=</span> <span class="nf">getenv</span><span class="p">(</span><span class="s">&#34;MAX_TWEET_AGE&#34;</span><span class="p">)</span>
<span class="nx">logger</span> <span class="p">=</span> <span class="nx">log</span><span class="p">.</span><span class="nf">New</span><span class="p">()</span>
<span class="p">)</span>
<span class="kd">func</span> <span class="nf">getenv</span><span class="p">(</span><span class="nx">name</span> <span class="kt">string</span><span class="p">)</span> <span class="kt">string</span> <span class="p">{</span>
<span class="nx">v</span> <span class="o">:=</span> <span class="nx">os</span><span class="p">.</span><span class="nf">Getenv</span><span class="p">(</span><span class="nx">name</span><span class="p">)</span>
<span class="k">if</span> <span class="nx">v</span> <span class="o">==</span> <span class="s">&#34;&#34;</span> <span class="p">{</span>
<span class="nb">panic</span><span class="p">(</span><span class="s">&#34;missing required environment variable &#34;</span> <span class="o">+</span> <span class="nx">name</span><span class="p">)</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nx">v</span>
<span class="p">}</span>
</code></pre></div><p>The program uses the <a href="https://github.com/ChimeraCoder/anaconda">anaconda</a> library. It fetches your timeline up to the Twitter API&rsquo;s limit of 200 tweets per request, then compares each tweet&rsquo;s date of creation to your <code>MAX_TWEET_AGE</code> variable to decide whether it&rsquo;s old enough to be deleted. After deleting all the expired tweets, the Lambda function terminates.</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="kd">func</span> <span class="nf">deleteFromTimeline</span><span class="p">(</span><span class="nx">api</span> <span class="o">*</span><span class="nx">anaconda</span><span class="p">.</span><span class="nx">TwitterApi</span><span class="p">,</span> <span class="nx">ageLimit</span> <span class="nx">time</span><span class="p">.</span><span class="nx">Duration</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">timeline</span><span class="p">,</span> <span class="nx">err</span> <span class="o">:=</span> <span class="nf">getTimeline</span><span class="p">(</span><span class="nx">api</span><span class="p">)</span>
<span class="k">if</span> <span class="nx">err</span> <span class="o">!=</span> <span class="kc">nil</span> <span class="p">{</span>
<span class="nx">log</span><span class="p">.</span><span class="nf">Error</span><span class="p">(</span><span class="s">&#34;Could not get timeline&#34;</span><span class="p">)</span>
<span class="p">}</span>
<span class="k">for</span> <span class="nx">_</span><span class="p">,</span> <span class="nx">t</span> <span class="o">:=</span> <span class="k">range</span> <span class="nx">timeline</span> <span class="p">{</span>
<span class="nx">createdTime</span><span class="p">,</span> <span class="nx">err</span> <span class="o">:=</span> <span class="nx">t</span><span class="p">.</span><span class="nf">CreatedAtTime</span><span class="p">()</span>
<span class="k">if</span> <span class="nx">err</span> <span class="o">!=</span> <span class="kc">nil</span> <span class="p">{</span>
<span class="nx">log</span><span class="p">.</span><span class="nf">Error</span><span class="p">(</span><span class="s">&#34;Couldn&#39;t parse time &#34;</span><span class="p">,</span> <span class="nx">err</span><span class="p">)</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="k">if</span> <span class="nx">time</span><span class="p">.</span><span class="nf">Since</span><span class="p">(</span><span class="nx">createdTime</span><span class="p">)</span> <span class="p">&gt;</span> <span class="nx">ageLimit</span> <span class="p">{</span>
<span class="nx">_</span><span class="p">,</span> <span class="nx">err</span> <span class="o">:=</span> <span class="nx">api</span><span class="p">.</span><span class="nf">DeleteTweet</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">Id</span><span class="p">,</span> <span class="kc">true</span><span class="p">)</span>
<span class="nx">log</span><span class="p">.</span><span class="nf">Info</span><span class="p">(</span><span class="s">&#34;DELETED: Age - &#34;</span><span class="p">,</span> <span class="nx">time</span><span class="p">.</span><span class="nf">Since</span><span class="p">(</span><span class="nx">createdTime</span><span class="p">).</span><span class="nf">Round</span><span class="p">(</span><span class="mi">1</span><span class="o">*</span><span class="nx">time</span><span class="p">.</span><span class="nx">Minute</span><span class="p">),</span> <span class="s">&#34; - &#34;</span><span class="p">,</span> <span class="nx">t</span><span class="p">.</span><span class="nx">Text</span><span class="p">)</span>
<span class="k">if</span> <span class="nx">err</span> <span class="o">!=</span> <span class="kc">nil</span> <span class="p">{</span>
<span class="nx">log</span><span class="p">.</span><span class="nf">Error</span><span class="p">(</span><span class="s">&#34;Failed to delete! &#34;</span><span class="p">,</span> <span class="nx">err</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="nx">log</span><span class="p">.</span><span class="nf">Info</span><span class="p">(</span><span class="s">&#34;No more tweets to delete.&#34;</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div><p>Read the full code <a href="https://github.com/victoriadrake/ephemeral/blob/master/main.go">here</a>.</p>
<p>For a use case like this, AWS Lambda has a free tier that costs nothing. If you&rsquo;re any level of developer, it&rsquo;s an extremely useful tool to become familiar with. For a full walkthrough with screenshots of how to set up a Lambda function that tweets for you, you can read <a href="https://victoria.dev/blog/running-a-free-twitter-bot-on-aws-lambda/">this article</a>. The set up for ephemeral is the same, it just has an opposite function. :)</p>
<p>I forked ephemeral from Adam Drake&rsquo;s <a href="https://github.com/adamdrake/harold">Harold</a>, a Twitter tool that has many useful functions beyond keeping your timeline trimmed. If you have more than 200 tweets to delete at first pass, please use Harold to do that first. You can run Harold with the <code>deletetimeline</code> flag from your terminal.</p>
<p>You may like to first <a href="https://twitter.com/settings/your_twitter_data">download all your tweets before deleting them</a> for sentimental value.</p>
<h1 id="why-use-twitter-at-all">Why use Twitter at all?</h1>
<p>In anticipation of the question, let me say that yes, I do use Twitter besides just as a bucket for my Lambda functions to fill and empty. It has its benefits, most related to what I perceive to be its original intended purpose: to be a means of near-instant communication for short, digestible pieces of information reaching a widespread pool of people.</p>
<p>I use it as a way to keep tabs on what&rsquo;s happening <em>right now.</em> I use it to comment on, joke about, and commiserate with things tweeted by the people I follow <em>right now.</em> By keeping my timeline restricted to only the most recent few days, I feel like I&rsquo;m using Twitter more like it was meant to be used: a way to join the conversation and see what&rsquo;s happening in the world <em>right now</em> - instead of just another place to amass more &ldquo;stuff.&rdquo;</p>
Running a free Twitter bot on AWS Lambdahttps://victoria.dev/blog/running-a-free-twitter-bot-on-aws-lambda/
Mon, 05 Mar 2018 10:29:15 -0500hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/running-a-free-twitter-bot-on-aws-lambda/How to save some time with automated link sharing on Twitter - for free!
]]>
<p>If you read <a href="https://victoria.dev/blog/about-time/">About time</a>, you&rsquo;ll know that I&rsquo;m a big believer in spending time now on building things that save time in the future. To this end I built a simple Twitter bot in Go that would occasionally post links to my articles and keep my account interesting even when I&rsquo;m too busy to use it. The tweets help drive traffic to my sites, and I don&rsquo;t have to lift a finger.</p>
<p>I ran the bot on an Amazon EC2 instance for about a month. My AWS usage has historically been pretty inexpensive (less than the price of a coffee in most of North America), so I was surprised when the little instance I was using racked up a bill 90% bigger than the month before. I don&rsquo;t think AWS is expensive, to be clear, but still&hellip; I&rsquo;m cheap. I want my Twitter bot, and I want it for less.</p>
<p>I&rsquo;d been meaning to explore AWS Lamda, and figured this was a good opportunity. Unlike an EC2 instance that is constantly running (and charging you for it), Lambda charges you per request and according to the duration of time your function takes to run. There&rsquo;s a free tier, too, and the first 1 million requests, plus a certain amount of compute time, are free. Roughly translated to running a Twitter bot that posts for you, say, twice a day, your monthly cost for using Lambda would total&hellip; carry the one&hellip; nothing. I&rsquo;ve been running my Lambda function for a couple weeks now, completely free.</p>
<p>When recently it came to me to take the reigns of the <a href="https://twitter.com/freeCodeCampTO">@freeCodeCampTO</a> Twitter, I decided to employ a similar strategy, and also use this opportunity to document the process for you, dear reader.</p>
<p>So if you&rsquo;re currently using a full-time running instance for a task that could be served by a cron job, this is the article for you. I&rsquo;ll cover how to write your function for Lambda, how to get it set up to run automatically, and as a sweet little bonus, a handy bash script that updates your function from the command line whenever you need to make a change. Let&rsquo;s do it!</p>
<h1 id="is-lambda-right-for-you">Is Lambda right for you?</h1>
<p>When I wrote the code for my Twitter bot in Go, I intended to have it run on an AWS instance and borrowed heavily from <a href="https://github.com/campoy/justforfunc/tree/master/14-twitterbot">Francesc&rsquo;s awesome Just for Func episode</a>. Some time later I modified it to randomly choose an article from my RSS feeds and tweet the link, twice a day. I wanted to do something similar for the @freeCodeCampTO bot, and have it tweet an inspiring quote about programming every morning.</p>
<p>This is a good use case for Lambda because:</p>
<ul>
<li>The program should execute once</li>
<li>It runs on a regular schedule, using time as a trigger</li>
<li>It doesn&rsquo;t need to run constantly</li>
</ul>
<p>The important thing to keep in mind is that Lambda runs a function once in response to an event that you define. The most widely applicable trigger is a simple cron expression, but there are many other trigger events you can hook up. You can get an overview <a href="https://aws.amazon.com/lambda/">here</a>.</p>
<h1 id="write-a-lambda-function">Write a Lambda function</h1>
<p>I found this really straightforward to do in Go. First, grab the <a href="https://github.com/aws/aws-lambda-go">aws-lambda-go</a> library:</p>
<div class="highlight"><pre class="chroma"><code class="language-bash" data-lang="bash">go get github.com/aws/aws-lambda-go/lambda
</code></pre></div><p>Then make this your <code>func main()</code>:</p>
<div class="highlight"><pre class="chroma"><code class="language-go" data-lang="go"><span class="kd">func</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="nx">lambda</span><span class="p">.</span><span class="nf">Start</span><span class="p">(</span><span class="nx">tweetFeed</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div><p>Where <code>tweetFeed</code> is the name of the function that makes everything happen. While I won&rsquo;t go into writing the whole Twitter bot here, you can view my code <a href="https://gist.github.com/victoriadrake/7859dab68df87e28f40d6715d08383c7">on GitHub</a>.</p>
<h1 id="setting-up-aws-lambda">Setting up AWS Lambda</h1>
<p>I&rsquo;m assuming you already have an AWS account. If not, first things first here: <a href="https://aws.amazon.com/free">https://aws.amazon.com/free</a></p>
<h2 id="1-create-your-function">1. Create your function</h2>
<p>Find AWS Lambda in the list of services, then look for this shiny button:</p>
<p><img src="lambda-01.png#screenshot" alt="Create function"></p>
<p>We&rsquo;re going to author a function from scratch. Name your function, then under <strong>Runtime</strong> choose &ldquo;Go 1.x&rdquo;.</p>
<p>Under <strong>Role name</strong> write any name you like. It&rsquo;s a required field but irrelevant for this use case.</p>
<p>Click <strong>Create function.</strong></p>
<p><img src="lambda-02.png#screenshot" alt="Author from scratch"></p>
<h2 id="2-configure-your-function">2. Configure your function</h2>
<p>You&rsquo;ll see a screen for configuring your new function. Under <strong>Handler</strong> enter the name of your Go program.</p>
<p><img src="lambda-03.png#screenshot" alt="Configure your function"></p>
<p>If you scroll down, you&rsquo;ll see a spot to enter environment variables. This is a great place to enter the Twitter API tokens and secrets, using the variable names that your program expects. The AWS Lambda function will create the environment for you using the variables you provide here.</p>
<p><img src="lambda-04.png#screenshot" alt="Environment variables"></p>
<p>No further settings are necessary for this use case. Click <strong>Save</strong> at the top of the page.</p>
<h2 id="3-upload-your-code">3. Upload your code</h2>
<p>You can upload your function code as a zip file on the configuration screen. Since we&rsquo;re using Go, you&rsquo;ll want to <code>go build</code> first, then zip the resulting executable before uploading that to Lambda.</p>
<p>&hellip;Of course I&rsquo;m not going to do that manually every time I want to tweak my function. That&rsquo;s what <code>awscli</code> and this bash script is for!</p>
<p><code>update.sh</code></p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">go build <span class="o">&amp;&amp;</span> <span class="se">\
</span><span class="se"></span>zip fcc-tweet.zip fcc-tweet <span class="o">&amp;&amp;</span> <span class="se">\
</span><span class="se"></span>rm fcc-tweet <span class="o">&amp;&amp;</span> <span class="se">\
</span><span class="se"></span>aws lambda update-function-code --function-name fcc-tweet --zip-file fileb://fcc-tweet.zip <span class="o">&amp;&amp;</span> <span class="se">\
</span><span class="se"></span>rm fcc-tweet.zip
</code></pre></div><p>Now whenever I make a tweak, I just run <code>bash update.sh</code>.</p>
<p>If you&rsquo;re not already using <a href="https://aws.amazon.com/cli/">AWS Command Line Interface</a>, do <code>pip install awscli</code> and thank me later. Find instructions for getting set up and configured in a few minutes <a href="https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html">here</a> under <strong>Quick Configuration</strong>.</p>
<h2 id="4-test-your-function">4. Test your function</h2>
<p>Wanna see it go? Of course you do! Click &ldquo;Configure test events&rdquo; in the dropdown at the top.</p>
<p><img src="lambda-05.png#screenshot" alt="Configure test events"></p>
<p>Since you&rsquo;ll use a time-based trigger for this function, you don&rsquo;t need to enter any code to define test events in the popup window. Simply write any name under <strong>Event name</strong> and empty the JSON in the field below. Then click <strong>Create</strong>.</p>
<p><img src="lambda-06.png#screenshot" alt="Configuring an empty test event"></p>
<p>Click <strong>Test</strong> at the top of the page, and if everything is working correctly you should see&hellip;</p>
<p><img src="lambda-07.png#screenshot" alt="Test success notification"></p>
<h2 id="5-set-up-cloudwatch-events">5. Set up CloudWatch Events</h2>
<p>To run our function as we would a cron job - as a regularly scheduled time-based event - we&rsquo;ll use CloudWatch. Click <strong>CloudWatch Events</strong> in the <strong>Designer</strong> sidebar.</p>
<p><img src="lambda-08.png#screenshot" alt="CloudWatch Events trigger"></p>
<p>Under <strong>Configure triggers</strong>, you&rsquo;ll create a new rule. Choose a descriptive name for your rule without spaces or punctuation, and ensure <strong>Schedule expression</strong> is selected. Then input the time you want your program to run as a <em>rate expression</em>, or cron expression.</p>
<p>A cron expression looks like this: <code>cron(0 12 * * ? *)</code></p>
<table>
<thead>
<tr>
<th>Minutes</th>
<th>Hours</th>
<th>Day of month</th>
<th>Month</th>
<th>Day of week</th>
<th>Year</th>
<th>In English</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>12</td>
<td>*</td>
<td>*</td>
<td>?</td>
<td>*</td>
<td>Run at noon (UTC) every day</td>
</tr>
</tbody>
</table>
<p>For more on how to write your cron expressions, read <a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/ScheduledEvents.html">this.</a></p>
<p>To find out what the current time in UTC is, click <a href="https://codepen.io/victoriadrake/full/OQabar/">here.</a></p>
<p>If you want your program to run twice a day, say once at 10am and again at 3pm, you&rsquo;ll need to set two separate CloudWatch Events triggers and cron expression rules.</p>
<p>Click <strong>Add</strong>.</p>
<p><img src="lambda-09.png#screenshot" alt="Set cron expression rule"></p>
<h1 id="watch-it-go">Watch it go</h1>
<p>That&rsquo;s all you need to get your Lambda function up and running! Now you can sit back, relax, and do more important things than share your RSS links on Twitter.</p>
Moving to a new domain without breaking old links with AWS & Disqushttps://victoria.dev/blog/moving-to-a-new-domain-without-breaking-old-links-with-aws-disqus/
Wed, 10 Jan 2018 08:56:20 -0500hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/moving-to-a-new-domain-without-breaking-old-links-with-aws-disqus/I moved my site's blog to its own domain without breaking old links or losing comments. Here's how.
]]>
<p>I started blogging about my nomadic travels last year, and so far the habit has stuck. Like all side projects, I won&rsquo;t typically invest heavily in setting up web properties before I can be reasonably certain that such an investment is worth my time or enjoyment. In other words: don&rsquo;t buy the domain until you&rsquo;ve proven to yourself that you&rsquo;ll stick with it!</p>
<p>After some months of regular posting I felt I was ready to commit (short courtship, I know, but we&rsquo;re all adults here) and I bought a dedicated domain, <a href="https://heronebag.com">herOneBag.com</a>.</p>
<p>Up until recently, my #NomadLyfe blog was just a subdirectory of my main personal site. Now it&rsquo;s all grown up and ready to strike out into the world alone! Here&rsquo;s the setup for the site:</p>
<ul>
<li>Static site in Amazon Web Services S3 bucket</li>
<li>Route 53 handling the DNS</li>
<li>CloudFront for distribution and a custom SSL certificate</li>
<li>Disqus for comments</li>
</ul>
<p>If you&rsquo;d like a walk-through for how to set up a new domain with this structure, it&rsquo;s over here: <a href="https://victoria.dev/verbose/aws-static-site/">Hosting your static site with AWS S3, Route 53, and CloudFront</a>. In this post, I&rsquo;ll just detail how I managed to move my blog to the new site without breaking the old links or losing any comments.</p>
<h1 id="preserve-old-links-with-redirection-rules">Preserve old links with redirection rules</h1>
<p>I wanted to avoid breaking links that have been posted around the web by forwarding visitors to the new URL. The change looks like this:</p>
<p>Old URL: <code>https://victoria.dev/meta/5-bag-lessons/</code><br>
New URL: <code>https://heronebag.com/blog/5-bag-lessons/</code></p>
<p>You can see that the domain name as well as the subdirectory have changed, but the slug for the blog post remains the same. (I love static sites.)</p>
<p>To redirect links from the old site, we&rsquo;ll need to set redirection rules in the old site&rsquo;s S3 bucket. AWS provides a way to set up a conditional redirect. This is set in the &ldquo;Redirection rules&rdquo; section of your S3 bucket&rsquo;s properties, under &ldquo;Static website hosting.&rdquo; You can <a href="https://docs.aws.amazon.com/AmazonS3/latest/dev/how-to-page-redirect.html#advanced-conditional-redirects">find the documentation here.</a></p>
<p><img src="aws-redirect.png#screenshot" alt="Redirection rules placement"></p>
<p>There are a few examples given, but none that represent the redirect I want. In addition to changing the prefix of the object key, we&rsquo;re also changing the domain. The latter is achieved with the <code>&lt;HostName&gt;</code> tag.</p>
<p>To redirect requests for the old blog URL to the new top level domain, we&rsquo;ll use the code below.</p>
<div class="highlight"><pre class="chroma"><code class="language-xml" data-lang="xml"><span class="nt">&lt;RoutingRules&gt;</span>
<span class="nt">&lt;RoutingRule&gt;</span>
<span class="nt">&lt;Condition&gt;</span>
<span class="nt">&lt;KeyPrefixEquals&gt;</span>oldblog/<span class="nt">&lt;/KeyPrefixEquals&gt;</span>
<span class="nt">&lt;/Condition&gt;</span>
<span class="nt">&lt;Redirect&gt;</span>
<span class="nt">&lt;HostName&gt;</span>newdomain.com<span class="nt">&lt;/HostName&gt;</span>
<span class="nt">&lt;ReplaceKeyPrefixWith&gt;</span>newblog/<span class="nt">&lt;/ReplaceKeyPrefixWith&gt;</span>
<span class="nt">&lt;/Redirect&gt;</span>
<span class="nt">&lt;/RoutingRule&gt;</span>
<span class="nt">&lt;/RoutingRules&gt;</span>
</code></pre></div><p>This rule ensures that requests for <code>olddomain.com/oldblog/specific-blog-post</code> will redirect to <code>newdomain.com/newblog/specific-blog-post</code>.</p>
<h1 id="migrate-disqus-comments">Migrate Disqus comments</h1>
<p>Disqus provides a tool for migrating the comment threads from your old blog site to the new one. You can find it in your Disqus admin tools at <code>your-short-name.disqus.com/admin/discussions/migrate/</code>.</p>
<p>To migrate posts from the old blog address to the new one, we&rsquo;ll use the URL mapper tool. Click &ldquo;Start URL mapper,&rdquo; then &ldquo;you can download a CSV here.&rdquo;</p>
<p><img src="aws-disqus.png#screenshot" alt="URL mapping for Disqus."></p>
<p>Disqus has decent instructions for how this tool works, and you can <a href="https://help.disqus.com/customer/en/portal/articles/912757-url-mapper">read them here.</a> Basically, you&rsquo;ll input the new blog URLs into the second column of the CSV file you downloaded, then pass it back to Disqus to process. If you&rsquo;re using a program to edit the CSV, be sure to save the resulting file in CSV format.</p>
<p>Unless you have a bazillion URLs, the tool works pretty quickly, and you&rsquo;ll get an email when it&rsquo;s finished. Don&rsquo;t forget to update the name of your site in the Disqus admin, too.</p>
<h1 id="transfer-other-settings">Transfer other settings</h1>
<p>Update links in your social profiles and any other sites you may have around the web. If you&rsquo;re using other services attached to your website like Google Analytics or IFTTT, don&rsquo;t forget to update those details too!</p>
A Unicode substitution cipher algorithmhttps://victoria.dev/blog/a-unicode-substitution-cipher-algorithm/
Sat, 06 Jan 2018 20:00:28 -0500hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/a-unicode-substitution-cipher-algorithm/How a fun but useless project turned into a Unicode substitution cipher algorithm.
]]>
<p>Full transparency: I occasionally waste time messing around on Twitter. <em>(Gasp! Shock!)</em> One of the ways I waste time messing around on Twitter is by writing my name in my profile with different Unicode character &ldquo;fonts,&rdquo; 𝖑𝖎𝖐𝖊 𝖙𝖍𝖎𝖘 𝖔𝖓𝖊. I previously did this by searching for different Unicode characters on Google, then one-by-one copying and pasting them into the &ldquo;Name&rdquo; field on my Twitter profile. Since this method of wasting time was a bit of a time waster, I decided (in true programmer fashion) to write a tool that would help me save some time while wasting it.</p>
<p>I originally dubbed the tool &ldquo;uni-pretty,&rdquo; (a pun on LEGO&rsquo;s <a href="https://www.lego.com/en-us/themes/unikitty/characters/unikitty-84aef06dc1164a718aded854976efeeb">Unikitty</a> from a movie I&rsquo;d just watched that absolutely no one found funny) but have since renamed it <a href="https://fancyunicode.com">fancy unicode</a>. It builds from <a href="https://github.com/victoriadrake/fancy-unicode">this GitHub repo</a>. It lets you type any characters into a field and then converts them into Unicode characters that also represent letters, giving you fancy &ldquo;fonts&rdquo; that override a website&rsquo;s CSS, like in your Twitter profile. (Sorry, Internet.)</p>
<p><img src="screenshot.png#screenshot" alt="fancy-unicode screenshot"></p>
<p>The tool&rsquo;s first naive iteration existed for about twenty minutes while I copy-pasted Unicode characters into a data structure. This approach of storing the characters in the JavaScript file, called hard-coding, is fraught with issues. Besides having to store every character from every font style, it&rsquo;s painstaking to build, hard to update, and more code means it&rsquo;s susceptible to more possible errors.</p>
<p>Fortunately, working with Unicode means that there&rsquo;s a way to avoid the whole mess of having to store all the font characters: Unicode numbers are sequential. More importantly, the special characters in Unicode that could be used as fonts (meaning that there&rsquo;s a matching character for most or all of the letters of the alphabet) are always in the following sequence: capital A-Z, lowercase a-z.</p>
<p>For example, in the fancy Unicode above, the lowercase letter &ldquo;L&rdquo; character has the Unicode number <code>U+1D591</code> and HTML code <code>&amp;#120209;</code>. The next letter in the sequence, a lowercase letter &ldquo;M,&rdquo; has the Unicode number <code>U+1D592</code> and HTML code <code>&amp;#120210;</code>. Notice how the numbers in those codes increment by one.</p>
<p>Why&rsquo;s this relevant? Since each special character can be referenced by a number, and we know that the order of the sequence is always the same (capital A-Z, lowercase a-z), we&rsquo;re able to produce any character simply by knowing the first number of its font sequence (the capital &ldquo;A&rdquo;). If this reminds you of anything, you can borrow my decoder pin.</p>
<p>In cryptography, the Caesar cipher (or shift cipher) is a simple method of encryption that utilizes substitution of one character for another in order to encode a message. This is typically done using the alphabet and a shift &ldquo;key&rdquo; that tells you which letter to substitute for the original one. For example, if I were trying to encode the word &ldquo;cat&rdquo; with a right shift of 3, it would look like this:</p>
<div class="highlight"><pre class="chroma"><code class="language-bash" data-lang="bash">c a t
f d w
</code></pre></div><p>With this concept, encoding our plain text letters as a Unicode &ldquo;font&rdquo; is a simple process. All we need is an array to reference our plain text letters with, and the first index of our Unicode capital &ldquo;A&rdquo; representation. Since some Unicode numbers also include letters (which are sequential, but an unnecessary complication) and since the intent is to display the page in HTML, we&rsquo;ll use the HTML code number <code>&amp;#120172;</code>, with the extra bits removed for brevity.</p>
<div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="kd">var</span> <span class="nx">plain</span> <span class="o">=</span> <span class="p">[</span><span class="s1">&#39;A&#39;</span><span class="p">,</span> <span class="s1">&#39;B&#39;</span><span class="p">,</span> <span class="s1">&#39;C&#39;</span><span class="p">,</span> <span class="s1">&#39;D&#39;</span><span class="p">,</span> <span class="s1">&#39;E&#39;</span><span class="p">,</span> <span class="s1">&#39;F&#39;</span><span class="p">,</span> <span class="s1">&#39;G&#39;</span><span class="p">,</span> <span class="s1">&#39;H&#39;</span><span class="p">,</span> <span class="s1">&#39;I&#39;</span><span class="p">,</span> <span class="s1">&#39;J&#39;</span><span class="p">,</span> <span class="s1">&#39;K&#39;</span><span class="p">,</span> <span class="s1">&#39;L&#39;</span><span class="p">,</span> <span class="s1">&#39;M&#39;</span><span class="p">,</span> <span class="s1">&#39;N&#39;</span><span class="p">,</span> <span class="s1">&#39;O&#39;</span><span class="p">,</span> <span class="s1">&#39;P&#39;</span><span class="p">,</span> <span class="s1">&#39;Q&#39;</span><span class="p">,</span> <span class="s1">&#39;R&#39;</span><span class="p">,</span> <span class="s1">&#39;S&#39;</span><span class="p">,</span> <span class="s1">&#39;T&#39;</span><span class="p">,</span> <span class="s1">&#39;U&#39;</span><span class="p">,</span> <span class="s1">&#39;V&#39;</span><span class="p">,</span> <span class="s1">&#39;W&#39;</span><span class="p">,</span> <span class="s1">&#39;X&#39;</span><span class="p">,</span> <span class="s1">&#39;Y&#39;</span><span class="p">,</span> <span class="s1">&#39;Z&#39;</span><span class="p">,</span> <span class="s1">&#39;a&#39;</span><span class="p">,</span> <span class="s1">&#39;b&#39;</span><span class="p">,</span> <span class="s1">&#39;c&#39;</span><span class="p">,</span> <span class="s1">&#39;d&#39;</span><span class="p">,</span> <span class="s1">&#39;e&#39;</span><span class="p">,</span> <span class="s1">&#39;f&#39;</span><span class="p">,</span> <span class="s1">&#39;g&#39;</span><span class="p">,</span> <span class="s1">&#39;h&#39;</span><span class="p">,</span> <span class="s1">&#39;i&#39;</span><span class="p">,</span> <span class="s1">&#39;j&#39;</span><span class="p">,</span> <span class="s1">&#39;k&#39;</span><span class="p">,</span> <span class="s1">&#39;l&#39;</span><span class="p">,</span> <span class="s1">&#39;m&#39;</span><span class="p">,</span> <span class="s1">&#39;n&#39;</span><span class="p">,</span> <span class="s1">&#39;o&#39;</span><span class="p">,</span> <span class="s1">&#39;p&#39;</span><span class="p">,</span> <span class="s1">&#39;q&#39;</span><span class="p">,</span> <span class="s1">&#39;r&#39;</span><span class="p">,</span> <span class="s1">&#39;s&#39;</span><span class="p">,</span> <span class="s1">&#39;t&#39;</span><span class="p">,</span> <span class="s1">&#39;u&#39;</span><span class="p">,</span> <span class="s1">&#39;v&#39;</span><span class="p">,</span> <span class="s1">&#39;w&#39;</span><span class="p">,</span> <span class="s1">&#39;x&#39;</span><span class="p">,</span> <span class="s1">&#39;y&#39;</span><span class="p">,</span> <span class="s1">&#39;z&#39;</span><span class="p">];</span>
<span class="kd">var</span> <span class="nx">fancyA</span> <span class="o">=</span> <span class="mi">120172</span><span class="p">;</span>
</code></pre></div><p>Since we know that the letter sequence of the fancy Unicode is the same as our plain text array, any letter can be found by using its index in the plain text array as an offset from the fancy capital &ldquo;A&rdquo; number. For example, capital &ldquo;B&rdquo; in fancy Unicode is the capital &ldquo;A&rdquo; number, <code>120172</code> plus B&rsquo;s index, which is <code>1</code>: <code>120173</code>.</p>
<p>Here&rsquo;s our conversion function:</p>
<div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="kd">function</span> <span class="nx">convert</span><span class="p">(</span><span class="nx">string</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// Create a variable to store our converted letters
</span><span class="c1"></span> <span class="kd">let</span> <span class="nx">converted</span> <span class="o">=</span> <span class="p">[];</span>
<span class="c1">// Break string into substrings (letters)
</span><span class="c1"></span> <span class="kd">let</span> <span class="nx">arr</span> <span class="o">=</span> <span class="nx">string</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s1">&#39;&#39;</span><span class="p">);</span>
<span class="c1">// Search plain array for indexes of letters
</span><span class="c1"></span> <span class="nx">arr</span><span class="p">.</span><span class="nx">forEach</span><span class="p">(</span><span class="nx">element</span> <span class="p">=&gt;</span> <span class="p">{</span>
<span class="kd">let</span> <span class="nx">i</span> <span class="o">=</span> <span class="nx">plain</span><span class="p">.</span><span class="nx">indexOf</span><span class="p">(</span><span class="nx">element</span><span class="p">);</span>
<span class="c1">// If the letter isn&#39;t a letter (not found in the plain array)
</span><span class="c1"></span> <span class="k">if</span> <span class="p">(</span><span class="nx">i</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// Return as a whitespace
</span><span class="c1"></span> <span class="nx">converted</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="s1">&#39; &#39;</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="c1">// Get relevant character from fancy number + index
</span><span class="c1"></span> <span class="kd">let</span> <span class="nx">unicode</span> <span class="o">=</span> <span class="nx">fancyA</span> <span class="o">+</span> <span class="nx">i</span><span class="p">;</span>
<span class="c1">// Return as HTML code
</span><span class="c1"></span> <span class="nx">converted</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="s1">&#39;&amp;#&#39;</span> <span class="o">+</span> <span class="nx">unicode</span> <span class="o">+</span> <span class="s1">&#39;;&#39;</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">});</span>
<span class="c1">// Print the converted letters as a string
</span><span class="c1"></span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">converted</span><span class="p">.</span><span class="nx">join</span><span class="p">(</span><span class="s1">&#39;&#39;</span><span class="p">));</span>
<span class="p">}</span>
</code></pre></div><p>A neat possibility for this method of encoding requires a departure from my original purpose, which was to create a human-readable representation of the original string. If the purpose was instead to produce a cipher, this could be done by using any Unicode index in place of <code>fancyA</code> as long as the character indexed isn&rsquo;t a representation of a capital &ldquo;A.&rdquo;</p>
<p>Here&rsquo;s the same code set up with a simplified plain text array, and a non-letter-representation Unicode key:</p>
<div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="kd">var</span> <span class="nx">plain</span> <span class="o">=</span> <span class="p">[</span><span class="s1">&#39;a&#39;</span><span class="p">,</span> <span class="s1">&#39;b&#39;</span><span class="p">,</span> <span class="s1">&#39;c&#39;</span><span class="p">,</span> <span class="s1">&#39;d&#39;</span><span class="p">,</span> <span class="s1">&#39;e&#39;</span><span class="p">,</span> <span class="s1">&#39;f&#39;</span><span class="p">,</span> <span class="s1">&#39;g&#39;</span><span class="p">,</span> <span class="s1">&#39;h&#39;</span><span class="p">,</span> <span class="s1">&#39;i&#39;</span><span class="p">,</span> <span class="s1">&#39;j&#39;</span><span class="p">,</span> <span class="s1">&#39;k&#39;</span><span class="p">,</span> <span class="s1">&#39;l&#39;</span><span class="p">,</span> <span class="s1">&#39;m&#39;</span><span class="p">,</span> <span class="s1">&#39;n&#39;</span><span class="p">,</span> <span class="s1">&#39;o&#39;</span><span class="p">,</span> <span class="s1">&#39;p&#39;</span><span class="p">,</span> <span class="s1">&#39;q&#39;</span><span class="p">,</span> <span class="s1">&#39;r&#39;</span><span class="p">,</span> <span class="s1">&#39;s&#39;</span><span class="p">,</span> <span class="s1">&#39;t&#39;</span><span class="p">,</span> <span class="s1">&#39;u&#39;</span><span class="p">,</span> <span class="s1">&#39;v&#39;</span><span class="p">,</span> <span class="s1">&#39;w&#39;</span><span class="p">,</span> <span class="s1">&#39;x&#39;</span><span class="p">,</span> <span class="s1">&#39;y&#39;</span><span class="p">,</span> <span class="s1">&#39;z&#39;</span><span class="p">];</span>
<span class="kd">var</span> <span class="nx">key</span> <span class="o">=</span> <span class="mi">9016</span><span class="p">;</span>
</code></pre></div><p>You might be able to imagine that decoding a cipher produced by this method would be relatively straightforward, once you knew the encoding secret. You&rsquo;d simply need to subtract the key from the HTML code numbers of the encoded characters, then find the relevant plain text letters at the remaining indexes.</p>
<p>Well, that&rsquo;s it for today. Be sure to drink your Ovaltine and we&rsquo;ll see you right here next Monday at 5:45!</p>
<p>Oh, and&hellip; ⍔⍠⍟⍘⍣⍒⍥⍦⍝⍒⍥⍚⍠⍟⍤ ⍒⍟⍕ ⍨⍖⍝⍔⍠⍞⍖ ⍥⍠ ⍥⍙⍖ ⍔⍣⍪⍡⍥⍚⍔ ⍦⍟⍚⍔⍠⍕⍖ ⍤⍖⍔⍣⍖⍥ ⍤⍠⍔⍚⍖⍥⍪</p>
<p>:)</p>
Hosting your static site with AWS S3, Route 53, and CloudFronthttps://victoria.dev/blog/hosting-your-static-site-with-aws-s3-route-53-and-cloudfront/
Wed, 13 Dec 2017 20:46:12 -0500hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/hosting-your-static-site-with-aws-s3-route-53-and-cloudfront/A guide to using Amazon Web Services to serve your site with HTTPS. For (almost) free.
]]>
<p>Some time ago I decided to stop freeloading on GitHub pages and move one of my sites to Amazon Web Services (AWS). It turns out that I&rsquo;m still mostly freeloading (yay free tier) so it amounted to a learning experience. Here are the components that let me host and serve the site at my custom domain with HTTPS.</p>
<ul>
<li>Static site in Amazon Web Services S3 bucket</li>
<li>Route 53 handling the DNS</li>
<li>CloudFront for distribution and a custom SSL certificate</li>
</ul>
<p>I set all that up most of a year ago. At the time, I found the AWS documentation to be rather fragmented and inconvenient to follow - it was hard to find what you were looking for without knowing what a specific setting might be called, or where it was, or if it existed at all. When I recently set up a new site and stumbled through this process again, I didn&rsquo;t find it any easier. Hopefully this post can help to collect the relevant information into a more easily followed process and serve as an accompanying guide to save future me (and you) some time.</p>
<p>Rather than replace existing documentation, this post is meant to supplement it. Think of me as your cool tech-savvy friend on the phone with you at 4am, troubleshooting your website. (Please don&rsquo;t actually call me at 4am.) I&rsquo;ll walk through the set up while providing links for the documentation that was ultimately helpful (mostly so I can find it again later&hellip;).</p>
<h1 id="hosting-a-static-site-with-amazon-s3-and-a-custom-domain">Hosting a static site with Amazon S3 and a custom domain</h1>
<p>If you&rsquo;re starting from scratch, you&rsquo;ll need an AWS account. It behooves you to get one, even if you don&rsquo;t like paying for services - there&rsquo;s a free tier that will cover most of the experimental stuff you&rsquo;re going to want to do in the first year, and even the things I do pay for cost me less than a dollar a month. You can sign up at <a href="https://aws.amazon.com/free">https://aws.amazon.com/free</a>.</p>
<p>Getting your static site hosted and available at your custom domain is your first mission, should you choose to accept it. <a href="http://docs.aws.amazon.com/AmazonS3/latest/dev/website-hosting-custom-domain-walkthrough.html">Your instructions are here.</a></p>
<p>Creating the buckets for site hosting on S3 is the most straightforward part of this process in my opinion, and the AWS documentation walkthrough covers what you&rsquo;ll need to do quite well. It gets a little unclear around <em>Step 3: Create and Configure Amazon Route 53 Hosted Zone</em>, so come back and read on once you&rsquo;ve reached that point. I&rsquo;ll make some tea in the meantime.</p>
<p>&hellip;</p>
<p>Ready? Cool. See, I&rsquo;m here for you.</p>
<h1 id="set-up-route-53">Set up Route 53</h1>
<p>The majority of the work in this section amounts to creating the correct record sets for your custom domain. If you&rsquo;re already familiar with how record sets work, the documentation is a bit of a slog. Here&rsquo;s how it should look when you&rsquo;re finished:</p>
<p><img src="aws-recordsets.png#screenshot" alt="Route 53 record sets."></p>
<p>The &ldquo;NS&rdquo; and &ldquo;SOA&rdquo; records are created automatically for you. The only records you need to create are the &ldquo;A&rdquo; records.</p>
<p>Hop over to <a href="https://console.aws.amazon.com/route53/home">Route 53</a> and follow <a href="http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/MigratingDNS.html">this walkthrough</a> to create a &ldquo;hosted zone.&rdquo; The value of the <strong>NS</strong> (Name Servers) records are what you&rsquo;ll have to provide to your domain name registrar (wherever you bought your custom domain, such as this super subtle <a href="https://affiliate.namecheap.com/?affId=109417">Namecheap.com affiliate link</a> right here.)</p>
<p>If you created two buckets in the first section (one for <code>yourdomain.com</code> and one for <code>www.yourdomain.com</code>), you&rsquo;ll need two separate A records in Route 53. Initially, these have the value of the endpoints for your matching S3 buckets (looks like <code>s3-website.us-east-2.amazonaws.com</code>). Later, you&rsquo;ll change them to your CloudFront domain name.</p>
<p>If you went with Namecheap as your registrar, <a href="http://docs.aws.amazon.com/AmazonS3/latest/dev/website-hosting-custom-domain-walkthrough.html#root-domain-walkthrough-update-ns-record">Step 4</a> looks like this:</p>
<p><img src="aws-namecheapdns.png#screenshot" alt="Namecheap&rsquo;s Custom DNS settings."></p>
<p>Waiting is the hardest part&hellip; I&rsquo;ve gotten into the habit of working on another project or setting up the DNS change before going to bed so that changes have time to propagate without me feeling like I need to fiddle with it. ^^;</p>
<p>When the transfer&rsquo;s ready, you&rsquo;ll see your site at <code>http://yourdomain.com</code>. Next, you&rsquo;ll want to set up CloudFront so that becomes <code>https://yourdomain.com</code>.</p>
<h1 id="set-up-cloudfront-and-ssl">Set up CloudFront and SSL</h1>
<p><a href="http://docs.aws.amazon.com/AmazonS3/latest/dev/website-hosting-cloudfront-walkthrough.html">Here are the instructions for setting up CloudFront.</a> There are a few important points to make sure you don&rsquo;t miss on the &ldquo;Create Distribution&rdquo; page:</p>
<ul>
<li><strong>Origin Domain Name:</strong> Make sure to use your S3 bucket endpoint, and not select the bucket from the dropdown menu that appears.</li>
<li><strong>Viewer Protocol Policy:</strong> If you want requests for <code>http://yourdomain.com</code> to always result in <code>https://yourdomain.com</code>, choose &ldquo;Redirect HTTP to HTTPS.&rdquo;</li>
<li><strong>Alternate Domain Names:</strong> Enter <code>yourdomain.com</code> and <code>www.yourdomain.com</code> on separate lines.</li>
<li><strong>SSL Certificate:</strong> See below.</li>
<li><strong>Default Root Object:</strong> Enter the name of the html file that should be returned when your users go to <code>https://yourdomain.com</code>. This is usually &ldquo;index.html&rdquo;.</li>
</ul>
<h2 id="ssl-certificate">SSL Certificate</h2>
<p>To show your content with HTTPS at your custom domain, you&rsquo;ll need to choose &ldquo;Custom SSL Certificate.&rdquo; You can easily get an SSL Certificate with AWS Certificate Manager. Click on &ldquo;Request or Import a Certificate with ACM&rdquo; to get started in a new window.</p>
<p><a href="http://docs.aws.amazon.com/acm/latest/userguide/gs-acm-request.html">Here are instructions for setting up a certificate.</a> I don&rsquo;t think they&rsquo;re very good, personally. Don&rsquo;t worry, I got you.</p>
<p>To account for &ldquo;<a href="http://www.yourdomain.com">www.yourdomain.com</a>&rdquo; as well as any subdomains, you&rsquo;ll want to add two domain names to the certificate, like so:</p>
<p><img src="aws-acmdomains.png#screenshot" alt="Adding domain names to ACM."></p>
<p>Click &ldquo;Next.&rdquo; You&rsquo;ll be asked to choose a validation method. Choose &ldquo;DNS validation&rdquo; and click &ldquo;Review.&rdquo; If everything is as it should be, click &ldquo;Confirm and request.&rdquo;</p>
<p>You&rsquo;ll see a page, &ldquo;Validation&rdquo; that looks like this. You&rsquo;ll have to click the little arrow next to both domain names to get the important information to show:</p>
<p><img src="aws-acmvalidation.png#screenshot" alt="Validation instructions for ACM."></p>
<p>Under both domain names, click the button for &ldquo;Create record in Route 53.&rdquo; This will automatically create a CNAME record set in Route 53 with the given values, which ACM will then check in order to validate that you own those domains. You could create the records manually, if you wanted to for some reason. I don&rsquo;t know, maybe you&rsquo;re killing time. ¯\_(ツ)_/¯</p>
<p>Click &ldquo;Continue.&rdquo; You&rsquo;ll see a console that looks like this:</p>
<p><img src="aws-acmcertificates.png#screenshot" alt="List of certificates you own."></p>
<p>It may take some time for the validation to complete, at which point the &ldquo;Pending validation&rdquo; status will change to &ldquo;Issued.&rdquo; Again with the waiting. You can close this window to return to the CloudFront set up. Once the certificate is validated, you&rsquo;ll see it in the dropdown menu under &ldquo;Custom SSL Certificate.&rdquo; You can click &ldquo;Create Distribution&rdquo; to finish setting up CloudFront.</p>
<p>In your CloudFront Distributions console, you&rsquo;ll see &ldquo;In Progress&rdquo; until AWS has done its thing. Once it&rsquo;s done, it&rsquo;ll change to &ldquo;Deployed.&rdquo;</p>
<h2 id="one-last-thing">One last thing</h2>
<p>Return to your <a href="https://console.aws.amazon.com/route53/">Route 53 console</a> and click on &ldquo;Hosted zones&rdquo; in the sidebar, then your domain name from the list. For both A records, change the &ldquo;Alias Target&rdquo; from the S3 endpoint to your CloudFront distribution domain, which should look something like <code>dj4p1rv6mvubz.cloudfront.net</code>. It appears in the dropdown after you clear the field.</p>
<h1 id="youre-done">You&rsquo;re done!</h1>
<p>Well, usually. If you navigate to your new HTTPS domain and don&rsquo;t see your beautiful new site where it should be, here are some things you can do:</p>
<ol>
<li>Check S3 bucket policy - ensure that the bucket for <code>yourdomain.com</code> in the <a href="https://s3.console.aws.amazon.com/s3/home">S3 console</a> shows &ldquo;Public&rdquo; in the &ldquo;Access&rdquo; column.</li>
<li>Check S3 bucket index document - In the &ldquo;Properties&rdquo; tab for the bucket, then &ldquo;Static website hosting&rdquo;. Usually &ldquo;index.html&rdquo;.</li>
<li>Check CloudFront Origin - the &ldquo;Origin&rdquo; column in the <a href="https://console.aws.amazon.com/cloudfront/home">CloudFront Console</a> should show the S3 bucket&rsquo;s endpoint (<code>s3-website.us-east-2.amazonaws.com</code>), not the bucket name (<code>yourdomain.com.s3.amazonaws.com</code>).</li>
<li>Check CloudFront Default Root Object - clicking on the distribution name should take you to a details page that shows &ldquo;Default Root Object&rdquo; in the list with the value that you set, usually &ldquo;index.html&rdquo;.</li>
<li>Wait. Sometimes changes take up to 48hrs to propagate. ¯\_(ツ)_/¯</li>
</ol>
<p>I hope that helps you get set up with your new static site on AWS! Feel free to <a href="https://twitter.com/victoriadotdev">share your link</a> with me, I&rsquo;d love to see what you&rsquo;ve created. :)</p>
About timehttps://victoria.dev/blog/about-time/
Wed, 22 Nov 2017 14:05:14 -0500hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/about-time/Some thoughts and concrete suggestions on saving society through programming.
<p>This morning I read an article that&rsquo;s been making the rounds lately: <a href="http://nautil.us/issue/52/the-hive/modern-media-is-a-dos-attack-on-your-free-will">Modern Media Is a DoS Attack on Your Free Will</a>.</p>
<p>It&rsquo;s made me think, which I must admit, I at first didn&rsquo;t like. See, when I wake up in the morning (and subsequently wake up my computer) the first thing I do is go on Twitter to catch up on everything I missed while I was asleep. All this before my first coffee, mind you. Links on Twitter usually lead to stories on Medium, newly released apps on ProductHunt, and enticing sales on a new gadget or two on Amazon. Wherever it goes, in those blissfully half-awake mental recesses, the last thing I&rsquo;m trying to do is think.</p>
<p>However, yesterday, I also happened to listen to a podcast from freeCodeCamp. It was <a href="https://twitter.com/ossia/status/932698783863001089">#7: The code I&rsquo;m still ashamed of</a>. This lead to thoughts on the responsibilities of programmers - the people tasked with designing and building apps and systems meant to steer the very course of your life.</p>
<p>This morning, the combined swirling mess of notions brought on by these two sources of information had, even before my first coffee, the unfortunate effect of making me think.</p>
<p>Mostly, I thought about intention, and time.</p>
<p>I don&rsquo;t believe it&rsquo;s wildly inaccurate to say that when you go about doing something in your daily life, you have a general awareness of your reason for doing it. If you leave your building and go down the street to Starbucks and buy a coffee, more often than not, it&rsquo;s because you wanted a coffee. If you go to the corner store and buy a litre of milk, you probably intend to drink it. If you find yourself nicely dressed on a Friday night waiting at a well-decorated restaurant to meet another human being with whom you share an apparent mutual attraction, I can risk a guess that you&rsquo;re after some form of pleasant human interaction.</p>
<p>In each of these, and many more examples you can think up, the end goal is clearly defined. There is an expected final step to the process; an expected response; a return value.</p>
<p>What is the return value of opening up the Twitter app? Browsing Facebook? Instagram? In fact, any social media?</p>
<p>The concrete answer is that there isn&rsquo;t one. Perhaps in those of us with resilient self-discipline, there may at least be some sort of time limitation. That&rsquo;s the most we can hope for, however, and no wonder - that&rsquo;s what these and other similar services have been <em>designed</em> for. They&rsquo;re built to be open-ended black-holes for our most precious resource&hellip; time.</p>
<blockquote>
<p>In the case of the Analytical Engine we have undoubtedly to lay out a certain capital of analytical labour in one particular line; but this is in order that the engine may bring us in a much larger return in another line.</p>
<p><em>Ada Augusta (Ada Lovelace)</em> - <a href="https://www.fourmilab.ch/babbage/sketch.html">Notes on <em>Sketch of The Analytical Engine</em></a></p>
</blockquote>
<p>Okay, so I did some more reading. Specifically, #ThrowbackThursday to the mid 1800&rsquo;s and something my good friend Ada Lovelace once scribbled in a book. Widely considered one of the first computer programmers, she and Charles Babbage pioneered many concepts that programmers today take for granted. The one I&rsquo;m going to hang my point on is, I think, nicely encapsulated in the above quote: the things programmers make are supposed to save you time.</p>
<p>Save it. Not lose it.</p>
<p>I think Ada and Charles would agree that, observing the effects of social media apps, clickbait news sites, and many other forms of attention-hogging interactivity that we haven&rsquo;t even classified yet - something&rsquo;s gone horribly wrong.</p>
<p>What if, as programmers, we actually did something about it?</p>
<p>Consider that collectively - no, even individually - we who design and build the workings of modern technology have an <em>incredible</em> amount of power. The next indie app that goes viral on ProductHunt will consume hundreds of hours of time from its users. Where is all that untapped, pure potential going to? Some open-ended, inoffensive amusement? Another advertising platform thinly veiled as a game? Perhaps another drop of oil to smooth the machinery of The Great Engine of Commerce?</p>
<p>I get it - programmers will build what they&rsquo;re paid to build. That&rsquo;s capitalism, that&rsquo;s feeding your family, survival - life. I&rsquo;m not trying to suggest we all quit our jobs, go live in the woods, and volunteer as humanitarians. That would be nice, but it&rsquo;s impractical.</p>
<p>But we all have side projects. Free time. What are you doing with yours?</p>
<hr>
<p>Before I&rsquo;m accused of being too hand-wavy and idealistic, I want to offer a concrete suggestion. Build things that save time. Not in the &ldquo;I&rsquo;ve made yet another to-do list app for you to download,&rdquo; kind of way, but in the &ldquo;Here&rsquo;s a one-liner to automate this mundane thing that would have taken you hours,&rdquo; kind of way. Here, have a <a href="https://victoria.dev/blog/batch-renaming-images-including-image-resolution-with-awk/">shameless plug</a>.</p>
<p>I also really like this idea from the first article I mentioned, so hang on tight while I bring this full circle:</p>
<blockquote>
<p><strong>What’s one concrete thing companies could do now to stop subverting our attention?</strong></p>
<p>I would just like to know what is the ultimate design goal of that site or that system that’s shaping my behavior or thinking. What are they really designing my experience for? Companies will say that their goal is to make the world open and connected or whatever. These are lofty marketing claims. But if you were to actually look at the dashboards that they’re designing, the high-level metrics they’re designing for, you probably wouldn’t see those things. You’d see other things, like frequency of use, time on site, this type of thing. If there was some way for the app to say, to the user, “Here’s generally what this app wants from you, from an attentional point of view,” that would be huge. It would probably be the primary way I would decide which apps I download and use.</p>
</blockquote>
<p>There are so many ways I&rsquo;d love to see this put into practice, from the obvious to the subversive. A little <code>position: sticky;</code> banner? A custom meta tag in the header? Maybe a call to action like this takes more introspection and honesty than a lot of app makers are ready for&hellip; but maybe it just takes a little of our time.</p>
Batch renaming images, including image resolution, with awkhttps://victoria.dev/blog/batch-renaming-images-including-image-resolution-with-awk/
Mon, 20 Nov 2017 13:59:30 -0500hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/batch-renaming-images-including-image-resolution-with-awk/How to batch rename images with custom values using file, awk, and rename - in rainbow colors!
<p>The most recent item on my list of &ldquo;Geeky things I did that made me feel pretty awesome&rdquo; is an hour&rsquo;s adventure that culminated in this code:</p>
<div class="highlight"><pre class="chroma"><code class="language-bash" data-lang="bash">$ file IMG* <span class="p">|</span> awk <span class="s1">&#39;BEGIN{a=0} {print substr($1, 1, length($1)-5),a++&#34;_&#34;substr($8,1, length($8)-1)}&#39;</span> <span class="p">|</span> <span class="k">while</span> <span class="nb">read</span> fn fr<span class="p">;</span> <span class="k">do</span> <span class="nb">echo</span> <span class="k">$(</span>rename -v <span class="s2">&#34;s/</span><span class="nv">$fn</span><span class="s2">/img_</span><span class="nv">$fr</span><span class="s2">/g&#34;</span> *<span class="k">)</span><span class="p">;</span> <span class="k">done</span>
IMG_20170808_172653_425.jpg renamed as img_0_4032x3024.jpg
IMG_20170808_173020_267.jpg renamed as img_1_3024x3506.jpg
IMG_20170808_173130_616.jpg renamed as img_2_3024x3779.jpg
IMG_20170808_173221_425.jpg renamed as img_3_3024x3780.jpg
IMG_20170808_173417_059.jpg renamed as img_4_2956x2980.jpg
IMG_20170808_173450_971.jpg renamed as img_5_3024x3024.jpg
IMG_20170808_173536_034.jpg renamed as img_6_4032x3024.jpg
IMG_20170808_173602_732.jpg renamed as img_7_1617x1617.jpg
IMG_20170808_173645_339.jpg renamed as img_8_3024x3780.jpg
IMG_20170909_170146_585.jpg renamed as img_9_3036x3036.jpg
IMG_20170911_211522_543.jpg renamed as img_10_3036x3036.jpg
IMG_20170913_071608_288.jpg renamed as img_11_2760x2760.jpg
IMG_20170913_073205_522.jpg renamed as img_12_2738x2738.jpg
// ... etc etc
</code></pre></div><p>The last item on the aforementioned list is &ldquo;TODO: come up with a shorter title for this list.&rdquo;</p>
<p>I previously wrote about the power of command line tools like <a href="https://victoria.dev/blog/how-to-replace-a-string-in-a-dozen-old-blog-posts-with-one-sed-terminal-command/">sed</a>. This post expands on how to string all this magical functionality into one big, long, rainbow-coloured, viscous stream of awesome.</p>
<h2 id="rename-files">Rename files</h2>
<p>The tool that actually handles the renaming of our files is, appropriately enough, <code>rename</code>. The syntax is: <code>rename -n &quot;s/original_filename/new_filename/g&quot; *</code> where <code>-n</code> does a dry-run, and substituting <code>-v</code> would rename the files. The <code>s</code> indicates our substitution string, and <code>g</code> for &ldquo;global&rdquo; finds all occurrences of the string. The <code>*</code> matches zero or more occurrences of our search-and-replace parameters.</p>
<p>We&rsquo;ll come back to this later.</p>
<h2 id="get-file-information">Get file information</h2>
<p>When I run <code>$ file IMG_20170808_172653_425.jpg</code> in the image directory, I get this output:</p>
<div class="highlight"><pre class="chroma"><code class="language-bash" data-lang="bash">IMG_20170808_172653_425.jpg: JPEG image data, baseline, precision 8, 4032x3024, frames <span class="m">3</span>
</code></pre></div><p>Since we can get the image resolution (&ldquo;4032x3024&rdquo; above), we know that we&rsquo;ll be able to use it in our new filename.</p>
<h2 id="isolate-the-information-we-want">Isolate the information we want</h2>
<p>I love <code>awk</code> for its simplicity. It takes lines of text and makes individual bits of information available to us with built in variables that we can then refer to as column numbers denoted by <code>$1</code>, <code>$2</code>, etc. By default, <code>awk</code> splits up columns on whitespace. To take the example above:</p>
<pre><code>| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
-------------------------------------------------------------------------------------------------------------
| IMG_20170808_172653_425.jpg: | JPEG | image | data, | baseline, | precision | 8, | 4032x3024, | frames | 3 |
</code></pre><p>We can denote different values to use as a splitter with, for example, <code>-F','</code> if we wanted to use commas as the column divisions. For our current project, spaces are fine.</p>
<p>There are a couple issues we need to solve before we can plug the information into our new filenames. Column <code>$1</code> has the original filename we want, but there&rsquo;s an extra &ldquo;:&rdquo; character on the end. We don&rsquo;t need the &ldquo;.jpg&rdquo; either. Column <code>$8</code> has an extra &ldquo;,&rdquo; that we don&rsquo;t want as well. To get just to information we need, we&rsquo;ll take a substring of the column with <code>substr()</code>:</p>
<p><code>substr($1, 1, length($1)-5)</code> - This gives us the file name from the beginning of the string to the end of the string, minus 5 characters (&ldquo;length minus 5&rdquo;).
<code>substr($8,1, length($8)-1)</code> - This gives us the image size, without the extra comma (&ldquo;length minus 1&rdquo;).</p>
<h2 id="avoid-duplicate-file-names">Avoid duplicate file names</h2>
<p>To ensure that two images with the same resolutions don&rsquo;t create identical, competing file names, we&rsquo;ll append a unique incrementing number to the filename.</p>
<p><code>BEGIN{a=0}</code> - Using <code>BEGIN</code> tells <code>awk</code> to run the following code only once, at the (drumroll) beginning. Here, we&rsquo;re declaring the variable <code>a</code> to be <code>0</code>.
<code>a++</code> - Later in our code, at the appropriate spot for our file name, we call <code>a</code> and increment it.</p>
<p>When <code>awk</code> prints a string, it concatenates everything that isn&rsquo;t separated by a comma. <code>{print a b c}</code> would create &ldquo;abc&rdquo; and <code>{print a,b,c}</code> would create &ldquo;a b c&rdquo;, for example.</p>
<p>We can add additional characters to our file name, such as an underscore, by inserting it in quotations: <code>&quot;_&quot;</code>.</p>
<h2 id="string-it-all-together">String it all together</h2>
<p>To feed the output of one command into another command, we use &ldquo;pipe,&rdquo; written as <code>|</code>.</p>
<p>If we only used pipe in this instance, all our data from <code>file</code> and <code>awk</code> would get fed into <code>rename</code> all at once, making for one very, very long and probably non-compiling file name. To run the <code>rename</code> command line by line, we can use <code>while</code> and <code>read</code>. Similarly to <code>awk</code>, <code>read</code> takes input and splits it into variables we can assign and use. In our code, it takes the first bit of output from <code>awk</code> (the original file name) and assigns that the variable name <code>$fn</code>. It takes the second output (our incrementing number and the image resolution) and assigns that to <code>$fr</code>. The variable names are arbitrary; you can call them whatever you want.</p>
<p>To run our <code>rename</code> commands as if we&rsquo;d manually entered them in the terminal one by one, we can use <code>echo $(some command)</code>. Finally, <code>done</code> ends our <code>while</code> loop.</p>
<h2 id="bonus-round-rainbow-output">Bonus round: rainbow output!</h2>
<p>I wasn&rsquo;t kidding with that <a href="https://github.com/tehmaze/lolcat">&ldquo;rainbow-coloured&rdquo; bit&hellip;</a></p>
<div class="highlight"><pre class="chroma"><code class="language-bash" data-lang="bash">$ pip install lolcat
</code></pre></div><p>Here&rsquo;s our full code:</p>
<div class="highlight"><pre class="chroma"><code class="language-bash" data-lang="bash">$ file IMG* <span class="p">|</span> awk <span class="s1">&#39;BEGIN{a=0} {print substr($1, 1, length($1)-5),a++&#34;_&#34;substr($8,1, length($8)-1)}&#39;</span> <span class="p">|</span> <span class="k">while</span> <span class="nb">read</span> fn fs<span class="p">;</span> <span class="k">do</span> <span class="nb">echo</span> <span class="k">$(</span>rename -v <span class="s2">&#34;s/</span><span class="nv">$fn</span><span class="s2">/img_</span><span class="nv">$fs</span><span class="s2">/g&#34;</span> *<span class="k">)</span><span class="p">;</span> <span class="k">done</span> <span class="p">|</span> lolcat
</code></pre></div><p>Enjoy!</p>
How to code a satellite algorithm and cook paella from scratchhttps://victoria.dev/blog/how-to-code-a-satellite-algorithm-and-cook-paella-from-scratch/
Fri, 08 Sep 2017 16:50:24 -0400hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/how-to-code-a-satellite-algorithm-and-cook-paella-from-scratch/A guide to expertly tackling seemingly complicated problems that you'd rather never tackle in the first place.
]]>
<p>What if I told you that by the end of this article, you&rsquo;ll be able to calculate the orbital period of satellites around Earth using their average altitudes and&hellip; You tuned out already, didn&rsquo;t you?</p>
<p>Okay, how about this: I&rsquo;m going to teach you how to make paella!</p>
<p><em>And</em> you&rsquo;ll have written a function that does <em>the stuff I mentioned above</em>, just like I did for this <a href="https://www.freecodecamp.org/challenges/map-the-debris">freeCodeCamp challenge</a>.</p>
<p>I promise there&rsquo;s an overarching moral lesson that will benefit you every day for the rest of your life. Or at least, feed you for one night. Let&rsquo;s get started.</p>
<h2 id="the-only-thing-i-know-about-paella-is-that-its-an-emoticon">The only thing I know about paella is that it&rsquo;s an emoticon</h2>
<figure>
<img src="solve-unicode-paella.jpg"
alt="Unicode paella emoji."/> <figcaption>
<p>Unless you&rsquo;re reading this on a Samsung phone, in which case you&rsquo;re looking at a Korean hotpot.</p>
</figcaption>
</figure>
<p>One of my favorite things about living in the world today is that it&rsquo;s <em>totally fine</em> to know next-to-nothing about something. A hundred years ago you might have gone your whole life not knowing anything more about paella other than that it&rsquo;s an emoticon.* But today? You can simply <a href="https://en.wikipedia.org/wiki/Paella">look it up</a>.</p>
<p>*That was a joke.</p>
<p>As with all things in life, when we are unsure, we turn to the internet - in this case, the entry for <em>paella</em> on Wikipedia, which reads:</p>
<blockquote>
<p>Paella &hellip;is a Valencian rice dish. Paella has ancient roots, but its modern form originated in the mid-19th century near the Albufera lagoon on the east coast of Spain adjacent to the city of Valencia. Many non-Spaniards view paella as Spain&rsquo;s national dish, but most Spaniards consider it to be a regional Valencian dish. Valencians, in turn, regard paella as one of their identifying symbols.</p>
</blockquote>
<blockquote>
<p>Types of paella include Valencian paella, vegetarian/vegan paella (Spanish: paella de verduras), seafood paella (Spanish: paella de marisco), and mixed paella (Spanish: paella mixta), among many others. (<a href="https://en.wikipedia.org/wiki/Paella">Wikipedia</a>)</p>
</blockquote>
<p>At this point, you&rsquo;re probably full of questions. Do I need to talk to a Valencian? Should I take an online course on the history of Spain? What type of paella should I try to make? What is the common opinion of modern chefs when it comes to paella types?</p>
<p>If you set out with the intention of answering all these questions, one thing is certain: you&rsquo;ll never end up actually making paella. You&rsquo;ll spend hours upon hours typing questions into search engines and years later wake up with a Masters in Valencian Cuisine.</p>
<h2 id="the-most-important-question-method">The &ldquo;Most Important Question&rdquo; method</h2>
<p>When I talk to myself out loud in public (doesn&rsquo;t everyone?) I refer to this as &ldquo;MIQ&rdquo; (rhymes with &ldquo;Nick&rdquo;). I also imagine MIQ to be a rather crunchy and quite adorable anthropomorphized tortilla chip. Couldn&rsquo;t tell you why.</p>
<p><img src="solve-miq.png#center" alt="MIQ the chip."></p>
<p>MIQ swings his crunchy triangular body around to point me in the right direction, and the right direction always takes the form of the most important question that you need to ask yourself at any stage of problem solving. The first most important question is always this:</p>
<p><strong>What is the scope of the objective I want to achieve?</strong></p>
<p>Well, you want to make paella.</p>
<p>The next MIQ then becomes: how much do I actually need to know about paella in order to start making it?</p>
<p>You&rsquo;ve heard this advice before: any big problem can be broken down into multiple, but more manageable, bite-size problems. In this little constellation of bite-size problems, there&rsquo;s only <em>one</em> that you need to solve in order to get <em>most of the way</em> to a complete solution.</p>
<p>In the case of making paella, we need a recipe. That&rsquo;s a bite-size problem that a search engine can solve for us:</p>
<blockquote>
<p><strong>Simple Paella Recipe</strong></p>
<ol>
<li>In a medium bowl, mix together 2 tablespoons olive oil, paprika, oregano, and salt and pepper. Stir in chicken pieces to coat. Cover, and refrigerate.</li>
<li>Heat 2 tablespoons olive oil in a large skillet or paella pan over medium heat. Stir in garlic, red pepper flakes, and rice. Cook, stirring, to coat rice with oil, about 3 minutes. Stir in saffron threads, bay leaf, parsley, chicken stock, and lemon zest. Bring to a boil, cover, and reduce heat to medium low. Simmer 20 minutes.</li>
<li>Meanwhile, heat 2 tablespoons olive oil in a separate skillet over medium heat. Stir in marinated chicken and onion; cook 5 minutes. Stir in bell pepper and sausage; cook 5 minutes. Stir in shrimp; cook, turning the shrimp, until both sides are pink.</li>
<li>Spread rice mixture onto a serving tray. Top with meat and seafood mixture. (<a href="http://allrecipes.com/recipe/84137/easy-paella/">allrecipes.com</a>)</li>
</ol>
</blockquote>
<p>And <em>voila</em>! Believe it or not, we&rsquo;re <em>most of the way</em> there already.</p>
<p>Having a set of step-by-step instructions that are easy to understand is really most of the work done. All that&rsquo;s left is to go through the motions of gathering the ingredients and then making paella. From this point on, your MIQs may become fewer and far between, and they may slowly decrease in importance in relation to the overall problem. (Where do I buy paprika? How do I know when sausage is cooked? How do I set the timer on my phone for 20 minutes? How do I stop thinking about this delicious smell? Which Instagram filter best captures the ecstacy of this paella right now?)</p>
<figure>
<img src="solve-insta-paella.jpg"
alt="The answer to that last one is Nashville"/> <figcaption>
<p>The answer to that last one is Nashville</p>
</figcaption>
</figure>
<h2 id="i-still-know-nothing-about-calculating-the-orbital-periods-of-satellites">I still know nothing about calculating the orbital periods of satellites</h2>
<p>Okay. Let&rsquo;s examine the problem:</p>
<blockquote>
<p>Return a new array that transforms the element&rsquo;s average altitude into their orbital periods.</p>
</blockquote>
<blockquote>
<p>The array will contain objects in the format {name: &lsquo;name&rsquo;, avgAlt: avgAlt}.</p>
</blockquote>
<blockquote>
<p>You can read about orbital periods on wikipedia.</p>
</blockquote>
<blockquote>
<p>The values should be rounded to the nearest whole number. The body being orbited is Earth.</p>
</blockquote>
<blockquote>
<p>The radius of the earth is 6367.4447 kilometers, and the GM value of earth is 398600.4418 km3s-2.</p>
</blockquote>
<blockquote>
<p><code>orbitalPeriod([{name : &quot;sputnik&quot;, avgAlt : 35873.5553}])</code> should return <code>[{name: &quot;sputnik&quot;, orbitalPeriod: 86400}].</code></p>
</blockquote>
<p>Well, as it turns out, in order to calculate the orbital period of satellites, we also need a recipe. Amazing, the things you can find on the internet these days.</p>
<p>Courtesy of <a href="http://www.dummies.com/education/science/physics/how-to-calculate-the-period-and-orbiting-radius-of-a-geosynchronous-satellite/">dummies.com</a> (yup! #noshame), here&rsquo;s our recipe:</p>
<figure>
<img src="solve-orbital-period.png"
alt="Orbital period formula"/> <figcaption>
<p>It&rsquo;s kind of cute, in a way.</p>
</figcaption>
</figure>
<p>That might look pretty complicated, but as we&rsquo;ve already seen, we just need to answer the next MIQ: how much do I actually need to know about this formula in order to start using it?</p>
<p>In the case of this challenge, not too much. We&rsquo;re already given <code>earthRadius</code>, and <code>avgAlt</code> is part of our arguments object. Together, they form the radius, <em>r</em>. With a couple search queries and some mental time-travel to your elementary math class, we can describe this formula in a smattering of English:</p>
<p><strong><em>T</em>, the orbital period, equals 2 multiplied by Pi, in turn multiplied by the square root of the radius, <em>r</em> cubed, divided by the gravitational mass, <em>GM</em>.</strong></p>
<p>JavaScript has a <code>Math.PI</code> property, as well as <code>Math.sqrt()</code> function and <code>Math.pow()</code> function. Using those combined with simple calculation, we can represent this equation in a single line assigned to a variable:</p>
<div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="kd">var</span> <span class="nx">orbitalPeriod</span> <span class="o">=</span> <span class="mi">2</span> <span class="o">*</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">PI</span> <span class="o">*</span> <span class="p">(</span><span class="nb">Math</span><span class="p">.</span><span class="nx">sqrt</span><span class="p">(</span><span class="nb">Math</span><span class="p">.</span><span class="nx">pow</span><span class="p">((</span><span class="nx">earthRadius</span> <span class="o">+</span> <span class="nx">avgAlt</span><span class="p">),</span> <span class="mi">3</span><span class="p">)</span> <span class="o">/</span> <span class="nx">GM</span><span class="p">));</span>
</code></pre></div><p>From the inside out:</p>
<ol>
<li>Add <code>earthRadius</code> and <code>avgAlt</code></li>
<li>Cube the result of step 1</li>
<li>Divide the result of step 2 by GM</li>
<li>Take the square root of the result of step 3</li>
<li>Multiply 2 times Pi times the result of step 4</li>
<li>Assign the returned value to <code>orbitalPeriod</code></li>
</ol>
<p>Believe it or not, we&rsquo;re already most of the way there.</p>
<p>The next MIQ for this challenge is to take the arguments object, extract the information we need, and return the result of our equation in the required format. There are a multitude of ways to do this, but I&rsquo;m happy with a straightforward <code>for</code> loop:</p>
<div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="kd">function</span> <span class="nx">orbitalPeriod</span><span class="p">(</span><span class="nx">arr</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">resultArr</span> <span class="o">=</span> <span class="p">[];</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">teapot</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">teapot</span> <span class="o">&lt;</span> <span class="nx">arguments</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nx">length</span><span class="p">;</span> <span class="nx">teapot</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">GM</span> <span class="o">=</span> <span class="mf">398600.4418</span><span class="p">;</span>
<span class="kd">var</span> <span class="nx">earthRadius</span> <span class="o">=</span> <span class="mf">6367.4447</span><span class="p">;</span>
<span class="kd">var</span> <span class="nx">avgAlt</span> <span class="o">=</span> <span class="nx">arguments</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="nx">teapot</span><span class="p">][</span><span class="s1">&#39;avgAlt&#39;</span><span class="p">];</span>
<span class="kd">var</span> <span class="nx">name</span> <span class="o">=</span> <span class="nx">arguments</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="nx">teapot</span><span class="p">][</span><span class="s1">&#39;name&#39;</span><span class="p">];</span>
<span class="kd">var</span> <span class="nx">orbitalPeriod</span> <span class="o">=</span> <span class="mi">2</span> <span class="o">*</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">PI</span> <span class="o">*</span> <span class="p">(</span><span class="nb">Math</span><span class="p">.</span><span class="nx">sqrt</span><span class="p">(</span><span class="nb">Math</span><span class="p">.</span><span class="nx">pow</span><span class="p">((</span><span class="nx">earthRadius</span> <span class="o">+</span> <span class="nx">avgAlt</span><span class="p">),</span> <span class="mi">3</span><span class="p">)</span> <span class="o">/</span> <span class="nx">GM</span><span class="p">));</span>
<span class="kd">var</span> <span class="nx">result</span> <span class="o">=</span> <span class="p">{</span>
<span class="nx">name</span><span class="o">:</span> <span class="nx">name</span><span class="p">,</span>
<span class="nx">orbitalPeriod</span><span class="o">:</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">round</span><span class="p">(</span><span class="nx">orbitalPeriod</span><span class="p">)</span>
<span class="p">}</span>
<span class="nx">resultArr</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">result</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nx">resultArr</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div><p>If you need a refresher on iterating through arrays, have a look at my <a href="https://victoria.dev/blog/iterating-over-objects-and-arrays-frequent-errors/">article on iterating, featuring breakfast arrays</a>! (5 minutes read)</p>
<p>Don&rsquo;t look now, but you just gained the ability to calculate the orbital period of satellites. You could even do it <em>while</em> making paella, if you wanted to. Seriously. Put it on your resume.</p>
<h2 id="tldr-the-overarching-moral-lesson">Tl;dr: the overarching moral lesson</h2>
<p>Whether it&rsquo;s cooking, coding, or anything else, problems may at first seem confusing, insurmountable, or downright boring. If you&rsquo;re faced with such a challenge, just remember: they&rsquo;re a lot more digestible with a side of bite-sized MIQ chips.</p>
<p><img src="solve-miq-bowl.png#center" alt="Bowl of MIQs."></p>
Making sandwiches with closures in JavaScripthttps://victoria.dev/blog/making-sandwiches-with-closures-in-javascript/
Sun, 28 May 2017 09:16:35 +0700hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/making-sandwiches-with-closures-in-javascript/An explanation of closures and how they can emulate private methods in JavaScript.
]]>
<p>Say you&rsquo;re having a little coding get-together, and you need some sandwiches. You happen to know that everyone prefers a different type of sandwich, like chicken, ham, or peanut butter and mayo. You could make all these sandwiches yourself, but that would be tedious and boring.</p>
<p>Luckily, you know of a nearby sandwich shop that delivers. They have the ability and ingredients to make any kind of sandwich in the world, and all you have to do is order through their app.</p>
<p>The sandwich shop looks like this:</p>
<div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="kd">function</span> <span class="nx">makeMeASandwich</span><span class="p">(</span><span class="nx">x</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">ingredients</span> <span class="o">=</span> <span class="nx">x</span><span class="p">.</span><span class="nx">join</span><span class="p">(</span><span class="s1">&#39; &#39;</span><span class="p">);</span>
<span class="k">return</span> <span class="kd">function</span> <span class="nx">barry</span><span class="p">()</span> <span class="p">{</span>
<span class="k">return</span> <span class="nx">ingredients</span><span class="p">.</span><span class="nx">concat</span><span class="p">(</span><span class="s1">&#39; sandwich&#39;</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div><p>Notice that we have an outer function, <code>makeMeASandwich()</code> that takes an argument, <code>x</code>. This outer function has the local variable <code>ingredients</code>, which is just <code>x</code> mushed together.</p>
<p>Barry? Who&rsquo;s Barry? He&rsquo;s the guy who works at the sandwich shop. You&rsquo;ll never talk with Barry directly, but he&rsquo;s the reason your sandwiches are made, and why they&rsquo;re so delicious. Barry takes <code>ingredients</code> and mushes them together with &quot; sandwich&rdquo;.</p>
<p>The reason Barry is able to access the <code>ingredients</code> is because they&rsquo;re in his outer scope. If you were to take Barry out of the sandwich shop, he&rsquo;d no longer be able to access them. This is an example of <em>lexical scoping</em>: &ldquo;Nested functions have access to variables declared in their outer scope.&rdquo; (<a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Closures#Lexical_scoping">MDN</a>)</p>
<p>Barry, happily at work in the sandwich shop, is an example of a closure.</p>
<blockquote>
<p><strong>Closures</strong> are functions that refer to independent (free) variables (variables that are used locally, but defined in an enclosing scope). In other words, these functions &lsquo;remember&rsquo; the environment in which they were created. (<a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Closures">MDN</a>)</p>
</blockquote>
<p>When you order, the app submits your sandwich request like so:</p>
<div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="kd">var</span> <span class="nx">pbm</span> <span class="o">=</span> <span class="nx">makeMeASandwich</span><span class="p">([</span><span class="s1">&#39;peanut butter&#39;</span><span class="p">,</span> <span class="s1">&#39;mayo&#39;</span><span class="p">]);</span>
<span class="nx">pbm</span><span class="p">();</span>
</code></pre></div><p>And in thirty-minutes-or-it&rsquo;s-free, you get: <code>peanut butter mayo sandwich</code>.</p>
<p>The nice thing about the sandwich shop app is that it remembers the sandwiches you&rsquo;ve ordered before. Your peanut butter and mayo sandwich is now available to you as <code>pbm()</code> for you to order anytime. It&rsquo;s pretty convenient since, each time you order, there&rsquo;s no need to specify that the sandwich you want is the same one you got before with peanut butter and mayo and it&rsquo;s a sandwich. Using <code>pbm()</code> is much more concise.</p>
<p>Let&rsquo;s order the sandwiches you need for the party:</p>
<div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="kd">var</span> <span class="nx">pmrp</span> <span class="o">=</span> <span class="nx">makeMeASandwich</span><span class="p">([</span><span class="s1">&#39;prosciutto&#39;</span><span class="p">,</span> <span class="s1">&#39;mozzarella&#39;</span><span class="p">,</span> <span class="s1">&#39;red pepper&#39;</span><span class="p">]);</span>
<span class="kd">var</span> <span class="nx">pbt</span> <span class="o">=</span> <span class="nx">makeMeASandwich</span><span class="p">([</span><span class="s1">&#39;peanut butter&#39;</span><span class="p">,</span> <span class="s1">&#39;tuna&#39;</span><span class="p">]);</span>
<span class="kd">var</span> <span class="nx">hm</span> <span class="o">=</span> <span class="nx">makeMeASandwich</span><span class="p">([</span><span class="s1">&#39;ham&#39;</span><span class="p">]);</span>
<span class="kd">var</span> <span class="nx">pbm</span> <span class="o">=</span> <span class="nx">makeMeASandwich</span><span class="p">([</span><span class="s1">&#39;peanut butter&#39;</span><span class="p">,</span> <span class="s1">&#39;mayo&#39;</span><span class="p">]);</span>
<span class="nx">pmrp</span><span class="p">();</span>
<span class="nx">pbt</span><span class="p">();</span>
<span class="nx">hm</span><span class="p">();</span>
<span class="nx">pbm</span><span class="p">();</span>
</code></pre></div><p>Your order confirmation reads:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">prosciutto mozzarella red pepper sandwich
peanut butter tuna sandwich
ham sandwich
peanut butter mayo sandwich
</code></pre></div><p>Plot twist! The guy who wanted a ham sandwich now wants a ham <em>and cheese</em> sandwich. Luckily, the sandwich shop just released a new version of their app that will let you add cheese to any sandwich.</p>
<p>With this added feature, the sandwich shop now looks like this:</p>
<div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="kd">function</span> <span class="nx">makeMeASandwich</span><span class="p">(</span><span class="nx">x</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">ingredients</span> <span class="o">=</span> <span class="nx">x</span><span class="p">.</span><span class="nx">join</span><span class="p">(</span><span class="s1">&#39; &#39;</span><span class="p">);</span>
<span class="kd">var</span> <span class="nx">slices</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kd">function</span> <span class="nx">barry</span><span class="p">()</span> <span class="p">{</span>
<span class="k">return</span> <span class="nx">ingredients</span><span class="p">.</span><span class="nx">concat</span><span class="p">(</span><span class="s1">&#39; sandwich&#39;</span><span class="p">);</span>
<span class="p">}</span>
<span class="kd">function</span> <span class="nx">barryAddCheese</span><span class="p">()</span> <span class="p">{</span>
<span class="nx">slices</span> <span class="o">+=</span> <span class="mi">2</span><span class="p">;</span>
<span class="k">return</span> <span class="nx">ingredients</span><span class="p">.</span><span class="nx">concat</span><span class="p">(</span><span class="s1">&#39; sandwich with &#39;</span><span class="p">,</span> <span class="nx">slices</span><span class="p">,</span> <span class="s1">&#39; slices of cheese&#39;</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">return</span> <span class="p">{</span>
<span class="nx">noCheese</span><span class="o">:</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
<span class="k">return</span> <span class="nx">barry</span><span class="p">();</span>
<span class="p">},</span>
<span class="nx">addCheese</span><span class="o">:</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
<span class="k">return</span> <span class="nx">barryAddCheese</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div><p>You amend the order to look like this:</p>
<div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="nx">pmrp</span><span class="p">.</span><span class="nx">noCheese</span><span class="p">();</span>
<span class="nx">pbt</span><span class="p">.</span><span class="nx">noCheese</span><span class="p">();</span>
<span class="nx">hm</span><span class="p">.</span><span class="nx">addCheese</span><span class="p">();</span>
<span class="nx">pbm</span><span class="p">.</span><span class="nx">noCheese</span><span class="p">();</span>
</code></pre></div><p>And your order confirmation reads:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">prosciutto mozzarella red pepper sandwich
peanut butter tuna sandwich
ham sandwich with <span class="m">2</span> slices of cheese
peanut butter mayo sandwich
</code></pre></div><p>You&rsquo;ll notice that when you order a sandwich with cheese, Barry puts 2 slices of cheese on it. In this way, the sandwich shop controls how much cheese you get. You can&rsquo;t get to Barry to tell him you want more than 2 slices at a time. That&rsquo;s because your only access to the sandwich shop is through the public functions <code>noCheese</code> or <code>addCheese</code>.</p>
<p>Of course, there&rsquo;s a way to cheat the system&hellip;</p>
<div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="nx">hm</span><span class="p">.</span><span class="nx">addCheese</span><span class="p">();</span>
<span class="nx">hm</span><span class="p">.</span><span class="nx">addCheese</span><span class="p">();</span>
<span class="nx">hm</span><span class="p">.</span><span class="nx">addCheese</span><span class="p">();</span>
</code></pre></div><p>By ordering the same ham sandwich with cheese three times, you get: <code>ham sandwich with 6 slices of cheese</code>.</p>
<p>This happens because the sandwich shop app recognizes the variable <code>hm</code> as the same sandwich each time, and increases the number of cheese slices it tells Barry to add.</p>
<p>The app could prevent you from adding lots of cheese to the same sandwich, either by adding a maximum or by appending unique order numbers to the variable names&hellip; but this is our fantasy sandwich shop, and we get to pile on as much cheese as we want.</p>
<p><img src="closures-cheesestack.jpg#center" alt="All the cheese."></p>
<p>By using closures, we can have JavaScript emulate private methods found in languages like Ruby and Java. Closures are a useful way to extend the functionality of JavaScript, and also order sandwiches.</p>
Understanding Array.prototype.reduce() and recursion using apple piehttps://victoria.dev/blog/understanding-array.prototype.reduce-and-recursion-using-apple-pie/
Thu, 18 May 2017 11:40:06 +0700hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/understanding-array.prototype.reduce-and-recursion-using-apple-pie/An explanation of JavaScript reduce() method and recursive functions using delicious, attention-retaining apples.
]]>
<p>I was having trouble understanding <code>reduce()</code> and recursion in JavaScript, so I wrote this article to explain it to myself (hey, look, recursion!). I hope you find my examples both helpful and delicious.</p>
<p>Given an array with nested arrays:</p>
<div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="kd">var</span> <span class="nx">arr</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="p">[[</span><span class="mi">4</span><span class="p">]]]]</span>
</code></pre></div><p>We want to produce this:</p>
<div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="kd">var</span> <span class="nx">flat</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">]</span>
</code></pre></div><h1 id="using-for-loops-and-if-statements">Using for loops and if statements</h1>
<p>Naively, if we know the maximum number of nested arrays we&rsquo;ll encounter (there are 4 in this example), we can use <code>for</code> loops to iterate through each array item, then <code>if</code> statements to check if each item is in itself an array, and so on&hellip;</p>
<div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="kd">function</span> <span class="nx">flatten</span><span class="p">()</span> <span class="p">{</span>
<span class="kd">var</span> <span class="nx">flat</span> <span class="o">=</span> <span class="p">[];</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">i</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span> <span class="nx">i</span><span class="o">&lt;</span><span class="nx">arr</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nb">Array</span><span class="p">.</span><span class="nx">isArray</span><span class="p">(</span><span class="nx">arr</span><span class="p">[</span><span class="nx">i</span><span class="p">]))</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">ii</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span> <span class="nx">ii</span><span class="o">&lt;</span><span class="nx">arr</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">length</span><span class="p">;</span> <span class="nx">ii</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nb">Array</span><span class="p">.</span><span class="nx">isArray</span><span class="p">(</span><span class="nx">arr</span><span class="p">[</span><span class="nx">i</span><span class="p">][</span><span class="nx">ii</span><span class="p">]))</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">iii</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span> <span class="nx">iii</span><span class="o">&lt;</span><span class="nx">arr</span><span class="p">[</span><span class="nx">i</span><span class="p">][</span><span class="nx">ii</span><span class="p">].</span><span class="nx">length</span><span class="p">;</span> <span class="nx">iii</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">iiii</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span> <span class="nx">iiii</span><span class="o">&lt;</span><span class="nx">arr</span><span class="p">[</span><span class="nx">i</span><span class="p">][</span><span class="nx">ii</span><span class="p">][</span><span class="nx">iii</span><span class="p">].</span><span class="nx">length</span><span class="p">;</span> <span class="nx">iiii</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nb">Array</span><span class="p">.</span><span class="nx">isArray</span><span class="p">(</span><span class="nx">arr</span><span class="p">[</span><span class="nx">i</span><span class="p">][</span><span class="nx">ii</span><span class="p">][</span><span class="nx">iii</span><span class="p">]))</span> <span class="p">{</span>
<span class="nx">flat</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">arr</span><span class="p">[</span><span class="nx">i</span><span class="p">][</span><span class="nx">ii</span><span class="p">][</span><span class="nx">iii</span><span class="p">][</span><span class="nx">iiii</span><span class="p">]);</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="nx">flat</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">arr</span><span class="p">[</span><span class="nx">i</span><span class="p">][</span><span class="nx">ii</span><span class="p">][</span><span class="nx">iii</span><span class="p">]);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="nx">flat</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">arr</span><span class="p">[</span><span class="nx">i</span><span class="p">][</span><span class="nx">ii</span><span class="p">]);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="nx">flat</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">arr</span><span class="p">[</span><span class="nx">i</span><span class="p">]);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c1">// [1, 2, 3, 4]
</span></code></pre></div><p>&hellip;Which works, but of course looks ridiculous. Besides looking ridiculous, a) it only works if we know how many nested arrays we&rsquo;ll process, b) it&rsquo;s hard to read and harder to understand, and c) can you imagine having to debug this mess?! (Gee, I think there&rsquo;s an extra <code>i</code> somewhere.)</p>
<h1 id="using-reduce">Using reduce</h1>
<p>JavaScript has a couple methods we can use to make our code a little less ridiculous. One of these is <code>reduce()</code> and it looks like this:</p>
<div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="kd">var</span> <span class="nx">flat</span> <span class="o">=</span> <span class="nx">arr</span><span class="p">.</span><span class="nx">reduce</span><span class="p">(</span><span class="kd">function</span><span class="p">(</span><span class="nx">done</span><span class="p">,</span><span class="nx">curr</span><span class="p">){</span>
<span class="k">return</span> <span class="nx">done</span><span class="p">.</span><span class="nx">concat</span><span class="p">(</span><span class="nx">curr</span><span class="p">);</span>
<span class="p">},</span> <span class="p">[]);</span>
<span class="c1">// [ 1, 2, 3, [ [ 4 ] ] ]
</span></code></pre></div><p>It&rsquo;s a lot less code, but we haven&rsquo;t taken care of some of the nested arrays. Let&rsquo;s first walk through <code>reduce()</code> together and examine what it does to see how we&rsquo;ll correct this.</p>
<blockquote>
<p><strong>Array.prototype.reduce()</strong></p>
</blockquote>
<blockquote>
<p>The reduce() method applies a function against an accumulator and each element in the array (from left to right) to reduce it to a single value. (<a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/Reduce?v=example">MDN</a>)</p>
</blockquote>
<p>It&rsquo;s not quite as complicated as it seems. Let&rsquo;s think of <code>reduce()</code> as an out-of-work developer (AI took all the dev jobs) with an empty basket. We&rsquo;ll call him Adam. Adam&rsquo;s main function (ba-dum ching) is now to take apples from a pile, shine them up, and put them one-by-one into the basket. This basket of shiny apples is destined to become delicious apple pies. It&rsquo;s a very important job.</p>
<p><img src="recursion-apple-formula.jpg#center" alt="Pile of apples + Adam: apple pie."></p>
<!-- raw HTML omitted -->
<p>In our above example, the pile of apples is our array, <code>arr</code>. Our basket is <code>done</code>, the accumulator. The initial value of <code>done</code> is an empty array, which we see as <code>[]</code> at the end of our reduce function. The apple that our out-of-work dev is currently shining, you guessed it, is <code>curr</code>. Once Adam processes the current apple, he places it into the basket (<code>.concat()</code>). When there are no more apples in the pile, he returns the basket of polished apples to us, and then probably goes home to his cat, or something.</p>
<h1 id="using-reduce-recursively-to-address-nested-arrays">Using reduce recursively to address nested arrays</h1>
<p>So that&rsquo;s all well and good, and now we have a basket of polished apples. But we still have some nested arrays to deal with. Going back to our analogy, let&rsquo;s say that some of the apples in the pile are in boxes. Within each box there could be more apples, and/or more boxes containing smaller, cuter apples.</p>
<p><img src="recursion-nested-apples.jpg#center" alt="Box within a box within a box with apples."></p>
<!-- raw HTML omitted -->
<p>Here&rsquo;s what we want our apple-processing-function/Adam to do:</p>
<ol>
<li>If the pile of apples is a pile of apples, take an apple from the pile.</li>
<li>If the apple is an apple, polish it, put it in the basket.</li>
<li>If the apple is a box, open the box. If the box contains an apple, go to step 2.</li>
<li>If the box contains another box, open this box, and go to step 3.</li>
<li>When the pile is no more, give us the basket of shiny apples.</li>
<li>If the pile of apples is not a pile of apples, give back whatever it is.</li>
</ol>
<p>A recursive reduce function that accomplishes this is:</p>
<div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="kd">function</span> <span class="nx">flatten</span><span class="p">(</span><span class="nx">arr</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nb">Array</span><span class="p">.</span><span class="nx">isArray</span><span class="p">(</span><span class="nx">arr</span><span class="p">))</span> <span class="p">{</span>
<span class="k">return</span> <span class="nx">arr</span><span class="p">.</span><span class="nx">reduce</span><span class="p">(</span><span class="kd">function</span><span class="p">(</span><span class="nx">done</span><span class="p">,</span><span class="nx">curr</span><span class="p">){</span>
<span class="k">return</span> <span class="nx">done</span><span class="p">.</span><span class="nx">concat</span><span class="p">(</span><span class="nx">flatten</span><span class="p">(</span><span class="nx">curr</span><span class="p">));</span>
<span class="p">},</span> <span class="p">[]);</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="k">return</span> <span class="nx">arr</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c1">// [ 1, 2, 3, 4 ]
</span></code></pre></div><p>Bear with me and I&rsquo;ll explain.</p>
<blockquote>
<p><strong>Recursion</strong></p>
</blockquote>
<blockquote>
<p>An act of a function calling itself. Recursion is used to solve problems that contain smaller sub-problems. A recursive function can receive two inputs: a base case (ends recursion) or a recursive case (continues recursion). (<a href="https://developer.mozilla.org/en-US/docs/Glossary/Recursion">MDN</a>)</p>
</blockquote>
<p>If you examine our code above, you&rsquo;ll see that <code>flatten()</code> appears twice. The first time it appears, it tells Adam what to do with the pile of apples. The second time, it tells him what to do with the thing he&rsquo;s currently holding, providing instructions in the case it&rsquo;s an apple, and in the case it&rsquo;s not an apple. The thing to note is that these instructions are a <em>repeat of the original instructions we started with</em> - and that&rsquo;s recursion.</p>
<p>We&rsquo;ll break it down line-by-line for clarity:</p>
<ol>
<li><code>function flatten(arr) {</code> - we name our overall function and specify that it will take an argument, <code>arr</code>.</li>
<li><code> if (Array.isArray(arr)) {</code> - we examine the provided &ldquo;arrgument&rdquo; (I know, I&rsquo;m very funny) to determine if it is an array.</li>
<li><code> return arr.reduce(function(done,curr){</code> - if the previous line is true and the argument is an array, we want to reduce it. This is our recursive case. We&rsquo;ll apply the following function to each array item&hellip;</li>
<li><code> return done.concat(flatten(curr));</code> - an unexpected plot twist appears! The function we want to apply is the very function we&rsquo;re in. Colloquially: take it from the top.</li>
<li><code> }, []);</code> - we tell our reduce function to start with an empty accumulator (<code>done</code>), and wrap it up.</li>
<li><code> } else {</code> - this resolves our if statement at line 2. If the provided argument isn&rsquo;t an array&hellip;</li>
<li><code> return arr;</code> - return whatever the <code>arr</code> is. (Hopefully a cute apple.) This is our base case that breaks us out of recursion.</li>
<li><code> }</code> - end the else statement.</li>
<li><code>}</code> - end the overall function.</li>
</ol>
<p>And we&rsquo;re done! We&rsquo;ve gone from our 24 line, 4-layers-deep nested <code>for</code> loop solution to a much more concise, 9 line recursive reduce solution. Reduce and recursion can seem a little impenetrable at first, but they&rsquo;re valuable tools that will save you lots of future effort once you grasp them.</p>
<p>And don&rsquo;t worry about Adam, our out-of-work developer. He got so much press after being featured in this article that he opened up his very own AI-managed apple pie factory. He&rsquo;s very happy.</p>
<p><img src="recursion-adams-apples.jpg#center" alt="Adam&rsquo;s apple pie factory, &ldquo;Adam&rsquo;s Apples.""></p>
<!-- raw HTML omitted -->
Iterating over objects and arrays: frequent errorshttps://victoria.dev/blog/iterating-over-objects-and-arrays-frequent-errors/
Tue, 16 May 2017 10:46:46 +0700hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/iterating-over-objects-and-arrays-frequent-errors/A quick reference to why your code isn't working, or some errors I frequently make with iteration when the coffee wears off.
]]>
<p>Here&rsquo;s <del>some complaining</del> a quick overview of some code that has confounded me more than once. I&rsquo;m told even very experienced developers encounter these situations regularly, so if you find yourself on your third cup of coffee scratching your head over why your code is doing exactly what you told it to do (and not what you <em>want</em> it to do), maybe this post can help you.</p>
<p>The example code is JavaScript, since that&rsquo;s what I&rsquo;ve been working in lately, but I believe the concepts to be pretty universal.</p>
<h1 id="quick-reference-for-equivalent-statements">Quick reference for equivalent statements</h1>
<table>
<thead>
<tr>
<th>This&hellip;</th>
<th>&hellip;is the same as this</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>i++;</code></td>
<td><code>i = i + 1;</code></td>
</tr>
<tr>
<td><code>i--;</code></td>
<td><code>i = i - 1;</code></td>
</tr>
<tr>
<td><code>apples += 5</code></td>
<td><code>apples = apples + 5;</code></td>
</tr>
<tr>
<td><code>apples -= 5</code></td>
<td><code>apples = apples - 5;</code></td>
</tr>
<tr>
<td><code>apples *= 5</code></td>
<td><code>apples = apples * 5;</code></td>
</tr>
<tr>
<td><code>apples /= 5</code></td>
<td><code>apples = apples / 5;</code></td>
</tr>
</tbody>
</table>
<h1 id="quick-reference-for-logical-statements">Quick reference for logical statements</h1>
<table>
<thead>
<tr>
<th>This&hellip;</th>
<th>&hellip;gives this</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>3 == '3'</code></td>
<td><code>true</code> (type converted)</td>
</tr>
<tr>
<td><code>3 === '3'</code></td>
<td><code>false</code> (type matters; integer is not a string)</td>
</tr>
<tr>
<td><code>3 != '3'</code></td>
<td><code>false</code> (type converted, 3: 3)</td>
</tr>
<tr>
<td><code>3 !== '3'</code></td>
<td><code>true</code> (type matters; integer is not a string)</td>
</tr>
<tr>
<td>||</td>
<td>logical &ldquo;or&rdquo;: either side evaluated</td>
</tr>
<tr>
<td><code>&amp;&amp;</code></td>
<td>logical &ldquo;and&rdquo;: both sides evaluated</td>
</tr>
</tbody>
</table>
<h1 id="objects">Objects</h1>
<p>Given a breakfast object that looks like this:</p>
<div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="kd">var</span> <span class="nx">breakfast</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">&#39;eggs&#39;</span><span class="o">:</span> <span class="mi">2</span><span class="p">,</span>
<span class="s1">&#39;waffles&#39;</span><span class="o">:</span> <span class="mi">2</span><span class="p">,</span>
<span class="s1">&#39;fruit&#39;</span><span class="o">:</span> <span class="p">{</span>
<span class="s1">&#39;blueberries&#39;</span><span class="o">:</span> <span class="mi">5</span><span class="p">,</span>
<span class="s1">&#39;strawberries&#39;</span><span class="o">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="p">},</span>
<span class="s1">&#39;coffee&#39;</span><span class="o">:</span> <span class="mi">1</span>
<span class="p">}</span>
</code></pre></div><p>Or like this:</p>
<p><img src="cover.png#center" alt="Breakfast object."></p>
<h2 id="iterate-over-object-properties">Iterate over object properties</h2>
<p>We can iterate through each breakfast item using a for loop as follows:</p>
<div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="k">for</span> <span class="p">(</span><span class="nx">item</span> <span class="k">in</span> <span class="nx">breakfast</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">&#39;item: &#39;</span><span class="p">,</span> <span class="nx">item</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div><p>This produces:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">item: eggs
item: waffles
item: fruit
item: coffee
</code></pre></div><h2 id="get-object-property-value">Get object property value</h2>
<p>We can access the value of the property or nested properties (in this example, the number of items) like this:</p>
<div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">&#39;How many waffles? &#39;</span><span class="p">,</span> <span class="nx">breakfast</span><span class="p">[</span><span class="s1">&#39;waffles&#39;</span><span class="p">])</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">&#39;How many strawberries? &#39;</span><span class="p">,</span> <span class="nx">breakfast</span><span class="p">[</span><span class="s1">&#39;fruit&#39;</span><span class="p">][</span><span class="s1">&#39;strawberries&#39;</span><span class="p">])</span>
</code></pre></div><p>Or equivalent syntax:</p>
<div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">&#39;How many waffles? &#39;</span><span class="p">,</span> <span class="nx">breakfast</span><span class="p">.</span><span class="nx">waffles</span><span class="p">)</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">&#39;How many strawberries? &#39;</span><span class="p">,</span> <span class="nx">breakfast</span><span class="p">.</span><span class="nx">fruit</span><span class="p">.</span><span class="nx">strawberries</span><span class="p">)</span>
</code></pre></div><p>This produces:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">How many waffles? <span class="m">2</span>
How many strawberries? <span class="m">1</span>
</code></pre></div><h2 id="get-object-property-from-the-value">Get object property from the value</h2>
<p>If instead I want to access the property via the value, for example, to find out which items are served in twos, I can do so by iterating like this:</p>
<div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="k">for</span> <span class="p">(</span><span class="nx">item</span> <span class="k">in</span> <span class="nx">breakfast</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">breakfast</span><span class="p">[</span><span class="nx">item</span><span class="p">]</span> <span class="o">==</span> <span class="mi">2</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">&#39;Two of: &#39;</span><span class="p">,</span> <span class="nx">item</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div><p>Which gives us:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">Two of: eggs
Two of: waffles
</code></pre></div><h2 id="alter-nested-property-values">Alter nested property values</h2>
<p>Say I want to increase the number of fruits in breakfast, because sugar is bad for me and I like things that are bad for me. I can do that like this:</p>
<div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="kd">var</span> <span class="nx">fruits</span> <span class="o">=</span> <span class="nx">breakfast</span><span class="p">[</span><span class="s1">&#39;fruit&#39;</span><span class="p">];</span>
<span class="k">for</span> <span class="p">(</span><span class="nx">f</span> <span class="k">in</span> <span class="nx">fruits</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">fruits</span><span class="p">[</span><span class="nx">f</span><span class="p">]</span> <span class="o">+=</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">fruits</span><span class="p">);</span>
</code></pre></div><p>Which gives us:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh"><span class="o">{</span> blueberries: 6, strawberries: <span class="m">2</span> <span class="o">}</span>
</code></pre></div><h1 id="arrays">Arrays</h1>
<p>Given an array of waffles that looks like this:</p>
<div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="kd">var</span> <span class="nx">wafflesIAte</span> <span class="o">=</span> <span class="p">[</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">11</span> <span class="p">];</span>
</code></pre></div><p>Or like this:</p>
<p><img src="iteration-waffles.png" alt="Waffle array."></p>
<h2 id="iterate-through-array-items">Iterate through array items</h2>
<p>We can iterate through each item in the array using a for loop:</p>
<div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="k">for</span> <span class="p">(</span><span class="kd">var</span> <span class="nx">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o">&lt;</span> <span class="nx">wafflesIAte</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">&#39;array index: &#39;</span><span class="p">,</span> <span class="nx">i</span><span class="p">);</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">&#39;item from array: &#39;</span><span class="p">,</span> <span class="nx">wafflesIAte</span><span class="p">[</span><span class="nx">i</span><span class="p">]);</span>
<span class="p">}</span>
</code></pre></div><p>This produces:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">array index: <span class="m">0</span>
item from array: <span class="m">1</span>
array index: <span class="m">1</span>
item from array: <span class="m">3</span>
array index: <span class="m">2</span>
item from array: <span class="m">2</span>
array index: <span class="m">3</span>
item from array: <span class="m">0</span>
array index: <span class="m">4</span>
item from array: <span class="m">5</span>
array index: <span class="m">5</span>
item from array: <span class="m">2</span>
array index: <span class="m">6</span>
item from array: <span class="m">11</span>
</code></pre></div><p>Some things to remember:
<code>i</code> in the above context is a placeholder; we could substitute anything we like (<code>x</code>, <code>n</code>, <code>underpants</code>, etc). It simply denotes each instance of the iteration.</p>
<p><code>i &lt; wafflesIAte.length</code> tells our for loop to continue as long as <code>i</code> is less than the array&rsquo;s length (in this case, 7).</p>
<p><code>i++</code> is equivalent to <code>i+1</code> and means we&rsquo;re incrementing through our array by one each time. We could also use <code>i+2</code> to proceed with every other item in the array, for example.</p>
<h2 id="access-array-item-by-index">Access array item by index</h2>
<p>We can specify an item in the array using the array index, written as <code>wafflesIAte[i]</code> where <code>i</code> is any index of the array. This gives the item at that location.</p>
<p>Array index always starts with <code>0</code>, which is accessed with <code>wafflesIAte[0]</code>. Using <code>wafflesIAte[1]</code> gives us the second item in the array, which is &ldquo;3&rdquo;.</p>
<h2 id="ways-to-get-mixed-up-over-arrays">Ways to get mixed up over arrays</h2>
<p>Remember that <code>wafflesIAte.length</code> and the index of the last item in the array are different. The former is 7, the latter is <code>6</code>.</p>
<p>When incrementing <code>i</code>, remember that <code>[i+1]</code> and <code>[i]+1</code> are different:</p>
<div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">&#39;[i+1] gives next array index: &#39;</span><span class="p">,</span> <span class="nx">wafflesIAte</span><span class="p">[</span><span class="mi">0</span><span class="o">+</span><span class="mi">1</span><span class="p">]);</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">&#39;[i]+1 gives index value + 1: &#39;</span><span class="p">,</span> <span class="nx">wafflesIAte</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">+</span><span class="mi">1</span><span class="p">);</span>
</code></pre></div><p>Produces:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh"><span class="o">[</span>i+1<span class="o">]</span> gives next array index: <span class="m">3</span>
<span class="o">[</span>i<span class="o">]</span>+1 gives index value + 1: <span class="m">2</span>
</code></pre></div><h1 id="practice-makes-better">Practice makes&hellip; better.</h1>
<p>The more often you code and correct your errors, the better you&rsquo;ll remember it next time!</p>
<p>That&rsquo;s all for now. If you have a correction, best practice, or another common error for me to add, please let me know!</p>
How to replace a string in a dozen old blog posts with one sed terminal commandhttps://victoria.dev/blog/how-to-replace-a-string-in-a-dozen-old-blog-posts-with-one-sed-terminal-command/
Sat, 06 May 2017 20:04:53 +0800hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/how-to-replace-a-string-in-a-dozen-old-blog-posts-with-one-sed-terminal-command/How to use sed to update a URL in all your old blog posts with simple find and replace.
]]>
<p><em>June 1, 2018: This post was previously titled &ldquo;That time 30 seconds, StackOverflow, and sed saved me 30 minutes&rdquo; and has since been revised and updated with more examples and a couple new doodles. It&rsquo;s better now.</em></p>
<p>I&rsquo;ve had more than a few usernames, URLs, and Twitter handles over the years. Whether it was changing to something that better reflected my current interests or briefly getting caught up in the &ldquo;.io&rdquo; domain craze, there always seemed to be a great reason for an Internet presence refresh. The downside to all this fresh rebranding is that it often means needing to update a lot of links. (If you want to redirect an old blog post URL, check out <a href="https://victoria.dev/blog/moving-to-a-new-domain-without-breaking-old-links-with-aws-disqus/">this article</a> too!)</p>
<p>This week, I launched my new website and changed my Twitter username to match. I was about to spend time manually going through all my old blog posts to find and update the URLs when this very future blog post popped up on my screen like <a href="http://knowyourmeme.com/memes/clippy">Clippy</a> and shook its pixelated head disapprovingly.</p>
<p>Here&rsquo;s a worthwhile new habit for you: anytime you find yourself going &ldquo;Ughhh I have to do <em>that?</em> It&rsquo;ll take forever!&rdquo; head on over to <a href="https://duckduckgo.com/">DuckDuckGo</a> and search for &ldquo;terminal command (the thing you&rsquo;re trying to do)”.</p>
<p><img src="sed-duck.png" alt="Superhero DuckDuckGo doodle"></p>
<!-- raw HTML omitted -->
<p>Here&rsquo;s what I found to save myself a whole bunch of mindless tedium.</p>
<h1 id="update-a-string-in-dozens-of-blog-posts-using-sed">Update a string in dozens of blog posts using sed</h1>
<p>Meet your new friend <code>sed</code>. This amazingly powerful tool lives in your terminal and is available to be totally underused for things like finding and replacing strings in files. (I seem to have a habit of suggesting ways to totally underuse powerful tools, as in my exploration of how to <a href="https://victoria.dev/blog/how-i-created-custom-desktop-notifications-using-terminal-and-cron/">use cron to create desktop notifications</a>, but I digress.)</p>
<h2 id="current-directory-non-recursive">Current directory, non-recursive</h2>
<p><strong>Non-recursive</strong> means sed won&rsquo;t change files in any subdirectories of the current folder.</p>
<p>Run this command to search all the files in your current directory and replace a given string.</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">// to replace <span class="s1">&#39;foo&#39;</span> with <span class="s1">&#39;bar&#39;</span>
$ sed -i -- <span class="s1">&#39;s/foo/bar/g&#39;</span> *
</code></pre></div><p>Here&rsquo;s what each component of the command does:</p>
<p><code>-i</code> will change the original, and stands for &ldquo;in-place.&rdquo;<br>
<code>s</code> is for substitute, so we can find and replace.<br>
<code>foo</code> is the string we&rsquo;ll be taking away,<br>
<code>bar</code> is the string we&rsquo;ll use instead today.<br>
<code>g</code> as in &ldquo;global&rdquo; means &ldquo;all occurrences, please.&rdquo;<br>
<code>*</code> denotes all file types. (No more rhymes. What a tease.)</p>
<p>You can limit the operation to one file type, such as <code>txt</code>, by using:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">sed -i -- <span class="s1">&#39;s/foo/bar/g&#39;</span> *.txt
</code></pre></div><h2 id="current-directory-and-subdirectories-recursive">Current directory and subdirectories, recursive</h2>
<p>We can supplement <code>sed</code> with <code>find</code> to expand our scope to all the current folder&rsquo;s subdirectories. This will include any hidden files.</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">find . -type f -exec sed -i <span class="s1">&#39;s/foo/bar/g&#39;</span> <span class="o">{}</span> +
</code></pre></div><p>To ignore hidden files (such as <code>.git</code>) you can pass the negation modifier <code>-not -path '*/\.*'</code>.</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">find . -type f -not -path <span class="s1">&#39;*/\.*&#39;</span> -exec sed -i <span class="s1">&#39;s/foo/bar/g&#39;</span> <span class="o">{}</span> +
</code></pre></div><p>This will exclude any file that has the string <code>/.</code> in its path.</p>
<p>Alternatively, you can limit the operation to file names that end in a certain extension, like Markdown:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">find . -type f -name <span class="s2">&#34;*.md&#34;</span> -exec sed -i <span class="s1">&#39;s/foo/bar/g&#39;</span> <span class="o">{}</span> +
</code></pre></div><h2 id="working-with-urls-change-the-separator">Working with URLs: change the separator</h2>
<p>In the case of needing to update a URL, the <code>/</code> separator in your strings will need escaping. It ends up looking like this&hellip;</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">find . -type f -exec sed -i <span class="s1">&#39;s/https:\/\/www.oldurl.com\/blog/https:\/\/www.newurl.com\/blog/g&#39;</span> <span class="o">{}</span> +
</code></pre></div><p>You can avoid some confusion and mistakes by changing the separator to any non-conflicting character. The character that follows the <code>s</code> will be treated as the separator. In our case, using a <code>,</code> or <code>_</code> would do. This doesn&rsquo;t require escaping and is much more readable:</p>
<div class="highlight"><pre class="chroma"><code class="language-sh" data-lang="sh">find . -type f -exec sed -i <span class="s1">&#39;s_https://www.oldurl.com/blog_https://www.newurl.com/blog_g&#39;</span> <span class="o">{}</span> +
</code></pre></div><h1 id="maybe-endless-possibilities">(Maybe) endless possibilities!</h1>
<p>There&rsquo;s a lot more that <code>sed</code> can do. I&rsquo;ll be adding to this living post as I find more examples that are useful. For now, <a href="http://www.folkstalk.com/2012/01/sed-command-in-unix-examples.html">here are some other use cases</a> that you may find handy.</p>
Top free resources for developing coding superpowershttps://victoria.dev/blog/top-free-resources-for-developing-coding-superpowers/
Thu, 27 Apr 2017 08:14:09 +0800hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/top-free-resources-for-developing-coding-superpowers/The go-to resources from my bookmarks folder for anyone who wants to learn to code.
]]>
<p>I&rsquo;m frequently asked for my opinion on how to get started with being a freelance developer. If you&rsquo;re hoping to live the life of a remote working digital nomad, whichever career you choose, having a little coding expertise in your back pocket will be a big benefit.</p>
<p>Here&rsquo;s a quick list of resources that you should definitely look at first if you&rsquo;re hoping to gain some coding superpowers for free.</p>
<h2 id="freecodecamp-hahahugoshortcode-s0-hbhb">freeCodeCamp (<a href="https://freecodecamp.org/" target="_blank" rel="noopener noreferrer">freecodecamp.org</a>)</h2>
<p><strong>An amazingly high value curriculum that can take you from zero to full-stack.</strong> This is always my top recommendation for someone looking to test the waters and see if a development career is interesting enough to pursue. The toughest part about learning to code on your own is getting stuck and not having quick help - this is the problem that I think freeCodeCamp (fCC) solves best by allowing you to immerse yourself in a hugely supportive social community. Through their forum, you can get quick advice if you get stuck on a challenge, and even team up with someone to tackle projects in-depth. The fCC community is lively and diverse with people from all over the world, and many local chapters even host regular meetups.</p>
<h2 id="hackerrank-hahahugoshortcode-s1-hbhb">HackerRank (<a href="https://www.hackerrank.com/" target="_blank" rel="noopener noreferrer">hackerrank.com</a>)</h2>
<p><strong>Solve challenges tailored for every level of coder over a variety of relevant topics. Enter competitions and increase your chances of getting hired.</strong> I love HackerRank especially for its algorithm and statistics challenges - if you&rsquo;re hoping to get into data science, this is an area that you&rsquo;ll need to be especially sharp in. Seasoned developers return to HackerRank to hone their skills and enter competitions that can win you swag and get you noticed for jobs.</p>
<h2 id="stack-overflow-hahahugoshortcode-s2-hbhb">Stack Overflow (<a href="https://stackoverflow.com/" target="_blank" rel="noopener noreferrer">stackoverflow.com/</a>)</h2>
<p><strong>Even seasoned developers have questions.</strong> This is the top search hit that comes up when you Google that error message you thought only you were getting. If you&rsquo;re shy about asking a question you can&rsquo;t find the answer to - don&rsquo;t be! Simply asking it will be of help to the next person who comes looking for the exact same solution.</p>
<h2 id="the-odin-project-hahahugoshortcode-s3-hbhb">The Odin Project (<a href="http://www.theodinproject.com/" target="_blank" rel="noopener noreferrer">theodinproject.com/</a>)</h2>
<p><strong>A curriculum for web developers built on a collection of resources designed to take you from &ldquo;What&rsquo;s the Internet?&rdquo; to web dev hire.</strong> For those specifically interested in web development, there&rsquo;s a community here for you. The Odin Project (TOP) has plenty of tutorials and practice projects to flesh out your knowledge of web dev essentials.</p>
<h2 id="a-coding-glossary-for-kids---and-for-the-rest-of-us-too">A coding glossary for kids - and for the rest of us, too!</h2>
<p><em>March 4, 2020 update:</em> I recently received a nice email on behalf of Amelia and her class, from Ms. Lincoln. Amelia was kind enough to suggest adding the below resource for young people like her who want to learn how to program!</p>
<p>Here&rsquo;s a handy glossary and list of games and resources for kids who want to have some fun while learning how to code: <a href="https://www.smartadvocate.com/News/Blog/software-programming-and-coding-glossary-for-kids" target="_blank" rel="noopener noreferrer">Software Programming and Coding Glossary for Kids</a>. Thanks for paying it forward, Amelia, and helping others find resources like these!</p>
<h3 id="what-are-you-waiting-for">What are you waiting for?</h3>
<p>Dive in - it&rsquo;s free! Good luck on your journey to coding superpowers!</p>
<p>Have one I missed? Let me know!</p>
Things you need to know about becoming a Data Scientisthttps://victoria.dev/blog/things-you-need-to-know-about-becoming-a-data-scientist/
Fri, 31 Mar 2017 13:19:19 +0900hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/things-you-need-to-know-about-becoming-a-data-scientist/Five Data Scientists discuss a day in the life and what it takes to be a successful Data Scientist.
]]>
<p>I recently attended a panel discussion hosted by General Assembly in Singapore entitled, &ldquo;So you want to be a Data Scientist/Analyst&rdquo;. The panel featured professionals in different stages of their careers and offered a wealth of information to an audience of hopefuls, including tips on how to land a job as a data scientist, and stories debunking myths that color this field.</p>
<h1 id="the-panelists">The panelists</h1>
<p><strong>Misrab Faizullah-Khan</strong> - VP of Data Science, <em>GO_JEK</em><br>
<strong>Anthony Ta</strong> - Data Scientist, <em>Tech in Asia</em><br>
<strong>Leow Guo Jun</strong> - Data Scientist, <em>GO_JEK</em><br>
<strong>Gabriel Jiang</strong> - Data Scientist<br>
<strong>Adam Drake</strong> - Chief Data Officer, <em>Atazzo</em></p>
<p>Here&rsquo;s a rundown of the major points discussed, paraphrased for brevity.</p>
<h2 id="whats-a-day-in-the-life-like">What&rsquo;s a day-in-the-life like?</h2>
<p>We&rsquo;re mostly &ldquo;data janitors.&rdquo; A large part of working with data begins with and consists of data sanitation. Without quality data, you won&rsquo;t get accurate results. Understanding how data should be sanitized largely encompasses skills that aren&rsquo;t directly related to data analytics. To fully understand the problem you&rsquo;re hoping to solve, you need to talk with the people involved. It&rsquo;s important that everyone understands all the elements of a project, and exactly what those elements are being called. &ldquo;Sales,&rdquo; as an example, may be calculated differently depending on who you&rsquo;re talking to.</p>
<h2 id="whats-a-data-scientist-vs-data-analyst">What&rsquo;s a data &ldquo;scientist&rdquo; vs. data &ldquo;analyst&rdquo;?</h2>
<p>It largely depends on the company you work for. &ldquo;Data [insert modifier]&rdquo; is only a recent distinction for a job field that has historically been called &ldquo;Business Analytics.&rdquo; In a smaller company, as with any other position, one person may handle a variety of data-related tasks under the title of &ldquo;Data Scientist.&rdquo; In a larger company with more staff and finer grain specialization, you may have a &ldquo;Data Analyst&rdquo; that handles less technical aspects, and a &ldquo;Data Scientist&rdquo; whose work is very technical and involves quantitative learning or machine learning.</p>
<p>The field of data science/analytics is fresh enough that standard definitions for job titles really haven&rsquo;t been agreed upon yet. When considering a position, focus on the company rather than the title.</p>
<h2 id="should-i-join-a-startup-or-large-company">Should I join a startup or large company?</h2>
<p>There&rsquo;s no wrong answer. Being aware of your own working style and preferences will help guide your decision.</p>
<p>Startups generally offer more freedom and less micromanaging. This also means that you&rsquo;ll necessarily receive less guidance, and will need to be able to figure stuff out, learn, and make progress under your own power.</p>
<p>In a big company, you&rsquo;re likely to experience more structure, and be expected to follow very clearly defined pre-existing processes. Your job scope will likely be more focused than it would be at a startup. You&rsquo;ll experience less freedom in general, but also more certainty in what&rsquo;s expected of you.</p>
<p>In the end, especially at the beginning of your career, don&rsquo;t put too much stock in choosing one or the other. If you like the company, big or small, give it a try. If you&rsquo;re not happy there after a few months, then try another one. No career decision is ever permanent.</p>
<p>It&rsquo;s also worthwhile to note that even if you find a company you like the first time around, it&rsquo;s in your best interest to change companies after one or two years. The majority of the salary raises you&rsquo;ll earn in your lifetime will occur in the first ten years of your career. Say you&rsquo;re hired by Company A as a junior data scientist for two years - after two years, you&rsquo;re no longer a junior. You can now earn, say, a 30% higher salary in a data scientist position, but it&rsquo;s unlikely that Company A will give you a 30% raise after two years. At that point it&rsquo;s time to find Company B and put a few more years of experience on your resume, then probably change companies again. You don&rsquo;t earn the big bucks sticking with one company for decades - you&rsquo;ll always be the junior developer.</p>
<p><img src="datasci-offstage.jpg" alt="Talking offstage."></p>
<h2 id="what-do-you-look-for-when-hiring-a-candidate">What do you look for when hiring a candidate?</h2>
<p>Overall, the most important skills for a data science candidate are soft skills. Curiosity, tenacity, and good communication skills are vital. Persistence, especially when it comes to adapting to a quickly changing industry, is important. The most promising candidates are passionate enough about the field to be learning everything they can, even outside of their work scope. Hard skills like coding and algorithms can be taught - it&rsquo;s the soft skills that set good candidates apart.</p>
<p>Hacking skills are also vital. This doesn&rsquo;t necessarily mean you can write code. Someone who has a grasp of overall concepts, knows algorithms, and has curiosity enough to continuously learn is going to go farther than someone who can just write code. It takes creativity to build hacking skills on top of being familiar with the basic navigation points. Having the ability to come up with solutions that use available tools in new ways - that&rsquo;s hacking skill.</p>
<p>Design thinking is another important asset. Being able to understand how systems integrate on both technical and business levels is very valuable. If you&rsquo;re able to see the big picture, you&rsquo;re more likely to find different ways to accomplish the overall objective.</p>
<p>You might think that seeing buzzwords on resumes makes you look more attractive as a candidate - more often, it stands out as a red flag. Putting &ldquo;advanced machine learning&rdquo; on your CV and then demonstrating that you don&rsquo;t know basic algorithms doesn&rsquo;t look good. It&rsquo;s your projects and your interests outside of the job you&rsquo;re applying for that say the most about you. Popular topics in this industry change fast - you&rsquo;re better off having a solid grasp of basic fundamentals as well as a broad array of experience than name-dropping whatever&rsquo;s trending.</p>
<h2 id="is-there-a-future-for-humans-in-the-data-science-field-when-will-the-machines-replace-us">Is there a future for humans in the data science field? When will the machines replace us?</h2>
<p>This isn&rsquo;t a question unique to data science, and many historical examples already exist. Financial investment is a good example - where you used to have a human do calculations and make predictions, computers now do a lot of that automatically, making decisions about risk and possible payoff every day.</p>
<p>Where humans won&rsquo;t be replaced, just as in other industries that have embraced automation, is in the human element. You&rsquo;ll still need people to handle communication, be creative, be curious, make interpretations and understand problems&hellip; all those things are fundamentally human aspects of enterprise.</p>
<p>Ultimately, machines and more automation will make human work less of a grind. By automating the mundane stuff, like data sanitization for example, human minds are freed up to develop more interesting things.</p>
<h2 id="what-are-the-future-applications-for-data-driven-automation">What are the future applications for data-driven automation?</h2>
<p>Legal is a good next candidate for automation. There&rsquo;s a lot there that can be handled by programs using data to assess risk.</p>
<p>Medicine is another field ripe for advances through data. Radiologists, your days are numbered: image detection is coming for you. The whole field of diagnostics is about to drastically change.</p>
<p>A particularly interesting recent application for data science is in language translation. By looking at similarities in sentence structure and colloquial speech across different languages, we&rsquo;re able to sort similar words based on the &ldquo;space&rdquo; they occupy within the language structure.</p>
<p>Insurance - the original data science industry - already is and will continue to become very automated. With increased ability to use data to assess risk, we&rsquo;re beginning to see new creative insurance products being introduced. E-commerce companies can now buy insurance on the risk a customer will return a product - hard to do without the accessibility of data that we have today.</p>
<h2 id="how-do-i-push-data-driven-decisions-and-get-my-boss-to-agree-with-me">How do I push data-driven decisions and get my boss to agree with me?</h2>
<p>It&rsquo;s a loaded question. The bottom line is that it depends on the company&rsquo;s data culture and decision path. We&rsquo;ve experienced working for management who say, &ldquo;We&rsquo;ve already made the decisions, we just need the data to prove it.&rdquo; Obviously, that&rsquo;s a tough position to work from.</p>
<p>Generally, ask yourself, &ldquo;Am I making my boss look good?&rdquo; You might hear that and think, &ldquo;Why would I let my boss get all the credit?&rdquo; - but who cares? Let them take the credit. If you&rsquo;re producing good work, you&rsquo;re making your team look good. If you make your team look good, you&rsquo;re indispensible to your team and your boss. People who are indispensible are listened to.</p>
<h2 id="whats-your-best-advice-for-a-budding-data-scientist">What&rsquo;s your best advice for a budding data scientist?</h2>
<p>Don&rsquo;t be too keen to define yourself too quickly. If you narrow your focus too much, especially when you&rsquo;re learning, you can get stuck in a situation of having become an expert in &ldquo;Technology A, version 3&rdquo; when companies are looking to hire for experts in version 4. It happens.</p>
<p>A broad understanding of fundamentals will be far more valuable to you on the whole. Maybe you start out writing code, and decide you don&rsquo;t like it, but discover that you&rsquo;re really good at designing big picture stuff and leading teams, and you end up as a technical lead. It could even vary depending on the company you work for - so stay flexible.</p>
<p>Your best bet is to follow what you&rsquo;re passionate about, and try to understand a wide range of overall concepts. Spend the majority of your efforts learning things that are timeless, like the base technologies under hot-topic items like TensorFlow. Arm yourself with a broad understanding of the terrain, different companies, and the products that are out there.</p>
<p>If you focus on learning code specifically, learning one language well makes it easier to learn others. Make sure you understand the basics.</p>
<h3 id="tldr-it">TL;dr it:</h3>
<p><strong>Adam:</strong> Talk more and don&rsquo;t give up.<br>
<strong>Anthony:</strong> [Be] courageous, and hands-on.<br>
<strong>Gabriel:</strong> Be creative.<br>
<strong>Guo Jun:</strong> It&rsquo;s worth the pain.<br>
<strong>Misrab:</strong> Evaluate yourself and maintain a feedback loop.</p>
<p><img src="datasci-crowd.jpg" alt="The crowd at GA Singapore">
</p>
<p><a href="https://twitter.com/GA_Singapore">General Assembly</a> is one of many schools and resources available to those interested in a career in data science. I&rsquo;ve listed a few others in <a href="https://victoria.dev/blog/top-free-resources-for-developing-coding-superpowers/">this post</a> if you&rsquo;re looking for more. Good luck!</p>
How I created custom desktop notifications using terminal and cronhttps://victoria.dev/blog/how-i-created-custom-desktop-notifications-using-terminal-and-cron/
Tue, 21 Feb 2017 10:48:38 +0700hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/how-i-created-custom-desktop-notifications-using-terminal-and-cron/How you can use tools your Linux system already has to create custom desktop notifications.
<p>In my last post I talked about moving from Windows 10 to running i3 on Linux, built up from Debian Base System. Among other things, this change has taught me about the benefits of using basic tools and running a minimal, lightweight system. You can achieve a lot of functionality with just command line tools and simple utilities. One example I&rsquo;d like to illustrate in this post is setting up desktop notifications.</p>
<p>I use <a href="http://knopwob.org/dunst/">dunst</a> for desktop notifications. It&rsquo;s a simple, lightweight tool that is easy to configure, doesn&rsquo;t have many dependencies, and can be used across various distributions.</p>
<h1 id="battery-statuslow-battery-notification">Battery status/low battery notification</h1>
<p>I was looking for a simple, versatile set up to create notifications for my battery status without having to rely on separate, standalone GUI apps or services. In my search I came across a simple one-line cron task that seemed to be the perfect fit. I adapted it to my purpose and it looks like this:</p>
<pre><code class="language-conf" data-lang="conf"># m h dom mon dow command
*/5 * * * * acpi --battery | awk -F, '/Discharging/ { if (int($2) &lt; 20) print }' | xargs -ri env DISPLAY=:0 notify-send -u critical -i &quot;/usr/share/icons/Paper/16x16/status/xfce-battery-critical.png&quot; -t 3000 &quot;{}\nBattery low!&quot;
</code></pre><blockquote>
<p><em>Psst&hellip; <a href="https://crontab.guru/">here&rsquo;s a great tool</a> for formatting your crontab times.</em></p>
</blockquote>
<p>There&rsquo;s a lot going on here, so let&rsquo;s break it down:<br>
<code>*/5 * * * *</code><br>
Every five minutes, do the following.</p>
<p><code>acpi --battery</code><br>
Execute <code>acpi</code> and show battery information, which on its own returns something akin to:<br>
<code>Battery 0: Discharging, 65%, 03:01:27 remaining</code></p>
<p>Pretty straightforward so far. At any point you could input <code>acpi --battery</code> in a terminal and receive the status output. Today&rsquo;s post, however, is about receiving this information passively in a desktop notification. So, moving on:</p>
<p><code>| awk -F, '/Discharging/ { if (int($2) &lt; 20) print }'</code><br>
Pipe (<code>|</code>) the result of the previous command to <code>awk</code>. (If you don&rsquo;t know what pipe does, here&rsquo;s <a href="http://superuser.com/questions/756158/what-does-the-linux-pipe-symbol-do">an answer from superuser.com</a> that explains it pretty well, I think.) <code>awk</code> can do a lot of things, but in this case, we&rsquo;re using it to examine the status of our battery. Let&rsquo;s zoom in on the <code>awk</code> command:</p>
<p><code>awk -F, '/Discharging/ { if (int($2) &lt; 20) print }'</code><br>
Basically, we&rsquo;re saying, &ldquo;Hey, awk, look at that input you just got and try to find the word &ldquo;discharging,&rdquo; then look to see if the number after the first comma is less than 20. If so, print the whole input.&rdquo;</p>
<p><code>| xargs -ri</code><br>
Pipe the result of the previous command to <code>xargs</code>, which takes it as its input and does more stuff with it. <code>-ri</code> is equivalent to <code>-r</code> (run the next command only if it receives arguments) and <code>-i</code> (look for &ldquo;{}&rdquo; and replace it with the input). So in this example, xargs serves as our gatekeeper and messenger for the next command.</p>
<p><code>env DISPLAY=:0</code><br>
Run the following utility in the specified display, in this case, the first display of the local machine.</p>
<p><code>notify-send -u critical -i &quot;/usr/share/icons/Paper/16x16/status/xfce-battery-critical.png&quot; -t 3000 &quot;{}\nLow battery!&quot;</code><br>
Shows a desktop notification with <code>-u critical</code> (critical urgency), <code>-i</code> (the specified icon), <code>-t 3000</code> (display time/expires after 3000 milliseconds), and finally <code>{}</code> (the output of awk, replaced by xargs).</p>
<p>Not bad for a one-liner! I made a few modifications for different states of my battery. Here they all are in my crontab:</p>
<pre><code class="language-conf" data-lang="conf"># m h dom mon dow command
*/5 * * * * acpi --battery | awk -F, '/Discharging/ { if ( (int($2) &lt; 30) &amp;&amp; (int($2) &gt; 15) ) print }' | xargs -ri env DISPLAY=:0 notify-send -a &quot;Battery status&quot; -u normal -i &quot;/usr/share/icons/Paper/16x16/status/xfce-battery-low.png&quot; -t 3000 &quot;{}\nBattery low!&quot;
*/5 * * * * acpi --battery | awk -F, '/Discharging/ { if (int($2) &lt; 15) print }' | xargs -ri env DISPLAY=:0 notify-send -a &quot;Battery status&quot; -u critical -i &quot;/usr/share/icons/Paper/16x16/status/xfce-battery-critical.png&quot; -t 3000 &quot;{}\nSeriously, plug me in.&quot;
*/60 * * * * acpi --battery | awk -F, '/Discharging/ { if (int($2) &gt; 30) print }' | xargs -ri env DISPLAY=:0 notify-send -a &quot;Battery status&quot; -u normal -i &quot;/usr/share/icons/Paper/16x16/status/xfce-battery-ok.png&quot; &quot;{}&quot;
*/60 * * * * acpi --battery | awk -F, '/Charging/ { print }' | xargs -ri env DISPLAY=:0 notify-send -a &quot;Battery status&quot; -u normal -i &quot;/usr/share/icons/Paper/16x16/status/xfce-battery-ok-charging.png&quot; &quot;{}&quot;
*/60 * * * * acpi --battery | awk -F, '/Charging/ { if (int($2) == 100) print }' | xargs -ri env DISPLAY=:0 notify-send -a &quot;Battery status&quot; -u normal -i &quot;/usr/share/icons/Paper/16x16/status/xfce-battery-full-charging.png&quot; &quot;Fully charged.&quot;
</code></pre><p>By the way, you can open your crontab in the editor of your choice by accessing it as root from the <code>/var/spool/cron/crontabs/</code> directory. It&rsquo;s generally best practice however to make changes to your crontab with the command <code>crontab -e</code>.</p>
<p>You can see that each notification makes use of the <code>{}</code> placeholder that tells xargs to put its input there - except for the last one. This is interesting because in this case, we&rsquo;re only using <code>xargs -ri</code> as a kind of switch to present the notification. The actual information that was the input for xargs is not needed in the output in order to create a notification.</p>
<h1 id="additional-notifications-with-command-line-tools">Additional notifications with command line tools</h1>
<p>With cron and just a few combinations of simple command line tools, you can create interesting and useful notifications. Consider the following:</p>
<h2 id="periodically-check-your-dhcp-address">Periodically check your dhcp address</h2>
<pre><code>*/60 * * * * journalctl | awk -F: '/dhcp/ &amp;&amp; /address/ { print $5 }' | tail -1 | xargs -ri env DISPLAY=:0 notify-send -a &quot;dhcp address&quot; -u normal &quot;{}&quot;
</code></pre><p>Which does the following:<br>
<code>*/60 * * * *</code><br>
Every 60 minutes.</p>
<p><code>journalctl</code><br>
Take the contents of your system log.</p>
<p><code>| awk -F: '/dhcp/ &amp;&amp; /address/ { print $5 }'</code><br>
Find logs containing both &ldquo;dhcp&rdquo; and &ldquo;address&rdquo; and output the 5th portion as separated by &ldquo;:&rdquo; (the time field counts).</p>
<p><code>| tail -1 </code><br>
Take the last line of the output.</p>
<p><code>| xargs -ri env DISPLAY=:0 notify-send -a &quot;dhcp address&quot; -u normal &quot;{}&quot;</code><br>
Create the desktop notification including the output.</p>
<h2 id="periodically-display-the-time-and-date">Periodically display the time and date</h2>
<pre><code>*/60 * * * * timedatectl status | awk -F\n '/Local time/ { print }' | xargs -ri env DISPLAY=:0 notify-send -a &quot;Current Time&quot; -u normal &quot;{}&quot;
</code></pre><h2 id="system-log-activity">System log activity</h2>
<p>You can also search your system logs (try <code>journalctl</code>) for any number of things using awk, enabling you to get periodic notifications of virtually any logged events.</p>
<h1 id="experiment">Experiment!</h1>
<p>As with all things, you are only limited by your imagination! I hope this post has given you some idea about the endless possibilities of these simple utilities. Thanks for reading!</p>
How I ditched WordPress and set up my custom domain HTTPS site for (almost) freehttps://victoria.dev/blog/how-i-ditched-wordpress-and-set-up-my-custom-domain-https-site-for-almost-free/
Sat, 28 Jan 2017 13:16:17 +0700hello@victoria.dev (Victoria Drake)https://victoria.dev/blog/how-i-ditched-wordpress-and-set-up-my-custom-domain-https-site-for-almost-free/A guide (for the minimally tech-savvy) to setting up a website with HTTPS using Hugo, Cloudflare and GitHub Pages.
]]>
<p>I got annoyed with WordPress.com. While using the service has its pros (like https and a mobile responsive website, and being very visual and beginner-friendly) it&rsquo;s limiting. For someone who&rsquo;s comfortable enough to be tweaking CSS but who&rsquo;s not interested in creating their own theme (or paying upwards of $50 for one), I felt I wasn&rsquo;t really the type of consumer WordPress.com was suited to.</p>
<p>To start with, if you want to remove WordPress advertising and use a custom domain name, it&rsquo;s a minimum of $3 per month. If, like me, the free themes provided aren&rsquo;t just what you&rsquo;re looking for, you&rsquo;re stuck with two choices: buy a theme for $50+, or pay $8.25 per month to do <em>some</em> css customization. I don&rsquo;t know about you, but I f