Hartley Brodyhttps://blog.hartleybrody.com
Hartley is a 20-something trying to learn as much as he can while adjusting to the lifestyle of a grown-up. He works on the marketing team at HubSpot where he gets to build cool things and work with great people. He's a world-class marketer with a hacker mentality. Author of Marketing for Hackers.Thu, 19 Feb 2015 15:11:19 +0000en-UShourly1http://wordpress.org/?v=4.1.1Minimum Viable Git Best Practices for Small Teamshttps://blog.hartleybrody.com/git-small-teams/
https://blog.hartleybrody.com/git-small-teams/#commentsWed, 21 Jan 2015 01:01:58 +0000http://blog.hartleybrody.com/?p=2405When I started as the first employee at Burstworks, the cofounders and I could easily hold the information about who was working on what at any given moment in our brains.

But as we worked on new projects and the scope and size of the engineering team grew, all of our code mostly stayed organized in one central repository:

Our high-performance ad server

Data Pipeline

One-off scripts

Nightly jobs

Everything…

While we generally weren’t working on the exact same files at the same time, there was still lots of stepping on toes. Having your git push rejected was a common occurrence.

Inevitably we had issues with merge conflicts, which lead me to send this tweet from our company account:

Commits Should be Small and Frequent

You should be committing code whenever you have made a single logical change. This allows you to write concise but descriptive commit messages, which offers great context for others who might be reading through your code.

Committing small things, frequently make it much easier to handle bugs or other bad situations:

When did we push this bug? Oh I see the commit… but a bunch of other things changed too. What was being worked on here?

I need to roll back that commit but what else might break if I do that?

Here are some code smells or signs that usually mean you’re not committing frequently enough:

A 100+ line file is being committed for the first time

You’re changing more than 20 lines of a file in one commit

You’re only committing when you take breaks (i.e. lunch, end of day, etc)

You have trouble succinctly describing what has changed (see below section)

Pro tip:
Sometimes when you’re in the zone, you end up making a bunch of different logical changes to the code base without stopping to commit each one. That’s okay: git add -p to the rescue!

Running git add in patch mode (sometimes called “partial” mode) lets you stage a few lines out of a file for a commit, leaving the rest of the changes to a file unstaged.

Git will automatically show you chunks of changes from the file, and ask if you want to stage them or skip them. There’s also an option to break down the current chunk into smaller ones for really fine-grained control.

Now you can take the huge refactor you just cranked through and break it down into smaller, logical commits.

Commit Messages Should be Semantic

Every commit message should describe why the code was changed — or what a change accomplished — at an appropriate level of detail.

You shouldn’t just use words to describe what parts of the code have changed — anyone can see that from reading the diff.

If someone wonders why a line of code was created or edited, the commit message should make it clear.

Here are some code smells or signs that you’ve writing a bad commit message:

The message is less than 3 words

The message is more than 10 words

Your message is too high-level (it’s hard to be too low level)

You don’t know what changes are being committed (see above section)

Pro tip:
Did you just make a commit with a bad message like “refactor” or “business logic”?

It happens to the best of us. Just use:

git commit —amend

which gives you a vim editor to change the last commit’s message.

Use Feature Branches

If you’ll be making multiple commits that are related to each other, they belong on a separate branch. One of the nicest things about using branches is that both Github and Bitbucket support Pull Requests which allow for a discussion about a collection of commits, before they’re merged back into master.

Using branches makes the merge into master feel very concrete and important. It gives you a chance to see all of your final changes while ignoring work-in-progress commits. It also means that the master branch is always ready to deploy, with no half-ready changes mixed into the code.

Signs you should be using a branch:

You will be committing “work in progress” changes to save progress that leave your application in a broken state and shouldn’t go to production

There will take multiple logical changes to the codebase that are part of a larger project

There will be several commits in a row that all depend on each other and should be in order

Signs you don’t need a feature branch — that is, it’s okay to commit to master:

You’re making a small change that fits nicely inside one commit

Bug fixes/hot fixes for fixing typos, etc

Previous/future commits won’t affect this commit

When you’re working on a branch, make sure you run git pull origin master frequently, (at least once a day) so that your branch doesn’t get left behind, and to decrease the likelihood of merge conflicts.

Once we switched to using branches, we found that we spent a lot more time reading each others’ code. This helped us learn from each other and gave us all a chance to recognize new or potentially troubling idioms and have a discussion around them, “in the code”.

It also allowed us to catch mistakes sooner and keep problematic code from being merged into master and shipped to production.

More Git Resources

If you want to learn more, here are some of the resources I used when coming up with these tips.

I’m hoping to avoid starting a flame war between different workflow models, but if you have constructive suggestions or more git power tips, feel free to leave a comment or drop me a note on Twitter!

]]>https://blog.hartleybrody.com/git-small-teams/feed/0Lightning Fast Data Serialization in Pythonhttps://blog.hartleybrody.com/python-serialize/
https://blog.hartleybrody.com/python-serialize/#commentsTue, 09 Dec 2014 03:55:26 +0000http://blog.hartleybrody.com/?p=2406A few months ago, I got a chance to dive into some Python code that was performing slower than expected.

The code in question was taking tiny bits of data off of a queue, translating some values from strings to primary keys, and then saving the data back to another queue for another worker to process.

The translation step should have been fast. We were loading the data into memory from a MySQL database during initialization, and had organized the data structure so that the id -> string lookups were constant time.

In order to keep messages on the queue for other workers to pick up, we were translating the Python dicts into JSON objects using the standard library’s json package.

Our worker was reading the text data from the queue, deserializing it into a Python dict, changing a few values and then serializing it back into text data to save onto a new queue.

The translation steps were taking up about 40% of the total runtime.

So I set out to see if there was a faster way to serialize a Python dict.

Our Concerns

When you’re optimizing code, it’s helpful to think about what sort of gains you’re looking for.

Gaining a few percentage points faster isn’t usually too challenging

Gaining several times faster (ie 200-800%) requires more strategic thinking

Gaining orders of magnitude in speed often requires rearchitecture or starting over

Since we were already using Python’s builtin json module, we knew it’d be hard to eek out an order of magnitude improvement. But a few percentage points wasn’t going to cut it. It had to be a meaningful speedup in order to take a big chunk out of that 40% of time spent doing serialization/deserialization.

Each message of data was small — 5 keys with small values tipped the scales at a few dozen bytes each — so we weren’t worried about saturating the network card. Bandwidth and latency also weren’t a huge factor since the queue and all the workers were in the same availability zone on EC2.

I should note that all of the workers that’d be touching this data were in-house, so interoperability with common data serialization standards wasn’t a huge concern.

If the fastest way to encode data was to string it together with pipes | and backslashes , that was fine. We could update all of the workers to accommodate it.

I tried searching for as many Python data serialization libraries as I could find — as well as coming up with my own serialization schemes.

The Code

I learn a bunch about the performance of different string building functions while building my own home_brew‘d serialization process. If you have any more ideas, let me know and I’ll be sure to add them!

Note that some packages require a file handle in order to write the serialized data, while others just dumped it to a string in-memory.

The overhead of opening and closing the file was undetectable on the order of time I was examining, but I commented the file-handling code out for the packages that didn’t need it, to simulate the actual cost of using each package in production.

We switched to using ujson and saw a roughly 1/3rd overall increase in our pipeline processing speed, which was inline with our expectations from the test results.

Any packages I missed? Different ideas for home brewed serialization? Shoot me a note on Twitter.

]]>https://blog.hartleybrody.com/python-serialize/feed/0Preventing Web Scraping: Best Practices for Keeping Your Content Safehttps://blog.hartleybrody.com/prevent-scrapers/
https://blog.hartleybrody.com/prevent-scrapers/#commentsTue, 12 Aug 2014 01:17:43 +0000http://blog.hartleybrody.com/?p=2401Many content producers or site owners get understandably anxious about the thought of a web scraper culling all of their data, and wonder if there’s any technical means for stopping automated harvesting.

Unfortunately, if your website presents information in a way that a browser can access and render for the average visitor, then that same content can be scraped by a script or application.

Any content that can be viewed on a webpage can be scraped. Period.

You can try checking the headers of the requests — like User-Agent or Cookie — but those are so easily spoofed that it’s not even worth doing.

You can see if the client executes Javascript, but bots can run that as well. Any behavior that a browser makes can be copied by a determined and skilled web scraper.

But while it may be impossible to completely prevent your content from being lifted, there are still many things you can do to make the life of a web scraper difficult enough that they’ll give up or not event attempt your site at all.

Having written a book on web scraping and spent a lot of time thinking about these things, here are a few things I’ve found that a site owner can do to throw major obstacles in the way of a scraper.

Rate Limit Individual IP Addresses

If you’re receiving thousands of requests from a single computer, there’s a good chance that the person behind it is making automated requests to your site.

Blocking requests from computers that are making them too fast is usually one of the first measures sites will employ to stop web scrapers.

Keep in mind that some proxy services, VPNs, and corporate networks present all outbound traffic as coming from the same IP address, so you might inadvertently block lots of legitimate users who all happen to be connecting through the same machine.

If a scraper has enough resources, they can circumvent this sort of protection by setting up multiple machines to run their scraper on, so that only a few requests are coming from any one machine.

Alternatively, if time allows, they may just slow their scraper down so that it waits between requests and appears to be just another user clicking links every few seconds.

Require a Login for Access

HTTP is an inherently stateless protocol meaning that there’s no information preserved from one request to the next, although most HTTP clients (like browsers) will store things like session cookies.

This means that a scraper doesn’t usually need to identify itself if it is accessing a page on a public website. But if that page is protected by a login, then the scraper has to send some identifying information along with each request (the session cookie) in order to view the content, which can then be traced back to see who is doing the scraping.

This won’t stop the scraping, but will at least give you some insight into who’s performing automated access to your content.

Change Your Website’s HTML Regularly

Scrapers rely on finding patterns in a site’s HTML markup, and they then use those patterns as clues to help their scripts find the right data in your site’s HTML soup.

If your site’s markup changes frequently or is thoroughly inconsistent, then you might be able to frustrate the scraper enough that they give up.

This doesn’t mean you need a full-blown website redesign, simply changing the class and id in your HTML (and the corresponding CSS files) should be enough to break most scrapers.

Note that you might also end up driving your web designers insane as well.

Embed Information Inside Media Objects

Most web scrapers assume that they’ll simply be pulling a string of text out of an HTML file.

If the content on your website is inside an image, movie, pdf, or other non-text format, then you’ve just added another very huge step for a scraper — parsing text from a media object.

Note that this might make your site slower to load for the average user, way less accessible for blind or otherwise disabled users, and make it a pain to update content.

Use CAPTCHAs When Necessary

CAPTCHAs are specifically designed to separate humans from computers by presenting problems that humans generally find easy, but computers have a difficult time with.

While humans tend to find the problems easy, they also tend to find them extremely annoying. CAPTCHAs can be useful, but should be used sparingly.

Maybe only show a CAPTCHA if a particular client has made dozens of requests in the past few seconds.

Create “Honey Pot” Pages

Honey pots are pages that a human visitor would never visit, but a robot that’s clicking every link on a page might accidentally stumble across. Maybe the link is set to display:none in CSS or disguised to blend in with the page’s background.

Honey pots are designed more for web crawlers — that is, bots that don’t know all of the URLs they’re going to visit ahead of time, and must simply click all the links on a site to traverse its content.

Once a particular client visits a honey pot page, you can be relatively sure they’re not a human visitor, and start throttling or blocking all requests from that client.

Don’t Post the Information on Your Website

Ultimately, web scraping is just a way to automate access to a given website. If you’re fine sharing your content with anyone who visits your site, then maybe you don’t need to worry about web scrapers.

After all, Google is the largest scraper in the world and people don’t seem to mind when Google indexes their content. But if you’re worried about it “falling into the wrong hands” then maybe it shouldn’t be up there in the first place.

Any steps that you take to limit web scrapers will probably also harm the experience of the average web viewer. If you’re posting information on your website for anyone the public to view, then you probably want to allow fast and easy access to it.

This is not only convenient for your visitors, it’s great for web scrapers as well.

]]>https://blog.hartleybrody.com/prevent-scrapers/feed/0The Full Time Employee's Guide to Generating Freelance Clients on the Sidehttps://blog.hartleybrody.com/freelance-employee/
https://blog.hartleybrody.com/freelance-employee/#commentsMon, 07 Jul 2014 23:26:10 +0000http://blog.hartleybrody.com/?p=2373There are many reasons to begin freelancing while still maintaining a full time job. Whether you want to work on new projects outside the scope of your current role or make some extra money each month — or maybe you’re hoping to eventually jump into freelancing full time — getting your first few clients can be really rewarding.

But how should you get started? I’ll cover that in this article.

We’ll go over ways to build your expertise and generate demand for your time. Then we’ll talk about generating potential leads and how to convert them into paying clients. Finally, we’ll talk about some tips for your pricing conversations.

In a future article, I’ll talk about different styles of project management, easy ways to exceed your clients’ expectations, how to handle some of the legal and administrative issues you’ll face, and how to feel comfortable raising your rates. Make sure to subscribe for updates!

Personally, I’ve worked with dozens of clients over the past few years across several different business problem domains. Some of my clients are one-person operations while others are large organizations that have run Super Bowl ads.

I learned a bunch from both my successes and my mistakes along the way as I was getting started, so I figured I’d put together a guide for other people who might want to follow a similar path.

Building Credibility: Creating Demand for Your Time

No matter what sort of work you want to do, the first question is always: where can I find leads that will make good clients?

Old School Outbound
A lot of times, people think that they have to turn to “the big sites” and sift through tons of vague project descriptions and compete with the bottom-of-the-barrel prices from overseas development farms. Or else they start attending networking events hoping to strike up a conversation with the right person.

The problem is that you waste a ton of time wading through projects and having dead-end conversations. And as a full time employee, you probably have even less time than the average freelancer to do all this work of finding clients.

Pulling in clients this way means that every new client can only be won after hard-fought, hand-to-hand combat. You have to connect with them, find out about their project, show them a meaningful portfolio, convince them you’re smart and trust worthy and beat out anyone else they’re considering for the work.

Not an easy task to accomplish if you have no connection to the person and no experience or referrals to back up your work.

Building an Inbound Pipeline
Instead, you should focus on building a pipeline of inbound interest for your talents and skills. Ideally, you’d want people to already know that you’re really good at what you do. Then, you’re in a position where people are reaching out to you, hoping you can help them.

It’s no longer an uphill battle to convince the potential client you can deliver value — your reputation proceeds you. This lets you pick and choose the projects you work on.

Since it took no effort to find this client, you don’t feel like you have to work for them. This lowers the marginal costs for finding new clients and learning about their projects.

Starting From Nothing
Now you might be thinking that this only happens once you’re an established freelancer and have a large portfolio of companies spreading word-of-mouth referrals.

But the reality is that you can build up interest in your skills without having any previous clients at all.

The secret? Creating helpful content around a problem domain.

Do some inbound marketing and write some articles around the types of problem you like to solve:

Write about tips for getting started with solving that problem

Write about common mistakes people make when solving that problem

Write about power tips someone might not know about solving that problem

Collect a list of other great resources where people can learn more about solving that problem

Make some tutorials on using common technology that solves that problem

Publish that content on your own personal blog (you do have a website, right?) but also see if you can submit a guest post on other sites that operate in that problem space, and submit your content to relevant tutorial websites.

Once you have a few articles out there around that problem or topic, you start to become the most well-known “expert” on the internet around a topic.

People who are doing research or have that problem and are looking for help will see your name over and over again and will eventually want to reach out to this person who seems to know so much.

You’ll start getting inbound interest for your time and you won’t need to chase after prospective clients, convincing them you’re great. They’ll come to you, already trusting that you know what you’re talking about.

Picking a Problem Domain
You might be thinking, “Great, I’m pretty good at writing code in {{programming_language}} or using {{web_framework}}.” And while there are plenty of people out there who market themselves successfully as a “Pythonista” or “Ruby on Rails Mercenary”, I’d argue that marketing yourself around a particular technology is a bad idea.

Instead, you should think about a business problem or domain of problems. Things like:

Improving the speed of business reporting tools and analytics queries

Designing and building user interfaces

Integrating enterprise software systems from different vendors

Quickly building minimum viable web applications

Web scraping and data collection

Thinking about your current job, you’re most likely not entirely focused on working with Technology X, but on solving some class of business problems. If you were to describe your job to a non-technical person, what would you say you work on?

Those are the kinds of things you should market yourself around.

Demonstrating that you’re an expert at solving business problems, rather than simply wielding a particular technology, will make it more likely to be found by non-technical clients (ie, the vast majority of them) and make it much easier for you to ask for more money, which we’ll get to in a bit.

Generating Your First Leads

Obviously, if you’ve got people reading your articles all over the internet, you want to be able to capitalize on that and make it easy for them to get in touch if they’re looking for more help on a project.

The easiest thing do to is have a single “Contact Me” page that you can always direct people to. But don’t just put a contact form or list your email address on that page and pat yourself on the back just yet.

If someone is visiting this page, they’re probably going to be a bit anxious.

They’re about to reach out to a stranger across the internet (you) and tell that stranger about a problem they’re having and need help with. Wouldn’t you be a bit nervous?

You want this contact page to focus on alleviating their anxiety and making them feel more comfortable reaching out to you.

You should include a little blurb about who you are, where you’re located (if it matters), what sorts of projects you like to work on and a bit about the organizations you’d like to work with.

You might also consider giving people multiple options for getting in touch with you, besides just email. Some people just love doing phone calls, and while I usually try to avoid them for a first contact, I wouldn’t want to lose a big client because of that preference. I’ve also encouraged people to reach out on Twitter, and that’s led to a few great relationships as well.

Turning Leads Into Clients

If someone is interested in your expertise and decides to reach out, that’s awesome! You’ve got your first inbound interest for your time.

But you haven’t sealed the deal yet. Even if someone thinks you’re knowledgeable enough to reach out to, the next thing they have to decide is whether they think you’re a person they trust and want to do business with.

When you’re responding to inbound emails, it’s important to make sure you’re having productive conversations while also appearing professional.

Make sure that you do the following:

Write polite, well-formed emails that get to the point quickly

Ask questions so that you understand what their expectations are

Refer to previous work and clients, if applicable

Include a professional email signature

All of these ensure that the client takes your seriously, and transition you in their mind from “some expert stranger on the internet” to someone that they know and trust to do good work.

After you’ve gone through this sort of conversation a few times, you’ll get a much better sense for the type of information you need to learn about the project.

I’d recommend writing a form email or canned response that asks several pointed questions, so that you can respond quickly to any new leads.

Here are the things you’re probably trying to learn:

What is the scope of the project?

What is the deliverable, specifically?

Am I giving them code? data?

Am I hosting it for them?

Do the technologies/frameworks I use matter?

Does it need to integrate with any of their existing systems? How?

What is their timeline for getting the work done?

What is their budget for this project?

Is this a core part of their business or a side venture?

The template I use looks something like this:

Hey FIRST_NAME,

Thanks so much for reaching out! I’d be happy to learn more about the project. Do you have a proposal you could send over? Specifically, I’d like to know:

<a big list of questions, and I delete ones that aren’t relevant for this project>

Let me know if you have answers to those questions and we can see if it’ll be a good fit!

Best,
Hartley

It’s quick and to the point, and focuses mostly on their needs. I might mix up the list of questions I include depending on how much information they gave in their initial email.

Usually, if the project is sufficiently complex, you’ll have to hop on a phone call with the client to go over details. Pro tip: Always include timezones when scheduling calls or meetings with new clients. The internet is a big place, and you don’t want to miss a call and lose a client because you both assumed you were in each others’ time zone. Learn from my mistakes!

Tips for Having Pricing Conversations

Pricing conversations are hard and it’s impossible for me to present a one-size-fits-all solution. Sometimes you’ll have clients with a clear budget in mind that they won’t deviate from, other times they don’t really know what their price range is and might need you to help set their expectations.

Here are a few tips to keep in mind when you’re having those discussions:

Don’t Position Yourself as a “Coder”
When you’re first engaging with a client about working on a project, your goal is first and foremost to understand their business problem and how your expertise can (or can’t) help.

If you don’t know anything about their particular industry, ask them questions about their business — not just tech questions — in order to get the business context of their problem.

At the beginning of your conversation, try to avoid talking about or even mentioning technical topics like programming languages, frameworks, databases or hosting providers, unless the client insists on discussing it.

Instead, find out more about the potential users of the thing they’re asking you to build. What are their goals, and what are they currently doing to try and solve them? What are the main issues that caused this person to even reach out to you in the first place and what other options were they considering?

These might not all be appropriate questions in every circumstance, but your initial conversations should always strive to learn as much as you can about their problem at a business level, in order to focus the conversation on how you can help them reach their business goals.

Tell Them When They Shouldn’t Pay You
This also means that you should be willing and ready to suggest non-technical solutions to their problem, or solutions that don’t involve them paying you. There might be an API that already exists or a service they should try that’d be cheaper than paying you — tell them this.

Even though it seems like you’re walking away from work, you now have a person who trusts you to look out for their interests and help them solve their problems.

I’ve had clients that offered to pay for my time on the phone because I gave them advice on how to solve their problem for much cheaper than paying me to do the work.

You’re now their go-to “technical guy” and they’ll mention you to their friends and reach out again the next time they have a potential job. Now you’re building a network of people who trust you to help them, and those people are helping you by spreading word-of-mouth referrals. This is a great position to put yourself in if you want regular, repeat clients.

Don’t Bring up Pricing Too Early
Try to avoid nailing down a price until you’re sure you really understand their project and the business context behind it. You also want to give yourself a chance to convince them that you care about solving their problems and not just writing code.

I usually try to mention previous experience or anecdotes of things that have or haven’t worked in the past before pricing comes up, to show that I have experience solving their problem and that I’ll be able to bring that knowledge and wisdom to tackle their problem.

Avoid Per-Hour Pricing
For well-defined projects, go with an overall project price, instead of a per-hour price. Clients generally like this since they know what their bill will be up-front, and it’s also great as the freelancer to have this number in mind too.

Now, for this to work well, you need to make sure the project is well-defined in writing before you get started. And then during the project, you need to make sure you communicate early and clearly when you feel like things are outside of the original scope and will cost the client extra.

Some people like to charge per-hour since it discourages clients from adding extra fixes and changes along the way. If you have a client that’s like this, then maybe charging by the hour is a good thing. But that also incentivizes slow development.

I have found that charging a per-project price forces you to find the most efficient way to get things done, and really aligns incentives between you and your client.

Have a “Minimum Project Size”
Whenever you’re taking on a new client, you should have you “minimum” project price in mind. Never take on “just a quick favor” for $20 because that means your client is really looking for something cheap. Steer these people to “the big sites” like eLance or oDesk.

Whenever I’ve made an exception to my minimum project price, I’ve regretted it. Clients that expect to get you for less are usually very fussy and will balk at the slightest push-back you give about needing to charge more if they expand the scope or problems arise. Having a minimum filters out the yippy clients who will be a drain on your time and not compensate you properly for it.

At some point in the discussion, you should try to say the sentence “I’m probably not the cheapest person to do this work for you.” If this scares the client away, then they’re optimizing for low price over everything else and most likely would have been a bad client. But if they’re willing to pay a premium for your time, they think you’re worth it and will treat you as such.

Stick With Your Price Tag
Finally, never negotiate against yourself if they balk at your price. Sometimes you’ll offer a price and they’ll kinda drag their feet and wait for you to lower it for them.

These clients are bad at communicating and are going to keep expecting you to offer more for less throughout the relationship.

It’s not worth lowering your price to take on a bad client who’s going to demand more of your time anyways. That’s a lose-lose.

You’re doing this nights & weekends so presumably you don’t need their money. That puts you in a good position to stand by your quote.

I’ve had people get mad when I quote them a price. By being upfront about a price and sticking with it, I was able to avoid what would’ve been a terrible client, and I’m much better off that I didn’t engage with them at all.

Get Started!

As a full time employee, you won’t have a ton of time to start finding clients for freelance work, so it’s important that you build up an efficient pipeline. At a high level, you should focus on:

]]>https://blog.hartleybrody.com/freelance-employee/feed/0Moving a Static Site to S3 Before My Girlfriend Got Out of the Showerhttps://blog.hartleybrody.com/static-site-s3/
https://blog.hartleybrody.com/static-site-s3/#commentsFri, 06 Jun 2014 21:42:51 +0000http://blog.hartleybrody.com/?p=2377I’ve got an old Rackspace instance that I’ve been running a bunch of small sites on over the past 4 years. Lately it’s been causing me problems and sites will sporadically go down from time to time.

I have been meaning to move several of the static sites onto a more appropriate static-file hosting service like Amazon’s Simple Storage Service, also known as simply “S3″.

I’m on a trip in Denver with my girlfriend right now, so when I woke up to an email that one of my sites was down again, the last thing I wanted to do was waste precious vacation time doing server ops.

Fortunately, moving the static sites to s3 was so easy, I was able to get it done before my girlfriend even got out of the shower. No vacation time wasted!

0. Create an Amazon Web Services Account

I already had one so I didn’t need to do this step, but it does take a few extra minute to get signed up. I believe you still need a credit card to sign up, but if you’re doing less than 20k GET requests per month (including requests for each CSS/JS file) then hosting your site on S3 is free.

1. Create a bucket called mydomain.com

30 seconds – The bucket has to have the same name as the hostname you’ll be pointing to it. Choose a region and Amazon will automatically create a bucket URL for you like:

mydomain.com.s3-website-us-east-1.amazonaws.com/.

2. Upload your static files

30 seconds – Since my site only has a few files, I used Amazon’s interface to upload them. Note that my homepage document is called index.html but any file name could work. You’ll set that in less than a minute.

3. Make all files public

10 seconds – You’ll need to instruct Amazon to allow the files in your bucket to be downloaded by anyone on the internet, otherwise visitors will see a “403 Forbidden” error when visiting your site.

Just check the box next to each file and then go to Actions > Make Public and click “Okay” on the confirmation screen.

4. Enable static website hosting for the bucket

30 seconds – In order for the S3 to return files from your bucket when requests are made for your domain, you’ll need to turn on static website hosting for this bucket.

Click “All Buckets” at the top, and then right-click on your domain’s bucket and click “Properties”. Open the “Static Website Hosting” and click the dot to enable static website hosting.

You’ll need to enter the name of the file that you want visitors to see when they navigate to the base path like http://mydomain.com/ so that you don’t need to add /index.html to the end of all of your URLs.

5. Point your site’s DNS at your S3 bucket

2 minutes – At this point, you should be able to visit your bucket’s Amazon S3 URL and see your website. Now you just need to point your site’s current domain at your S3 bucket so that your site’s visitors hit your S3 bucket instead of your current web host.

If you’re using Route 53 — Amazon’s DNS service — you create an A record for your domain, make it an alias, and then point that alias at your site’s S3 bucket. More instructions here.

Since I use Cloudflare to manage the DNS records for the site I was moving and they support CNAME flattening, I just made a CNAME record for my domain and made its value the full S3 URL to the root of my bucket.

—

Voila, you’re done in less than 5 minutes.

By the time she’s drying her hair, your site will have 99.99% uptime, and 99.999999999% of durability (yes, 9 nines), you won’t pay anything for hosting if you’re doing less than 20k file requests a month, and you won’t have to waste any more time messing with servers.

]]>https://blog.hartleybrody.com/static-site-s3/feed/0Becoming a Cold Weather Adventurer: Notes from MIT Outing Club's Winter Schoolhttps://blog.hartleybrody.com/winter-school/
https://blog.hartleybrody.com/winter-school/#commentsSat, 15 Feb 2014 20:15:02 +0000http://blog.hartleybrody.com/?p=2344Growing up, I was always an outdoorsy person, but cold New England winters kept me cooped up inside for a big chunk of the year. Last winter, I decided to take my first winter mountaineering and ice climbing lessons to start building the skills to become a year-round adventurer.

Earlier this winter, a friend told me about MIT Outing Club’s annual Winter School in January. It was 16 hours of lectures, demonstrations and stories from trip leaders and outside speakers. The course was a great introduction for anyone looking to get outside more in the winter.

I’ve compiled some of my notes from the course here, and added my own anecdotes that I’ve picked up over the past year. While reading about this stuff is a great way to whet your appetite, some of the skills and more technical aspects should really be practiced before you go out and try using them.

Winter Layering

Regulating your body’s temperature is one of the most important parts of being outdoors in the winter. Your core temperature can vary a lot depending on whether you’re moving or stopped, and weather conditions can change quickly.

Somewhat counter-intuitively, the most important thing to think about is preventing yourself from sweating. Having moisture on your skin draws more heat away from your body, and then if you stop moving or it gets colder, that moisture chills quickly and can make you uncomfortably cold.

You ideally want to have a system of many layers that each add a bit of warmth, that way you can easily add or remove layers as your temperature and activity change.

Put on your warmest, puffy layer as soon as you stop moving, and take off layers or open up zippers and vents before you start sweating.

The ideal layering system has 3 parts:

Base Layer
You want something that will wick and pull sweat away from your body. For winter activities, this will usually be something synthetic like Patagonia’s capilene material, or something wool.

Definitely no cotton, and that goes even for underwear and t-shirts. Studies have show that you lose more heat wearing wet cotton layers than you would if you were naked. Cotton kills, and it’s especially dangerous in cold winter conditions. That also means no denim, khaki or most types of flannel.

Base layers usually consist of

Synthetic T-Shirt

Liner Gloves

Pants/Tights

Liner Socks

Insulation Layer
The goal of this layer is to maintain a pocket of trapped, warm air as a barrier against the cold. Usually this will be fleece or wool, but it could also be something warmer like down or synthetic fill materials.

You don’t need to go crazy with this layer — on my recent Mt. Washington climb with temps in the single digits and wind chills well below zero, a light fleece jacket kept me warm over a base layer and under a shell. You’ll generate a lot of heat while moving around, so make sure you have something with good venting options.

Your insulating layer will probably be

Warm Hat or Balaclava

Light Jacket or Sweater

Gloves or Mittens

Warm Pants

Wool Socks

Shell Layer
This layer is to protect you from the wind and rain. It’s usually made of a coated nylon, or else Gore-Tex or some similarly waterproof+breathable material.

If you’re traveling above tree line at high elevations, or there’s any snow or wind in the forecast, this layer will be essential in keeping you warm and cozy.

Most people already own “ski jackets” which tend to be heavier and have both shell- and insulation-like properties. But this makes it hard to layer, since it’s effectively two layers in one. You’re better off going with a light insulation layer and a rain jacket so that you have more options.

To stay protected, you should have

Snow or Ski Goggles

Rain Jacket or Shell

Outer Mittens

Windproof Pants/Bibs

Nylon Gaiters

Waterproof Boots

Another easy way to adjust the dial on your body’s core temperature is to bring a few different hats of varying weight. Since you lose a lot of your body’s heat through your head, a warm hat can really keep things toasty. But if you’re feeling a bit too steamy, taking if off for a few minutes or switching to a lighter one is a great way to dump excess heat.

It’s worth repeating that the point of a layering system is to make it easy to adjust your temperature and keep your body happy. Pay attention to how you’re feeling and be sure to shed layers before you sweat.

Winter Footwear and Traction

Your feet are what carry you into and out of the backcountry, and it’s important that they’re protected and cozy so that you don’t have a bad time.

Boots
Usually, you want your winter boots to be either all leather or else a combination of plastic outer boot and inner insulating boot.

Those “snow boots” that you see in department stores won’t cut it. Your boots need to be both well-insulated and waterproof, and you also want a very rigid sole if you’re going to be using crampons or microspikes. But the most important thing is that the boots fit well.

You probably don’t want to commit to buying a pair of boots for your first winter trip since they tend to be expensive, but you can usually rent them from a guide program or outing club, or borrow a pair from a friend. These usually have the added benefit of already being broken in for you.

Socks
Under your boots, you’ll want a good sock system. Usually people wear two layers, a light inner synthetic one, and an outer warm sock, usually wool.

Just as with the layering system on your body, you need the inner layer to wick sweat away and keep your feet dry, while the outer layer holds warmth and cushions your feet.

If you start to feel a blister coming on (called a “hotspot”) it’s important to stop immediately and take care of it before it turns into a full blown blister and ruins your trip. You should dry the area off if it’s sweaty, and put some kind of protection around the area. Some people recommend moleskin, but regular old duct tape works well too. If you think you’re at risk for getting a hotspot, put some duct tape over the area before you head out, that way the sock will rub against the tape instead of your skin.

Gaiters
Over your boots, you’ll usually want to have a pair of gaiters to protect your lower legs and keep snow and mud from getting in over the top of your boot.

These are usually waterproof and made of heavy duty nylon. If you’re wearing crampons, they’ll also protect your pants, as it’s easy to accidentally kick yourself with the spikes on a crampon while you’re walking and tear a hole in your pants. Gaiters are usually ~$40 and are much easier to replace.

Crampons
Crampons are a set of metal spikes that attach to the bottom of your boot to give you super stable traction on steep snow and ice. These are usually only necessary for serious mountain or ice climbing.

Depending on the type of boot you have, there are different types of crampon designs. They can also have different construction designs, spike patterns and strap systems depending on what sort of terrain you’ll be covering and how technical your needs are. REI has a good guide.

When you’re wearing crampons, keep in mind that they’re sharp and will cut your clothes or bags if they get snagged. You want to walk with a slightly wider gait to avoid them catching on your pants.

When you’re not wearing them, you want to make sure they’re stored inside a specially-designed bag (they usually come with one) to keep them from tearing up the other gear when they’re in your pack.

You should also beware of snow “balling up” underfoot, meaning getting caught between the spikes and effectively turning your crampons into platform heels.

If the snow is soft and sticky, you’ll need to clear it from under your feet by banging your ice axe against the side of your foot.

Other Traction Aids
If you just want to be able to hike on your favorite trail in the winter and won’t encounter anything too steep, then you’ll probably be okay with a lighter traction control like Microspikes or YakTraks.

While crampons require winter mountaineering boots with stiff sole and special notches, lighter traction control systems slip on over your normal hiking boots or sneakers.

I bought a pair of Microspikes a few winters ago and they’ve come in handy many times on trails in winter. They’re a great idea to buy if your just want to do some basic winter hiking instead of becoming trapped indoors from December through the spring.

Snowshoes
Snowshoes are great when the snow is deep — you don’t want to “posthole” or sink your boot deep down into the snow with every step.

By distributing your weight across a much wider surface, a snowshoe reduces the amount of pressure you put on the snow and allows you to “float”.

Snowshoes usually have some claws or teeth on the bottom which offer a bit of traction, but not nearly as much as crampons. When walking in snowshoes, you want to take wide, high steps so that you don’t trip yourself and the snowshoe doesn’t get caught in the snow.

If you’re with a group, it’s a good idea to have someone up front breaking the trail through the snow, with everyone else following along that trail, making it increasingly packed down. Make sure you switch leaders often since it’s much more tiring to break trail than it is to follow in someone else’s.

Backpacks and Day Trip Gear

If you’re used to hiking in the summer, then you probably already know most of what you’ll need to select a backpack for a winter outing. You want an internal frame backpack which keeps weight on your hips and the load close to your back so it’s not unwieldy.

Keep in mind that the gear you bring out in the winter is usually bigger and heavier than summer gear, so you might need a bigger bag than you’re used to.

While a 15 liter bag might work well for summer day hikes, usually you’ll want about 30 liters of storage for a winter day hike bag, or 65+ liters for winter overnights.

When packing, you want to keep your heavy stuff closer to your back and lower down in your pack, just as you would when packing for a summer trip.

You generally want to keep traction gear — like crampons and ice axes — on the outside of your bag so that the sharp spikes don’t tear up all of your other gear.

Many winter packs have dedicated pockets and attachments for shovels, crampons, ice axes, skis and haul systems, which you usually don’t see on summer packs.

Before you put on the pack, you want to loosen all of the straps. Then put it on and first secure and tighten the hip belt, then tighten the shoulder straps, and finally adjust the load lifters. Adjusting and readjusting the pack every time you put it back on will help prevent soreness and fatigue.

Inside your pack, you always want to have the “ten essentials”, as follows:

Navigation (map and compass)

Sun protection (sunglasses and sunscreen)

Insulation (extra clothing)

Illumination (headlamp or flashlight)

Basic First-aid kit

Fire (fire starting device as well as tinder or candles)

Repair kit and multitool

Nutrition (high energy snacks)

Hydration (water or sports drink)

Emergency shelter

You should always have the ten essentials with you whenever you go out, but they’re especially important in the winter, when your margin for error is much smaller.

In the summer, you could probably survive an uncomfortable night out without some of them in an emergency, but in the winter getting stranded without them could have serious consequences.

Hydration, Nutrition & “Going” in the Winter

When hiking in the winter, it’s normal for someone to burn around 5,000 calories a day — more than twice what most people normally burn.

Keeping your body fueled and hydrated is extremely important, and you have to pay extra special attention to nutrition to ensure that you don’t end up cold and miserable.

Hydration
Since the air is usually very dry in the winter, you’ll lose a lot more water than you might think. You generally want to have 2-3 liters per person for a day hike.

It’s usually a good idea to fill one water bottle with something warm — like hot chocolate or soup — in the morning when you head out. Wrap the bottle in a sock or other insulating material if it’s really cold out.

Because the water in the bottle is swishing around with every step you take, it usually won’t freeze unless temps are well below freezing. If that’s the case, you want to store your water bottles upside down, so that when the top surface of the water freezes first, it isn’t obstructing the mouthpiece and blocking in the rest of your water.

You want to have one bottle that’s easily accessible for quick stops, and another bottle or two that are buried deeper inside your pack where they’ll stay warmer and resist freezing.

Nutrition
In terms of nutrition, you should think about the following 4 nutritional aspects:

Sugars for quick energy

Complex carbs for sustained energy

Fats for sustained energy over long periods

Proteins for energy and tissue rebuilding

On the trail, dried fruits, nuts, energy bars and hard candy make great snacks. For morale, it’s often nice to stop and have a big meal for lunch. Bagels, beef jerky, chocolate, cheese and crackers are healthy, delicious and easy to pack.

Eat a lot, because no matter how much you eat, you’re probably still going to run a calorie deficit. Always bring more than you think you’ll need, since sharing your extra snacks is a great way to make friends on the trail.

Keep in mind that your snacks will most likely freeze. Apple and other fruits, soft candies and anything else with high water content will most likely become rock hard within hours.

If you don’t want to lose a tooth biting into a snickers bar (which has happened), throw your food in the freezer a few days before your trip and see what happens to it.

On my Mt. Washington trip, my clif bars froze pretty solid, so I kept one or two inside my jacket to warm them up before eating. Your body will be giving off a ton of heat that you can use to thaw or even pre-heat your snacks!

Bathroom Tips
When you’re wearing lots of layers and everything around you is frozen solid, “going” in a sanitary and environmentally friendly way gets very tricky in the winter. But that shouldn’t stop you from answering nature’s call — you should be peeing every few hours if you’re staying properly hydrated.

If you’re going to wander off from the hard packed snow on the trail, make sure you don’t sink into the soft snow as you venture off. You probably want to take off your pack and any outer mittens, but leave liner gloves on.

Depending on the pants or bib you’re wearing, zipping things off might take a few moments. Sometimes you’ll find that the shock of the cold on your privates makes you not have to go anymore — that’s normal.

While you might dig a quick cat hole to poop in over the summer, it’ll be impossible with deep snowpack and frozen ground — but if you just leave it in the snow then it’ll be totally exposed when everything melts in the summer. Gross!

With #2, you need to poop on or into something so that you can carry it out. You should have a small kit of plastic bags, disposable latex gloves and toilet paper to keep things sanitary and easy to clean up. Put the latex gloves on over your liner gloves so your hands stay clean and warm.

Also, beware of hand sanitizer in the winter — alcohol freezes at a much lower temperature than water, so your Purell may still be liquid, but it could literally be colder than ice, so you probably don’t want to dump a bunch of it on your bare hands. Keep your liner gloves on and cover them with the disposable latex gloves is the easy, warm way to keep things clean.

Winter Navigation & Avalanche Hazards

As with any trip, it’s important to have a detailed plan before you head out. You should know what your goal is, what emergency evacuation routes are nearby and have a schedule for when you should reach the major milestones along the way.

If you’re just going out for a day hike, you should have a hard “turn around time” at which point you need to abandon the goal and head home, so that you make it back before darkness falls or conditions get bad.

There should always be someone back home who knows exactly what your plan is, and when you’re scheduled to return, so that they can send help if you don’t make it back on time.

When you’re navigating in whiteout conditions or other precarious situations, it’s important you take the time to check your progress using a map and compass or GPS.

You want to avoid “relative navigation” where you make decisions about where you are based on where you’re already heading and where you think you just came from.

This causes errors to accumulate and is one of the leading causes of hikers getting lost. Stop and locate yourself on a map often, so that you know exactly where you are.

Avalanches
It’s especially important that you don’t wander aimlessly in the winter due to avalanche danger. Whenever there has been recent snowfall on top of ice or dense snow, you have the potential for an avalanche.

Avalanche conditions are affected by wind, precipitation and changes in temperature, so it’s always a good idea to check the most recent avalanche conditions for the area you’re traveling in — usually, those forecasts are updated daily.

If you are going to be somewhere where avalanches are a real concern, you should have a beacon, probe and shovel, and know how to use all three.

People who get swept up in an avalanche often only have minutes before they suffocate, so whoever is on scene immediately becomes a first responder — which might be you.

Check the snow pack throughout the day by digging out a deep cross section of snow and looking for alternating hard and soft layers which might break off or slide over each other.

Make sure you stagger your group through exposed areas — don’t all hike above each other in a line because you’d all get swept up if an avalanche is triggered.

If you are swept up in an avalanche, there’s various advice on what to do, but it’s been hard to test or prove these techniques — your goal should always be to avoid getting swept up in the first place.

If you can, some say you should drop your pack or try to use your arms in a swimming motion to stay on the surface. Since suffocation is the main concern for people who are buried, you want to cover your mouth to keep it from filling with snow, and try to keep the other arm up to help clear an airway to the surface or visually signal rescuers.

Technical Traveling

If you like hiking in more aggressive terrain, you’ll often hear people refer to the “tree line” or being “above tree line.” This is the area where wind and weather are so severe that plants larger than a small bush have a hard time growing. And because there are no tall trees to protect you, hiking above tree line is far more exposed than other types of hiking.

Above Tree Line
The appeal of traveling above tree line is that you often get great views and have a more challenging, rewarding trip. But the dangers are very real. In strong winds, it might be hard to communicate with your group, or even think.

Taking off your pack to fetch an extra layer can become a massive chore if you have 50mph+ winds whipping everything around you. I saw a guide’s metal ice axe get blown over the edge of a cliff in 80mph winds near the top of Mt. Washington.

Wind also increases the rate at which you lose heat. If you’ll be traveling above tree line, it’s important to have a windproof outer shell to help stave off the heat loss.

Frostbite and hypothermia are very real possibilities when you’re above tree line. You don’t want any exposed skin — that means face masks or balaclavas and goggles with a tight strap.

The other, more subtle danger of being above tree line is that weather conditions often change very rapidly. From the base of a mountain, you might be able to see a storm rolling in hours before it arrives, but mountain tops often have their own weather systems and strong storms can whip up with very little warning, offering little time to descend or find shelter.

Sometimes you’ll see “cairns” above tree line along the trail. These are carefully placed piles of rocks used as navigation aids.

On a clear day, you might wonder why a trail has cairns every few yards, but in whiteout conditions, even traveling a few feet from one to the next can be a dangerous navigation challenge.

Ice Axes
If you’re going to be hiking above tree line, you’ll usually want to bring an ice axe with you. There are many different kinds of ice axes depending on what terrain you’re navigating. But usually the standard mountaineering ice axe is all you need for a hike.

You can use axes for all sorts of things — cutting steps or seats into hard snow or ice, setting up belays or anchors, or even as a walking stick. But the most important use of an ice axe is for self-arresting in the event that you slip and start sliding down the side of a mountain.

The basic idea is to bury the head of the axe into the ground while rolling your weight onto it to bring yourself to a stop. Self-arresting is definitely something you want to practice a few times on easy terrain before you head somewhere where you might actually need to use it.

Be Smart About the Risks

Whenever you venture out into the backcountry, there are hazards, regardless of the season. To ensure a fun trip, you need to plan ahead for the hazards you know you’ll face — exposure, hunger, thirst — and make sure you avoid adding any additional hazards.

As the boy scouts say, “be prepared”. Know your gear and how to use it. Go through your first aid kit so you know what’s inside. Don’t just pick up an ice axe or strap on some crampons and assume you’ll “figure it out”. Take the time to learn and ask someone if you’re not sure.

It’s also really important to listen to your body. While winter trips can be cold and sometimes challenging, you should never feel uncomfortable or scared.

If something feels off — maybe you need a snack break or your boot is causing a blister — tell the rest of the people you’re with. It’s much better to stop and take care of things sooner than to press on and let issues worsen.

Finally, make sure you respect nature. Mountains are steep, winds are cold, darkness can lead to trouble. Always check the conditions before you go out — that means weather and avalanche forecasts — and make sure you’re comfortable going out in those conditions.

Stick to your agreed upon turn around time, and cut your trip short if conditions get bad. Most climbing accidents happen on the way down the mountain, usually around dusk or sunset. You want to make sure you have plenty of time to get back unless you’re planning to hike in the dark.

“Put yourself in a position to be lucky, don’t rely on luck”
– Quote from “Alpine Climbing, Techniques to Take You Higher”

I’m hugely grateful to all of the MIT Outing Club leaders who gave presentations and shared their stories. I learned a ton and feel much more comfortable heading out on winter trips now. And shout out to Matt Stein Photography for some of the sweet shots in this article.

]]>https://blog.hartleybrody.com/winter-school/feed/0Peeling Back the ORM: Demystifying Relational Databases For New Web Developershttps://blog.hartleybrody.com/databases-intro/
https://blog.hartleybrody.com/databases-intro/#commentsWed, 20 Nov 2013 03:48:50 +0000http://blog.hartleybrody.com/?p=2330Most web developers building dynamic websites interact with databases every day. Relational databases like MySQL or Postgres are usually the first tool people reach for when their application needs to store data.

But with the recent proliferation of web frameworks like Rails and Django, many web developers rely totally on Object-Relational Mappers (ORMs) for interacting with their database.

In fact, many new web developers see “writing raw SQL” or interacting directly with the database as something scary that should be avoided at all costs.

The reality is that relational databases are actually fairly easy to tame, and are built on top of lots of great ideas. Understanding the relational database that your application runs on will give you a much richer understanding of your web stack and make you a more powerful, proficient developer.

This article is a version of some notes I wrote for the new web developers who just started at Burstworks. At the end I link to the major resources I used, in case you want to learn more about this stuff.

Why Use Databases At All?

Before we dive into relational databasea, we should spend a minute thinking about why we’d want to use them at all. For example, our application could simply write JSON-encoded data to a file on disk, or use some csv-like “tables”.

The problem with these home-brewed solutions are many. For one, you don’t want to have to chase pointers around on the disk, reading and writing to files yourself. You also don’t get any “advanced” querying from these systems, and there’s really no way to optimize your lookups.

Once you have more than a few records in your application, you really need to use something more powerful, like a relational database.

Relational database management systems (RDBMS) tend to have the following characteristics:

Massive. They can store terabytes of data — usually much more than can fit into memory — using disk as storage which since it’s much cheaper.

Persistent. Data outlives a script or application’s execution, and can be accessed multiple ways at the same time

Multi User. There’s some sort of concurrency control that handles situations where multiple users read/write data at the same time

Convenient Access. There are some high-level abstractions that make accessing and manipulating data much nicer for the application developer.

Efficient. Databases can usually run thousands of simple queries per second and can optimize complex lookups so they’re fast.

Databases have also been around for a really long time, so there’s lots of information to help you if you get stuck.

Convinced databases are worth using? Next we’ll look at some of the more academic underpinnings of relational databases, and some of the “math” behind them.

Relational Databases & Your Objects

Unless you have a specific reason, the data in your application should be organized in a fairly “normalized” fashion, mapping the real world objects that your application uses into classes.

These are your models and they make up the “M” in most MVC frameworks. Most frameworks will automatically take the models you define in your code, and translate those directly into relations (ie tables).

Formally speaking, a database is a set of “relations” (or tables). Each relation has a set of “attributes” (or columns), and each attribute has a type — either something atomic (float, int, string, etc) or more structured (date, time, etc). Within the relations, each “tuple” (or row) has a value for each attribute. You can imagine it looks something like this:

The names in the quotes above are the more formal labels for each concept, but people usually refer to them using the names I included in parentheses. For the rest of the article, I’ll refer to them as tables, columns and rows.

The ORM in your favorite web framework is saving each instance of a model as a new row in that model’s table. So if you have a model called “Movies” and you’ve created 20 movies in your application, then you have a table called “movies” and it’s got 20 rows.

The attributes of your models are what make up the columns in the table. That’s why adding a new attribute to your models usually requires a database migration — the RDBMS needs to change the structure of the corresponding table in the database to accommodate the new column.

Rows also contain a unique key, which is an attribute whose value is unique to that row. Keys are sometimes defined as a set of attributes that are unique when used together.

Keys are important because they’re used to uniquely identify a row inside a table, and they’re also used to define relationships between rows in different tables.

This is why a lot of ORMs will perform searches by primary key by default — those lookups are fast (thanks to indexes, which we look at below) and guaranteed to return a single, unique result.

Relational Algebra

Relational algebra may sound scary, but it’s not really that math-heavy. If you want to understand relational databases, you should really understand the basics of relational algebra.

Relational algebra is a formal language (meaning it’s just high-level concepts) but it forms the underpinnings of implemented languages like SQL (which we’ll look at in a bit).

Here are the core topics you need to know from relational algebra:

Select (choose only certain rows)

Project (choose only certain columns)

Join (span relationships across tables)

… and a bunch of stuff from set theory (union, cross product, etc)

If you want to learn more about what each of these mean, skip down to the “resources” section at the bottom.

Because each of these operations are so formally specified, databases can look at a given query and come up with the least costly way of performing that operation.

This is called algebraic optimization and is one of the major reasons you should use relational databases if you want to run complex queries.

Just like ((2*z) + (3*z) + 0) / 1 can be optimized to simply z * (2 + 3) — making it two operations instead of five — relational databases look at the query you’re giving it and figure out the most efficient way to run it without having to do things like scan the entire table multiple times.

Other database systems (like map-reduce) leave those optimizations up to the programmer.

Another important concept of relational algebra is closure. Specifically, every operation that applies to a table, also returns a table. This important property allows for operations to be chained together and queries to be nested inside each other.

There’s “closure” in the sense that you can’t escape the system by performing operations. Just like doing mathematical operations on numbers always returns other number, and not, say, letters. Running an operation on a table will always return another table.

Structured Query Language (SQL)

In a more hands-on sense, the language you actually use to talk to a relational database is some form of SQL, which stands for Structured Query Language.

SQL is a very standardized query language which has been around since the early 1970s, making it one of the oldest technologies that modern web developers still use on a daily basis.

The official standard is thousands of pages long, but you can get started pretty simply. There are tons of great tutorials and references around the internet, so I won’t cover the details here. But the most commonly used SQL operations provide exactly what you need to build a CRUD app:

Create: INSERT ... INTO table;

Read: SELECT ... FROM table;

Update: UPDATE table SET ...;

Destroy: DELETE FROM table WHERE ...;

SQL is a declarative language — as we’ve seen, you leave the how up to the database and simply tell it what you want.

By contrast, you’re probably used to building your applications using an imperative programming language, where you describe a sequence of commands you want the computer to take.

It takes a bit of getting used to at first, but it’s a really nice abstraction. You don’t have to describe how you want the RDBMS to scan the table and perform each operation. You just tell it what you want and let algebraic optimization take care of finding the most efficient way to get it done.

Indexes

Indexes are an incredibly powerful feature of relational databases. In fact, using indexes properly can often improve the performance of your queries by an order of magnitude.

Indexes allow you to “ask questions” (ie find rows) without having to scan the entire table on disk. They’re really useful for answering “needle in a haystack” types of queries, where you’d otherwise have to look through many rows to pick out only the few that you need.

An index is a data structure that’s stored persistently on disk, and is typically implemented as either:

Without an index, performing a SELECT ... FROM table WHERE ...; statement requires the RDBMS system to scan through every single row in the table to see if it matches the WHERE clause. If you have more than a few hundreds rows in a table, this takes a non-negligible amount of time.

But by adding an index to a column that you frequently run lookups on, you can decrease this amount of time substantially. Instead of having to scan n rows (where n is the total number of rows in the database), the database can return your query by looking at either log(n) rows (if indexes are implemented as a b-tree) or simply one row (if indexes are implemented as a hash table).

Once you’ve set an index on a particular column, you don’t need to do anything to make subsequent queries use it. The query planner in the RDBMS will automatically use the index if it needs to.

While these are awesome benefits to using indexes, there are a few downsides to be aware of. Since the index is stored as a separate data structure from the rest of the data in your table, indexes take up additional space on disk.

There’s also the overhead of index creation and maintenance. Whenever a new row is added to the table, the index’s data structure must also be updated so that the index remains accurate.

If you’re willing to sacrifice a bit of performance on writes to get much better performance on reads, then indexes are a great tool to use.

Transactions & ACID Compliance

If you care about not losing your data, transactions are one of the most essential features you’ll find in a relational databases.

Transactions solve two independent problems:

Multiple users accessing the same row at the same time.

Resilience to system crashes and failures.

A transaction is simply one or more operations that are treated as a single, atomic unit. Either the entire transaction happens, or none of it does. You are guaranteed to never have a partial completion.

A common example that’s used when talking about transactions is the notion of transferring funds from one bank account to another. You might have multiple operations such as debiting one account and crediting another, and it would be really bad if only some of those operations were successful while others failed. To be safe, all of those operations should be wrapped in a single transaction to ensure things stay consistent.

More formally, transactions are also used to guarantee that relational databases are “ACID compliant”. The four ACID properties are the core tenants of reliability for relational databases:

Atomicity. If there’s a crash or failure during a transaction before it’s been fully committed, the database knows how to roll back to the way things were before the transaction. Nothing is left half done.

Consistency. A query can’t violate integrity constraints (ie writing a string to an integer field or leaving a NOT NULL field blank) or leave the database in an inconsistent state.

Isolation. Changes made in one transaction aren’t visible to any other transaction until the first one is completely finished. It must appear to each client that their transactions happened in order.

Durability. Changes that have been committed are stored permanently.

These may seem like obvious features, but there’s a lot that goes on under the hood to ensure these conditions are always met.

For example, the database will lock certain rows or tables when updates are being performed to ensure multiple clients aren’t writing to the same rows at the same time. But those locks are granular enough that other parts of the database can still be updated so that multiple, unrelated queries can be processed at the same time.

To force a series of operations to behave as a single transaction, you wrap everything with BEGIN...COMMIT. Some databases might have “autocommit” mode on where each operation is automatically wrapped in a transaction.

Downsides of Relational Databases

While relational databases come with a ton of powerful, battle-tested features out of the box, there are definitely some downsides to using them that are important to be aware of.

The most common complaint you hear about today is that relational databases “don’t scale” meaning they’re not really designed to be run across multiple machines in a cluster. Relational databases were developed in an era where data could easily fit on one hard drive and queries rarely took more than a few seconds. They’re not designed to work with huge amounts of data or queries that can run for hours and days.

If you’re really pushing the limits of what your database can store, you can try “sharding” your data — splitting it into different databases on multiple machines based on some shard key. But this is very difficult to implement and change once you get it set up. If your data and/or query load don’t fit on one machine, you might need to use something else besides an RDBMS.

Relational databases also have somewhat poor fault tolerance. While they have guarantees to ensure your data won’t become corrupted even in the event of hardware failure, they also require that you re-run any failed queries and simply try again. But if you’re trying to run long operations or use multiple machines, the likelihood of a failure mid-operation becomes nearly 100% and you won’t get much done if you need to keep restarting things.

Finally, while relational databases are good about enforcing integrity constraints, that doesn’t always make sense for every application. The schemas in a relational database are enforced when the data enters the RDBMS, making it impossible to store “dirty” data that still needs to be cleaned or processed in some way.

Resources

If you want to learn more about relational databases, I’d highly recommend checking out the following (free) video resources:

]]>https://blog.hartleybrody.com/databases-intro/feed/0The "Ultimate Guide to Web Scraping" is Now Availablehttps://blog.hartleybrody.com/web-scraping-guide/
https://blog.hartleybrody.com/web-scraping-guide/#commentsMon, 05 Aug 2013 02:45:05 +0000http://blog.hartleybrody.com/?p=2292I wrote an article on web scraping last winter that has since been viewed almost 100,000 times. Clearly there are people who want to learn about this stuff, so I decided I’d write a book.

No prior knowledge of web scraping is necessary to follow along — the book is designed to walk you from beginner to expert, honing your skills and helping you become a master craftsman in the art of web scraping.

The book talks about the reasons why web scraping is a valid way to harvest information — despite common complaints. It also examines various ways that information is sent from a website to your computer, and how you can intercept and parse it. We’ll also look at common traps and anti-scraping tactics and how you might be able to thwart them.

There are code samples in both Ruby and Python — I had to learn Ruby just so I could write the code samples! If anyone’s willing to translate the sample code into PHP or Javascript, I’ll give you a free copy of the book. Get in touch.

—

Check out the table of contents:

Introduction to Web Scraping

Web Scraping as a Legitimate Data Collection Tool

Understand Web Technologies: A Brief Introduction to HTTP and the DOM

Finding The Data: Discovering Your “API”

Extracting the Data: Finding Structure in an HTML Document

Sample Code to Get You Started

Avoiding Common Scraping Traps

Being a Good Web Scraping Citizen

As a special deal for my blog subscribers, get 20% off with the code BLOGSUB. That coupon code is only good for a limited time, so order your copy today!

]]>https://blog.hartleybrody.com/web-scraping-guide/feed/0How HTTPS Secures Connections: What Every Web Dev Should Knowhttps://blog.hartleybrody.com/https-certificates/
https://blog.hartleybrody.com/https-certificates/#commentsThu, 25 Jul 2013 02:40:22 +0000http://blog.hartleybrody.com/?p=2299How does HTTPS actually work? That was the question I set out to solve a few days ago for a project at work.

As a web developer, I knew that using HTTPS to protect users’ sensitive data was A Very Good Idea, but I didn’t have much understanding about how it actually worked.

How was data protected? How can a client and server create a secure connection if someone was already listening in on the wire? What is a security certificate and why do I need to pay someone to get one?

A Series of Tubes

Before we dive into how it all works, let’s talk briefly about why it’s important to secure connections in the first place, and what sorts of things HTTPS guards against.

When you make a request to visit your favorite website, that request must pass through many different networks — any of which could be used to potentially eavesdrop or tamper with your connection.

From your own computer to other machines on your local network, to the access point itself, through routers and switches all the way to the ISP and through the backbone providers, there are a lot of different organizations who ferry a request along. If a malicious user got into any one of those systems, then they have the potential to see what’s traveling through the wire.

But sometimes, as the developer of a web application, you know that sensitive information like passwords or credit card data will be going over the connection, so it’s necessary to take extra precautions against snooping on those pages.

Transport Layer Security (TLS)

We’re about to dive into the world of cryptography, but you shouldn’t need much experience to keep up. We’ll really only be scratching the surface.

Cryptography is the practice of securing communications against potential adversaries — people who might want to interfere with the communication, or just listen in.

TLS — the successor to SSL — is a protocol that’s most often used to implement secure HTTP connections (ie HTTPS). TLS sits at a lower level in the OSI model than HTTP, which is basically a fancy way of saying that, during a web request the TLS connection stuff happens before the HTTP connection stuff.

TLS is a hybrid cryptographic system, meaning it makes use of multiple crypto paradigms, both of which we’ll look at next:

Public Key Cryptography for shared secret generation and authentication (making sure you are who you say you are).

Public Key Encryption

Public key encryption is a type of cryptographic system where each party has both a private and a public key, which are mathematically linked to each other. The public key is used for encrypting plaintext to “ciphertext” (essentially, gibberish), while the private key is used for decrypting that gibberish back into plaintext.

Once a message has been encrypted by a public key, it can only be decrypted with the corresponding private key. Neither key can perform both functions by itself. The public key can be published freely without compromising the security of the system, but the private key must not be revealed to anyone who isn’t authorized to decrypt messages. Hence the names, public and private.

One of the cool benefits of public key cryptography is that two parties with no prior knowledge of each other can create a secure connection while initially communicating over an open, insecure connection.

The client and the server can both use their own private keys — along with some shared, public information — to agree upon a shared secret key for the session.

That means that even if someone is sitting in between the client and server and watches the connection happen, they still can’t determine the private keys of either the client or the server, or the secret key for the session.

How is this possible? Math!

Diffie-Hellman
One of the most common way this exchange is performed is by using a Diffie-Hellman key exchange. This process allows the client and sever to agree upon a shared secret, without having to transmit that secret over the connection. Again, snoopers can’t determine the shared secret even if they’re watching every packet on the connection.

Once the initial DH exchange takes place, the resulting shared secret can be used to encrypt further communications in that session using a much simpler symmetric key encryption, which we’ll look at in a bit.

A Bit of Math…
The math behind it is actually fairly simple to calculate one way, but essentially impossible to reverse. This is where the importance of having really large prime numbers comes into play.

If Alice and Bob are two parties performing a DH key exchange, they start by agreeing on a root (generally a small number, like 2, 3 or 5) and a large prime (300+ digits), both of which can be sent in the clear without compromising the security of the exchange.

Remember, Alice and Bob each have their own private key (100+ digits) that should never be shared, either between them or with anyone else. What they exchange publicly over the network is a mixture of their private keys, plus the root and the prime. Specifically:

So Alice creates her mixture using the agreed upon constants (root and prime) plus her private key, and Bob does the same. Once they’ve received each other’s mixture, they then perform some more math to derive the shared secret for the session. Specifically:

Alice Calculates
(Bob’s mixture Alice’s Secret) % prime

Bob Calculates
(Alice’s mixture Bob’s Secret) % prime

This calculation generates the same number for both Alice and Bob, and that number becomes the shared secret for this session. Note that neither party had to send their private key to the other one, and the resulting shared secret was also never sent over the connection. Brilliant!

For those who are less math-inclined, the Wikipedia article has a great image involving mixing colors:

Notice how the starting color (yellow) ends up getting “mixed” with both Alice’s color and Bob’s color. That’s how it ends up being the same for both parties at the end. The only thing that’s sent over the connection is the half-way-done mixture which is meaningless to anyone watching the connection.

Symmetric Key Encryption

This public key exchange only needs to happen once per session, the first time the client and server connect. Once they’ve agreed on a shared secret, the client and server communicate using a symmetric-key crypto system which is much more efficient to communicate on since it saves an extra round-trip each exchange.

With the shared secret they agreed upon earlier, plus an agreed-upon cipher suite (essentially a collection of encryption algorithms), the client and server can now communicate securely, encrypting and decrypting each others’ messages using the shared secret, with a snooper just seeing gibberish going back and forth.

Authentication

The Diffie-Hellman key exchange allows two parties to create a private, shared secret. But how do the two parties know they’re talking to the correct entity? We haven’t talked about authentication yet.

What if I picked up the phone and called my friend and we performed a Diffie-Hellman key exchange, but it turns out my call was intercepted and I was actually talking to someone else? I’d still be able to communicate securely with that person — no one else would be able to decode our communication once we negotiated the shared secret — but they’re not who I thought I would be talking to. That’s not very secure!

To solve the authentication problem, we need a Public Key Infrastructure to make sure that entities are who they say they are. These infrastructures are set up to create, manage, distribute and revoke signed certificates. Certificates are those annoying things you have to pay for in order to serve your site over HTTPS.

But what exactly is a certificate, and how does it make things more secure?

Certificates

At a high level: a public key certificate is a file that uses a digital signature (more on that in a minute) to bind a machine’s public key with an identity. The digital signature on the certificate is someone vouching for the fact that a particular public key belongs to a particular individual or organization.

Certificates essentially associate domain names (the identities) with a particular public key. This prevents a snooper from presenting their own public key, pretending to be the server a client is trying to reach.

In the phone call example above, the attacker could try presenting his public key, pretending he’s my friend — but the signature on that certificate wouldn’t be from someone I trusted.

In order to be trusted by the average web browser, certificates have to be signed by a trusted Certificate Authority (CA). CAs are companies that perform manual inspection and review, to make sure that the applying entity is both:

a real person or business that exists in the public record

in control of the domain they’re applying for a signed certificate for

Once the CA verifies that the applicant is real and really owns the domain, the CA will “sign” the site’s certificate, essentially putting their stamp of approval on the fact that this site’s public key really belongs to them and should be trusted.

Your browser comes preloaded with a list of trusted CAs. If a sever returns a certificate that isn’t signed by a trusted CA, it will flash a big red error warning. Otherwise anyone could go around “signing” bogus certificates. There needs to be a layer of trust in the system.

So even if an attacker were to take their machine’s own public key and generate a certificate saying that public key was associated with facebook.com, a browser wouldn’t trust it since that certificate isn’t signed by a trusted CA.

Other Things to Know About Certificates

When granting an extended validation certificate, the CAs must do even more checking into the identity of the entity who owns the domain (usually requiring passport or utility bills).

This type of certificate turns the browser bar green, in addition to showing the usual padlock icon.

Serving Multiple Websites from the Same Server
Because the TLS handshake occurs before the HTTP connection begins, there can be problems if there are multiple websites hosted on the same server, at the same IP address.

The named virtual hosts routing happens in the web server, but the handshake happens before the connection reaches that point. The single certificate for that system needs to be sent on requests to any of the sites hosted on that machine, which can create problems for shared hosting environments.

If you’re using a web hosting company, they’ll usually require that you purchase a dedicated IP address before you can get HTTPS set up for your website. Otherwise they’d constantly need to get new certificates (and get them re-verified by CAs) every time a site on that machine updated.

—

Wikipedia is a great resource for this stuff, and this Coursera course looks especially interesting. Thanks to the guys in the security.stackexchange.com chat room for answering some of my questions this morning.

]]>https://blog.hartleybrody.com/https-certificates/feed/0Are You A Hacker, Developer or Engineer? (And Why it Matters)https://blog.hartleybrody.com/hacker-developer-engineer/
https://blog.hartleybrody.com/hacker-developer-engineer/#commentsSun, 23 Jun 2013 18:14:32 +0000http://blog.hartleybrody.com/?p=2285In the software world, the terms “developer” and “engineer” are often used interchangeably to mean “someone who builds things with code.”

Sometimes the word “hacker” gets thrown into the mix if the company is a startup or is trying to make an open job position sound more enticing.

But what does it mean for someone to be a “developer” versus an “engineer”? Does it matter? If you’re trying to “level up” or make a career out of writing code, I’d say it matters a lot.

But he begins the presentation by describing his path to becoming a front-end engineer, and each of the steps most people follow along the way. In the process, he offers a nice definition of “hacker,” “developer” and “engineer” that highlights their differences quite nicely:

A hacker can come up with solutions, but maybe they can’t look back after they’ve finished and realize how they came up with the solution. They just kinda poke at things until they get something that works.
…
At some point, you level up and become a developer and a developer understands best practices. They’ve heard other developers say things like “you should put your scripts at the bottom of the webpage” … and you use those best practices to craft solutions but you don’t really understand beneath the best practices, beneath the abstractions.
…
An engineer is someone who can get things done, craft a solution — they understand the best practices, but they also understand why they’re using the best practices that they are … [they] move into an understanding of the platform as a whole.

The labels themselves aren’t as important as the degree of awareness and understanding that each new “level” brings.

From Hacker to Engineer

Listening to David’s explanation, I was immediately reminded of my own progression as a web developer.

When I first got started, I used MySQL as my database because that’s what all the tutorials said to use and I had no idea what the other options were. I copy/pasted some code that I found online and built my own tiny blogging engine in PHP. I had no idea what I was doing, but it worked for my use cases and that was good enough.

About a year ago, I started learning MongoDB because that seemed to be all the rage on Hacker News. Surely if others are using it, it must be really good. I read about indexing and normalization because people seemed to be talking about those a lot and I wanted to learn all of the best practices.

Now, I’m taking a Coursera course on Data Science and another called Introduction to Databases and am starting to understand the why. Why were relational databases created in the first place, what problems did they solve? Why would you choose a schema-less database versus a relational one? Why do indexes make your queries faster, and what are the drawbacks to using them?

Moving Through the Stages

Not only does the degree of understanding change across the levels, but the type of learning that takes place is also somewhat unique. As a hacker, you can play with other people’s code, copy/pasting it around and making tweaks until things seem to work well enough. When you hit a road block you Google it, then move on to the next one.

As a developer, you need to start paying more attention to the community. Subscribe to blogs or Twitter accounts that talk about technologies you’re interested in so that you learn best practices and see how others are approaching similar problems. Sites like Hacker News or technology-specific subreddits can be good for seeing which articles are the most valuable in the community. Your learning is fairly passive and you follow the lead of others.

But you can’t really expect to become an engineer by reading the most popular blog posts in software communities. You need to start learning about things that you didn’t know you didn’t know. Here’s where reading a book or taking a full-fledged academic course really becomes essential. They usually offer the breadth to survey the whole field and give you perspective, showing you the how and why.

—

David’s definitions really crystallized these stages for me, and I think it’s important that everyone who aspires to make a living writing code is aware of them, and is realistic about where they fall on the spectrum from hacker to engineer.