Another that’s been out for a while is the data warehouse offering called RedShift.

1. old-fashioned SQL interface

Ok, yes Redshift can support petabyte databases and this in itself is staggering to consider. But just after you digest that little fact, you’ll probably discover that it’s SQL compatible.

This is a godsend. It means the platform can leverage all of the analytical tools already in the marketplace, ones that your organization is already familiar. Many are already certified on RedShift such as Looker and Chart IO.

I’m in the process of evaluating FlyData sync. This is a service based solution which connects to your Amazon RDS for MySQL instance, capturing binload data much like Oracle’s GoldenGate does, and ships it across to RedShift for you.

If you have constantly changing data, this may be ideal as you don’t have a one-shot dataload option, implied by the basic COPY command solution.

3. Very fast or very big nodes

There are essentially two types of compute nodes for RedShift, DW2 are dense compute running on SSD. As we all know, these are very fast solid state memory drives, and bring huge disk I/O benefits. Perfect for a data warehouse. They cost about $1.50/Tb per hour.

The second type is DW1 or so-called dense storage nodes. These can scale up to a petabyte of storage. They are running on traditional storage disks so aren’t SSD fast. They’re also around $0.50/Tb per year. So a lot cheaper.

Amazon recommends if you’re less than 1Tb of data, go with Dense Compute or DW2. That makes sense as you get SSD speed right out of the gates.

4. distkeys, sortkeys & compression

The nice thing about NoSQL databases is you don’t have to jump through all the hoops trying to shard your data with a traditional database like MySQL. That’s because distribution is supported right out of the box.

When you create tables you’ll choose a distkey. You can only have one on a table, so be sure it’s the column you join on most often. A timestamp field, or user_id, perhaps would make sense. You’ll choose diststyle as well. ALL means keep an entire copy of the table on each node, key means organize based on this distkey, and EVEN the default means let Amazon try to figure it out.

RedShift also has sortkeys. You can have more than one of these on your table, and they are something like b-tree indexes. They order values, and speed up sorting.

5. Compression, defragmentation & constraints

Being a columnar database, Redshift also supports collumn encodings or compression. There is LZO often used for varchar columns, bytedict and runlength are also common. One way to determine these is to load a sample of data, say 100,000 rows. From there you can ANALYZE COMPRESSION on the table, and RedShift will make recommendations.

A much easier way however, is to use the COPY command with COMPUPDATE ON. During the initial load, this will tell RedShift to analyze data as it is loaded and set the column compression types. This is by far the most streamlined approach.

RedShift also supports Table constraints, however they don’t restrict data. Sounds useless right? Execept they do inform the optimizer. What’s that mean? If you know you have a primary key id column, tell RedShift about it. No it won’t enforce that but since your source database is, you’re able to pass along that information to RedShift for optimizing queries.

You’ll also find some of the defragmentation options from Oracle & MySQL present in Redshift. There is vacuum which reorganizes the table & resets the high water mark, while it is still online for updates. And then there is Deep Copy which is more thorough, but takes the table offline to do it. It’s faster, but locks the table.
o deep copy

1. Big availability gains

One of the big improvements that Aurora seems to offer is around availability. You can replicate with aurora, or alternatively with MySQL binlog type replication as well. They’re also duplicating data two times in three different availability zones for six copies of data.

All this is done over their SSD storage network which means it’ll be very fast indeed.

2. SSD means 5x faster

The Amazon RDS Aurora FAQ claims it’ll be 5x faster than equivalent hardware, but making use of it’s proprietary SSD storage network. This will be a welcome feature to anyone already running on MySQL or MySQL for RDS.

3. Failover automation

Unplanned failover takes just a few minutes. Here customers will really be benefiting from the automation that Amazon has built around this process. Existing customers can do all of this of course, but typically require operations teams to anticipate & script the necessary steps.

4. Incremental backups & recovery

The new Aurora supports incremental backups & point-in-time recovery. This is traditionally a fairly manual process. In my experience MySQL customers are either unaware of the feature, or not interested in using it due to complexity. Restore last nights backup and we avoid the hassle.

5. Warm restarts

RDS Aurora separates the buffer cache from the MySQL process. Amazon has probably accomplished this by some recoding of the stock MySQL kernel. What that means is this cache can survive a restart. Your database will then start with a warm cache, avoiding any service brownout.

I would expect this is a feature that looks great on paper, but one customers will rarely benefit from.

Application infrastructure is not something we learned in my college, and it’s definitely not something I will learn anytime soon in my current job (I work as a mobile developer for a mid-sized startup). I also think it’s not something you can just goof around with in your own computer. Do companies prepare their software engineers when hiring infrastructure engineers, or do they all expect you to know your skills and tools? Also: Is automation killing old-school operations
For example, My guess is that Facebook has a huge infrastructure team making the site usable and fast for as many people as possible. Where can you learn that skills, or get prepared for that time of job? Do you think it is possible to self-learn those skills?

Here’s my take on some of this. Since the invention of Linux, experimenting with infrastructure has been within reach. In the present day there are some even better reasons to experiment & teach yourself about this important aspect of devops & backend server management.

Early Linux circa 1992

Before Linux (in the 80′s we’re talking about) it was a lot harder. Into the 90′s Linux came on the scene and you could cobble together parts, video, motherboard, memory, ide or scsi bus & disks & build a 486 tower. You could then start building linux. I mean because of course everything had to be hand rolled (compiled by hand & debugged usually)!

Present day virtualization

What to learn

Start learning Vagrant. It automates the provisioning of virtual machines on your own desktop. You can boot those linux boxes to your hearts content, network between them, hack them, run services on them, build your skills.

I’d also recommend digging into docker. It is the lightening fast younger brother to Virtualization.

Why I write about hiring

I’ve worked as a consultant for almost twenty years. Technology & professional services are pretty far removed from hiring, so why would I write about it?

As it turns out, finding projects, working with clients, and selling your skills & solutions has quite a lot in common to do with hiring.

As a services consultant, you’re more often a peer to technology directors & CTOs, while hiring for traditional roles is more of a boss employee relationship.

Recruiters

I’ve run into a lot of recruiters & hr folks over the years. Usually it means I’m talking to the wrong folks, as they’re gatekeepers & not decision makers. I wrote Why I don’t work with recruiters after some ups & downs.

Still they’re all a fact of life, and each of us has a role to play. So let’s play fair!

Games

I’ve always wondered, Is Hiring a numbers game? That is does it bend more to persistence & throwing spagetti at the wall, or deliberate, precision searches?

MySQL interview

These are helpful not just to candidates, but to hiring managers, hr, recruiters & everyone in between.

Mythical talent

Since as far back as I can remember, DBAs have been in short supply. In the 90′s I was doing primarily Oracle work. There were never enough technical dbas. Many came from business backgrounds, and didn’t have operating system & hardware fundamentals.

Costs

A stack of…

These days the full stack of a internet or mobile startup involves a lot of varied components, from Chef, Puppet & Ansible, to Nginx, haproxy, redis, solr and some database like MySQL or Postgres on the relational end of the spectrum, or Mongodb, Hbase or Cassandra on the NoSQL side. What type of challenges does this pose to a team? I’m curious,Do startups assemble at their own risk?

Let things fail

Young founders

I worked at one startup with a CTO just out of college. Although they were flush with cash & had real problems scaling, communication problems ultimately soured the engagement. Are you too young to be a boss?

80 million fix

Sometimes fixing serious performance bottlenecks can get a site back up on it’s feet. In this success story they went on to get acquired weeks after the fix. In tongue in cheek fashion I askWhere’s my 80 million dollars?

I wonder if I can blog about devops without first level setting on what the term means. Yes I’ll agree it’s used broadly, sometimes as a buzzword, sometimes as a catch-all phrase. Luckily I already wrote a post like that… What is devops and why is it important?.

Fear of automation

There’s a lot of automation happening in the cloud. A lot more configuration management (chef, puppet, ansible) is in use. I’ve seen some platform as a service companies (Heroku & EngineYard are examples of these) argue that you can now spend more on devs. You won’t need an operations staff. This raises the question Is automation killing old-school ops?.

NoSQL taking over…

If you look left some startup is building on Mongodb, and look right and another is building on Cassandra. It makes you wonder, Are sql databases dead.

Death of MySQL?

While we’re on the topic of relational databases, it’s been six years since Oracle’s purchase of Sun Microsystems. Some are still worried, Will Oracle kill MySQL?

Db operations

On resistance

Another week, another war story. Sometimes the job of an op, systems administrator or DBA is actually to say “no”. In this story the CTO was shouting, and tons of money was being lost every minute. Supposedly. So I wrote Does a devop need to practice the art of resistence?

Perspectives & mandates

Ops & devs look at the world in different ways. I argue that’s because the business asks them to do very different things. Devs are tasked with bringing change, through new code & product features. Ops are tasked with continuity, stability, uptime & performance. That often means resistance to change. So I wonder Does a four letter word divide dev & ops?

Database as a service?

You’re looking at Amazon Web Services, and wondering, should I use their RDS database service or build my own MySQL? Here are 10 use cases for RDS or MySQL.

So here’s a peak into the archives, of some of the very best of scalable startups. Enjoy!

1. I blog about consulting

When you spend years doing consulting, professional services & freelance work, you learn all sorts of things. You stumble, you find yourself in unfamiliar territory, you learn. All that makes great fodder for blogging about business, and war stories. So here’s some of my best writing on the topic.

I had one experience where a prospect was still on the fence. That may be positive spin, as the title wasWhen prospects mislead. It turned out to be more a case of free consulting advice than anything else.

At networking events, I meet other freelancers, and consultants. There’s always debate about this topic, so I wroteWhy i ask clients for a deposit. There are reasons for both client and consultant, and I touch on the lessons i’ve learned.

It might seem strange that I’d write a post titledWhy I can’t raise the bar at every firm but there are prospects that aren’t the right fit for me. Here are some of the pre-qualifying questions on both sides of the fence.

You’re ready to hire a consultant. What’s next? As it turns out, professional services is more a peer relationship with CEO’s, CTO’s & managers. So the typical, “send me your resume” and so forth may not be best. Here’s5 conversational ways to evaluate consultants that provide an alternate approach to finding the best services.

One of the hardest things for engineers can be sales. Along the way to consulting success, I wroteCan an engineer learn to love sales? Eventually it’s a skill that you have to improve at, if you want to stay in business for yourself.

Ever consulting engagement is not about your own triumphs. The conclusion isn’t always the wonderful things you’ve done for the firm. I wroteWhen you have to take the fall after an engagement where it wasn’t a celebration at the end.

1. Dizzying array of technologies in use

I’ve been working with startups since the mid-nineties. In those days most application stacks consisted of a PHP application running on Apache, with Oracle on the backend. Both webserver & db ran on Sun Solaris. Hardware was reliable. Most attention was focused on fitting everything in memory, and monitoring the servers for swapping, and disk failure. Boy have those days changed.

I see dozens of startups each year, so I see a lot of very cutting edge environments. Here’s a peak at what I’m seeing these days:

Database: MySQL, Postgres & Oracle, to Mongodb, Cassandra & Couchbase

Caching: Memcache or Redis

Search: Solr

Webservers: Apache, Nginx, Lighttpd

Load balancers: haproxy, Zen

Languages: PHP, Python & Ruby

Publishing: Drupal, WordPress, Joomla

Continuous Integration: Jenkins

Metrics: Cacti, collectd, NewRelic

Monitoring: Nagios, Ganglia, Munin, OpenNMS

Automation: Ancible, Chef, Puppet, Docker & Vagrant

Logs: Logstash

DDOS & CDN: Cloudflare, Ultradns

Whew… That’s a long list!! And we’re not even considering the API’s that many applications are now building on.

3. More things to break & master

o features in current versions
o bugs of current versions
o vulnerabilities of various versions
o troubleshooting
o best practices
o backup & reliability

For example a lot of shops where I dig into the database, I find low hanging fruit, such as misconfigured startup settings, table layout or index usage.

I see similar things when a networking expert pours over the haproxy configuration, or runs ping tests across the network. Most of these components are setup with fairly vanilla configurations, leaving loose ends and frayed threads.

5. Long term support & viability

At one five year old firm, I was brought in to address scalability problems. I met with the team and was asked to provide a comprehensive review. The first thing I found was all the original engineers had long since left, so the code was new for everyone. As I dug my heels in, I found multiple versions of Apache along with Nginx on some other servers. Their stack was built on a patchwork of Python, Ruby & PHP. Then digging in further, we found a complicated web of dependencies for digital assets, mounted across servers & unmonitored.

Lack of standards is common in environments like these. Without an operational or architectural lead, developers are left to make decisions with what is directly in front of them. Though a decision of what language to use may appear simple at the outset, it carries long term consequences.

Will that language or technology be supported in five years? Will the community survive? Will your firm be able to hire people with that skill set? Will engineers still be excited about it?

One of the main takeaways from that work is the idea of “getting out of the building”. It means essentially that before you get to far along with your idea, building your product, and too heavily vested and invested in one direction, go do real research with real potential customers.

Right from the beginning test your ideas, and talk to customers. It’s not easy, but if done right will be very revealing.

The book can be had for free at Talking To Humans as an e-book, or send it straight to your kindle for $0.99 cents! With a forward by Steve Blank & Tom Fishburne’s funny cartoons and at only 98 pages, it’s well worth an hour or two of your time.

2. Fight cognitive biases with metrics

We all have biases. We think are customers are soccer moms, or 20-somethings who like lattes. By calculating metrics, we find out which market segments actually want our product and why. Keep calculating metrics, and make conclusions from real data.

At the same time beware the dynamic of mistaking statistics for facts. Remain skeptical!