2015-07-31T10:30:00-07:00http://www.craigkerstiens.com/Octopress2015-07-25T00:00:00-07:00http://www.craigkerstiens.com/2015/07/25/A-guide-to-analyst-relations-for-startupsWhen it comes to go to market and marketing there’s lots of pieces in a toolchest that all work together. One that comes a bit later, but if used properly (much like a PR agency) can be valuable is industry analysts. And while working with a PR agency can quickly start to become clear. How to work with analysts so it is productive on both sides can take a bit longer to figure out, or at least it did for me. Even before you do start working with them there’s the question of if or when should you. Here’s hoping this primer makes it a bit faster and easier for others.

What is an analyst

Apologies to all analysts, but of all parts of this post I might butcher this one

Analysts talk to a lot of companies, both the ones making products as well as the ones purchasing them. I’m not actually sure what the spread is I’d guess 80-20. A large output of this and other activities is creating various reports and rankings. Gartner’s Magic Quadrant is probably the most well known industry ranking. Much of what they create isn’t freely available for consumption so you likely don’t see the sheer volume of insights they put out.

Why would you engage with an analyst

So what do they do for you? There’s really two major buckets:

Help with sales/marketing – Given they’re informing and influencing buying decisions of businesses they can be one more person on your side. If a launch in Techcrunch makes business foo aware of product bar, then an analyst report or ranking can help sway a decision on whether to try bar vs. baz.

Consulting – The other major opportunity is for the analyst to give some form of guidance. In a larger company when you already have an established product they should absolutely be part of your launch process (more on that in a future post). They’re actively following your market and space, hopefully just as you are to some extent. They can offer an outside perspective and help with broad areas of focus and messaging.

More – In reality it’s as clean cut as above. They may be able to introduce you to good candidates for hiring. They may be able to introduce you to a large company interested in acquiring some capability which you have. They may be able to connect you with investors. All of these things can and do happen, but the above buckets typically are the primary drivers.

When?

First, engaging with analysts should always come after you have some confidence in the product, after you’ve started some marketing drumbeat, and after sales. In short don’t be in a huge rush here, you’ll get there, but don’t be in too big of a rush. As you start to get some attention and momentum it’s just as likely they’ll engage with you first as you reaching out. Also, marketing != sales, more on that in a future post.

But, let’s assume you’ve got a product which targets business (Analysts aren’t just for tech companies, though you’ll see the benefit here sooner if you’re say a database company than a HR product). Let’s also assume you’ve got some sales and have some good launches under your belt. As it starts to come up in sales calls if you’re in any industry reports or rankings that may be an indicator, if you’re hearing about other competitors having more validation in such reports. As a general rule of thumb once you’ve got inhouse PR they should be able to help guide and steer to the right time.

So how do you engage with an analyst

If you’re engaging in some form of report or article, that should start to be pretty self explanatory. They’ll send you a questionnaire, you fill it out. You go back and forth a little bit.

However, the majority of my interactions aren’t on those articles and reports, for ever one time I fill out lots of questions to help some report or ranking I have 20 calls with an analyst.

There are two primary calls you can have, an inquiry and a briefing.

Inquiries

Inquiry is just a fancy word for consulting call. An inquiry you will always be paying for.

A small detour here. The regularity and consistency in which you engage with an analyst makes a difference. They’re also people at the end of the day, so while firms have certain styles it’s even further multiplied by being very people driven. In your interactions you’ll have a different rapport with different people, it’s at a minimum important to be aware of this.

So back to an inquiry. Within an inquiry your goal is to pull back the curtain and give some backstage insights into what you’re doing and where you’re headed. This is typically under NDA and trust the NDA of an analyst. It’s worthwhile to be as candid as you can here, yes it feels weird, but you’ll get the most value. They’re not like that of a reporter looking for a scoop (not that you can’t trust reporters, just know if you say it, it’s on record). You don’t have to relish the entire call to one area, but areas of coverage are often:

Upcoming products and major releases you’re working on

Broader strategy and roadmap

Get input on what they’re seeing and hearing from customers

Briefings

The other type of call we have is a briefing. This is a little similar to that of a press briefing. You’ll get on the call, and walk through some upcoming launch or just give an update on your company and progress. The latter is more common if they’re unfamiliar with you or your product.

Analyst briefings are good to do earlier than your press briefings, compared to press they’re like a bike with training wheels. It’s best if you still maintain your balance–the ride will be smoother, but there’s a little less risk of completely toppling over. One key difference is you often have a powerpoint deck you get to walk through during an analyst briefing. I’ve found this is helpful for pacing and key messages, I used to be skeptical, but now very much feel it’s always worth doing.

Pro-tip: You can create a deck and use it for press too, no they won’t want to get on a gotomeeting, but you can send it over so they have more content later. BUT, more importantly you can also walk through it on your own screen if it helps with pacing.

Within a briefing you’ll have some ability to ask them questions at points. Does this resonate? Are you hearing similar? What are you seeing in the market? Don’t turn it into an inquiry, but knowing the parts that hit home for them allow you to refine your pitch for the next call.

Engaging – the tactical parts

“Analysts are pretty much paid to talk and write” – @cote. So expect that often when you occupy their time there’s a price to it. In terms of finding them it should be pretty easy, to know the list of ones in your space, you may see them quoted or reference in various media outlets. You may just naturally crop up in a report.

If you create a regular relationship with them you’ll have some contract of hours over the course of a quarter or year. At an early stage company this is often owned and manage by whomever runs your PR from an internal perspective.

Conclusion

If you’re about to engage with analysts for the first time or haven’t figured out how to get the most out of your interactions I hope the broad overview is helpful. If there’s some glaring parts you feel I’ve missed let me know @craigkerstiens. And for further reading/watching I’d encourage checking out the great talk from @cote in the Heavybit library.

As far as take-aways and a recap:

Don’t be too eager to jump in with analysts. They can absolutely provide value, but you have to put some time in before it really starts to pay off. It’s not an overnight change and takes building a rapport with them.

At the same time, analysts can be useful in many B2B areas not just tech ones.

When in an inquiry be open and as transparent as possible.

Powerpoint/Keynote/Google presentations are useful in briefings, even if it’s just for you to follow along.

]]>2015-07-21T00:00:00-07:00http://www.craigkerstiens.com/2015/07/21/An-intro-PR-guide-for-startupsYou’ve built your product and you’re now ready for your first major launch. Or you’ve been through a launch or two, but are looking to scale the process as you’re doing more launches and announcements. You really have two options: do it all on your own, or work with a PR agency. One frequent crossroad is that you’re not at the point of a full time PR person, but unsure what a PR agency can offer you; and, further what’s the best way to work with them so you’re getting the maximum value.

As I’ve talked to more startups lately, it’s become clear that effectively working with PR teams and the media is mostly learned by doing. Because there’s not much guidance out there, here’s an attempt at some basic guidelines.

On PR

First there’s two types here and they’re not mutually exclusive. In-house PR is a full time person or team that works within your company, here you’ll often have a pretty different experience. From my experience, in-house PR people tend to understand a company message and vision because they are living and breathing your company values every day.

The other alternative is hiring a PR agency. An agency will have several (sometimes hundreds!) of clients. The relationship that you’ll have with an agency is much different than in-house. You’ll use them just like you would a consultant or contractor. Most startups end up with the agency approach first, because of the perception of “more people working for a cheaper cost than hiring in-house.” However, it’s of note an agency doesn’t alleviate you of doing work, nor should you want them to handle all parts of it.

Messaging

An agency may offer to help with messaging, but take this somewhat lightly. I don’t doubt that some are very good at it, but in most cases I’ve found they don’t have the same amount of customer interaction as you as a founder or early employee would. Further, your vision of impact to the market and direction may be more distant than theirs. You should expect to own your messaging, just like you own your product.

Where they can heavily help is providing a lot of structured frameworks for helping you get to your messaging. Some pretty basic templates of standard questions for customers and partners can go along way in helping you actually uncover what they feel your value is.

On your key messaging/value prop, there’s two pieces I’ll drop in here. While I’d love to write another long post on it, I wonder when I’ll actually get it out. So the first is pitch the problem you’re trying to solve–Dave McClure talks about this as well as anyone. The second is don’t pitch features, pitch the use cases and solutions. Pitch what’s possible

Pitching

This is the number one area I’ve found that having PR makes a huge difference. In the world of reporting, different reporters have different beats (areas of coverage), styles, outreach preferences, and most importantly, different relationships with companies and people. Knowing all of this and how to pitch a story to them is key. Yes you can spend hours researching and creating a perfect story just for them, and do that again, and again and hopefully land some coverage. But I’d argue a bit: that’s not the best use of your time.

With a good PR person or agency you’ll be able to strike a mix of:

Here’s the outlets I want to be in and why (have a good reason for why).

Understanding the audience and readership.

What outlets you feel like your key customers are reading, and validate this with the agency.

From there, if you’ve found a good agency they already have relationships with your key journalists / publications. So if you have a compelling product, you just need to give them the right messaging of the particular launch or news.

What else to expect from your agency

A surprise for some is how the whole process works. The agency is going to be there on the phone with you. You’re not going to hang out over beers while pitching being chummy. The reporter is listening to multiple other pitches, it’s likely they had one right before you and right after. The agency is there listening, helping keep time and track of conversation for reporter fact-checking after the interview.

Hopefully they’re also keeping notes. They should be able to provide you with some high level notes of what message resonated with each reporter and what didn’t, what you covered, and what they asked. This is especially useful for future interactions.

Similarly you should get a briefing 1 pager ahead of time. You should be able to skim this, you don’t have to memorize. But it’ll include key things about recent articles written by the reporter, their beat, topics to dive into and ones to stay away from. If you can connect the dots, those notes from an initial call start to feed into the 1 pagers for future calls.

Onto the briefing

Of course it’s important to land the briefing in the first place, but just as important is getting it right. Coming into it, the reporter will have already gotten the high level pitch… It’s why they took the call. You’ll get a mixed bag of those that are open to teeing up the opportunity to those that want to get right to the news. Roll with what they prefer, but also don’t be afraid of trying to hit some of your key points.

Have your key messages ready

Sound bites help hugely here. Analogies, customer references, whatever you want to hit. Have it ready. Also if you’ve got a great sound bite that helps tell the story, it can make the reporter’s job easier. Just don’t swing too far into happy go lucky marketing land. It’s important to remember that you’re talking to a person. Have a conversation – don’t talk at them.

Go slow

It may seem obvious when you think about it, but as you’re talking the reporter is writing. Or at least you hope they are. Some do it by hand and type up notes late, some type right then and there. When you hear a pause it doesn’t always mean to keep going and it seldom means hurry up. Become extra comfortable with pauses. Check in if you’re going to fast, if they’re following, if they have any questions. I’ve had people bring me in a beer before because I’d had multiple cups of coffee through a few pitches, and they were trying to slow me down a bit. Know your pace, and then slow it down.

Questions

It’s okay if they don’t have a lot of questions, they may not. They may have none at all. Yes, pause, and give them a chance, or even ask if they have any. But don’t stress too much if they have no questions.

On the flip side of that – you’re PR person should have prepared a list of questions for you beforehand that the reporter could possibly throw your way. Be sure you’ve thought through and practiced all the Q&A scenarios before the interview so you aren’t caught off-guard when you’re in front of the reporter.

In conclusion

If it’s your first go around, don’t stress too much. Have the headlines you want in your mind and key messages, or better yet write them out. Personally I write key things on a whiteboard nice and large before I’m on the call. Finally once you’re all done, enjoy reading the coverage. But you’re not all done after you get some coverage look back, run a retrospective just like you would for a software project. What worked well, why did or didn’t something work. What can you improve next time.

Full disclosure, this is based across interactions with a small sample size of different PR agencies and individuals. Mileage may differ heavily from PR firm to PR firm, but hopefully the above provides at least some roadmap for more clarity vs. flying blind. As always if you’ve got feedback/questions, feel free to let me know @craigkerstiens

Finally a special thanks to Paul Katsen for much of the inspiration on creating this post and to he and Katie Boysen for review

]]>2015-06-07T00:00:00-07:00http://www.craigkerstiens.com/2015/06/07/Moving-past-averages-in-sqlOften when you’re tracking a metric for the first time you take a look at your average. For example what is your ARPU – Average Revenue Per User. In theory this tells you if you can acquire new user how much you’ll make off that user. Or maybe what’s your average life time value of a customer. Yet, many that are more familiar looking and extracting meaning from data median or a few different looks at percentiles can be much more meaningful.

And while you can very easily get the AVG in Postgres, with a small amount more effort you can report on percentiles as well. Window functions have been around for some time in Postgres. They allow you to order your result set over a certain group. The most basic example is if you want to order by date, but know which one falls at place 10 in order you can use a window function and project out the rank().

Beyond outputting the rank yourself and doing extra manipulation Postgres has some great utilities to make the most common uses even easier. Being able to compute things such as the perc 95 directly on the data, or lay out for every record in the result where it falls within a percentile is hugely useful. Let’s take a look:

Assuming you have a table called purchases, which has a total in it we could try:

SELECT id,
total,
ntile(100) OVER (ORDER BY total) AS perc_rank
FROM purchases

What this would tell us is we have less than 5% of our purchases that have a total over 751. From here you can start to dig in and extract all sorts of different meanings, and by doing directly in SQL you’re closer to the data and have one less processing step.

Percentiles get even more fun with the ordered set functions that came out in Postgres 9.4. They even allow you to project out hypothetical values in certain cases. For now I’d encourage adding ntile to your toolbox anytime you’re analyzing average or medians it will make your world a bit better, and then consider exploring further on the ordered set functions

Sure we’re still several months away from Postgres 9.5 being released, anywhere from 3-6 months as a best guess. That doesn’t mean we can’t take a first look at this feature. Though before we get into it a few special call outs of thanks to Peter Geoghegan of the Heroku Postgres team for being the primary author on it, Andres Freund who recently just joined Citus Data for his heavy contributions, and Heikki Linnakangas as well for his contributions.

And now onto the exploration. Upsert is the common name, but if you’re unfamiliar upsert is essentially create or update – Create this new record, but if a conflict exists update it. Let’s take a practical example.

Assume you have a web scraper that imports product information into a table. Each product has a UPC code, title, description, and link. There’s a unique constraint on the UPC code. Now, if your web scraper tries to insert a new product, and a product with the same UPC already exists, you’d usually get an error. But you don’t want the query to fail, you’d want to update the existing product instead. Maybe with a new image, maybe a new description, whatever have you, but I don’t want it to blow up… I simply want to capture the new data and save it.

So before: Insert a record… Exception this violates a unique constraint… Let your app figure out what to do. protip: often applications would try to work around this, but you can run a chance of a race condition and duplicate records if there’s a conflict. TLDR; it’s not a perfect solution.

Now: Insert a record… There’s a unique constraint violation… Okay, let’s just update all the new record’s fields inside a single transaction

It’s been a long time coming for this, and it makes building applications that need this kind of behavior even easier. While it would have been great for this to be available years ago, kudos to Postgres and its community for taking the approach that is safe for your data. The result we have now both provides the desired behavior of create or update, and is performant without the risk of race conditions for your data.

]]>2015-02-18T00:00:00-08:00http://www.craigkerstiens.com/2015/02/18/a-pm-blueprintI find myself having more conversations with startups – both small and large – about product management. I’ve blogged about some of the tools in my chest here but I haven’t talked much about my “blueprint” for product management, which I find myself laying out in many conversations over coffee. What follows is this process I’ve used a few times over with new teams to get product and engineering moving together, shipping in a predictable manner, and tackling bigger and more strategic projects.

Trust

I need to know how to work with my team, what their working styles are, and how we interact. This starts by simply interacting – specifically, outside of the office. I heard a similar opinion recently from Chris Fry (who ran engineering at Salesforce and Twitter) when he remarked something to the effect of: “you can tell a good PM from a bad one based on if he goes to drinks with his team.” Without getting hung up on whether it’s beers or coffee, it’s more about socialization with your team and time outside the office. My personal approach: expect a dinner invite over to my place when I take on running product for a new team.

Velocity

Once you’ve started to build some rapport, it’s time to get down to business. If being able to quickly commit and ship something isn’t a problem for you, then it’s easy to just assume this is working. In reality most teams I encounter that need PM support don’t have shipping nailed down. You probably already know if you fall into that category of feeling like you can commit and ship vs. not, so if you’re not able to do that a few tips:

There’s some projects that everyone wants to ship that’s been tried over and over, don’t tackle that first.

Shipping something is better than nothing. It doesn’t have to be the right thing.

Sometimes you don’t have to ship something to get velocity, you can launch things you already have

Kill scope Test things earlier and more iteratively, the more you can validate or try something without requiring a large investment the more everyone feels better about the direction you’re heading.

The key here is to commit to projects, deliver, and move on. Your velocity depends solely on delivery, not tasks, not sprints, not projects, etc. If you haven’t shipped anything in a year, then your velocity for the year is zero. At a later point you should move from the focus on shipping anything to shipping the right things, it’s more important to ship 1 thing that moves the needle than 10 that don’t, but that’s a later concern.

Killing things

On the note of killing scope… I’ve heard it articulated at times, that some engineers are happy when certain PMs show up because it means less work for them. When you go over to an engineer’s desk are you creating more or less work? The answer should be less some large percentage of the time. If you can find a way to accomplish your goals with less effort, it’s always a win. Every project everywhere always needs more time or money, what’s more innovative is how you can help a project to ship without one of those two.

At a broader perspective than just scope – one of the biggest ways product can help engineering is by pushing harder for killing off features and the scope of a product. There’s a good test on if something is ready to ship: if you tell beta users you’re killing it and they yell at you that you shouldn’t kill it, then it’s ready to ship.

If you’ve already shipped things, but they’re not delivering value or not being used, kill them. It’s that simple, it may have been a great idea at the time, but either invest in making sure it’s used or kill it so you don’t have to maintain it.

Roadmap planning

Usually getting velocity and killing things takes 3-6 months to really take full effect. At this point a team feels like they’re not under a pile of technical debt, and they can commit to shipping projects. This is the point when product and engineering are melding and you can really start to have fun about where you’re headed. At this point I’ve seen a huge mix of where engineers are more actively or less actively engaged in this process. And the reality is this is everyone’s job to be thinking about where you’re headed as a company – at least that’s the case for any company that classifies itself as a startup.

My favorite tool for this is a team gridding exercise, you can read more about this here and here. This is often best conducted at an off-site where you have an opportunity for casual conversation which can foster broader thinking beyond the obvious bug fixes or smaller product improvements.

One item of note I’ve heard from teams that have done this or similar exercises is they still have trouble deciding what to do after the fact. The role of product is to get to that decision. The most important part is getting to a decision and not the perfect one, gather data, decide, revisit as you go along. All of this isn’t to say that it’s an arbitrary decision, customers, data all inform that as well as the effort to impact matrix exercise, but in the end a clear direction isn’t executed on consensus.

In conclusion

There’s really no end or done when it comes to the role and the work.

There’s always another milestone and the market is always moving around you. But once you’re able to execute predictably and think in an ordered sense about your roadmap, you’re in a position to be able to monitor and adapt to the market, and even more so experiment and shape the market yourself. At that point you have to keep doing it and then the hard part becomes finding ways of keeping a fresh perspective protip: customers are an important part of that equation

Have tips/tricks/practices that I completely missed here or that you disagree with? I’m always happy to talk with others so drop me a note craig.kerstiens@gmail.com.

]]>2014-10-01T00:00:00-07:00http://www.craigkerstiens.com/2014/10/01/a-simple-guide-for-db-migrationsMost web applications will add/remove columns over time. This is extremely common early on and even mature applications will continue modifying their schemas with new columns. An all too common pitfall when adding new columns is setting a not null constraint in Postgres.

Not null constraints

What happens when you have a not null constraint on a table is it will re-write the entire table. Under the cover Postgres is really just an append only log. So when you update or delete data it’s really just writing new data. This means when you add a column with a new value it has to write a new record. If you do this requiring columns to not be null then you’re re-writing your entire table.

Where this becomes problematic for larger applications is it will hold a lock preventing you from writing new data during this time.

A better way

Of course you may want to not allow nulls and you may want to set a default value, the problem simply comes when you try to do this all at once. The safest approach at least in terms of uptime for your table –> data –> application is to break apart these steps.

Start by simply adding the column with allowing nulls but setting a default value

Run a background job that will go and retroactively update the new column to your default value

Add your not null constraint.

Yes it’s a few extra steps, but I can say from having walked through this with a number of developers and their apps it makes for a much smoother process for making changes to your apps.

]]>2014-08-15T00:00:00-07:00http://www.craigkerstiens.com/2014/08/15/my-postgres-wishlist-for-9.5As I followed along with the 9.4 release of Postgres I had a few posts of things that I was excited about, some things that missed, and a bit of a wrap-up. I thought this year (year in the sense of PG releases) I’d jump the gun and lay out areas I’d love to see addressed in PostgreSQL 9.5. And here it goes:

Upsert

Merge/Upsert/Insert or Update whatever you want to call it this is still a huge wart that it doesn’t exist. There’s been a few implementations show up on mailing lists, and to the best of my understanding there’s been debate on if it’s performant enough or that some people would prefer another implementation or I don’t know what other excuse. The short is this really needs to happen, until that time you can always implement it with a CTE which can have a race condition.

Ability to accept a DSN to a utility function to create foreign user and tables.

Better security around creds of foreign tables

More out of the box FDWs

Stats/Analytics

Today there’s madlib for machine learning, and 9.4 got support for ordered set aggregates, but even still Postgres needs to keep moving forward here. PL-R and PL-Python can help a good bit as well, but having more out of the box functions for stats can continue to keep it at the front of the pack for a database that’s not only safe for your data, but powerful to do analysis with.

Multi-master

This is definitely more of a dream than not. Full multi-master replication would be amazing, and it’s getting closer to possible. The sad truth is even once it lands it will probably require a year of maturing, so even more reason for it to hopefully hit in 9.5

Logical Replication

The foundation made it in for 9.4 which is huge. This means we’ll probably see a good working out of the box logical replication in 9.5. For those less familiar this means the replication is SQL based vs. the binary WAL stream. This means things like using replication to upgrade across versions is possible. So not quite 0 downtime, but ~ a minute or two to upgrade versions. Even of large DBs.

An official GUI

Alright this one is probably a pipe dream. And to kick it off, no pgAdmin doesn’t cut it. A good end user tool for connecting/querying would be huge. Fortunately the ecosystem is improving here with JackDB (web based) and PG Commander (mac app), but these still aren’t discoverable enough for most users.

What do you want?

]]>2014-08-13T00:00:00-07:00http://www.craigkerstiens.com/2014/08/13/when-to-ship-when-to-killA few weeks ago at lunch I had the opportunity to catch up with a company in the current YC batch, building something very similar to dataclips. While we talked about a lot of things from what we’ve learned from dataclips, marketing, and other areas. One area we talked about was product and when to ship vs. when to kill things and I realized I hadn’t talked on my fairly simple but clear view on this publicly, so here it is.

A large credit to Adam Wiggins for giving this model early on in Heroku and his approach to shipping product.

A precursor to shipping

First a little background on shipping, in shipping something I’m going to assume you have some process of alpha/beta testing with users. This is actually fairly key, if you’re not testing it with users then well the rest of this is all moot. Alpha and beta testing is pretty simple, you need some early users. These can be friends, people within a network, or random users you select from. There’s different value to how you select these but that’s a topic for another time and place.

On to shipping

So how do you know it’s ready. The basic idea is super simple. Give it to some users in alpha/beta testing. Or start to roll it out following a one –> some –> many all principle (maybe to 5% or 10% of your userbase). Then take that brand new feature away.

There’s a couple of ways to do this as far as mechanics. If you’re in contact with users such as alpha/beta users that you were higher touch with just email them. Tell them you’re removing the feature, or if you want to approach it more softly ask them how much they’d miss it if it were gone tomorrow. If you’re rolling it out more broadly perhaps behind a feature flag, flip it off and watch for feedback.

Once you take the feature away or threaten to if you don’t have users with pitchforks almost immediately then it’s not ready to ship.

Go back to the drawing board and work more on it or simply kill it. As @james_heroku would say: “So you’re saying the reason to ship the shitty thing now is becase you’ve spent a lot of time on it?”. Stepping back it’s all logical, but all too often it’s not put in practice when shipping it.

Your metrics can lie

Relying on just seeing a user spend some time on the new feature can often be misleading vs. the above approach. There’s a great talk by Des Traynor over at intercom.io that hits on this in part, the basic premise in there is that users shifting time from feature X to Y doesn’t mean it was a success it just means they’re spending time on something different. In launching new things you want to increase the overall value of your product, not simply shift users focus to the new flavor of the week.

]]>2014-07-14T00:00:00-07:00http://www.craigkerstiens.com/2014/07/14/on-scribingIn the process of growing a company there’s several hurdles based on the size of the company. What worked at 5 doesn’t work at 20, what works at 20 doesn’t work at 50, and what worked at 50 doesn’t work at 150. There’s a lot of talk about two pizza teams and scaling development teams out there. One thing I haven’t seen quite enough of is details around scribing and documenting things.

Planning

At teams of 2 and 3 you get everyone in a room. Perhaps 1 person says what you’re going to do and you all rally around it, or maybe it’s a day of debate and persuasion from all sides.

In the end though you all leave, get heads down, but all know what goal you’re working towards. At a larger company planning doesn’t scale quite this way. I’ve seen roadmapping and planning done a variety of ways as companies scale, but most times the thing they miss for far too long is documenting what comes out of it. Many may produce some level of artifact, but a cohesive wrap-up is often missed. Such an artifact should be easily digestible within a couple minutes, but also deep enough to answer many of the initial questions raised by the high level pieces.

Meetings

Meetings are a smaller level item than broader planning, and tend to go without thorough note taking than higher level planning. With growth you’ll have more meetings, trust me you will. The more meetings you have the more likely you may miss one or two you’re interested in. Or perhaps its as simple as some team members being out. Summer is especially hard around this. For a team of 10 it’s not uncommon that you may go all summer with at least 1 person not in the meeting and often two.

Keeping those that miss the meeting well informed of what happened at it is critical as you scale. This is slightly less important at an extremely large company, though still valuable, but critical as you scale to larger. As you’re scaling things are changing faster, and context can more easily get lost.

So how do you improve this?

Some practical tips:

Have a set of running notes with someone consistently scribing is a great standard to set. If you missed a meeting you know where to go for it.

Recording who was and was not at the meeting can be incredibly valuable. I’ve heard statements “I said X at Y meeting”, the only problem with that statement is I wasn’t at Y meeting.

Not only recording the meeting notes, but explicitly calling out who’s not there can help to know if that information should be explicitly passed along vs. just missed.

Within your long running document have a summary to wrap it up. While scribing is great it can lead to not seeing the forest for the trees at times.

And a few from others:

Meetings need a purpose and an agenda. If I don’t know why I’m having a meeting, or what will be covered, I won’t go. If I’m organizing a meeting and can’t spare the time to produce an agenda and goal, I shouldn’t waste other people’s time with the meeting – @jacobian

Any meeting over about 15-20 isn’t a meeting, it’s a presentation (which is OK too but make it clear that it’s a download, not a discussion). – @jacobian

Email

If you aren’t aware I’m a big fan of email. Email is almost guaranteed that someone will at least open it (at least if its to them or a clear enough list). If you have something you want someone to read – email it. You can have a canonical wiki, or Trello board, or a variety of tools, but email will get more eyeballs than any of these. At the same time don’t email things that are already documented elsewhere.

Emails are great for highlighting the things people absolutely need to know about. Short and concise emails will also help to improve reach. Be careful to make these emails have a high ratio of information size to value. If you have a lot of extra follow on content send them somewhere else to read.

]]>2014-05-22T00:00:00-07:00http://www.craigkerstiens.com/2014/05/22/on-connection-poolingConnection pooling is quickly becoming one of the more frequent questions I hear. So here’s a primer on it. If there’s enough demand I’ll follow up a bit further with some detail on specific Postgres connection poolers and setting them up.

The basics

For those unfamiliar, a connection pool is a group of database connections sitting around that are waiting to be handed out and used. This means when a request comes in a connection is already there whether in your framework or some other pooling process, and then given to your application for that specific request or transaction. In contrast, without any connection pooling your application will have to reach out to your database to establish a connection. While in the most basic sense you may thinking connecting to a database is quick, often theres some overhead here. An example is SSL negotiation that may have to occur which means you’re looking at not 1-2 ms but often closer to 30-50.

The options

There’s really two major options when it comes to connection pooling:

Framework pooling

Standalone pooler

Persistent connections

Framework pooling

Today many modern application frameworks have at least some basic level of connection pooling. This means as your application server starts up it will create a pool of connections to use. It’s worth noting that while most modern frameworks have pooling, not all do, and further it may not be enabled by default.

If you’re using the Sequel ORM for Ruby or SQLAlchemy for Python you’re well covered here. Further Rails is in pretty good shape also, though you may want to configure the pool size. For Django it’s a bit of a mixed story. For some time Django did not have pooling at all. As of Django 1.6 you now have persistent connections by default and the ability to enable a pool.

Persistent connections

Persistent connections don’t offer all of the benefits of pooling, but can often work well enough. Persistent connections is the act of maintaining a connection to your database once it’s connected. In the case where you have overhead of 30-50 ms each time you connect this can be quite helpful. At the same time you’re limited to the number of things that can be interacting with your databases as you’re limited to 1 connection per entry point to your webserver.

Standalone pooling

Postgres can be a bit of a sore spot when it comes to handling a ton of connections. For Postgres each connection you have to your database assumes some overhead of memory. Casual observations have seen it be between 5 and 10 MB assuming some basic query workload. And even if you have the memory overhead on your Postgres instance there becomes a point where management of connections becomes a limiting factor, we’ve seen this somewhere in the hundreds. While framework level connection poolers can give some better performance and lengthen the time before you have to deal with something more complex if you’re successful that time may come.

A rule of thumb I’d use is if you have over 100 connections you want to look at something more robust

In this case that something more robust is a standalone pooler specifically for Postgres. A standalone pooler can be much more configurable overall letting you specify how it works for Postgres sessions, transactions, or statements. Further these are very specifically designed to work with Postgres handling a very large pool of connections without adding too much overhead. In contrast to the 5MB-ish standard connection to Postgres PG Bouncer has a 2kb per connection.

So once you’re at the point of needing one there’s really two options.

PG Bouncer

My short and sweet recomendation is towards PG Bouncer. Contrary to how it’s named PG Pool is a multi purpose tool that does a lot of things (pooling, load balancing, replication, more). PG Bouncer takes the philosophy of doing one thing and doing it extremely well. I tend to favor these types of tools, which is the same reason I lean towards WAL-E to help with Postgres replication.

Need more?

Need more guidance with setting up and running PGBouncer? Give this guide a look or try the pgbouncer buildpack if running on Heroku. If you’re still interested in a deeper guide let me know @craigkerstiens and I’ll work on getting it into the queue.

Finally, make sure to sign-up below to get updates on Postgres content and first access to training.

* indicates required

Email Address *

]]>2014-05-08T00:00:00-07:00http://www.craigkerstiens.com/2014/05/08/personas-data-science-k-means If one of the industry lingo terms in title didn’t make your skin crawl a little then I need to try harder. At the same time you’ve probably heard someone use one of them in a non-trolling way in the last month. All three of these can often actually mean the same or similar things, it’s just people approach them differently from their world perspective.

Personas don’t have to be marketing only speak, and data science doesn’t have to be only for stats people. My goal here is to simply set a context for the rest of the meat which talks about how you can simply look at your data and let it surface things you may not have known.

Personas

I most commonly hear this term from “business people”. In fact not too long ago I recall interacting with someone that wanted to define personas for a company. They wanted to give them names, Joe and Mary. Joe is a father of 2, he works between 8 and 5, because he has to pick kids up from school, he’s always worked at fortune 100 companies. Mary is single, she’s a small business owner, she likes using tools instead of building things herself. If you think this is overly exaggerated on what you might expect that’s fair. Lets take a company I’m fond of Travis CI, if someone were to do this for them it might look like:

Enterprise QA developer

Startup full stack engineer

Open source contributor

While this is all fine and good, a name and what they do doesn’t help in the substantial way I’d like. Sure use personas if it helps you think about who you’re building the product for, but don’t expect customers to say yes I fit into only this bucket by trying to create classifications like this.

Let’s rephrase this to be super simple, groupings of people, no groupings of something that have a likely outcome based on some various inputs. Perhaps a better term for it is archtype

Data science

The application of math or statistics to learn something about your business. It doesn’t have to be big data, or NoSQL, simply the application of an algorithm to learn something. Extending it a bit, let’s assume it’s to do something actionable. This is a bit of a chicken and egg, because you can’t look at different data the same way everytime and have a valuable intrepretation. Sometimes it requires using several methods and examining the quality of the results. We can apply a little more clarity and judgement to ease this process though.

k-means

Alright onto the meat of what I was hoping to dig into here, well actually first a little more of a detour. Tracking key data for your business should be extremely clear. Hopefully you’re already doing this, if you’re not already tracking month over month growth then go implement it today. If you don’t know your lifetime value or attrition rate then get on those too. But if you do have that and still are unclear how to move the needle on some goal, maybe that goal is increasing lifetime value then we’re at the right place.

An extremely old algorithm for grouping things together and fairly commonly known in stats communities is k-means. It will group things together based on their likeness into some set, thats where the k comes from, of groups. It’s also known as an unsupervised clustering method, because you simply put the data in, and let it create these groupings for you. But why or how is it useful, you know you want to influence lifetime value so you should just find what makes people increase it and move that, well… we may be able to get there with k-means.

Practicality

Most commonly when you search for k-means you’ll find some image similar to the one at the top of the post. This image graphically represents the clustering and the center of those clusters. And while visually interesting doesn’t actually tell you how to act upon it. A clearer way is actually often by examing the clusters and whats common, this tells you how to actually treat that archtype differently.

In his book Data Smart John Foreman actually does a great job of laying this out in a pratical way. I’m particularly partial to his example also because it uses wine as an example. His example generates a variety of groupings, looking at the surrouding meta data its then possible to discover that:

Grouping 1 likes Pinot

Grouping 2 likes buying in bulk

Grouping 3 likes buying small volume

Grouping 4 likes bubbly

From here you can then start to get some idea of what you’d do with this. Perhaps you’d create a deal each month so that it appeals to all groups, or target them with different deals. Or maybe you’d simply not send an email to them if you didn’t have a deal that month. If course you could go more granular down into a recommendation engine to get a personalized recommendation for each customer, but for a lot of smaller apps/sites that’s simply not feasible.

So in this case the output would look less like the image at the top and more like a set of 4 groups, then a CSV of every user and which grouping they fall in. Yes, its a less sexy graph, but a much more applicable CSV or excel output.

In the end what we’ve really done is define personas or archtypes based on whats similar between customers vs. arbitrary perceptions we may come in with.

Whats next

Up next I’ll actually dig in on a real world example here. Alex over at HackDesign was kind enough to give me access to their data to create a more practical example of this. While I’m just now digging in, there should be a tangible example of this to follow.

]]>2014-05-07T00:00:00-07:00http://www.craigkerstiens.com/2014/05/07/Postgres-datatypes-the-ones-youre-not-usingPostgres has a variety of datatypes, in fact quite a few more than most other databases. Most commonly applications take advantage of the standard ones – integers, text, numeric, etc. Almost every application needs these basic types, the rarer ones may be needed less frequently. And while not needed on every application when you do need them they can be an extremely handy. So without further ado let’s look at some of these rarer but awesome types.

hstore

Yes, I’ve talked about this one before, yet still not enough people are using it. Of this list of datatypes this is one that could also have benefit for most if not all applications. Hstore is a key-value store directly within Postgres. This means you can easily add new keys and values (optionally), without haveing to run a migration to setup new columns. Further you can still get great performance by using Gin and GiST indexes with them, which automatically index all keys and values for hstore.

It’s of note that hstore is an extension and not enabled by default. If you want the ins and outs of getting hands on with it, give the article on Postgres Guide a read.

Range types

If there is ever a time where you have two columns in your database with one being a from, another being a to, you probably want to be using range types. Range types are just that a set of ranges. A super common use of them is when doing anything with calendaring. The place where they really become useful is in their ability to apply constraints on those ranges. This means you can make sure you don’t have overlapping time issues, and don’t have to rebuild heavy application logic to accomplish it.

Timestamp with Timezone

Timestamps are annoying, plain and simple. If you’ve re-invented handling different timezones within your application you’ve wasted plenty of time and likely done it wrong. If you’re using plain timestamps within your application further there’s a good chance they dont even mean what you think they mean. Timestamps with timezone or timestamptz automatically includes the timezone with the timestamp. This makes it easy to convert between timezones, know exactly what you’re dealing with, and will in short save you a ton of time. There’s seldom a case you shouldn’t be using these.

UUID

Integers as primary keys aren’t great. Sure if you’re running a small blog they work fine, but if you’re application has to scale to a large size then integers can create problems. First you can run out of them, second it can make other details such as sharding a little more annoying. At the same time they are super readable. However, using the actual UUID datatype and extension to automatically generate them can be incredibly handy if you have to scale an application.

Similar to hstore, there’s an extension that makes the UUID much more useful.

Binary JSON

This isn’t available yet, but will be in Postgres 9.4. Binary JSON is of course JSON directly within your database, but also lets you add Gin indexes directly onto JSON. This means a much simpler setup in not only inserting JSON, but having fast reads. If you want to learn a bit more about this, sign up to get notified of training regarding the upcoming PostgreSQL 9.4 release.

Money

Please don’t use this… The money datatype assumes a single currency type, and generally brings with it more caveats than simply using a numeric type.

More

It’s already been pointed out on twitter that I missed a few. To give a quick highlight of some others:

In conclusion

What’d I miss? What are you’re favorite types? Let me know @craigkerstiens, or sign-up below to updates on Postgres content and first access to training.

Sign up to get weekly advice and content on Postgres

* indicates required

Email Address *

]]>2014-04-08T00:00:00-07:00http://www.craigkerstiens.com/2014/04/08/What-you-need-to-know-about-april-7-and-the-web
On April 7 a vulnerability, nicknamed heartbleed, was discovered in a programming library that helps power somewhere over half of the internet. In the most basic sense this library allowed intentional external parties to acquire data that was thought to be safe and secure from whomever was running a vulnerable website. There was little to know one that was except from this due to their security practices, major examples of sites that were affected include:

The short of it is you, yes you as in everyone, should rotate your passwords once all websites are safe. For further details please continue reading.

What does the vulnerability mean

In this case it allowed an external party to acquire a moderate amount of data from some computer running your website. Extremely clear examples (such as shown on the right) highlight an example of random third parties easily acquiring most recently logged in Yahoo mail usernames and passwords.

The first step

The first step in resolving this is actually not a step required by you at all, unless you’re running a production website online. The first step requires the developers running the site to update their site so they are no longer vulnerable. This as available to happen as early as April 7, and many major sites were fully updated and again safe as of April 8.

Still area for concern

With security vulnerabilities there are two key things to consider. First is the vulnerability itself, second is whether its therotical or can be simply acted upon. Yes, there’s a range here. One of the most unfortunate pieces from talking to those that know about security is this was extremely trivial to act upon.

This is made even worse in that this vulnerability has existed for 2 years without many knowing about it, meaning people have had an ability to snoop and collect parts of your data for two years

What to do?

First things first, be extremely cautious with any major website you connect with anything important. Any account that you have a password and you care about the account you should cease logging into it until you know its safe. As of the morning of April 8 here is a list of sites that were safe and ones that were vulnerable. You can check any site today here.

Once it’s clear that a site you know is now updated and safe either via that list of the latter tool you should change your password. For the time that this has existed and ease of comprimising its safe to assume all of your internet passwords and data within those accounts could have been comprimised. This means any website you have logged into within the last two years you should change the password for. Changing your passwords limits anyone being able to access that again.

I am not a security expert or analyst, but have heavily interacted with many that are in dealing with this incident. This advice is high level intended at non technical experts, if you have any questions or feedback please let me know on twitter @craigkerstiens

]]>2014-03-31T00:00:00-07:00http://www.craigkerstiens.com/2014/03/31/some-short-untypical-marketing-tipsMarketing is generally unexciting to a ton of engineers, until it brings eyeballs which bring feedback and dollars. Marketing doesn’t have to always be cheesy campaigns or ads, it can often just be surfacing the things your customers actually do want to care about. My favorite type of marketing is when a service sells me on something at the exact time I want it. Here’s a few short tips on some non-traditional marketing that won’t seem sleezy but still can work quite well.

Email subscriptions to your blog

RSS is pretty dead, google went and killed it with google reader. Sure there’s some decent replacements if you’re really tied to it. In particular newsblur by @samuelclay is a great reader. But now days content emerges on twitter, fb, and ranking services, then later is discovered via search. Both of these work pretty well, but twitter is ephemeral for so many. Email still converts incredibly well, if people are abandoning rss but still care about your content give them the ability for it to be put right in front of their face via email.

Market in transactional emails

Have emails that include receipts? Account confirmations? General notices? No not a monthly newsletter! Transactional emails are obviously valuable to your users. Why not include a small call out to your latest announcement? Have a central hook that your emails can check from and simply include a small call to action within there.

Retarget to your existing users

In a similar vein of notifying your existing customers in transactional emails about news, you should be doing this all over the web. Retargeting is great to convert people once you’ve already got them on a landing page, but its also incredibly useful to get existing users to use a specific feature. If you track if they’ve never used a feature retargeting is a great way to make them aware of it, and once they’ve used it just count it as a conversion.

My favorite retargeting provider perfect audience makes this quite convenient as they allow a bit more control than most retargeting services

In conclusion

Marketing doesn’t have to be throwing your product and messaging in someones face, but you should make your users aware of it. The more engaged they are they more they’ll stick around and be happy about using you’re product, assuming you’ve built a good one. What are some of your favorite tips?

]]>2014-03-26T00:00:00-07:00http://www.craigkerstiens.com/2014/03/26/a-year-of-postgresA couple years back I started more regularly blogging, though I’ve done this off and on before, this time I kept some regularity. A common theme started to emerge with some content on Postgres about once a month because most of what was out there was much more reference oriented. A bit after that I connected with petercooper, who runs quite a few weekly email newsletters. As someone thats been interested helping give others a good reason to create content the obvious idea of Postgres Weekly emerged.

Since then we’ve now had the newsletter running for over a year, helped surface quite a bit of content, and grown to over 5,000 subscribers. First if you’re not subscribed, then go subscribe now.

And if you need some inspiration or just want to reminisce with me… here’s a look back at a few highlights over the past year:

Aggregate Knowledge released Postgres HyperLogLog, which is a new Postgres datatype hll that strikes a balance between HyperLogLog and a simple set. This data type solves the problem of calculating uniques for a given data set efficiently both in performance and storage.

The above is still one of my favorite extensions that most of the world doesn’t know about

A common question for anyone new or even experienced with Postgres is whats the best editor out there? Most when they are asking this are asking for a GUI editor, this post highlights much of the power in the CLI ‘psql’ editor.

A mix of notable entries

After the heavily publicized and very serious security vulnerability was patched last week Blackwing intelligence took the chance to dig in. Read more on the details of the vulnerability such as what damage can be done and the basics of how its exploitable.

Tom Lane, one of the major contributors to Postgres and on the Postgres core team, was in San Francisco last week and gave a talk at the SF Postgres Users Group. Here’s the video from the talk where Tom explains the innards of the PostgreSQL query planner. Whether you’re a noob or a knowledgable Postgres user this is a must watch.

After a fresh install, there are probably a few knobs you want to tweak on Postgres. If you’re new to doing this, it can be a bit overwhelming. Here’s a quick primer on tuning a brand new server to be more properly configured.

In conclusion

]]>2014-03-24T00:00:00-07:00http://www.craigkerstiens.com/2014/03/24/Postgres-9.4-Looking-upJust a few weeks back I wrote a article discussing many of the things that were likely to miss making the 9.4 PostgreSQL release. Since that post a few weeks ago the landscape has already changed, and much more for the positive.

JSONB

JSON has existed for a while in Postgres. Though the JSON that exists today simply validates that your text is valid JSON, then goes on to store it in a text field. This is fine, but not overly performant. If you do need some flexibility of your schema and performance without much effort then hstore may already work for you today, you can of course read more on this in an old post comparing hstore to json.

But let’s assume you do want JSON and a full document store, which is perfectly reasonable. Your option today is still best with the JSON datatype. And if you’re retrieving full documents this is fine, however if you’re searching/filtering on values within those documents then you need to take advantage of some functional indexing. You can do this some of the built-in operators or with full JS in Postgres. This is a little more work, but also very possible to get good performance.

Finally, onto the perfect world, where JSON isn’t just text in your database. For some time there’s been a discussion around hstore and its future progress and of course the future of JSON in Postgres. These two worlds have finally heavily converged for PostgreSQL 9.4 giving you the best of both worlds. With what was known as hstore2, by The Russians under the covers, and collective efforts on JSONB (Binary representation of JSON) which included all the JSON interfaces you’d expect. We now have full document storage and awesome performance with little effort.

Digging in a little further, why does it matter that its a binary representation? Well under the covers building on the hstore functionality brings along some of the awesome index types in Postgres. Namely GIN and possibly in the future GIST. These indexes will automatically index all keys and values within a document, meaning you don’t have to manually create individual functional indexes. Oh and they’re fast and often small on disk as well.

Logical Decoding

Logical replication was another feature that I talked about that was likely missing. Here there isn’t the same positive news as JSONB, as there’s not a 100% usable feature available. Yet there is a big silver lining in it. Committed just over a week ago was logical decoding. This means that we can decode the WAL (Write-Ahead-Log) into logical changes. In layman’s terms this means something thats unreadable to anything but Postgres (and version dependent in cases) can be intrepretted to a series of INSERTs, UPDATEs, DELETEs, etc. With logical commands you could then start to get closer to cross version upgrades and eventually multi-master.

With this commit it doesn’t mean all the pieces are there in the core of Postgres today. What it does mean is the part thats required of the Postgres core is done. The rest of this, which includes sending the logical replication stream somewhere, and then having something apply it can be developed fully as an extension.

In Conclusion

Postgres 9.4 isn’t 100% complete yet, as the commitfest is still going on. You can follow along on the postgres hackers mailing list or on the commitfest app where you can follow specific patches or even chip in on reviewing. And of course I’ll do my best to continue to highlight useful features here and surface them on Postgres Weekly as well.

Sign up to get weekly advice and content on Postgres

* indicates required

Email Address *

]]>2014-02-26T00:00:00-08:00http://www.craigkerstiens.com/2014/02/26/Tracking-MoM-growth-in-SQL In analyzing a business I commonly look at reports that have two lenses, one is by doing various cohort analysis. The other is that I look for Month over Month or Week over Week or some other X over X growth in terms of a percentage. This second form of looking at data is relevant when you’re in a SaaS business or essentially anythign that does recurring billing. In such a business focusing on your MRR and working on growing your MRR is how success can often be measured.

I’ll jump write in, first lets assume you have some method of querying your revenue. In this case you may have some basic query similar to:

SELECT date_trunc('month', mydate) as date,
sum(mymoney) as revenue
FROM foo
GROUP BY date
ORDER BY date ASC;

Now this is great, but the first thing I want to do is start to see what my percentage growth month over month is. Surprise, surprise, I can do this directly in SQL. To do so I’ll use a window function and then use the lag function. According to the Postgres docs

lag(value any [, offset integer [, default any ]]) same type as value returns value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, instead return default. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null

Essentially it orders it based on the window function and then pulls in the value from the row before. So in action it looks something like:

SELECT date_trunc('month', mydate) as date,
sum(mymoney) as revenue,
lag(mymoney, 1) over w previous_month_revenue
FROM foo
WINDOW w as (order by date)
GROUP BY date
ORDER BY date ASC;

Combining to actually make it a bit more pretty (with some casting to a numeric and then formatting a bit) in terms of a percentage:

Sign up to get weekly advice and content on Postgres

* indicates required

Email Address *

]]>2014-02-15T00:00:00-08:00http://www.craigkerstiens.com/2014/02/15/PostgreSQL-9.4-What-I-WantedTheres no doubt that the 9.4 release of PostgreSQL will have some great improvements. However, for all of the improvements it delivering it had the promise of being perhaps the most impactful release of Postgres yet. Several of the features that would have given it my stamp of best release in at least 5 years are now already not making it and a few others are still on the border. Here’s a look at few of the things that were hoped for and not to be at least until another 18 months.

Upsert

Upsert, merge, whatever you want to call it, this is been a sore hole for sometime now. Essentially this is insert based on this ID or if that key already exists update other values. This was something being worked on pretty early on in this release, and throughout the process continuing to make progress. Yet as progress was made so were exteneded discussions about syntax, approach, etc. In the end two differing views on how it should be implemented have the patch still sitting there with other thoughts on an implementation but not code ready to commit.

At the same time I’ll acknowledge upsert as a hard problem to address. The locking and concurrency issues are non-trivial, but regardless of those having this in there mostly kills the final argument for anyone to chose MySQL.

Better JSON

JSON is Postgres is super flexible, powerful, and generally slow. Postgres does validation and some parsing of JSON, but without something like PLV8, or functional indexes you may not get great performance. This is because under the covers the JSON is represented as text and as a result many of the more powerful indexes that could lend benefit, such as GIN or GIST, simply don’t apply here.

As a related effort to this hstore, the key/value store, is working on being updated. This new support will add types and nesting making it much more usable overall. However the syntax and matching of how JSON functions isn’t guranteed to be part of it. The proposal and actually work is still there and not rejected yet, but looks heavily at risk. Backing a new binary representation of JSON with hstore 2 would deliver so many benefits further building upon the foundation of hstore, JSON, PLV8 that exists today for Postgres.

apt-get for your extensions

I’m almost not even sure where to start with this one. The notion within a Postgres community is that packaging for distros is super simple and extensions should just be packaged for them. Then there’s PGXN the Postgres extension network where you can download and compile and muck with annoying settings to get extensions to build. This proposal would have delivered a built in installer much like NPM or rubygems or PyPi and the ability for someone to simply say install extension from this centralized repository. No, it was setting out to solve the issue of having a single repository but would make it much easier for people to run one.

For all the awesome-ness that exists in extensions such as HyperLogLog, foreign data wrappers, madlib theres hundreds of other extensions that could be written and be valuable. They don’t even all require C, they could fully exist in JavaScript with PLV8. Yet I’m on the fence encouraging people to write such because if no one uses it then much of the point in the reusability of an extension is lost. Here’s hoping that there’s a change of opinion in the future that packaging is a solved problem and that creating an ecosystem for others to contribute to the Postgres world without knowing C is a positive thing.

Logical replication

When I first heard this might have some shot at making it in 9.4 I was shocked. This is something that while some may not take notice of I’ve felt pain of for many years. Logical replication means in short enabling upgrades across PostgreSQL versions without a dump and restore, but even more so laying the ground work for more complicated architectures like perhaps multi-master. Yes, even with logical replication in theres still plenty of work to do, but having the groundwork laid goes a long way. There are options for it today with third party tools, but the management of these is painful at best.

In conclusion

The positive of this one is that the building blocks are in and its continuing to make progress. Its just that we’ll have to wait about 18 months before the release of PostgreSQL 9.5 before its in our hands.

Sign up to get weekly advice and content on Postgres

* indicates required

Email Address *

]]>2014-02-07T00:00:00-08:00http://www.craigkerstiens.com/2014/02/07/my-email-hacksIn a conversation with @alexbaldwin yesterday the topic of email came up, with each of us quickly diving into various observations, how its both awesome and a great form of communication/engagement, how most people still do it really bad. Alex has some good experience with it with hack design having over 100,000 subscribers. A tangent in an entirely unrelated meeting with @mschoening and others it was suggested instead of emailing a list to send out a ton of individual emails instead. Both of these reminded me that email is incredibly powerful, but taking advantage of its power has to be intentional.

This is not about ways to get to inbox 0 or better manage your inflow of emails. Rather its about how to get the maximum output out of emails that you send, or minimum output depending on what you prefer.

1 email to 100 vs. 100 emails to 1

This is perhaps my favorite approach to get more efficient feedback and also know how broad an impact something has. Most smaller companies or groups within a company have a mailing list thats all@yourcompany.com or ourgroup@mycompany.com. When people want to communicate out to the entire list its a great mechanism, however when you want feedback from the entire company its not a great mechanism.

The reason being is that most people will know how many are on that list and assume that someone else will pick it up. This concept is fairly common in physical settings known as the bystander effect, stating that individuals often do not offer up help to a victim when there are other bystanders preset.

Finally in certain situations you’ll want to hear the same thing 100 times. Hearing something once doesn’t represent how much others echo that. You’ll only see so many +1s on a thread, getting 100 individual responses ensure you get not only the breadth of responses but amplitude of them.

FWIW, I ran a test of this sending an email to essentially all@heroku, then an individualized email in a similar form. The one directly addressed to people received 5x response as well as more thorough responses in the same time frame

Scaling requests for input

The issue that typically exists with the above is that you don’t want 100 responses from 100 people most of the time. Most of the time you want feedback from 2 or 3, then feedback from 4 or 5, then smaller feedback or revision from the rest of that 100. This is actually how I craft blog posts, I start with broad messaging/theming. At that level there’s truly 100 different directions it could go, that kind of input it not helpful when I have to narrow it down to a single one. When collecting product/roadmap input it can be helpful. Knowing which of the two I’m aiming for is critical in deciding a method.

Being explicit about the before and the ask

On the note of crafting a blog post I do usually start with a request from 2 or 3 to get general direction. This takes the effect of, is this interesting? From here though theres still further refinement. The next phase is, does this flow, does it make sense? Here having a broader list is helpful so usually it’ll hit around 4 to 5 people. Finally I’ll revert to the 1 email to 100 people on a mailing list asking for grammar input because mine is crap. Here I don’t mind the bystander effect because I want people to intentionally filter so it works well.

The key at each step of the process is being extremely clear of whats already been done. With a blog post as an example… If I don’t explain the process of people having reviewed and set the goals and some consensus that it meets them, that several have been over it for flow, and that what I’m looking for now which is grammar feedback.

Circulating through people

Email and requests are a time burden on people. I commonly diversify and circle through a set of people. Much in the same way I reach out to people to have drinks or coffee every so often I am to not do the same person every week and only that person with the exception of my wife.

Having more of a rotating basis of getting through people increases their excited-ness to provide input. If I’m always going back to the same people they may feel slightly drained by my constant requests, and quite rightfully so. At the same time the input is good, but diversifying where you receive it gives a broader perspective.

Delayed sending

This is one that may be a little more obvious to people. But sending an email to slow down a thread, not seem over eager, or for whatever other reason you may have is hugely useful. There’s really two tools I look to here: 1. Boomerang and 2. Yesware. Both have slightly different benefits. Boomerang with a much simpler interface, Yesware better integration with Salesforce. Regardless of which you choose, if you ever want to type and email but send it at some point later one of these is critical.

Fin.

While this list is less of a defined process and more of a collection of random processes, several of these I’d be much less effective without, and the collection of all makes getting appropriate reactions from email incredibly useful. I’d love to hear what hacks you use to elicit positive impact from the emails you receive, as always if you have feedback please drop me a note.

]]>2014-02-02T00:00:00-08:00http://www.craigkerstiens.com/2014/02/02/Examining-PostgreSQL-9.4PostgreSQL is currently entering its final commit fest. While its still going, which means there could still be more great features to come, we can start to take a look at what you can expect from it now. This release seems to bring a lot of minor increments versus some bigger highlights of previous ones. At the same time there’s still a lot on the bubble that may or may not make it which could entirely change the shape of this one. For a peek back of some of the past ones:

Highlights of 9.2

Highlights of 9.3

On to 9.4

With 9.4 instead of a simply list lets dive into a little deeper to the more noticable one.

pg_prewarm

I’ll lead with one that those who need it should see huge gains (read larger apps that have a read replica they eventually may fail over to). Pg_prewarm will pre-warm your cache by loading data into memory. You may be interested in running pg_prewarm before bringing up a new Postgres DB or on a replica to keep it fresh.

Why it matters – If you have a read replica it won’t have the same cache as the leader. This can work great as you can send queries to it and it’ll optimize its own cache. However, if you’re using it as a failover when you do have to failover you’ll be running in a degraded mode while your cache warms up. Running pg_pregwarm against it on a periodic basis will make the experience when you do failover a much better one.

Refresh materialized view concurrently

Materialized views just came into Postgres in 9.3. The problem with them is they were largely unusable. This was because they 1. Didn’t auto-refresh and 2. When you did refresh them it would lock the table while it ran the refresh making it unreadable during that time.

Materialized views are often most helpful on large reporting tables that can take some time to generate. Often such a query can take 10-30 minutes or even more to run. If you’re unable to access said view during that time it greatly dampens their usefulness. Now running REFRESH MATERIALIZED VIEW CONCURRENTLY foo will regenerate it in the background so long as you have a unique index for the view.

Ordered Set Aggregates

I’m almost not really sure where to begin with this, the name itself almost makes me not want to take advantage. That said what this enables is if a few really awesome things you could do before that would require a few extra steps.

While there’s plenty of aggregate functions in postgres getting something like percentile 95 or percentile 99 takes a little more effort. First you must order the entire set, then re-iterate over it to find the position you want. This is something I’ve commonly done by using a window function coupled with a CTE. Now its much easier:

SELECT percentile_disc(0.95)
WITHIN GROUP (ORDER BY response_time)
FROM pageviews;

In addition to varying percentile functions you can get quite a few others including:

Mode

percentile_disc

percentile_cont

rank

dense_rank

More to come

As I mentiend earlier the commit fest is still ongoing this means some things are still in flight. Here’s a few that still offer some huge promise but haven’t been committed yet:

Insert on duplicate key or better known as Upsert

HStore 2 – various improvements to HStore

JSONB – Binary format of JSON built on top of HStore

Logical replication – this one looks like some pieces will make it, but not a wholey usable implementation.