Jackson Gabbard's Bloghttps://jg.gg
A Scattered RepositoryMon, 16 Apr 2018 05:59:54 +0000enhourly1http://wordpress.com/https://jg340627569.files.wordpress.com/2018/01/cropped-screenshot-2018-01-21-14-48-13.png?w=32Jackson Gabbard's Bloghttps://jg.gg
3232Communicating via AES 256 GCM between NodeJS and Go(lang)https://jg.gg/2018/01/22/communicating-via-aes-256-gcm-between-nodejs-and-golang/
https://jg.gg/2018/01/22/communicating-via-aes-256-gcm-between-nodejs-and-golang/#respondMon, 22 Jan 2018 13:16:31 +0000http://jg.gg/?p=293Continue reading Communicating via AES 256 GCM between NodeJS and Go(lang)→]]>I recently had the mixed fortune of needing to send encrypted payloads back and forth between a service running in NodeJS and a service built in Golang. It was not a straightforward thing to do, which made me appreciate just how particular and difficult these libraries are to use, doubly so to communicate across services written in different languages. Hence, I decided to write a blog post about the intricacies of the process.

First Issue: Transport Encoding

Enciphered payloads are just strings of bits. These bit strings are not ASCII or UTF8, though. You can’t output them to your terminal, for instance, because they can completely destroy the terminal due to byte sequences that the terminal interprets as instructions. Nor can you include them as HTTP request bodies, for the same reason. They can include characters that are interpreted by HTTP parsers in weird ways, most likely causing message truncation.

So, you need a transport encoding for sending ciphertext around. Base64 works great for this… with a catch.

Golang has multiple Base64 encoding methods. Specifically, StdEncoding and URLEncoding. URLEncoding does what it says on the tin — it creates Base64 output that is safe for use in a URL. Standard Base64 includes characters like the “=” sign, which requires escaping in order to work as part of a URL. (It’s worth noting that URLEncoding Base64 is also a lot longer than StdEncoding.)

I’d written a crypto wrapper library for my Golang service that was using URLEncoding internally. However, NodeJS doesn’t have native support for URLEncoded JSON. There are some NPMs for this, but if you’re like me and you hate adding a million dependencies, you just want this to work out of the box. So, I had to switch my existing Go library to use StdEncoding. Not a big deal, but it’s a Gotcha that tripped me up because I just didn’t expect that NodeJS wouldn’t support this. Ho hum.

Second Issue: Terminology

If you read the NodeJS crypto library documentation, it’s a) sparse, b) full of unexplained jargon and initials, and c) uses different words from the Golang library. Because of course they wouldn’t be the same.

So, what NodeJS calls an “IV,” which is a shortening of the term Initialization Vector, Golang refers to as a nonce. Nonce <sarcasm>*obviously*</sarcasm> is a mash-word for “number used once.” And, <sarcasm>as all of us crypto experts know</sarcasm>, you should never use the same Initialization Vector more than once. Don’t we all feel better, wiser, and superior to the mere mortals who don’t know all this already.

In truth, there is a subtle difference between the two terms. An initialisation vector means “choose some random bytes, used to lock in security of the encryption algorithm.” A none on the other hand, refers to “choose a random number with the correct number of bytes, used to lock in the security of the encryption algorithm” Initialization vectors are usually appended to the message, meaning your ciphertext is slightly longer than your message. A nonce (in theory) can be derived from context. In practice however, it seems the two are used interchangeably.

For AES 256 GCM, your nonce/IV/initialization vector ought to be 12 bytes long (i.e. 96 bits). This has to do with the size of the blocks in the block cipher. An IV needs to be sized correctly for the length of the block size in the block cipher and for the specific counter method used. For GCM with AES256, 12-bytes is standard.

This is important to get right for security and also because if you try to use the wrong sized nonce, you’ll get obscure errors like:

Error: Unsupported state or unable to authenticate data

Super helpful, that. Especially because you can get this error a bunch of different ways. In truth, ambiguous errors when attempting to decrypt a ciphertext is actually an important part of the security of the algorithm. If you can get the algorithm to output different errors by tweaking the ciphertext, you can use chosen ciphertext attacks to infer details about the plaintext. In some cases, you can even decrypt the ciphertext this way.

Unrelated, if you’re using a different cipher algorithm (i.e. aes256 rather than aes-256-gcm), you might get this error instead:

Error: Invalid IV length

My solution to this problem was to generate 12 bytes of randomness to use as the nonce. I chose to prepend these bytes to the start of my encoded blob.

Lastly, secure encryption requires a mechanism for message integrity. This is done by one of many algorithms which calculate a MAC or Message Authentication Code. I’m not going to attempt to discuss the details of this here, but there is a terminology hurdle with this notion. What some algorithms and papers refer to as a MAC others will refer to as an Auth Tag, which is short for Authentication Tag which is just one of an endless string of unexplained synonyms in the crypto world. Now you know.

Third Issue: Some disassembly required

Here’s where things get really gnarly. If you look at the Golang API for sealing an AES 256 GCM-enciphered plaintext message, you’ll note that it deals only with a nonce, a plaintext, and the mysterious “additionalData”:

What’s important about this is that there is something missing.

If we look at NodeJS’s implementation of the same thing, we’ve got an extra piece of required information — the AuthTag.

So that’s an interesting little mystery. Where is this AuthTag component in the Golang library?

Also, note that the deciphering will not work in NodeJS without the AuthTag. You’ll get the extremely useful, highly unique error message:

Error: Unsupported state or unable to authenticate data

Ugh.

To resolve this mystery, I had to dig into the actual source code of the Golang library for crypto. If we crawl into the source a bit, we can see that they’re helping us out by hiding the auth tag from us entirely:

Here you can see that Golang is using/anticipating it’s own structure for the ciphertext and the AuthTag. This isn’t documented in the Golang docs, of course. However, once you know that the ciphertext produced by Golang’s GCM sealing code, you can easily write your own code to splice out these bits. My code looks like this:

The assumption here is that the bundle is a Base64 encoded string of bits with the first 12 bytes being the nonce, the last 16 bytes being the AuthTag, and all the bytes in the middle being the ciphertext. Golang auto-appends the AuthTag to the end and I wrote some assembly code to prepend this with the nonce. With each of these pieces extracted from the Buffer, performing the decryption is finally possible.

(You might note the weird ordering of parameters to this function. It is strange to have callbacks followed by additional params. This was done because this particular callback is used in a chain of callbacks and I curried some of the arguments. Without doing so, it would have meant a lot more forwarding of arguments along the chain, which results in messy code. This way, the function that decrypts the AES key only needs to call onKeySuccess with a single argument, because the other three were already bound to the onKeySuccess function. Yay JavaScript.)

Conclusion

Writing cross-service, secure communication is hard. Different libraries choose to implement the ciphertext and AuthTag packaging differently. Golang is particularly problematic because it does not tell you it has hidden away some of the details. Because it’s not part of the public API, it means that relying on this particular implementation is unsafe. They might choose to put it the other way around without telling you because you were never supposed to reach inside the blackbox is the first place. Alas. Alack.

]]>https://jg.gg/2018/01/22/communicating-via-aes-256-gcm-between-nodejs-and-golang/feed/0akismet-adc1aa15771466a23ddcebfb8453227eHow not to join the wrong companyhttps://jg.gg/2017/02/11/how-not-to-join-the-wrong-company/
Sat, 11 Feb 2017 21:30:45 +0000http://jg.gg/?p=279Continue reading How not to join the wrong company→]]>I recently had the experience of joining and then leaving a startup. I won’t get into the details of the company, but let’s just say it rhymes it “kitty slapper.” I stayed a month, which was long enough to understand the culture of the company, the mission, and the people. In that time, it became really apparent to me that I’d joined the wrong company for me. Once I knew, I didn’t waste any time. I let the team know and transitioned out.

So, on one front, I think I did okay. When I knew it wasn’t going to work, I didn’t belabour it. I didn’t foolishly stick to my plan despite the evidence I had that it wasn’t going to work. A younger, more driven, and more naïve me would have pushed ahead trying to make it work at all costs. I did that at Facebook and mostly just gave myself more grey hairs.

Now, I wouldn’t pretend that I left Facebook in the most mature, conscientious way. No, in truth I was really frustrated and hurt when I left. Things weren’t adding up and my two separate chains of accountability were giving me incongruous feedback. If I’m honest though, I think there was a way forward for me there. It would have just required a lot of letting go of hard feelings and finding a new team to call home. I wasn’t willing to let go of my goals or area of focus at Facebook. So, instead of moving away from security or quitting when it seemed like there was an impasse, I just kept grinding. I gritted my teeth and pushed forward. The result? I still quit a year later, even more frustrated and demoralised than the year before.

Okay, so I didn’t repeat that mistake this time. I recognised the way was blocked by unresolvable issues and I walked away.

But there’s room for improvement here. The question I keep coming back to since I left a couple weeks ago is how did I miss these signals in the first place? I’m a very, very seasoned interviewer. So much so that when someone is interviewing me, it typically works more like I’m interviewing them. So how the hell did I miss majorly important signals that would’ve prevented me from joining that company in the first place?

After another week of self-reflection, I think it comes down to just a few things.

#1 Don’t fall subject to the authority bias

The first book of the Bible explains what happens when we disobey a great authority: we get ejected from paradise. This is also what less celestial authorities would have us believe.

When I went for one of several interviews with the CEO, I noted a clear authoritarian streak in his communication. Shockingly (at least to myself in retrospect), rather than investigate this, I bowed down. I was excited about the company and scared of not getting to join it. So, even though I had signal that *already* was telling me something wasn’t right, I ignored it. This is probably a bit of confirmation bias on my part as well, but the fact that I didn’t take the opportunity to ask the hard questions I should have is something different. It’s the authority-fearing part of me (and I guess part of everyone) who is afraid of causing trouble — even when the consequences of *not* causing that trouble turn out to be much worse.

#2 Don’t let “winning” override making a smart decision

In the interview phase of this company, I sent over some outlines that explained how I would run a team doing the work I had been discussing with the leadership team. It was very technical (as one might imagine it ought to be if I’m being interviewed to be a lead engineer). Now, from the technical people, I got a thumbs up. In fact, after a month at the company, I can see that the outline I sent over was basically spot on. Yay me. The important part about this is that I also got told that my communication “wasn’t a successful one for informing business people.”

At this point, if I’m honest with myself, I let my ego get the better of me. Incredulity overtook me and I got caught up in a haughty, “Why would business people expect to be able to understand an outline explaining how to tackle a hard technical problem?” train of thought. Now — that question is actually the perfect question to ask. However, instead of facing the reality that I was having to ask such a ridiculous thing and walking away, I tried to “win” the debate. I should have just accepted that at this particular company the top of the food chain was obviously business people rather than engineers. I should have owned the reality that their expectation would be that I would communicate extremely complex things upwardly in some sort of digest format. That if I didn’t summarise complexity digestibly enough, the communication would be a failure and blame would be assigned to me.

But I didn’t do any of that. I got caught in the trap of “No way, I’m a great communicator — I’ll prove it!” I can see that now, but it was not at all obvious to me then. In fact, after being told that my hours of free effort devising a way to tackle a hard problem with a team were wasted because a business person didn’t understand it, I was nervous. I worried that I had screwed up my prospects of getting hired. I spent one night tossing and turning hoping that my failure hadn’t undone all the positive interactions I’d had prior. Silly me.

#3 Pay attention to all the signals you have

Going into this company, I had several reports that all was not unspoilt in the state of Denmark. I had some direct feedback from current employees and I had some very telling Glassdoor reviews. Did I weigh these correctly? Nope! Again, I let confirmation bias take over my reasoning. Rather than asking the hard questions about these things, I instead asked the questions that I knew gave my interviewers enough rhetorical space to give answers I’d accept.

For instance, I should have asked, “I read on Glassdoor that X, Y, and Z appears to happen at your company. Is that true? If so, how often? Has anyone left the company as a result?” That question leaves no room wriggling around. Instead of using the signal I had usefully, I instead asked questions like, “You guys have a bit of a reputation. How does that play out in the day-to-day running of the company?” This question gives *plenty* of room for subjectivity and soft answers. This is silly of me. Wasting a strong signal about a company — especially one that invalidates the assumption that it’s a good one to join — is about as rookie a mistake as one can make. 10+ years into my career, here I am making exactly that mistake.

#4 Checksum your intuitions about how your job will work

As I look back at my interview performances, I realise I didn’t bother to do something really important. I didn’t bother to check if the company worked in a sensible way relative to my expectations. I think my years at Facebook set me up perfectly for this. At a company like Facebook, even a “meh” team still produces a lot of great work, has amazingly talented people on it, and has plenty of remit to do their jobs well. It never occurred to me that these things might not be true at a different company. So, rather than asking straightforward questions like:

If I need to coordinate with other teams in order to do my job well, what might that look like?

What does the reporting structure look like for engineers?

If I see an opportunity to improve processes across the engineering team, how can I push that forward?

If I see an opportunity to improve processes across the company, how can I make them happen?

If I’d asked these questions and pushed for authentic answers, I would have known that I was barking up the wrong tree. But, I didn’t. I just assumed that every company works as well as Facebook by default.

Nope.

]]>akismet-adc1aa15771466a23ddcebfb8453227eDo you need strong mental math to pass an engineering interview?https://jg.gg/2016/10/17/do-you-need-strong-mental-math-to-pass-an-engineering-interview/
Mon, 17 Oct 2016 09:38:42 +0000http://jg.gg/?p=274Continue reading Do you need strong mental math to pass an engineering interview?→]]>I got a great question from a viewer of the Intro to Architecture and Systems Design Interviews video I created (https://youtu.be/ZgdS0EUmn70). The question is: If my mental math is really weak, is it OK to whip out a calculator app?

For mental math vs. busting out a calculator, I think it’s probably inconsistent from one interviewer to the next. In no cases do I think it alone could possibly cost you the interview unless you’re interviewing at some extremely mathsy place (like an algorithmic trading company, for instance).

I’ve seen a spectrum in terms of judgement towards lack of mental math. One end has people like me, who couldn’t care less if you use a calculator. I would probably award you points for self knowledge and for using tools to make you more effective.

The other end of the spectrum would be someone who is very strong in maths and who doesn’t think the problems you’re tasked with are of sufficient complexity to warrant a calculator. With an interviewer on that end of the spectrum, you might lose some love, but it wouldn’t counterbalance an overall strong performance. For instance, in hundreds of decision-making discussions, I’ve never seen lack of mental math come up as a deal breaker.

There is one caveat to all of this. If you’re using a calculator for numbers that engineers *should* know, that could hurt you*. For instance, if 2^5 comes up and you have to bang it out on your TI-83, you’re probably going to lose real points. Even as a more math-lenient interviewer, I would have some serious questions about someone if powers of two don’t seem familiar. Likewise powers of two on the big end. 2^16 and 2^32 are both important numbers that should be in your mind as a programmer.

Also sums and differences of common powers of two. If you need a calculator to sum 4096 and 4096, I would knock you down a rung in my estimation.

Welcome back for Episode 06 of The Unqualified Engineer! I wanted to switch things up a bit this time. On an earlier episode, one of our viewers Vahid Noormofidi asked about system design interviews, so today we’re going to go there.

We’ve talked a lot about coding skills on this channel so far. We’ve walked through a bunch of examples of coding questions you might face in a coding interview. Some good, some bad. Some completely shit and unfair. While coding interviews are super important to succeed at in order to get a job offer, a less well-known detail of Silicon Valley style interviews is that your ‘level’ comes primarily from your performance in a design and architecture interview.

For those who don’t already know, Silicon Valley interviews have about five or six common forms, with three being ubiquitous. There’s the coding interview, which we’ve covered a lot. There’s the background/behaviour interview, which tries to determine if you’re a good person to work with and someone who will be a good addition to the team. There’s also the design & architecture interview, which focuses on your ability to take a big fuzzy problem and come up with a broad, detailed plan for solving it.

Broad and detailed? WTF?

If it feels weird to hear ‘broad *and* detailed,’ it should. Design and architecture interviews are impossibly big problems that definitely cannot be solved in 45 minutes. The goal isn’t to come up with a bullet-proof, 100% complete solution to a massive problem. Rather, it’s designed to give you the chance to show the interviewer what aspects of the problem you think about, what solutions you can come up with, and how much technical depth and diversity you’re bringing to the table. This is why it so strongly influences what ‘level’ you get hired at.

If you’re wondering what I mean about “levels,” it’s probably because you haven’t worked in a company that is as structured as a Google or a Facebook. The basic idea is that engineers (well, all employees actually) get broken into numbered groups that map coarsely to a scales of a person’s ability to contribute value to the company.

For instance, at Facebook, a new graduate from university might get hired in at Level 3. Why Level 3? Fuck if I know. I think it’s probably just self importance — we’re all just *too* special and *too* awesome to start at Level 1… Anyway, that’s how it works. Your level determines your compensation, it determines what equity stake you might get in the company, and it determines what amount of value you have to add to the company in order to keep your job. Brutal, yes, but also somewhat reasonable in practice.

So anyway, back to design and architecture interviews. They have a few subforms like product design interviews, database/data storage design interviews, etc.. The most common type is a generic systems design interview.

A typical question for an interview like this has a really common form. You’ll be presented with a big problem that has only a few concrete details. For instance, in a Facebook interview, you might get asked a question like “Design a mobile app logging infrastructure that can handle a user base of 500 million monthly active users. The system needs to handle any sort of logs from mobile devices like application start, stop, errors, etc..”

Breadth

If you don’t come from a systems background, you might initially think an interview question like this is grossly unfair. Aand, well, you’re kind right to feel that way. A question like this is much easier for someone with experience designing systems like this, at that scale. It’s harder for someone who hasn’t. Before you self-cycle into existential rage, try to remember that’s exactly the point of the interview.

On top of that, bear in mind that there’s no correct answer to a question like this.

There are certainly incorrect answers (i.e. don’t say “We can solve this with Mongo DB, because it’s web scale.”). However, the universe of good answers is vast. Good interviewers are going to look for you to use your experience to give the strongest performance in the domain you should be strongest in.

For instance, given the question above, let’s say you’re a mobile product engineer. The interviewer is going to be looking for you to think about the product impact of a system like this. When should you send logs from a device? What will the performance impact be of collecting logs on the device? What about supporting iOS and Android and mobile web? Will application engineers be able to log arbitrary data or is it a strictly defined log set? Those are the kinds of details your interviewer should expect you to be strong with.

On the other if you’re a distributed filesystem engineer, your interviewer is going to expect you to be amazingly good at describing how to handle the scale of the problem on the server side and to think about strategies for reducing the amount of time it takes for new log data to become available. The interviewer probably wouldn’t expect you to think about the battery life implications in the same detail.

If you can deliver awesome answers to both the client and the server side of the problem, all the better. Some people can.

The important part is to play to your strengths while also showing reasonable technical breadth. If a mobile app engineer answered this question for me and kicked ass with the client-side aspects but literally didn’t address the server side at all, it would be tough for me to feel confident in hiring her. On the other hand, if she thought through as much as detail as she could on the server and gave me clear signals about what she does and don’t know, that’s better. Maybe she wouldn’t invent a server side solution as well as a distributed file system engineer, but that’s OK. That makes sense given her skills and specialisation as a mobile engineer.

Break the problem down

One of the most critical things for you to do in this interview is to break the problem down even more coherently than the interviewer presented it. Going back to the example problem I proposed, you might have noticed that some of the details aren’t in a usable form. If you did, good. If you didn’t, you’ve got some work to do.

For instance, I said “500 million monthly active users.” What does that even mean? Can you use a *monthly* active number to design a real-time system? The answer is yes, but not directly. You need to break it down.

The rookie might do this by taking 500 million and dividing it by 30 days in a month and then divide that numbers by 86400* to figure out the concurrent users. The reason this is a naive way to approach the problem is that people live near each other and live life in daily cycles. That means that you can’t just evenly split the user base by seconds in a month as though all people are evenly spread across the world and active, using your app evenly 24 hours a day.

So how do you capacity plan? If you don’t have an immediate answer, you should start by interacting with the interviewer. Ask about the geographic distribution of users. Ask about the biggest markets for the app. What timezones are most of the users in?

You should be able to arrive at the notion that (assuming the app is doing well!), some large fraction of the users will be active every single day and mostly during specific times of day. You can reasonably smoke up these numbers. Maybe you would estimate that 70% of the monthly active users are also daily actives and that 70% of those people are active at roughly the same time.

Why? Well, it’s very likely that night time in the big population centers of the world is going to be much quiter than the day time. A key insight you need to have is that to handle ‘the user base,’ you really need to worry about the peak-time traffic. So, if 70% of the users are going to be using your app every day and 70% of that 70% will be using the app all at once during peak times, your 500 million users is actually only 245 million at peak times. If we assume that means 245 million people in an hour, then we can start doing even finer-grained estimation. 245M / 60 minutes / 60 seconds nets out to about 68K second-to-second active users. That number is much, much more reasonable to plan for than a hazy “500 million.” *How many servers do you need to handle 500 million people?* is almost meaningless compared to the real question — *How many servers do you need to handle 70K concurrent users during peak hours? *

Meta point about breaking the problem down

Did I just make up a bunch of numbers based on no data? Yes. Could they be “wrong” numbers? Well yes, in fact they probably are. Does that matter? Not as much as you might think.

The interview is not trying to figure out if you can psychically derive the exact user stats and behaviours of an imaginary, complex system. Rather, it’s designed to see if you can handle a big fuzzy need in a reasonable way. Can you take a big, insane requirement and grind through it to figure out what sort of actual technical problem it poses.

Veteran systems engineers, devops engineers, and people who have had the chance to work on large scale problems before will have a leg up in this domain. That might seem unfair, but it’s really not. Remember, we’re not talking about school where you can study the hardest and get the best grade. This is real life. Valuable skills are valuable and who has them and who doesn’t is not “fair,” it’s reality. If you’re expecting it to be fair, you’re going spend a lot of time being frustrated.

Breaking the problem down continued

So, back to the problem, — we’ve decided we’re talking about 70K concurrent users. Now we have to think about what that means in terms of actual throughput. If you were trying to solve this problem in the real world, your bottom line for the server side will be *How much log data are we actually talking about here?*

Right now, you might be thinking — it’s impossible to know! If you are, you’re either not thinking critically about mobile use cases or you’re not experienced enough in this domain to have an intuition about this. The second is better than the first. Let’s think about some real-world bounds. *How much network bandwidth does a phone have?* *How much storage space?* If we assume that the logs are going to be generated by user actions — remember the problem details “application start, application end, errors, etc.” — then we have an excellent frame of reference for breaking it down further. How many actions per minute can a person do on their phone? Maybe the average person could send and receive fifteen messages. Or maybe they could consume twenty photos on Instagram. Maybe they can switch apps roughly four times. In any of those cases, we’re not looking at a massive amount of data here.

The novice will assume that we just need to guess some reasonable average number and stick with that (E.g. “half a megabyte, tops.”). The better engineer will grind through some concrete estimations based on the size of data. Maybe we’re talking about 20+ 32-bit integers that refer to entity IDs (photo IDs, user IDs, etc.) at least, probably some overhead for a data encoding like JSON, probably some strings for things like event types, and in the error case hopefully something like a stack trace to aid in debugging. So, at least a few hundred bytes per minute, but with an upper bound that could be pretty big in error cases (them stack traces, yo).

The best way to answer here — in my opinion — is to have an opinion and to make intentional, reasonable tradeoffs. Burning through hundreds of bytes of data per minute is a guaranteed shitty and expensive experience for someone using your product. So, be a empathetic system designer. Come up with a guideline and set some numbers based on giving a shit about the user experience. Perhaps that number is 5KB of log data per minute as a maximum, anything more gets discarded. That means that if a person uses your product for an hour, they’ll only rack up 300KB of log data that they’re paying for out of their own hard earned money. That’s like asking them to download an extra image every hour or so in order to give your company the data you need to make sure the app is working and to understand why it’s not. A reasonable trade off.

And again, yes, this is a made up number. The specific number is not as important as why you chose it and whether or not that choice was reasonable. I could have said 500KB per minute (I wouldn’t because not even Facebook’s bloated apps use that much bandwidth for metadata), but the most important part is *why* that number.

The reason I think taking a strong stance towards how a system should work is the better option is because it’s show experience, leadership, and ownership. Sure, I could just let the users’ activity dictate some reasonable number of logs per minute but what if my estimation leads me to some big number like 5MB per hour? My inner nurturer needs to kick in and tell me that logging that aggressively is abusive of user trust. My inner data analyst should be telling me that 5MB of data per hour per user is a glut of data that will mostly be giving redundant signal. User empathy, experience and technical leadership should be the guiding light here over a raw calculation.

In a real company doing real work, that’s what actually happens, after all. Product decisions are made by people trying to make the smartest tradeoffs. There are no right answers. The only wrong answers are ones that drive away users or lead the company to fail. The right answers are many and varied. This is why design and architecture interviews are so important for determining people’s levels.

Keep Going

This discussion is already massive and ranging, yet we haven’t even talked about the concrete details of a client implementation, a server implementation. We haven’t talked about client-side caching. About mobile networks versus WiFi delivery of logs. We haven’t talked about data-saving strategies like user sampling. We haven’t talked about how to collect log data from the server side, storing it long-term. We haven’t talked about how many people it would take to build a system like this. We haven’t talked about how many servers we’ll need to handle the peak traffic. We haven’t talked about how much that will cost. We haven’t talked about countless absolutely critical aspects of the problem. Nor could we in 45 minutes. That’s why you, as the interviewee, must keep going.

At any point if you’re looking to your interviewer for guidance about how to proceed, you’re losing points. Now, don’t get me wrong on this point. Needing to bounce ideas off the interviewer or get clarification is fine. Also, you might be legitimately stuck on the problem. If so, by god man — ask for a pointer. I’ve seen many interviews that resulted in a job offer include feedback like “The candidate started off strong but needed a small pointer on X. After that, she did really well.” Don’t be afraid to ask for help. Still, you should bear in mind that the more help you need, the more likely it is that you’re showing lack of experience, creativity, and/or technical leadership.

I don’t know anything about image processing pipelines, for instance. But, if you give me a problem related to image processing pipelines at scale, I have high confidence that I can keep exploring the problem space making meaningful progress for 45 minutes. In all honesty, I love these kinds of challenges and find them really energising. Not everyone does, but everyone who is a strong technical leader *can* do these kinds of explorations.

Failing this interview

One interesting thing to consider on this point is that Facebook, at least, don’t even do design/architecture interviews for new graduates. They used to, but consistently found that the candidates just didn’t have enough experience with applied systems problems to even begin solving the problem.

Not having experience enough to be able to digest a big problem like this doesn’t always mean you’re fucked. It is an important signal though.

For instance, I was once on an interview loop where we all decided to hire the candidate even though he faired poorly in the architecture interview. His coding skills were strong and he seemed passionate about our company as well as a good person to work with. We didn’t hire him though.

When the interview feedback made it up to the director level, the director stopped us and rejected the candidate. I was confused. Turns out the director did some mental math that I didn’t have the foresight to do. Yes, the coding skills were good and the candidate was a reasonably good communicator, team player, etc.. However, he was lacking in design/architecture ability. This on it’s own is no failure, but the candidate had 8 years of industry experience. He had worked in a strongly technical company for that long without ever taking on enough leadership to get good at reasoning about large scale systems. That is a red flag.

Like Bob Dylan says, you’re either busy being born or busy dying. In this case, our candidate had plateaued in his career — so, he was busy dying. It could have been for a million reasons, but to the director making the call it means making a tough decision about whether or not we can expect this candidate to advance in his career or not.

At Facebook at least, you have to keep advancing at least to level 5, which is roughly equivalent to “senior engineer” or “tech lead” in other environments. After that, you’re doing valuable enough work to stick around without getting promoted again. This candidate would not have gotten a level 5 offer and if he’s already 8 years into his career, it’s unlikely he ever would. So, we would be sentencing him to a short-term career at Facebook at best. It was definitely the right call.

Final points

Architecture interviews are formidable, open-ended problems that you definitely cannot exhaustively solve in the time allotted. If you have no idea how to solve these kinds of problems, you might start by checking out the engineering blogs of companies like Google, Apple, Dropbox, and others. The amazing thing about architecture is that most of the best companies are sharing all their work.

Even if you have no background in the work, you can familiarise yourself with the common patterns of system design by reading diversely from the blogs on the topic, watching YouTube videos of tech talks from conferences, etc.. If that feels like cheating, it shouldn’t. After all, the reason engineers at big tech companies are good at solving these kinds of problems isn’t usually because they do it all day, every day. Rather, it’s because they get exposure to the solutions in internal tech talks and write-ups. The information is available. Go turn it into knowledge.

]]>https://jg.gg/2016/07/31/architecture-and-systems-design-interview/feed/0akismet-adc1aa15771466a23ddcebfb8453227eLargest Rectangle in a Histogram Coding Interview Problemhttps://jg.gg/2016/06/02/largest-rectangle-in-a-histogram-coding-interview-problem/
https://jg.gg/2016/06/02/largest-rectangle-in-a-histogram-coding-interview-problem/#respondThu, 02 Jun 2016 16:32:51 +0000http://jg.gg/?p=257Here’s the full text of the code from Episode 05. This is coded in JavaScript and uses the common approach of using a stack to keep track of the open rectangles.

This problem revolves around generating permutations of a sequence incrementally. Rather than pouring out a huge vector of output, this problem asks for a generator that can output the next permutation of a vector with every iteration.

For this video, I code in C++ and uses a functor to create the generator.

]]>https://jg.gg/2016/05/26/permutation-generator-coding-interview-question/feed/0akismet-adc1aa15771466a23ddcebfb8453227eCalendar Conflicts Coding Interview Questionhttps://jg.gg/2016/05/14/episode-03-calendar-conflicts-coding-interview-question/
https://jg.gg/2016/05/14/episode-03-calendar-conflicts-coding-interview-question/#respondSat, 14 May 2016 12:55:31 +0000http://jg.gg/?p=225Continue reading Calendar Conflicts Coding Interview Question→]]>Here’s the video and code for the Calendar Conflicts video which walks through a linear complexity solution to a common coding interview problem. The idea is that you’re given a list of calendar events and you must find any events in the calendar which conflict.

There is an easy brute force solution to this problem, but the optimal solution is linear. The code here is written in vanilla PHP. If you’re interviewing at a company like Google, Facebook, Apple, etc., this is exactly the kind of coding interview question you might have to face.

]]>https://jg.gg/2016/05/14/episode-03-calendar-conflicts-coding-interview-question/feed/0Calendar Conflicts Titleakismet-adc1aa15771466a23ddcebfb8453227eDinner Party Coding Interview Problemhttps://jg.gg/2016/05/03/the-unqualified-engineer-episode-02/
https://jg.gg/2016/05/03/the-unqualified-engineer-episode-02/#respondTue, 03 May 2016 00:53:01 +0000http://jg.gg/?p=212Continue reading Dinner Party Coding Interview Problem→]]>You’ll find the text-form code from the second episode here. Following some very good critical feedback I got from friends and viewers, this episode is much, much shorter and focuses entirely on a coding challenge. The full video is here:

With this coding problem, you’re main challenge is generating combinations. There are some really fancy ways to achieve this. My favourite approach involves using bitmasks and Hamming Weights. Such an approach would just count up from 0 to 2^n looking for any number with a Hamming Weight of table_size. While that’s super snazzy, it’s also not super representative of what a person not familiar with these types of problems might come up with, so I didn’t take that approach for this video.

Instead, this video focuses on recursion. Reasoning about recursion can be a real challenge for our poor human brains. To ease this, in the video I draw the recursion chart shown here:

A diagram of a 4 choose 3 combination result created via a recursive function call.

From this, you can get a basic idea of how a recursive approach to this problem might be solved. The really tricky thing for someone tackling such a problem for the first time is that you have to recurse twice in every function execution. Many people think of recursion as an approach where the function calls itself. The exciting thing about this algorithm is that the function calls itself not once, but twice!

One flaw with this code I noticed only after finishing the video and uploading it is that my groups argument is actually a bug. Because the default arguments are static across all calls to a function, this list will continue to grow and grow every time the find_dinner_parties function is invoked. It’s an easy thing to fix, but I ran out of whiteboard anyway. For those playing along at home, please add a list for groups in the call to combine_friends inside find_dinner_parties.

]]>https://jg.gg/2016/05/03/the-unqualified-engineer-episode-02/feed/0akismet-adc1aa15771466a23ddcebfb8453227eA diagram of a 4 choose 3 combination result created via a recursive function call.Least Disruptive Subrange Coding Interview Problemhttps://jg.gg/2016/04/24/the-unqualified-engineer-episode-01/
https://jg.gg/2016/04/24/the-unqualified-engineer-episode-01/#commentsSun, 24 Apr 2016 17:38:56 +0000http://jg.gg/?p=163Continue reading Least Disruptive Subrange Coding Interview Problem→]]>Here’s the code from the first episode. This is a basic coding interview question that you might be asked in an interview at a tech company. You can see the video associated with this code here:

The problem here can be stated pretty simply. Imagine that you need a function that can take two inputs: 1) a list of integers and 2) another list of integers. The first list could be thousands of integers long. It can contain positive and negative numbers. It’s not sorted in any way. Any integer could be at any position. The second list is similar except that it’s equal in length to the first list or shorter.

What we want to do with these lists is find where in the first list we could substitute the second list, integer for integer, that would create the least amount of change in each integer from the original list. For this problem, we consider change to be measured in number line distance (i.e. absolute value). So, you can’t use some negative distance to offset some positive distance. If you substitute -2 for 2, that’s a change of 4.

An example would be something like this:

original = [1, 2, 3, 4, 5]
replacement = [3, 5, 3]

In this example, the “disruption” created by each possible substitution looks like this:

You can see from this, that the best replacement choice here would be the 2nd index, which would create a subrange disruption of just 3, compared to all the other options.

So how might you solve this? Well, here’s a bit of JavaScript that aims to tackle the problem. This algorithm runs in O(n*m) time, where n refers to the length of the first input and m is the length of the second input. Interestingly, the longer the second input is, the shorter the run of the algorithm will be. So, the worst case is something like the length of replacement being half the length of original. In that case, the algorithm will do something along the lines of m*m work. It’s a constant memory problem in that it uses no additional arrays or objects to store state. I guess you could achieve this with a hash table if you just felt like wasting RAM.

While this algorithm (as far as I know) is correct, it does leave many details unserved. For instance, I don’t address the potential to integer underflow by subtracting from the minimum integer value in JavaScript. Likewise, I could integer overflow by adding to distance until it bubbles over the maximum integer value in JavaScript. My test cases are also fairly limited and don’t test for cases like empty list inputs. Also, because JavaScript, I should be checking for null inputs and handling that case reasonably. I’m also not checking for the case where replacement could be longer than original. I’m sure there are like a dozen other defects here.

The point is not to create a bullet-proof library function. Rather, I was aiming to simulate a real coding interview focusing on what you’d have time to accomplish. Let me know what you think. Especially let me know if I got it all wrong!