Slide 1

TECH BANKRUPTCY

Luka Kladaric
Sekura Collective
luka@sekura.io
@kll

Slide 2

DO IT QUICKLY OR DO IT WELL?

Since the dawn of software development, weʼve been faced with the same impossible choice every single day: do it quickly or do it well. We do our best to make the right choice for the task at hand, and we move on.

Slide 3

MOVE FAST & BREAK THINGS

Then came the lean startups & “Move fast and break things” and put their thumbs on the scale in the favor of the hacks, the MVPs, the just-ship-its, and the Product Managers just ate. it. up.

Slide 4

FINDING MARKET FIT

Thatʼs great, for proving a concept or finding a market fit. But what happens when thatʼs all you do? When the entire organization, top to bottom, has collectively forgotten how to write quality software. When you become unable to make the correct technical decision even by accident.

Slide 5

REVERSE-ENGINEERING BAD TECHNICAL DECISIONS

We will take a deep dive on a mission-critical web application that is basically unusable on its best day, and trace the trivial bad decisions that got it there. In the end I hope you will never take a shortcut again.

Slide 6

WHAT ARE WE TALKING ABOUT HERE?

Slide 7

small time startup, mobile app, think Foursquare/Swarm. People check into locations, leave tips and photos for each other. You charge a small fee for a no-ads experience. The data accumulates and isn't used.

Slide 8

THE HACKATHON 8 — @kll #GrowITconf 2018.
You organize a hackathon, and someone builds a concierge dashboard. It lets you peek at individual users' data and reach out to them with suggestions over chat.

Slide 9

THE PIVOT

The ads revenue is drying up, nobody is buying the no-ads experience, so you pivot to selling the concierge service. You start small, a few people working on it, helping just a few clients selected for this trial.

Slide 10

ENTER PROBLEMS OF SCALE

The app is essentially a collection of hacks to pull user data that was never meant to be aggregated from the database.

Slide 11

TOO MANY INTERNAL USERS

Built for a dozen people, now used by hundreds. Initially each helping a dozen clients, now thousands.

Slide 12

NO LONGER A SMALL TRUSTED TEAM

Slide 13

IT’S SLOW / TIMING OUT

"the browser can't render that many messages, but the API doesn't have pagination"

pagination in the browser is not pagination

Slide 14

IT’S SLOW / TIMING OUT

"just give me all the data via API"

no pagination on lists that keep growing forever deep responses that keep growing in scope Imagine if viewing a tweet also gave you a full list of everyone who liked it, along with all data about them You could build a twitter clone that has a single API endpoint that returns everything, all tweets, all users, all tweets by each user, but even if only 10 of your friends used it it would become unusable within a few months.

Slide 15

IT’S SLOW / TIMING OUT

"the backend team is busy, let's just reuse this meaningless field for meaningful signals"

sorting in the UI, based on data from deep responses

Slide 16

IT’S SLOW / TIMING OUT

"realtime chat is difficult, let's just refresh everything every time there's a change"

incremental updates > refreshing everything pubsub as trigger for regular refetch. huge surge of messages = new refetch triggered while old one is still completing. eventually times out.

Slide 17

IT’S SLOW / TIMING OUT

"we don't have profile image thumbnails"

just like sorting in the browser, resizing images in the browser is very inefficient eventually, you hit 800 MB pageloads with images included. resized: 12 MB. if the user list were paginated, it's be a fraction of that. unusable on slower computers, ipads, ...

Slide 18

IT’S SLOW / TIMING OUT

"Why is everything down?"

Large responses = slow responses Slow responses + surge = timeouts We ran out of workers to service requests on the app backend

Slide 19

WHO DID THAT?? (1)

"We don't know who sent the user a message full of profanity"

because thereʼs no way to grant several users access to a client, staff share accounts and passwords… once this goes on long enough, everyone has plausible deniability over any action like messaging users profanity even if messages are tied to someoneʼs account

Slide 20

WHO DID THAT?? (2)

"We don't know who moved a bunch of users from one concierge to another."

Because this app is an afterthought to the API, it has its own backend and its own account scheme. It then talks to the actual API with a single shared token. There's no way for the API to know and record who requested an action. No meaningful audit trail. Also no record of who got moved, and no way to undo the entire operation without spelunking through clients data individually.

Slide 21

API IMPLEMENTATION 101

for data heavy apps, API design has to be done right, because you will rarely get a chance to refactor it APIs by definition are meant to be stable, long-term contracts on how different apps interact To change them usually requires coordination across multiple teams, or at least people.

Slide 22

RULES FOR BUILDING AN API

There are many resources that talk about what the API should look like from the outside. Iʼm here to talk about what it should look like on the inside.

Slide 23

PAGINATE EARLY AND OFTEN

if it returns a list that is at all likely to grow past 100 elements — PAGINATE

and no total count!

Slide 24

AVOID DEEP RESPONSES

they will inevitably grow and require pagination themselves and thatʼs a whole new level of hell

Slide 25

NEVER COMBINE LISTS WITH DEEP RESPONSES

it's either a list, or it has a complex deep response otherwise you're setting yourself up for some real performance pain

Slide 26

SORT AND FILTER EARLY

sorting and filtering belongs in the database, or in the backend sorting in the browser is 1000x more expensive

Slide 27

DO AUTHENTICATION WELL

what do I mean by this?

Slide 28

AUTHENTICATION (1) User management console

You won't believe the creative ways people try to get out of having a dashboard where you can create and delete users, set up their permissions, reset their password... Having this when there's not much to it makes it easy to add other things as you need them

Slide 29

AUTHENTICATION (2)

You will inevitably need to separate how your staff logs in from what the end users see You will also probably need the ability to log in as someone else (for testing/monitoring/management/ quality assurance purposes)

Slide 30

AUTHENTICATION (3) Audit log for all destructive operations

Any destructive operation needs to be logged with the identity of the user who did it. You will eventually experience bad actors within your organization. It's important to be able to identify them.

Slide 31

There's a fine line between shipping features all the time and foundation work.