Analysing user event logs in real-time is pretty easy to do if you are willing to throw money at the problem. But if you want to process hundreds of millions of events with very low memory, you would want to look at probabilistic data structures.

Probabilistic Data Structures allow you to process huge volumes of data on low memory machines, while compromising on accuracy – which in some cases are a valid trade off.

This is a talk I gave in the Hasgeek’s Miniconf on Cloud Server Management in Chennai. In this talk I explain the lessons that we learnt when we migrated from the monolithic architecture to Microservices architecture in Mad Street Den.

There are two ways to solve any problem: Accurately or approximately. Accurate data structures has its disadvantages – too much memory usage and unscalable for real-time nature of data. In this talk I explained how to take advantage of the newly release Redis 4.0 with pluggable modules to build a data pipeline which uses probabilistic data structures to get real-time insights.

There are different insights and metrics that could be obtained from log events data. Processing the data in real-time and getting accurate results are possible in theory. In practice, not so easy.

Gone are the days when you had to provision and maintain servers full time and pay huge costs for it (even though it is idle 99% of the time). The world is going serverless where someone else takes care of running your code automatically whenever you want.

AWS Lambda is one such service which runs your piece of code when an event occurs – could be an HTTP API call, a message put in a queue or a file put in an S3 bucket.

All web applications aren’t as simple as following a 20 minute blog tutorial in a web framework. There are lot of components which handle the business logic and they might all not run on the same machine (or even be built using the same programming language).

This was Pycon India‘s First edition and was a two day event in IISc Bangalore. Lot of interesting talks and discussions happened in the conference. I proposed about an open source project I was working on – Waffle.

Waffle is an Open Source Python library for storing data in a schema-less, document oriented way using a relational database. This was an Open Source clone of a similar datastore that was used by FriendFeed as mentioned in the CTO Bret Taylor’s Blog.

I am Srinivasan Rangarajan, (AKA) cnu. I love talking about Technology, Startups, Product Design, Marketing and related stuff. I have helped many startups build and scale their SaaS products to millions of users. Currently I head the Engineering Team at Mad Street Den.