How to Build an Analytics System

Dec 10, 2024

A few months ago, I decided it was time to add analytics to Bizmo.site. It sounded straightforward—how hard could it be to count clicks and track user actions, right? Armed with confidence and Postgres. Spoiler: I took a wrong turn.

The Plan

My brilliant (or so I thought) plan was to use Postgres to process incoming analytics events. Every time someone clicked a button, submitted a form, or just sneezed on their screen, the event would be recorded, session time tracked, batched, and processed. Sounds efficient, doesn’t it ?

The Reality Check

At first, everything worked fine. I had some event processing going on, and it looked cool on a dashboard I whipped up. But as I started running stress tests, cracks began to show. Queries were taking forever, especially when analyzing large datasets. Every click started feeling like a weight dragging down the system.

Turns out, Postgres is great for many things, but crunching analytics data in real time isn’t one of them. My system was buckling under the load, and I realized I was trying to make Postgres something it wasn’t meant to be. I reached out to a few people who had dealt with similar issues, and they confirmed my suspicions—it was time to rethink my approach.

Why Postgres Struggled

OLTP vs. OLAP: Two Different Worlds

Postgres is an OLTP (Online Transaction Processing) database, which means it’s designed for transactional tasks like inserting, updating, or retrieving individual rows of data. Think of it like a teacher who can quickly tell you how a specific student is doing in Math.

Analytics, on the other hand, is an OLAP (Online Analytical Processing) problem. OLAP databases like ClickHouse are optimized for crunching large datasets and running aggregations. They’re more like a principal who can give you a school-wide average for test scores but doesn’t track individual students as closely.

How Postgres Stores Data

Postgres stores data in rows, which is perfect for transactional queries. For example, when you need to update a user’s profile, Postgres can efficiently retrieve and modify their row.

But for analytics—like calculating the average session time across millions of events—this row-based design becomes a bottleneck. Postgres has to scan every row, read the relevant column (e.g., session time), and aggregate the data. The more rows you have, the slower the query.


The Pivot: Discovering ClickHouse

After some research (and a few desperate conversations with friends in tech), I discovered ClickHouse, a columnar database designed specifically for analytics.

Where Postgres stores data in rows, ClickHouse stores it in columns. This simple shift has profound implications, Switching from Postgres to ClickHouse wasn’t just about swapping one database for another. It required rethinking how I handled events

  1. Instead of processing events and pushing them into Postgres, I began dumping them into ClickHouse.
  2. I optimized the schema for analytics use cases, focusing on columns that I needed for queries (e.g., timestamps, event types).
  3. While Postgres worked fine for real-time event inserts, I moved to batch processing for ClickHouse, which reduced overhead and improved performance.

Queries that previously took minutes now ran in seconds. My stress tests no longer broke the system.


Practical Examples: Postgres vs. ClickHouse

To illustrate why ClickHouse outperformed Postgres, here are two scenarios that highlight their strengths and weaknesses:

Scenario 1: Postgres Shines

You run a job board, and a user wants to update their profile to reflect a new job title.

  • Postgres: This is a transactional query. It locates the user’s row, updates the job title, and writes the changes back to the database. Postgres handles this seamlessly.
  • ClickHouse: Updating individual rows in ClickHouse is inefficient because it stores data by columns. To update a single value, it would need to rewrite the entire column, which is overkill for this use case.

Scenario 2: ClickHouse Crushes It

You run an e-commerce platform and want to calculate the total revenue generated in the last 24 hours.

  • Postgres: To calculate the total revenue, Postgres would need to scan every row in the sales table, retrieve the price column, and sum the values. For a large dataset, this process is painfully slow.
  • ClickHouse: Since ClickHouse stores data by columns, it can retrieve the price column in one read and perform the aggregation instantly. This query completes in seconds, regardless of the dataset size.

Lessons Learned

Here’s what I took away from this experience:

  1. Know your tools: Postgres is a beast, but not for everything. Use tools built for the job.
  2. Plan for scale: What works for 100 events per second might not work for 10,000. Test early, test often.
  3. Be ready to pivot: Sometimes, you just have to admit you took the wrong approach and switch gears.

Wrapping Up

Building an analytics system taught me a lot—not just about databases but also about approaching problem by doing a thorough research and talking to experts. If you're planning to build something similar, save yourself some headaches and start with tools like ClickHouse. Or, if you're like me, embrace the chaos and learn as you go.

Built by sreeragh.