Crushing Tech Education

Crushing Tech Education

Share this post

Crushing Tech Education
Crushing Tech Education
Top K problem System Design (Heavy Hitters)
User's avatar
Discover more from Crushing Tech Education
Join a community of engineers and technical managers dedicated to learning System Design.
Over 3,000 subscribers
Already have an account? Sign in
System Design

Top K problem System Design (Heavy Hitters)

The Heavy Hitters system design problem

SWE's avatar
SWE
Apr 02, 2023

Share this post

Crushing Tech Education
Crushing Tech Education
Top K problem System Design (Heavy Hitters)
Share

The Heavy Hitters system design problem refers to the challenge of identifying the most frequently occurring items, or "heavy hitters," in a large stream of data in real-time.

The main challenge

The main challenge in the Heavy Hitters problem is to handle a massive amount of data, potentially terabytes or petabytes, and identify the most frequent items in real-time. This requires designing an efficient and scalable algorithm that can process data streams in real-time, and handle high throughput with low latency.

Thanks for reading Crushing Tech Interview! Subscribe for free to receive new posts and support my work.

The Count-Min sketch algorithm

One common approach to solving the Heavy Hitters problem is to use the Count-Min Sketch algorithm. This algorithm is based on hashing and uses a fixed-size hash table to estimate the frequency of items. The algorithm can efficiently estimate the frequency of heavy hitters with a high probability, while minimizing the memory and computational resources required.

Flajolet-Martin algorithm

Another approach to solving the Heavy Hitters problem is to use the Flajolet-Martin algorithm. This algorithm uses probabilistic counting to estimate the number of unique items in a data stream, and can be used to identify heavy hitters based on their frequency of occurrence. The Flajolet-Martin algorithm is known for its accuracy and efficiency in handling large-scale data streams.

Database schema

To support the scale of the system, we need to store the aggregates data in the NoSQL database. The table schema has to be designed for specific queries we want to support. The common approach is to store the data in a time series fashion so it supports time range queries. We can have 2 different tables: real time and batch aggregates partitioned by the same key - in our case its day, hour, bucket + additional dimensions like country, city, and tag. This way we can support time series queries + extra aggregations like country, city and tag. The next optimization would be to store the dimensions as a JSON field and make the dynamic, so we can add more dimensions without re-aggregating the historical data. Check out this design in more details in a deep dive system design course on crushingtechinterview.com.

Conclusion


Overall, solving the Heavy Hitters system design problem requires a combination of efficient algorithms, distributed systems, and real-time data processing techniques, making it a challenging but essential problem in the field of data engineering and analytics.

Here's the video of me explaining this in-depth 👇‍


Thank you so much for reading this edition of the newsletter!

Thanks for reading Crushing Tech Interview! Subscribe for free to receive new posts and support my work.

Share this post

Crushing Tech Education
Crushing Tech Education
Top K problem System Design (Heavy Hitters)
Share

Discussion about this post

User's avatar
Starting an Architecture Review Team
How to start, manage, and deliver!
Feb 25, 2024 • 
SWE
 and 
Fran Soto
13

Share this post

Crushing Tech Education
Crushing Tech Education
Starting an Architecture Review Team
3
Design a Coding Contest Platform like Leetcode
Remote code execution engine and database design.
Feb 3, 2024 • 
SWE
12

Share this post

Crushing Tech Education
Crushing Tech Education
Design a Coding Contest Platform like Leetcode
6
Design Stock Exchange | HLD | Data Model | Reliability
Design a stock exchange, process buy and sell orders efficiently in memory
Jan 6, 2024
2

Share this post

Crushing Tech Education
Crushing Tech Education
Design Stock Exchange | HLD | Data Model | Reliability

Ready for more?

© 2025 Crushing Tech Education
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Create your profile

User's avatar

Only paid subscribers can comment on this post

Already a paid subscriber? Sign in

Check your email

For your security, we need to re-authenticate you.

Click the link we sent to , or click here to sign in.