SCALE FROM ZERO TO MILLIONS OF USERS — Part 2

5 min readSep 9, 2023

Let’s cover the 2nd part of our the system that supports millions of users . BTW, if you missed the part one I highly recommend you to read the 1st part then come back and join with us.

Stateful architecture & Stateless architecture
A stateful server and stateless server has some key differences. A stateful server remembers client data (state) from one request to the next. A stateless server keeps no state information.

Now we move the session data out of the web tier and store them in the persistent data store. The shared data store could be a relational database, Memcached/Redis, NoSQL, etc. A stateless system is simpler, more robust, and scalable. Autoscaling means adding or removing web servers automatically based on the traffic load. Here NoSQL is our state keeping store in the picture below.

Data centers
In simple word data center setup means users are geoDNS-routed, also known as geo-routed, to the closest data center. geoDNS is a DNS service that allows domain names to be resolved to IP addresses based on the location of a user like the image below.

In any data center damage for some reason, we direct all traffic to a healthy data center like the image below.

Considerations for data center
1. Traffic redirection: Effective tools like geoDNS are needed to direct traffic to the correct data center.
2. Data synchronization: In failover cases, Traffic should not be routed to a data center where data is unavailable.
3. Test and deployment: For multiple data center, system should be tested from different region.

To further scale our system, we need to decouple different components of the system so they can be scaled independently. Messaging queue is a key strategy employed by many realworld distributed systems to solve this problem.

Message queue
A message queue is a durable component, stored in memory, that supports asynchronous communication. The basic architecture of a message queue is simple. Input services, called producers/publishers, create messages, and publish them to a message queue. Other services or servers, called consumers/subscribers, connect to the queue, and perform actions defined by the messages. With the message queue, the producer can post a message to the queue when the consumer is unavailable to process it. The consumer can read messages from the queue even when the producer is unavailable.

Think your application supports photo customization, including cropping, sharpening, blurring, etc. Those customization tasks take time to complete. web servers publish photo processing jobs to the message queue. Photo processing workers pick up jobs from the message queue and asynchronously perform photo customization tasks. The producer and the consumer can be scaled independently. When the size of the queue becomes large, more workers are added to reduce the processing time. However, if the queue is empty most of the time, the number of workers can be reduced like the image above.

Logging, Metrics, Automation
When working with a small website that runs on a few servers, logging, metrics, and automation support are good practices but not a necessity
Logging: Monitoring error logs are important of your system & business too.
Metrics: You should check/monitor CPU, Memory, disk I/O, database, cache, daily active users.
Automation: Automation improve developer productivity. You can use CI/CD tool to build, test, deploy process.

As the data grows everyday, your database gets more overloaded. It is time to scale the database.

Vertical scaling for database
Vertical scaling, also known as scaling up, is the scaling by adding more power (CPU, RAM, DISK, etc.) to an existing machine. There are some powerful database servers. According to Amazon Relational Database Service (RDS). There are some limitations of vertical scaling like
1. Hardware limitation.
2. Greater risk of single point of failure.
3. High costing.

Horizontal scaling for database
Horizontal scaling, also known as sharding, is the practice of adding more servers. Sharding separates large databases into smaller, more easily managed parts called shards. Each shard shares the same schema, though the actual data on each shard is unique to the shard like the image below.

Considerations of sharding
1. Resharding data: If one shard is overloaded then we need to reshard that one.
2. Celebrity problem: If one shard belongs celebrity type users then that shard will be overwhelmed, so need to manage celebrity type users users in different shard.
3. Join and de-normalization: It’s hard to join & query across multiple shard from multiple servers.

So here is the final illustration of our system.

Millions of users and beyond

Keep web tier stateless
Build redundancy at every tier
Cache data as much as you can
Support multiple data centers
Host static assets in CDN
Scale your data tier by sharding
Split tiers into individual services
Monitor your system and use automation tools

Note: All I know about this system design from this great book — “System Design Interview An Insider’s Guide by Alex Xu” and all image credit goes to this books too.

You can read from here: https://github.com/mohasin-dev/books

Feel free to leave a comment if you have any feedback, questions or want me to write about something you interested.

SCALE FROM ZERO TO MILLIONS OF USERS — Part 2

Written by Mohasin Hossain

Responses (1)