Exploring the System Design of a Q&A Site

Exploring the System Design of a Q&A Site

As the digital landscape evolves, the demand for robust question-and-answer platforms continues to rise. Understanding the underlying architecture of such platforms, like Quora, is essential for engineers and developers aiming to create scalable and efficient systems. In this exploration, we dissect the functional and non-functional requirements, building blocks, workflow, and potential limitations of a Q&A site.

Functional Requirements

When designing a Q&A site, we only consider the following features and also ensure that, in the future, we can extend the features.

  • Questions and answers functionality

  • Commenting to the questions along with upvote and downvote capabilities

  • Recommendation system for personalized home feeds and advertising

  • Ranking mechanism for answers

Non Functional Requirements

The site should have the following characteristics due to the large number of users all over the world,

  • Scalability: The architecture should accommodate additional features and support a growing user base seamlessly.

  • Consistency: The questions and answers should be consistent with any set of users. This does not imply, that the newly added Q&A should be available to all the users right away.

  • Availability: The system should be highly available. It should handle a large number of concurrent requests and also be able to perform in case of a few server failures.

  • Performance: The system should serve the user without noticeable delay.

Resource Estimation

Consider 300 million active users. Each user will make 20 requests per day. With these assumptions, we are going to calculate the,

  • Number of servers

  • Storage size that includes database and blob storage

  • Network bandwidth


  • Each server can handle 8000 requests per second

  • Total request per day, TOTAL_REQUEST_PER_DAY = (3x10^6) * 20

  • Requests per second, TOTAL_REQUEST_PER_SECOND = (TOTAL_REQUEST_PER_DAY) / (24 60 60)

  • Number of servers required (TOTAL_REQUEST_PER_SECOND / 8000)


  • 15% Q&A will have the image

  • 5% Q&A will have video

  • For a Q&A there will be either an image or video or nothing. Both video and image can not be in a single Q&A

  • Consider 1 question from each active user

  • Two responses from each question

Calculate the storage

Let's assume

  • Each image size is 250 KB

  • Each video size is 5 MB

  • Text content and metadata regarding a question is 100 KB

Image Storage

  • 15% of total Q&A has image

  • Size 15% of 300 Million * 250 KB = ~ 11 TB

Video Storage

  • 5% of total Q&A has video

  • Size (5% of 300 Million) * 5 MB = ~75 TB

Text Content

  • 1 Q&A per active user

  • Total 300 million active user

  • Each Q&A has an estimated of 100 KB textual content and metadata

  • Total 30 TB

Each of the day, we will require 11 + 75 + 30 = ~116 TB of storage


Incoming Bandwidth

  • we will send 116 TB = (116 * 8) GB data per day

  • Bandwidth per second (116 * 8) / (24 hours in seconds) ~ 11 Gb/s

Outgoing Bandwidth

  • Consider 300 million active user

  • Consider each user sees 20 questions each day

    • 300 million * 20 Q&A, each Q&A has 100 KB of textual and metadata

    • ~600 TB

  • Consider that 15% of Q&A has an image

    • 300 million active user

    • Each user fetches 20 Q&A

    • 15% of this Q&A have an image

    • Each image has a size of 250 KB

    • ~225 TB

  • Consider 5% of Q&A has video

    • 300 million active user

    • Each user fetches 20 Q&A

    • 5% of these images have video

    • Each video has a size of 5 MB

    • ~1500 TB

  • Now total size of data: 600 + 225 + 1500 = 2325 TB ~ 2500 TB ~ 2500000 GB ~ 20000000 Gb

  • Outgoing bandwidth 20000000 / (24 hours in seconds) = 231 Gb/s

Building Blocks

  • Load Balancer: Distribute traffic between servers and services

  • Database: Store textual content in the DB

  • Distributed Caching: Schedule tasks and reduce loads to db and services

  • Blob Store: Store the images and videos

Web and Application Server

  • To handle requests

    • web server for the manager processes

    • application server for the worker process

  • Application servers maintain an in-memory queue to process different user requests

  • A router library between the web and application server

  • Manager process enqueued the tasks

  • Application process dequeued the tasks

Data Stores

  • A relational database MySQL for storing Q&A and comments, as it has a high level of consistency

  • An hBase to store metadata as it has a very high throughput in storing and retrieving data. Use these

  • Stats for recommendation later

  • Blob storage for the images and videos

Distributed Cache

  • memcacheD for critical data caching

  • Redis for upvote type data as it has in-store increment

  • CDN for serving images and videos

Computer Servers

  • For recommendation engine or ranking

  • The process will be online and offline

  • Probably running some ML operations

  • May have lots of memory and high processing power


Posting Q&A and comments

  • The web server receives the request and passes it to the application server

  • Web servers also manipulate the web page like the request is in progress

  • The worker process will manipulate the database, ex: fetch or save data

  • The task will be prioritized by different queue

    • User requests will be served earlier

    • The weekly digest will have less priority

  • Images and videos will be stored in the blob storage

  • The answer ranking system will rank the answers based on upvotes, views, dates, and some other properties. An ML engine will back it up.

  • Extract and store metadata of answers, comments, images, and videos in hBase and feed these metadata to ML and rank offline

Recommendation System

  • Run both online and offline, used for

    • User feed

    • Find duplicate

    • Generate add


  • Build index and store in DB and keep frequently used in hBase

  • Make an index by tokenizing from Q&A, level, and comments


Latencies of web and application servers: Latency of web and application servers communications.

In Memory Queue Failure: Tasks are queued in the queue. If a queue is failed, a lot of manual engineering will be required. Replicating the queue can be a solution but it will require extra memory. Tasks like view count should not hamper comparatively more important tasks like saving answers or questions.

MySQL QPS: Since we offer a lot of services, it is possible to encounter a lot of queries in our MySQL server. This will result in a huge latency in getting the query results.

HBase Latency: Although the HBase has a high throughput, it has slow latency. On top of that, since, we rely on the ML, at one point, it will have a poor performance.

Adjustment for the mentioned limitations

Latencies of web and application servers: Using a service host. A large powerful machine to handle all web and application processes at once in a single place.

In Memory Queue Failure: Use Kafka instead.

MySQL QPS: Use vertical sharding. If there are joins involved between two tables, put them in the same shard.

hBase Latency: Use MyRocks instead. Allows improved latency and data transfer tools between RocksDB and MySQL.


Understanding the intricate system design of a Q/A site like Quora offers invaluable insights into creating scalable, efficient, and reliable platforms in the digital age. By addressing functional and non-functional requirements, designing robust building blocks, establishing efficient workflows, and implementing mitigation strategies for potential limitations, developers can craft high-performance systems capable of meeting the demands of millions of users worldwide.