Santhosh Aditya


An Introduction on how a single Web Server handle concurrent requests?

September 16th,2023

I have been interested in this question for quite some time. From a user POV, it seemed that a server is handling several requests at a time. So How do servers handle multiple requests concurrently or parallelly? (Note that the scope of this article is only limited to single server instance. You can always scale-out horizontally)

But before that, I wanted to share a rather late realisation of mine. Ruby on Rails, Django, FastAPI, Spring Boot are not Web Servers. They are just frameworks. Web Servers are different. Web Servers handle connections and just pass the request object to the webframework so that they can operate on them.

Some of the popular Servers are Tomcat, Netty in Java; Unicorn, Puma in Ruby; Gunicorn, Tornado in Python.

Back to the Question. I had always assumed that each request is an isolated environment which is true, but there are several WebServer Implementations

The first and oldest approach is to create a new thread for every request. Single threaded servers like Puma, Unicorn and scaling the concurrency with no. of sepearate workers (processes). So the total concurrency becomes no. of. workers * no. of threads * no. of containers(if you scale horizontally)

If you have seperate processes, the requests are served parallely. If you have seperate threads, the requests are served concurrently

Concurrency vs Parallelism is explained here

But this approach does not scale when you have thousands of concurrent requests. The clients will keep getting timeouts. This is why the C10k problem was coined. It was a challenge on how do you serve data to 10,000 requests concurrently.

This is when event-driven programming got popular. NodeJs, Netty, Tornado, Goliath, Starlette+Uvicorn are some web servers which are async. So basically instead of creating new threads per connection, a single thread does context switching between different requests. This approach is very good when you have lots of I/O operations. The thread asynchronously calls a db/socket/I-O device and subscribes to its result. Meanwhile thread does some other work for other request and comes back to the first request when its result is finished .Thread context-switching and request processing takes not more than milli seconds. This is why requests appear to be parallel even in a single threaded single server instance. (Reminiscing my first sentence in the article )

The con of this type of web server is that the code becomes a little complex because you have to asynchronously call each function or I-O device and await its result. (Thanks to async-await syntax, this became easier compared to callbacks and promises in js)

Another approach is using Green threads which are user level threads compared to threads provided by kernel. Since these are user-level and cheap and doesn't call kernel, this method is performant and is usually implemented in languages like Python (Gunicorn Server) and Ruby which has Global Interpreter Lock (an interpreter can run only one thread at a time).

Beware these considerations should only occur at scale and only after doing some preliminary Tests and POCs. A single threaded server goes a long while too

Sources which helped me learn this How netflix scaled its Push Notification System Switching from a synchronous webserver puma to an asynchronous one

A Santhosh Aditya Production © 2022.
Total 23 Pages in this Website.