Skip to content

Zoom-in: WebSocket

Karify98·
Cover Image for Zoom-in: WebSocket

You type a message in a chat window, hit send, and the other person sees it instantly without refreshing the page or pulling to reload.

graph LR
    C(["💻 Client 1"]) -->|"Message sent"| S(["🖥️ Server"])
    S -->|"Push directly"| C2(["💻 Client 2"])
    style C fill:#1e3a5f,stroke:#3b82f6,color:#93c5fd
    style C2 fill:#1e3a5f,stroke:#3b82f6,color:#93c5fd
    style S fill:#1a3a2a,stroke:#22c55e,color:#86efac

Let's zoom in on this behavior.


Layer 1 — Polling: Early Hacks for Real-Time Web

Traditional HTTP is a one-way protocol. The client sends a request, the server responds, and the connection closes. The server cannot initiate a request or push data to the client when a new event occurs.

To simulate real-time updates, developers originally relied on Polling.

sequenceDiagram
    participant C as Client
    participant S as Server

    C->>S: 1. Any new messages? (HTTP GET)
    S-->>C: None.
    Note over C: Wait 2 seconds...
    C->>S: 2. Any new messages? (HTTP GET)
    S-->>C: None.
    Note over C: Wait 2 seconds...
    C->>S: 3. Any new messages? (HTTP GET)
    S-->>C: Yes! "Hello." (Data)

The client sends a new HTTP request every few seconds. This approach wastes network resources. Every HTTP request carries bulky headers (cookies, user-agents), which adds up quickly, overloads the server, and introduces up to 2 seconds of latency.

Remaining problem: we need a way to open a single, persistent connection where both client and server can send messages to each other at any time.

Layer 2 — Handshake & Upgrade: Upgrading the HTTP Port

Instead of opening a new port (which corporate firewalls might block), WebSocket reuses the existing HTTP port by executing a handshake called Protocol Upgrade.

sequenceDiagram
    participant C as Client
    participant S as Server

    C->>S: GET /chat HTTP/1.1 (Upgrade: websocket)
    S-->>C: HTTP/1.1 101 Switching Protocols
    Note over C,S: Connection upgraded to WebSocket (TCP persistent)

The client sends a standard HTTP GET request with two headers: Upgrade: websocket and Connection: Upgrade.

If the server supports WebSockets, it responds with 101 Switching Protocols. The initial HTTP connection upgrades to a persistent TCP connection running the WebSocket protocol. Both sides can now send data simultaneously (full-duplex).

Remaining problem: since we are no longer sending raw HTTP text, how is the data structured on the wire to remain lightweight and easy to parse?

Layer 3 — WebSocket Framing: Ultra-Lightweight Packets

Once the upgrade is complete, data is no longer sent as plain HTTP text. Instead, it is packetized into minimal binary envelopes called Frames.

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len |    Extended payload length    |
|I|S|S|S|  (4b) |A|     (7b)    |             (16/64b)          |
|N|V|V|V|       |S|             |                               |
+-+-+-+-+-------+-+-------------+-------------------------------+

A WebSocket frame uses only 2 to 10 bytes of header to store control data (e.g., whether the data is text or binary, the payload size, and mask keys for security). The remaining capacity holds the actual payload.

Compared to HTTP requests that require hundreds of bytes of header data, WebSockets minimize overhead. The protocol also defines control frames like Ping and Pong to check connection health automatically, meaning developers do not have to write custom keep-alive logic.


Full picture

sequenceDiagram
    participant C as Browser (Client)
    participant S as WebSocket Server

    Note over C,S: 1. Handshake (HTTP Upgrade)
    C->>S: GET /stream (Connection: Upgrade, Upgrade: websocket)
    S-->>C: 101 Switching Protocols

    Note over C,S: 2. Full-Duplex Real-Time Data (WebSocket Frames)
    C->>S: Frame (Text: "Hi") - [Only 2 bytes header]
    S-->>C: Frame (Text: "Hello there")
    Note over S: Event occurs on Server
    S-->>C: Frame (Binary: New graph data)

    Note over C,S: 3. Health Check (Heartbeat)
    S->>C: Frame (Ping)
    C-->>S: Frame (Pong)

    Note over C,S: 4. Teardown
    C->>S: Frame (Close)
    S-->>C: Frame (Close)
    Note over C,S: Underling TCP Connection closed

Takeaway

WebSocket bypasses the unidirectional limits of HTTP by hijacking the underlying TCP connection after an initial HTTP upgrade handshake. By replacing bulky HTTP headers with lightweight frame headers, WebSockets deliver full-duplex communication with minimal latency. However, WebSockets are stateful, meaning servers must maintain active connections in memory. This makes scaling WebSocket architectures more complex than scaling stateless HTTP services.


This post was assisted by Amy 🌸 - AI Assistant. Content has been reviewed by the author.

Related Posts