Latency

Latency is the time taken for a request to travel from a user to a system and back, measured as the delay between initiating an action and receiving a response. It significantly impacts the user experience, especially in real-time and interactive systems. Understanding the types and sources of latency is crucial for designing responsive and efficient systems.

Network Latency vs. Application Latency

Network Latency:
- Refers to the time it takes for data to travel across a network.
- Influenced by factors like:
  - Physical distance between the client and server.
  - Network congestion and bandwidth.
  - Routing and packet processing delays.
- Example: A user in Asia accessing a server in the US will experience higher network latency due to the geographic distance.
Application Latency:
- Refers to delays caused by the processing of a request within the application itself.
- Influenced by factors like:
  - Server-side computations.
  - Database queries and I/O operations.
  - Inefficient code or resource bottlenecks.
- Example: A slow database query increasing the time taken to render a page.

Sources of Latency in Distributed Systems

Network Communication:
- Remote Procedure Calls (RPCs), REST API calls, and database queries over the network contribute to network-related delays.
Data Serialization and Deserialization:
- Converting data formats (e.g., JSON, XML) for transmission can add processing delays.
Disk I/O:
- Reading and writing data to storage systems introduces latency, especially in non-optimized systems.
Database Queries:
- Complex queries, high contention, or locking in databases can delay responses.
Content Delivery:
- Delivering large files (e.g., videos, images) over the internet can increase latency, especially if the server is far from the user.
Concurrency Bottlenecks:
- Handling multiple requests simultaneously can lead to delays due to contention for shared resources.
Load Balancers and Proxies:
- While essential for distributing traffic, these add processing delays to the request-response cycle.

Techniques for Reducing Latency

Caching:
- Store frequently accessed data closer to the user or at the application layer.
- Types of caching:
  - Client-side caching: Cache data in the browser or app.
  - Server-side caching: Use in-memory caches (e.g., Redis, Memcached).
  - CDN caching: Cache static assets like images, CSS, and JavaScript at edge locations.
- Example: Amazon caches product details to reduce database queries.
Content Delivery Network (CDN):
- Distribute static and dynamic content across geographically distributed servers to minimize physical distance to users.
- Example: YouTube uses CDNs to deliver videos with low latency to users worldwide.
Efficient Data Formats:
- Use compact data formats (e.g., Protocol Buffers, Avro) instead of verbose formats like JSON or XML to reduce serialization and transmission time.
Database Optimization:
- Use indexing, query optimization, and replication to reduce database query times.
- Example: E-commerce platforms index frequently searched product categories for faster retrieval.
Asynchronous Processing:
- Offload non-critical tasks (e.g., logging, notifications) to background jobs to respond to users faster.
- Example: Processing a payment and sending a confirmation email asynchronously.
Load Balancing:
- Distribute traffic across multiple servers to prevent overloading a single server.
- Example: Cloudflare’s global load balancers optimize response times for high-traffic websites.
Reduce Network Hops:
- Minimize the number of intermediate nodes (proxies, gateways) in the network path.
- Example: Direct server-to-server communication reduces latency in microservices architectures.
Edge Computing:
- Process data closer to the user to minimize latency for real-time applications.
- Example: IoT devices using edge nodes for real-time analytics.
Prefetching:
- Load anticipated resources ahead of time.
- Example: Google prefetches search results while users type queries.
Keep-alive Connections:

Reuse TCP connections instead of establishing new ones for every request.
Example: Persistent connections in HTTP/2 reduce latency for web applications.

Balancing Latency and Other Trade-offs

Latency vs. Consistency:
- In distributed systems, reducing latency may require relaxing strict consistency (e.g., eventual consistency in databases).
- Example: DynamoDB prioritizes low latency with eventual consistency for many operations.
Latency vs. Cost:
- Optimizations like CDNs and edge computing can increase infrastructure costs.
- Example: Small startups might tolerate higher latency to minimize expenses.