Hi Guys,

In the “About Me” section of my blog, I mentioned that “The target audience is IT professionals who are just starting out.” However, this particular post is not aimed at that audience. Instead, it is geared more towards Architects and Leads. That said, I encourage everyone else to stick around, as this post will still offer valuable insights in one way or another.

The question:

When it comes to scaling, Horizontal Scaling is often the preferred choice due to its flexibility and ability to distribute the load across multiple servers. Vertical scaling, by comparison, is less common because of its limitations in scalability and potential cost implications.

But what happens when horizontal scaling isn’t an option? How can we handle sudden traffic spikes effectively? Is simply increasing RAM and CPU size sufficient for vertical scaling?

While increasing RAM and CPU is a critical part of vertical scaling, it’s not the complete solution. So, here’s how I would approach vertical scaling.

1. Assess Current Infrastructure Bottlenecks

First, conduct a comprehensive analysis of the current infrastructure to identify which components are under strain due to the traffic spike. From my experience till date, the common bottlenecks that I have seen are:

  • CPU: High CPU utilization can slow down transaction processing.
  • Memory (RAM): Insufficient memory can lead to slow database queries and application performance issues.
  • Disk I/O: Disk-intensive operations like database writes may be a bottleneck, especially for transactional applications.
  • Network Bandwidth: If network bandwidth is saturated, it can cause slow page loads or timeouts.

Understanding these bottlenecks will guide which parts of the infrastructure need scaling.

2. Optimize the Application and Database

Vertical scaling can be costly and limited by hardware, so optimizing the application and database performance before scaling the hardware is essential:

  • Application Layer:
    • Code Optimization: Ensure the code is optimized for speed and resource efficiency (e.g., avoid unnecessary loops, refactor slow queries, reduce API calls).
    • Caching: Implement caching mechanisms (e.g., in-memory caching like Redis or Memcached) to reduce the load on servers by serving repeat requests faster.
    • Load Distribution: Utilize a load balancer to ensure traffic is distributed efficiently across available resources.
  • Database Layer:
    • Query Optimization: Optimize database queries, indexes, and avoid full table scans.
    • Connection Pooling: Use connection pooling to reduce overhead from database connections.
    • Vertical Database Partitioning: If possible, split data into separate tables or schemas to improve query performance.

3. Increase Server Resources

Once application optimization is done, the next step is to scale the underlying infrastructure vertically. (yes, this will be the 3rd step)

  • CPU (Processor Power):
    • If the application is CPU-bound (e.g., complex computations or high concurrency), upgrading to a more powerful CPU or adding more cores can help. Multi-core processors allow for parallel processing of requests.
  • Memory (RAM):
    • Increasing memory can improve performance, especially for database-intensive applications, as more data can be loaded into memory. More RAM reduces reliance on disk I/O, which is slower.
  • Disk/Storage (SSD vs HDD):
    • If disk I/O is the bottleneck, switching from HDD to SSD (or adding more SSD storage) can significantly boost performance by speeding up data access times.
    • Consider RAID configurations to improve both read/write speeds and redundancy.
  • Network Capacity:
    • Ensure that network bandwidth is sufficient to handle the increased traffic. Upgrading network interfaces and ensuring high-speed connections can help avoid slow page loads or timeouts.

4. Implement Auto-Scaling Mechanisms (within vertical limits)

To dynamically handle traffic fluctuations, implement auto-scaling mechanisms within the vertical scaling limits. For example, cloud platforms like AWS, Azure, and Google Cloud provide the ability to dynamically adjust CPU and memory resources based on real-time demand.

5. Load Distribution and Failover Mechanisms

Although horizontal scaling isn’t allowed, within the vertical scaling context, it’s important to have failover mechanisms in place to ensure service continuity. This might include:

  • Load Balancer with Session Persistence: Even with vertical scaling, having a robust load balancer to handle failover across virtual instances (if on a cloud platform) ensures that user sessions remain active during instance restarts or failures.
  • Database Replication (Vertical Scaling): If your database is a bottleneck, setting up vertical replication for read/write splitting (Master-Slave architecture) can help offload some of the read queries to slave servers.

6. Improve Performance through Compression and Minification

To reduce the load on servers, employ the followings:

  • Compression (e.g., Gzip, Brotli) for assets like HTML, CSS, and JavaScript to reduce data transfer sizes.
  • Minification of CSS and JavaScript files to reduce the load time for users.

7. Monitoring and Proactive Scaling

Continuous monitoring is essential to ensure that the infrastructure remains responsive under increased load. I would establish automated alert systems to monitor CPU usage, memory consumption, disk I/O, and network traffic in real-time.

Proactive vertical scaling involves ensuring that system resources are increased ahead of anticipated spikes, such as during holiday seasons, to prevent system crashes or slowdowns.

Ideal Vertical Scaling Goals:

  • CPU Utilization: Targeting < 70% sustained CPU usage to prevent throttling.
  • Memory Usage: Ensuring there’s sufficient headroom for peak usage, with memory usage not exceeding 80%.
  • Disk I/O: Disk utilization should be optimized with SSDs, aiming for < 70% disk write capacity.
  • Network Bandwidth: Ensure that the bandwidth can handle peak traffic, avoiding bottlenecks on critical services (such as payments, checkout processes).

Notice that 1. Assess Current Infrastructure Bottlenecks points are the same as the Ideal Vertical Scaling Goals. If you got this, then leave a comment.

To address sudden or future spikes effectively, a comprehensive approach is needed that includes optimizing application performance, leveraging efficient caching mechanisms, enhancing storage capabilities, and ensuring network capacity can handle the load. It’s about making every component of the system work smarter, not just bigger.

Loading