Implement load balancing and automatic scaling using fly.io

Preface#

fly.io is a modern application delivery platform that provides load balancing and automatic scaling capabilities to ensure the reliability and performance of applications.

According to the fly.io load balancing documentation, it can be understood that fly.io supports automatic scaling based on request volume or TCP connection count. Recently, my friend's API was attacked and the server crashed. Since his APIs are used outside of browsers, such as in some applications and scripts, it is difficult to implement defense strategies. Therefore, I recommended him to migrate to fly.io.

Configuration#

First, package the program into a Docker container and deploy it on fly.io. I used a configuration of 2H512M and created 11 instances.

Then, configure the maximum connection count for each instance using fly.toml and [services.concurrency]. Here is a sample configuration:

# fly.toml app configuration file generated for cheat-show-backend on 2023-09-15T18:53:53-05:00
#
# See https://fly.io/docs/reference/configuration/ for information about how to use this file.
#

app = "XXXXXXXXXXXX"
primary_region = "lax"
swap_size_mb = 1024

[http_service]
  internal_port = 8080
  force_https = false
  auto_stop_machines = true
  auto_start_machines = true
  min_machines_running = 0
  processes = ["app"]
  [services.concurrency]
    type = "connections"
    hard_limit = 10000
    soft_limit = 1200

[build]
dockerfile = "Dockerfile"

soft_limit is the soft limit. fly.io will use this connection count to determine the limit for each instance. If a single instance exceeds this connection count, additional instances will be started to achieve automatic scaling. After testing with wrk, it was found that our deployed service can handle up to 2k concurrent connections per instance. To allow enough time for scaling, I changed the limit to 1200. If the connection count is below this value, fly.io will only start one machine. If it exceeds this value, multiple machines will be started based on the connection count.

hard_limit is the hard limit. If a single instance exceeds this value, a 503 error will be returned.

Load Testing#

Default State#

Massive Increase in Connection Count (wrk with 5000 threads)#

Under normal circumstances, only one instance will be started. Billing is done per minute, with each instance costing $3.8 per month. If the connection count decreases, it will automatically scale down to one instance. Instances in a stopped state will not incur charges.

Some Pitfalls#

fly.io's load balancing is source-based. If I have one instance in New York and one in Los Angeles, and I want the one in Los Angeles to be the default, with New York as a backup, I need a Los Angeles server to act as a reverse proxy for fly.io. Directly accessing fly.io would cause it to start instances based on proximity, so even if the connection count is only 1, it will start a New York instance. Therefore, I deployed 3 nginx reverse proxy instances in the Los Angeles region of fly.io, using 3 instances of 1h256m for load balancing. After optimization, each instance can handle over 10k concurrent connections.

Update: It seems that the issue with automatic scaling within the same region has been fixed. Previously, in the same region, all machines would be started by default for load balancing, and scaling only worked for cross-region scenarios. This pitfall no longer exists, and it currently works for the same region as well.
.

Optimized Dockerfile

FROM nginx
RUN sed -i 's/worker_processes auto;/worker_processes 8;/' /etc/nginx/nginx.conf
RUN sed -i 's/worker_connections [0-9]*;/worker_connections 9999999;/' /etc/nginx/nginx.conf
COPY nginx.conf /etc/nginx/conf.d/nginx.conf

nginx.conf

server {
  listen 8080 default_server;
  listen [::]:8080 default_server;
  server_name         api.test.com;
  keepalive_timeout   75s;
  keepalive_requests  100;

  location / {
      proxy_pass            http://ip:80;
      proxy_set_header      Host $host;
      proxy_set_header      Upgrade $http_upgrade;
      proxy_http_version 1.1;
  }
}

This allows for low-cost automatic scaling of services.

Cost Issue#

The 3 instances used for deploying nginx are free. The backend instance is usually only one and costs $3.8 per month. Instances that are not started will not incur charges.