Back to Blog
Server ManagementIntermediate

Server Update Management

Simha Infobiz
March 21, 2025
5 min read

The Equifax breach, one of the largest in history, happened because of a single unpatched web server. Patching is non-negotiable for security, yet it remains the most terrifying task for sysadmins. "If I run yum update, will my database restart? Will PHP break? Will the server never come back up?" This fear leads to "Patch Paralysis," leaving high-priority CVEs exposed for months.

The Strategy: Automation with Constraints

Manual patching is error-prone and unscalable. You cannot SSH into 50 servers. You need intelligent automation.

  1. Unattended-Upgrades (Debian/Ubuntu): Configure this immediately. Set it to automatically install Security updates only. Leave feature updates (which might break configs) for manual review, but let critical security fixes flow freely.
  2. KernelCare / Canonical Livepatch: In high-availability environments, rebooting for a kernel update is costly. These services patch the running kernel memory on-the-fly without a reboot, ensuring you stay secure against "Dirty COW" or generic exploits without downtime.

The Staging Buffer

Rule #1: Never patch production first. Create a "Patch Train":

  1. Dev/Staging: Updates apply automatically every night. If your staging environment is broken in the morning, you know the latest patch is toxic.
  2. The Canary: Identify one production server (or a small percentage of traffic). Apply the patch there first and monitor for 24 hours.
  3. The Fleet: Only after the Canary survives does the rest of the fleet update. Tools like Ansible or SaltStack can orchestrate this rollout precisely.

The Modern Way: Immutable Infrastructure

The "Cloud Native" approach solves patching by removing the concept entirely. Never update a running server. Instead of patching Server A:

  1. Build a new machine image (AMI/Snapshot) with the latest OS and application code baked in.
  2. Spin up Server B using this new image.
  3. Run automated health checks.
  4. Update the Load Balancer to point to Server B.
  5. Terminate Server A.

This ensures zero "configuration drift." You know exactly what is running because it was built from code in a clean environment, not a server that has been manually tweaked for 3 years.

Managing Downtime: Grouping and Draining

If you must update existing servers (Pet vs Cattle approach):

  • Database Clusters: Patch the Replicas first. Promote a Replica to Primary. Then patch the old Primary.
  • Web Clusters: Use a Load Balancer. Drain connections from Node 1 (wait for active requests to finish). Patch and reboot Node 1. Wait for health checks to pass. Re-enable Node 1. Move to Node 2. This "Rolling Update" strategy ensures end-users never see a 503 Service Unavailable error.
PatchingSecurityAutomation
Share: