fix: Harden CI against Nginx misconfiguration that caused prod 502/404
TaxBaik CI/CD / build-and-deploy (push) Failing after 3m5s

Today's incident: CI reported successful deploys while the real site
returned 502 (root) then 404 (/taxbaik/) to users. Root cause was three
compounding Nginx issues, none of which the previous CI checks could see
because they only ever curled 127.0.0.1:5001 directly, bypassing Nginx:

1. Two Nginx config files existed. sites-available/default (documented,
   but NOT symlinked into sites-enabled/) was being edited repeatedly with
   zero effect. The file actually loaded was
   sites-available/taxbaik-domains.conf (-> sites-enabled/), undocumented.
2. That real file hardcoded the Green-Blue app port (5003) directly in
   both `location /` and `location /taxbaik`, instead of the persistent
   TaxBaik.Proxy on 5001. When the active port flipped to 5004, Nginx kept
   pointing at the dead 5003 -> 502.
3. Fixing the port to 5001 with a trailing slash on proxy_pass triggered
   Nginx URI rewriting, sending a double slash ("//") to the backend,
   which 404'd. Confirmed via `curl http://backend//` -> 404.

Changes:
- deploy.yml: replace the old blind `grep sites-available/default` check
  (checked the wrong, unloaded file) with a hard-failing check that (a)
  resolves the actual file via sites-enabled/ symlinks, (b) fails the
  deploy if either location block hardcodes 5003/5004 instead of 5001,
  (c) fails if /taxbaik's proxy_pass carries a stray trailing slash.
- deploy.yml: add an external, post-deploy check that curls the real
  public domain (www.taxbaik.com root, /taxbaik/, /taxbaik/admin/login)
  through Cloudflare + Nginx, with retries — this is what would have
  caught the whole incident on the very first broken deploy instead of
  requiring live user reports.
- deploy_gb.sh: drop the stale comment implying Nginx needs updating
  per-deploy; it never should, since Nginx always points at the
  persistent 5001 proxy which reads taxbaik_port itself.
- CLAUDE.md: document the real config file, the 5001-only invariant, the
  proxy_pass trailing-slash gotcha, and the Host-header/SNI trick for
  testing domain-based server blocks locally; record the incident in the
  CI troubleshooting harness section.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
This commit is contained in:
2026-07-03 18:51:19 +09:00
parent aaa867ce02
commit 9ae701ff93
3 changed files with 114 additions and 10 deletions
+5 -3
View File
@@ -103,11 +103,13 @@ if [ "$SUCCESS" = "false" ]; then
exit 1
fi
# 6. Switch Traffic (Nginx update handled by CI post-deploy script)
# 6. Switch Traffic
# Nginx never needs per-deploy changes: it always proxies to the persistent
# TaxBaik.Proxy on 127.0.0.1:5001, which reads this same PORT_FILE and
# forwards to whichever port is currently active. See CLAUDE.md section 6.
echo "=== Switching Traffic to Port $TARGET_PORT ==="
echo "$TARGET_PORT" > "$PORT_FILE"
echo "✓ Traffic routed to $TARGET_PORT"
echo "⚠️ Note: Nginx will be updated by CI post-deploy script (requires root)"
echo "✓ Traffic routed to $TARGET_PORT (via TaxBaik.Proxy on 5001)"
# 7. Terminate Old App
echo "=== Stopping Old App on Port $ACTIVE_PORT ==="