fix: Harden CI against Nginx misconfiguration that caused prod 502/404
TaxBaik CI/CD / build-and-deploy (push) Failing after 3m5s
TaxBaik CI/CD / build-and-deploy (push) Failing after 3m5s
Today's incident: CI reported successful deploys while the real site
returned 502 (root) then 404 (/taxbaik/) to users. Root cause was three
compounding Nginx issues, none of which the previous CI checks could see
because they only ever curled 127.0.0.1:5001 directly, bypassing Nginx:
1. Two Nginx config files existed. sites-available/default (documented,
but NOT symlinked into sites-enabled/) was being edited repeatedly with
zero effect. The file actually loaded was
sites-available/taxbaik-domains.conf (-> sites-enabled/), undocumented.
2. That real file hardcoded the Green-Blue app port (5003) directly in
both `location /` and `location /taxbaik`, instead of the persistent
TaxBaik.Proxy on 5001. When the active port flipped to 5004, Nginx kept
pointing at the dead 5003 -> 502.
3. Fixing the port to 5001 with a trailing slash on proxy_pass triggered
Nginx URI rewriting, sending a double slash ("//") to the backend,
which 404'd. Confirmed via `curl http://backend//` -> 404.
Changes:
- deploy.yml: replace the old blind `grep sites-available/default` check
(checked the wrong, unloaded file) with a hard-failing check that (a)
resolves the actual file via sites-enabled/ symlinks, (b) fails the
deploy if either location block hardcodes 5003/5004 instead of 5001,
(c) fails if /taxbaik's proxy_pass carries a stray trailing slash.
- deploy.yml: add an external, post-deploy check that curls the real
public domain (www.taxbaik.com root, /taxbaik/, /taxbaik/admin/login)
through Cloudflare + Nginx, with retries — this is what would have
caught the whole incident on the very first broken deploy instead of
requiring live user reports.
- deploy_gb.sh: drop the stale comment implying Nginx needs updating
per-deploy; it never should, since Nginx always points at the
persistent 5001 proxy which reads taxbaik_port itself.
- CLAUDE.md: document the real config file, the 5001-only invariant, the
proxy_pass trailing-slash gotcha, and the Host-header/SNI trick for
testing domain-based server blocks locally; record the incident in the
CI troubleshooting harness section.
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
This commit is contained in:
+5
-3
@@ -103,11 +103,13 @@ if [ "$SUCCESS" = "false" ]; then
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# 6. Switch Traffic (Nginx update handled by CI post-deploy script)
|
||||
# 6. Switch Traffic
|
||||
# Nginx never needs per-deploy changes: it always proxies to the persistent
|
||||
# TaxBaik.Proxy on 127.0.0.1:5001, which reads this same PORT_FILE and
|
||||
# forwards to whichever port is currently active. See CLAUDE.md section 6.
|
||||
echo "=== Switching Traffic to Port $TARGET_PORT ==="
|
||||
echo "$TARGET_PORT" > "$PORT_FILE"
|
||||
echo "✓ Traffic routed to $TARGET_PORT"
|
||||
echo "⚠️ Note: Nginx will be updated by CI post-deploy script (requires root)"
|
||||
echo "✓ Traffic routed to $TARGET_PORT (via TaxBaik.Proxy on 5001)"
|
||||
|
||||
# 7. Terminate Old App
|
||||
echo "=== Stopping Old App on Port $ACTIVE_PORT ==="
|
||||
|
||||
Reference in New Issue
Block a user