Files
taxbaik/deploy_gb.sh
T
kjh2064 9ae701ff93
TaxBaik CI/CD / build-and-deploy (push) Failing after 3m5s
fix: Harden CI against Nginx misconfiguration that caused prod 502/404
Today's incident: CI reported successful deploys while the real site
returned 502 (root) then 404 (/taxbaik/) to users. Root cause was three
compounding Nginx issues, none of which the previous CI checks could see
because they only ever curled 127.0.0.1:5001 directly, bypassing Nginx:

1. Two Nginx config files existed. sites-available/default (documented,
   but NOT symlinked into sites-enabled/) was being edited repeatedly with
   zero effect. The file actually loaded was
   sites-available/taxbaik-domains.conf (-> sites-enabled/), undocumented.
2. That real file hardcoded the Green-Blue app port (5003) directly in
   both `location /` and `location /taxbaik`, instead of the persistent
   TaxBaik.Proxy on 5001. When the active port flipped to 5004, Nginx kept
   pointing at the dead 5003 -> 502.
3. Fixing the port to 5001 with a trailing slash on proxy_pass triggered
   Nginx URI rewriting, sending a double slash ("//") to the backend,
   which 404'd. Confirmed via `curl http://backend//` -> 404.

Changes:
- deploy.yml: replace the old blind `grep sites-available/default` check
  (checked the wrong, unloaded file) with a hard-failing check that (a)
  resolves the actual file via sites-enabled/ symlinks, (b) fails the
  deploy if either location block hardcodes 5003/5004 instead of 5001,
  (c) fails if /taxbaik's proxy_pass carries a stray trailing slash.
- deploy.yml: add an external, post-deploy check that curls the real
  public domain (www.taxbaik.com root, /taxbaik/, /taxbaik/admin/login)
  through Cloudflare + Nginx, with retries — this is what would have
  caught the whole incident on the very first broken deploy instead of
  requiring live user reports.
- deploy_gb.sh: drop the stale comment implying Nginx needs updating
  per-deploy; it never should, since Nginx always points at the
  persistent 5001 proxy which reads taxbaik_port itself.
- CLAUDE.md: document the real config file, the 5001-only invariant, the
  proxy_pass trailing-slash gotcha, and the Host-header/SNI trick for
  testing domain-based server blocks locally; record the incident in the
  CI troubleshooting harness section.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-03 18:51:19 +09:00

132 lines
3.9 KiB
Bash

#!/bin/bash
set -e
DEPLOY_HOME="/home/kjh2064"
PORT_FILE="$DEPLOY_HOME/taxbaik_port"
TIMESTAMP=$(TZ=Asia/Seoul date +%Y%m%d_%H%M%S)
echo "===== 🚀 TaxBaik Green/Blue Deployment Script ====="
if [ "${TAXBAIK_DEPLOY_FROM_CI:-}" != "1" ]; then
echo "❌ This deployment script may only be run from CI." >&2
exit 1
fi
# 1. Determine active port
ACTIVE_PORT=5003
if [ -f "$PORT_FILE" ]; then
ACTIVE_PORT=$(cat "$PORT_FILE" | tr -d '[:space:]')
fi
# 2. Determine target port
TARGET_PORT=5003
if [ "$ACTIVE_PORT" -eq 5003 ]; then
TARGET_PORT=5004
else
TARGET_PORT=5003
fi
echo "Active Port: $ACTIVE_PORT"
echo "Target Port: $TARGET_PORT"
# 3. New deploy dir is passed as first argument
DEPLOY_DIR="$1"
if [ -z "$DEPLOY_DIR" ]; then
echo "Error: Deployment directory argument required"
exit 1
fi
echo "Deploy Directory: $DEPLOY_DIR"
if [ ! -s "$DEPLOY_DIR/appsettings.Production.json" ]; then
echo "❌ Missing production settings: $DEPLOY_DIR/appsettings.Production.json" >&2
exit 1
fi
if [ ! -s "$DEPLOY_DIR/proxy/TaxBaik.Proxy.dll" ]; then
echo "❌ Missing proxy artifact: $DEPLOY_DIR/proxy/TaxBaik.Proxy.dll" >&2
exit 1
fi
# 0. Ensure the local TCP proxy exists and is running.
# Nginx and external traffic always enter through 127.0.0.1:5001.
if ! ss -tln | grep -q ':5001 '; then
echo "=== Starting proxy on 127.0.0.1:5001 ==="
cd "$DEPLOY_DIR/proxy"
nohup /usr/bin/dotnet TaxBaik.Proxy.dll > "$DEPLOY_HOME/taxbaik_proxy.log" 2>&1 &
sleep 2
fi
if ! ss -tln | grep -q ':5001 '; then
echo "❌ Proxy on 127.0.0.1:5001 is not running. Abort deploy." >&2
exit 1
fi
# 4. Start the new app on the target port
echo "=== Starting New App on Port $TARGET_PORT ==="
cd "$DEPLOY_DIR"
export ASPNETCORE_ENVIRONMENT=Production
export ASPNETCORE_URLS="http://127.0.0.1:$TARGET_PORT"
export ConnectionStrings__Default="Host=localhost;Database=taxbaikdb;Username=taxbaik;Password=taxbaik123"
export ApiClient__BaseUrl="http://127.0.0.1:$TARGET_PORT/taxbaik/api/"
export DOTNET_PRINT_TELEMETRY_MESSAGE=false
# Run dotnet process
nohup /usr/bin/dotnet TaxBaik.Web.dll > "web_${TARGET_PORT}.log" 2>&1 &
NEW_PID=$!
sleep 2
# Verify process is running
if ! ps -p $NEW_PID > /dev/null; then
echo "❌ Failed to start dotnet process on port $TARGET_PORT"
exit 1
fi
# 5. Health Check Loop
echo "=== Health Checking Port $TARGET_PORT ==="
ATTEMPTS=20
SUCCESS=false
for i in $(seq 1 $ATTEMPTS); do
STATUS=$(curl -sf -o /dev/null -w '%{http_code}' "http://127.0.0.1:${TARGET_PORT}/taxbaik/healthz" 2>/dev/null || echo "000")
if [ "$STATUS" = "200" ]; then
echo "✓ Health check passed on port $TARGET_PORT (Attempt $i/$ATTEMPTS)"
SUCCESS=true
break
fi
echo " Waiting for health check... ($i/$ATTEMPTS, Status: $STATUS)"
sleep 2
done
if [ "$SUCCESS" = "false" ]; then
echo "❌ Health check failed. Rolling back..."
kill -9 $NEW_PID || true
exit 1
fi
# 6. Switch Traffic
# Nginx never needs per-deploy changes: it always proxies to the persistent
# TaxBaik.Proxy on 127.0.0.1:5001, which reads this same PORT_FILE and
# forwards to whichever port is currently active. See CLAUDE.md section 6.
echo "=== Switching Traffic to Port $TARGET_PORT ==="
echo "$TARGET_PORT" > "$PORT_FILE"
echo "✓ Traffic routed to $TARGET_PORT (via TaxBaik.Proxy on 5001)"
# 7. Terminate Old App
echo "=== Stopping Old App on Port $ACTIVE_PORT ==="
# Find PID listening on ACTIVE_PORT
OLD_PID=$(ss -tlnp | grep ":$ACTIVE_PORT " | grep -oP 'pid=\K\d+' | head -n1)
if [ -n "$OLD_PID" ]; then
echo "Killing old process PID: $OLD_PID"
kill -15 $OLD_PID || kill -9 $OLD_PID
echo "✓ Old process terminated"
else
echo "No old process found on port $ACTIVE_PORT"
fi
# 8. Cleanup old deployment directories (Keep last 5)
echo "=== Cleaning Up Old Deployments ==="
ls -1dt $DEPLOY_HOME/deployments/taxbaik_* 2>/dev/null | tail -n +6 | xargs rm -rf 2>/dev/null || true
echo "✓ Cleanup completed"
echo "===== ✅ Green/Blue Deployment Completed Successfully ====="