9ae701ff93
TaxBaik CI/CD / build-and-deploy (push) Failing after 3m5s
Today's incident: CI reported successful deploys while the real site
returned 502 (root) then 404 (/taxbaik/) to users. Root cause was three
compounding Nginx issues, none of which the previous CI checks could see
because they only ever curled 127.0.0.1:5001 directly, bypassing Nginx:
1. Two Nginx config files existed. sites-available/default (documented,
but NOT symlinked into sites-enabled/) was being edited repeatedly with
zero effect. The file actually loaded was
sites-available/taxbaik-domains.conf (-> sites-enabled/), undocumented.
2. That real file hardcoded the Green-Blue app port (5003) directly in
both `location /` and `location /taxbaik`, instead of the persistent
TaxBaik.Proxy on 5001. When the active port flipped to 5004, Nginx kept
pointing at the dead 5003 -> 502.
3. Fixing the port to 5001 with a trailing slash on proxy_pass triggered
Nginx URI rewriting, sending a double slash ("//") to the backend,
which 404'd. Confirmed via `curl http://backend//` -> 404.
Changes:
- deploy.yml: replace the old blind `grep sites-available/default` check
(checked the wrong, unloaded file) with a hard-failing check that (a)
resolves the actual file via sites-enabled/ symlinks, (b) fails the
deploy if either location block hardcodes 5003/5004 instead of 5001,
(c) fails if /taxbaik's proxy_pass carries a stray trailing slash.
- deploy.yml: add an external, post-deploy check that curls the real
public domain (www.taxbaik.com root, /taxbaik/, /taxbaik/admin/login)
through Cloudflare + Nginx, with retries — this is what would have
caught the whole incident on the very first broken deploy instead of
requiring live user reports.
- deploy_gb.sh: drop the stale comment implying Nginx needs updating
per-deploy; it never should, since Nginx always points at the
persistent 5001 proxy which reads taxbaik_port itself.
- CLAUDE.md: document the real config file, the 5001-only invariant, the
proxy_pass trailing-slash gotcha, and the Host-header/SNI trick for
testing domain-based server blocks locally; record the incident in the
CI troubleshooting harness section.
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
132 lines
3.9 KiB
Bash
132 lines
3.9 KiB
Bash
#!/bin/bash
|
|
set -e
|
|
|
|
DEPLOY_HOME="/home/kjh2064"
|
|
PORT_FILE="$DEPLOY_HOME/taxbaik_port"
|
|
TIMESTAMP=$(TZ=Asia/Seoul date +%Y%m%d_%H%M%S)
|
|
|
|
echo "===== 🚀 TaxBaik Green/Blue Deployment Script ====="
|
|
|
|
if [ "${TAXBAIK_DEPLOY_FROM_CI:-}" != "1" ]; then
|
|
echo "❌ This deployment script may only be run from CI." >&2
|
|
exit 1
|
|
fi
|
|
|
|
# 1. Determine active port
|
|
ACTIVE_PORT=5003
|
|
if [ -f "$PORT_FILE" ]; then
|
|
ACTIVE_PORT=$(cat "$PORT_FILE" | tr -d '[:space:]')
|
|
fi
|
|
|
|
# 2. Determine target port
|
|
TARGET_PORT=5003
|
|
if [ "$ACTIVE_PORT" -eq 5003 ]; then
|
|
TARGET_PORT=5004
|
|
else
|
|
TARGET_PORT=5003
|
|
fi
|
|
|
|
echo "Active Port: $ACTIVE_PORT"
|
|
echo "Target Port: $TARGET_PORT"
|
|
|
|
# 3. New deploy dir is passed as first argument
|
|
DEPLOY_DIR="$1"
|
|
if [ -z "$DEPLOY_DIR" ]; then
|
|
echo "Error: Deployment directory argument required"
|
|
exit 1
|
|
fi
|
|
|
|
echo "Deploy Directory: $DEPLOY_DIR"
|
|
|
|
if [ ! -s "$DEPLOY_DIR/appsettings.Production.json" ]; then
|
|
echo "❌ Missing production settings: $DEPLOY_DIR/appsettings.Production.json" >&2
|
|
exit 1
|
|
fi
|
|
|
|
if [ ! -s "$DEPLOY_DIR/proxy/TaxBaik.Proxy.dll" ]; then
|
|
echo "❌ Missing proxy artifact: $DEPLOY_DIR/proxy/TaxBaik.Proxy.dll" >&2
|
|
exit 1
|
|
fi
|
|
|
|
# 0. Ensure the local TCP proxy exists and is running.
|
|
# Nginx and external traffic always enter through 127.0.0.1:5001.
|
|
if ! ss -tln | grep -q ':5001 '; then
|
|
echo "=== Starting proxy on 127.0.0.1:5001 ==="
|
|
cd "$DEPLOY_DIR/proxy"
|
|
nohup /usr/bin/dotnet TaxBaik.Proxy.dll > "$DEPLOY_HOME/taxbaik_proxy.log" 2>&1 &
|
|
sleep 2
|
|
fi
|
|
|
|
if ! ss -tln | grep -q ':5001 '; then
|
|
echo "❌ Proxy on 127.0.0.1:5001 is not running. Abort deploy." >&2
|
|
exit 1
|
|
fi
|
|
|
|
# 4. Start the new app on the target port
|
|
echo "=== Starting New App on Port $TARGET_PORT ==="
|
|
cd "$DEPLOY_DIR"
|
|
export ASPNETCORE_ENVIRONMENT=Production
|
|
export ASPNETCORE_URLS="http://127.0.0.1:$TARGET_PORT"
|
|
export ConnectionStrings__Default="Host=localhost;Database=taxbaikdb;Username=taxbaik;Password=taxbaik123"
|
|
export ApiClient__BaseUrl="http://127.0.0.1:$TARGET_PORT/taxbaik/api/"
|
|
export DOTNET_PRINT_TELEMETRY_MESSAGE=false
|
|
|
|
# Run dotnet process
|
|
nohup /usr/bin/dotnet TaxBaik.Web.dll > "web_${TARGET_PORT}.log" 2>&1 &
|
|
NEW_PID=$!
|
|
sleep 2
|
|
|
|
# Verify process is running
|
|
if ! ps -p $NEW_PID > /dev/null; then
|
|
echo "❌ Failed to start dotnet process on port $TARGET_PORT"
|
|
exit 1
|
|
fi
|
|
|
|
# 5. Health Check Loop
|
|
echo "=== Health Checking Port $TARGET_PORT ==="
|
|
ATTEMPTS=20
|
|
SUCCESS=false
|
|
for i in $(seq 1 $ATTEMPTS); do
|
|
STATUS=$(curl -sf -o /dev/null -w '%{http_code}' "http://127.0.0.1:${TARGET_PORT}/taxbaik/healthz" 2>/dev/null || echo "000")
|
|
if [ "$STATUS" = "200" ]; then
|
|
echo "✓ Health check passed on port $TARGET_PORT (Attempt $i/$ATTEMPTS)"
|
|
SUCCESS=true
|
|
break
|
|
fi
|
|
echo " Waiting for health check... ($i/$ATTEMPTS, Status: $STATUS)"
|
|
sleep 2
|
|
done
|
|
|
|
if [ "$SUCCESS" = "false" ]; then
|
|
echo "❌ Health check failed. Rolling back..."
|
|
kill -9 $NEW_PID || true
|
|
exit 1
|
|
fi
|
|
|
|
# 6. Switch Traffic
|
|
# Nginx never needs per-deploy changes: it always proxies to the persistent
|
|
# TaxBaik.Proxy on 127.0.0.1:5001, which reads this same PORT_FILE and
|
|
# forwards to whichever port is currently active. See CLAUDE.md section 6.
|
|
echo "=== Switching Traffic to Port $TARGET_PORT ==="
|
|
echo "$TARGET_PORT" > "$PORT_FILE"
|
|
echo "✓ Traffic routed to $TARGET_PORT (via TaxBaik.Proxy on 5001)"
|
|
|
|
# 7. Terminate Old App
|
|
echo "=== Stopping Old App on Port $ACTIVE_PORT ==="
|
|
# Find PID listening on ACTIVE_PORT
|
|
OLD_PID=$(ss -tlnp | grep ":$ACTIVE_PORT " | grep -oP 'pid=\K\d+' | head -n1)
|
|
if [ -n "$OLD_PID" ]; then
|
|
echo "Killing old process PID: $OLD_PID"
|
|
kill -15 $OLD_PID || kill -9 $OLD_PID
|
|
echo "✓ Old process terminated"
|
|
else
|
|
echo "No old process found on port $ACTIVE_PORT"
|
|
fi
|
|
|
|
# 8. Cleanup old deployment directories (Keep last 5)
|
|
echo "=== Cleaning Up Old Deployments ==="
|
|
ls -1dt $DEPLOY_HOME/deployments/taxbaik_* 2>/dev/null | tail -n +6 | xargs rm -rf 2>/dev/null || true
|
|
echo "✓ Cleanup completed"
|
|
|
|
echo "===== ✅ Green/Blue Deployment Completed Successfully ====="
|