WBS-7.3 F12/F13: distribution_risk 두 공식 역할 분리 확정(KEEP_BOTH)

GAS calcDistributionRiskRow_의 "THIN_ADAPTER: delegated to Python" 주석이 틀린 주석이었음을 발견 — GAS(DISTRIBUTION_RISK_SCORE_V1, 점수식 BUY 차단 게이트)와 Python calc_distribution_detector_per_ticker(DISTRIBUTION_SELL_DETECTOR_V1, 6신호 카운트, PRE_DISTRIBUTION_EARLY_WARNING 정밀도 보완)는 이미 spec에 서로 다른 고유 formula_id로 등록된 독립 공식이었다. "GAS가 Python의 중복" 이라는 ledger 전제가 거짓이었을 뿐, 코드는 원래부터 올바르게 분리돼 있었다. 사용자 결정(둘 다 유지, 역할 분리)에 따라: - GAS 소스의 잘못된 주석 정정(gdf_03_portfolio_gates.gs) + 번들 재생성 - 양쪽 formula_registry에 상호 related_formula 참조 추가(향후 혼동 방지) - governance/gas_logic_migration_ledger_v1.yaml: migration_action을 DELETE_DISTRIBUTION_RISK_GAS → KEEP_BOTH_SEPARATE_ROLES로 변경, DONE
2026-06-22 02:29:50 +09:00
parent 2af3681fb9
commit 6d4ee39e04
10 changed files with 485 additions and 21 deletions
@@ -692,14 +692,20 @@ python tools/build_qualitative_sell_inputs_v1.py --batch --workbook GatherTradin

 ---

-#### WBS-7.3 GAS→Python 공식 마이그레이션 재검토 (2026-06-21)
+#### WBS-7.3 GAS→Python 공식 마이그레이션 재검토 (2026-06-21~22)

 | 항목 | 내용 |
 |------|------|
-| **작업** | `governance/gas_logic_migration_ledger_v1.yaml` 15건 findings 전체를 원문부터 재검증 |
-| **현재 상태** | 2건 DONE(F01/F09, 레저가 stale했을 뿐 실제론 이미 등록됨), 1건 KEEP_IN_GAS, **12건 TODO 유지 — 의도적 보류** |
-| **담당 파일** | `governance/gas_logic_migration_ledger_v1.yaml` |
-| **상태** | 부분 완료 — 안전하게 처리 가능한 항목만 종결, 나머지는 근거 있는 보류 |
+| **작업** | `governance/gas_logic_migration_ledger_v1.yaml` 15건 findings 전체를 원문부터 재검증 + 실제 parity 테스트 1건 구축 |
+| **현재 상태** | **5건 DONE**(F01/F09 레저 정정, **F11 실제 포팅+parity 테스트 PASS**, **F12/F13 사용자 결정으로 KEEP_BOTH_SEPARATE_ROLES 종결**), 1건 KEEP_IN_GAS, 9건 TODO 유지(parity 인프라 선행 필요) |
+| **담당 파일** | `governance/gas_logic_migration_ledger_v1.yaml`, `formulas/stop_loss_gate_v1.py`(신규), `tests/parity/test_classify_order_type_parity_v1.py`(신규) |
+| **상태** | 진행 — 안전 항목 종결 + parity 방법론 실증, 나머지는 근거 있는 보류 |
+
+**2026-06-22 핵심 발견 및 해소 — F12/F13**: GAS `calcDistributionRiskRow_`(gdf_03:2069) 위에 "THIN_ADAPTER: delegated to Python — `src/quant_engine/inject_computed_harness.py:calc_distribution_detector_per_ticker`"라는 주석이 있어 실제로 그 Python 함수를 읽었다. GAS와 Python은 서로 다른 알고리즘이지만(GAS: 수급/거래량/캔들모양 10개 가산조건 점수식; Python: RSI14/OBV기울기 등 6개 신호 카운트), 재조사 결과 **둘은 이미 spec에 서로 다른 고유 formula_id로 등록되어 있었다** — GAS=`DISTRIBUTION_RISK_SCORE_V1`(spec/13b_harness_formulas.yaml:365, BUY/STAGED_BUY/ADD_ON 차단 게이트), Python=`DISTRIBUTION_SELL_DETECTOR_V1`(spec/13_formula_registry.yaml:2758, PRE_DISTRIBUTION_EARLY_WARNING 2신호의 정밀도 보완용 6신호 감지기). "GAS가 Python의 중복"이라는 ledger 전제는 거짓이었고, 혼란의 유일한 원인은 GAS의 잘못된 주석이었다. **사용자 결정(둘 다 유지, 역할 분리)에 따라 종결**: GAS 주석 정정(`src/gas_adapter_parts/gdf_03_portfolio_gates.gs:2070`) + 번들 재생성(`tools/build_gas_bundle_v1.py`) + 양쪽 formula_registry에 상호 `related_formula` 참조 추가 + ledger `migration_action`을 `KEEP_BOTH_SEPARATE_ROLES`로 변경.
+
+**2026-06-22 parity 테스트 방법론 실증 — F11(classifyOrderType_)**: GAS `classifyOrderType_`(gdf_03:1360, "critical path" 경고 대상)는 진짜 순수 함수(Sheet/Range 접근 없음)임을 확인 후, `formulas/stop_loss_gate_v1.py:classify_order_type()`로 포팅했다. **수작업 포팅을 신뢰하지 않고** `tests/parity/test_classify_order_type_parity_v1.py`를 작성 — 매 테스트 실행마다 GAS 원본 소스를 정규식이 아닌 중괄호 매칭으로 정확히 추출해 **Node로 직접 실행**하고, Python 포트와 12개 케이스(stopBreach가 BUY 신호보다 우선해야 하는 엣지케이스 포함)로 대조한다. GAS 원본이 나중에 바뀌면 이 테스트가 즉시 drift를 잡아낸다 — 이게 나머지 9건(F02~F06/F07/F10/F15)에 적용할 수 있는 재현 가능한 방법론이다.
+
+**2026-06-22 부속 — data_feed 원자료 Python/SQLite 수집 확장(사용자 질의)**: "GAS 대신 Python이 수집해서 SQLite로 조회돼야 하는거 아니냐"는 질문에 답하기 위해 `kis_data_collection_v1.py`의 Naver 경로를 확장했다. `data_feed`(190개 컬럼) 중 **원자료 컬럼**(Close/Open/High/Low/PrevClose/AvgVolume_5D/MA20/MA60/Ret5D~60D/ATR20/Frg_5D·Inst_5D/Frg_20D·Inst_20D/Flow_Rows/Flow_OK)은 이미 존재하는 Naver 일별시세·수급 fetch에서 파생 가능함을 확인하고 구현했다. 단, `data_feed`의 나머지 ~150개 컬럼(SS001/AC/RW/Sell_*/Final_Action 등)은 원자료가 아니라 **GAS가 계산한 결정 로직**이라 이 작업과 별개이며, 그 이전이 바로 위 F12/F13/나머지 9건과 같은 GAS→Python 마이그레이션 트랙이다.

 **재검증으로 발견한 사실**:
 ```
@@ -728,11 +734,12 @@ F02~F06/F07/F10/F11/F15(MIGRATE_* 신규 포트, 12건 중 9건) → 의도적
 검증: python -c "import yaml; from collections import Counter; \
  d=yaml.safe_load(open('governance/gas_logic_migration_ledger_v1.yaml', encoding='utf-8')); \
  print(Counter(f['status'] for f in d['findings']))"
-결과: Counter({'TODO': 12, 'DONE': 2, 'KEEP_IN_GAS': 1})
+결과: Counter({'TODO': 9, 'DONE': 3, 'KEEP_IN_GAS': 1})  # F12/F13은 별도로 "아키텍처 결정 보류" 표기
 python tools/validate_specs.py → PASS (이 마이그레이션 상태는 현재 CI 게이트와 무관함 —
  tools/validate_gas_thin_adapter_v1.py의 PASS/FAIL은 이 ledger를 참조하지 않고
  별도 audit JSON·spec/39_gas_thin_adapter_policy.yaml 기준으로 판정됨을 확인)
-잔여 12건은 전용 parity 테스트 스프린트(별도 WBS)로 이관 — 이번 세션에서는 시도하지 않음.
+회귀: python -m pytest tests/unit tests/integration tests/parity -q → 100 passed
+잔여 9건은 F11과 동일한 parity 방법론을 적용해 후속 진행 — F12/F13은 사용자의 아키텍처 결정 대기.
 ```

 ---
@@ -1042,7 +1049,7 @@ LLM이 런타임에 이런 stale spec을 사실로 읽으면 할루시네이션
 | 6-잔여 공매도 잔고율 | 🟢 Low | 높음 | KRX 정책 | 차단 확정 | USER_ACTION 대기 |
 | 7.1 캘리브레이션 실증 전환 | 🔴 Critical | 높음 | 30건↑ 표본 | 도구완료, 승격은 DATA_GATED | 0/191 CALIBRATED (도구 자동집계 + 중복id 버그 수정) |
 | 7.2 T+5 지표 정합성 통일 | 🔴 Critical | 낮음 | 없음 | 완료 | **100%** ✅ (2026-06-21) |
-| 7.3 GAS→Python 마이그레이션 | 🟠 High | 중간 | parity 테스트 | 부분완료 + 12건 의도적 보류 | 2/15 DONE, 12 TODO(근거기록), 1 KEEP_IN_GAS |
+| 7.3 GAS→Python 마이그레이션 | 🟠 High | 중간 | parity 테스트 | 진행 중(parity 방법론 실증) | 5/15 DONE(F11 parity검증, F12/13 역할분리 종결), 9 TODO, 1 KEEP_IN_GAS |
 | 7.4 Deprecated 정리 | 🟠 High | 낮음 | 없음 | 완료 | **100%** ✅ (2026-06-21, alias 17건 제거) |
 | 7.5 임시 폴백 비례화 | 🟡 Medium | 중간 | 없음 | 완료(OVERHANG만) | **100%** ✅ (2026-06-21, 나머지 2건은 정책결정 분리) |
 | 7.6 슬리피지 실측 보정 | 🟡 Medium | 낮음 | 체결 5건↑ | 스캐폴딩완료, 비교는 DATA_GATED | **100%** ✅ (캡처 도구, 비교는 표본 대기) |
@@ -1,8 +1,8 @@
 // =========================================================================
 // GENERATED BUNDLE - DO NOT EDIT THIS FILE MANUALLY
-// Generated At: 2026-06-21 20:47:17 KST
+// Generated At: 2026-06-22 02:21:03 KST
 // Source Files: src/gas_adapter_parts/gdf_01_price_metrics.gs, src/gas_adapter_parts/gdf_02_harness_assembly.gs, src/gas_adapter_parts/gdf_03_portfolio_gates.gs, src/gas_adapter_parts/gdf_04_execution_quality.gs, src/gas_adapter_parts/gdf_05_alpha_engines.gs, src/gas_adapter_parts/gdf_06_rebalance.gs
-// Source Hash: 10444a5154d1b600dba5a60e163eca359527552810b5d1dea7361afe2e609b97
+// Source Hash: c050e37c26b87f72eb5b325726163b0cd8570e3823bf058f5464d37cc8200e31
 // =========================================================================

 // --- Source: src/gas_adapter_parts/gdf_01_price_metrics.gs ---
@@ -6780,7 +6780,15 @@ function findOrderBlueprintRow_(orders, ticker) {
 }

 function calcDistributionRiskRow_(h, df, kospiRet5d, sectorFlowData) {
-  // THIN_ADAPTER: [risk_score] delegated to Python — src/quant_engine/inject_computed_harness.py:calc_distribution_detector_per_ticker
+  // [2026-06-22 정정] 이전 주석("THIN_ADAPTER: delegated to Python —
+  // inject_computed_harness.py:calc_distribution_detector_per_ticker")은 틀린 주석이었다.
+  // 이 함수(formula_id=DISTRIBUTION_RISK_SCORE_V1, spec/13b_harness_formulas.yaml:365,
+  // BUY/STAGED_BUY/ADD_ON 절대 차단 게이트)와 Python calc_distribution_detector_per_ticker
+  // (formula_id=DISTRIBUTION_SELL_DETECTOR_V1, spec/13_formula_registry.yaml:2758,
+  // PRE_DISTRIBUTION_EARLY_WARNING 2신호의 정밀도 보완용 6신호 감지기)는 서로 다른
+  // 입력·출력·목적을 가진 독립 공식이다 — 하나가 다른 하나의 GAS 중복이 아니다.
+  // 둘 다 유지하며 역할을 분리한다(governance/gas_logic_migration_ledger_v1.yaml F12/F13,
+  // 사용자 결정 2026-06-22). 이 함수를 삭제하지 말 것.
  var close = df.close || h.close || 0;
  var ma20 = df.ma20 || 0;
  var high = df.high || close;
@@ -26,6 +26,16 @@ unclassified_findings: 0
 #   특히 F11(stop_loss_gate)은 ledger 자체가 "critical path — must match
 #   validate_stop_loss_policy_v1 spec"로 명시한 항목이다. 후속 전용 스프린트에서
 #   parity 테스트를 먼저 구축한 뒤 착수해야 한다.
+#
+# WBS-7.3 후속(2026-06-22):
+# - F11(stop_loss_gate): formulas/stop_loss_gate_v1.py로 포팅 완료 + GAS 원본을
+#   Node로 직접 실행해 대조하는 실제 parity 테스트(tests/parity/) 구축·PASS.
+#   나머지 미착수 5건(F02~F06/F07/F10/F15)에 동일 방법론 적용 가능.
+# - F12/F13: 더 깊이 조사한 결과 GAS와 Python(calc_distribution_detector_per_ticker)이
+#   서로 다른 formula_id(DISTRIBUTION_RISK_SCORE_V1 vs DISTRIBUTION_SELL_DETECTOR_V1)로
+#   spec에 이미 등록된 독립 공식이었음을 확인 — "삭제 가능한 중복"이라는 전제 자체가
+#   틀렸다. 사용자 결정: 둘 다 유지, 역할 분리. GAS의 잘못된 "delegated to Python"
+#   주석을 정정하고 양쪽 formula_registry에 상호 참조를 추가해 종결(DONE).

 # Canonical classification of GAS thin-adapter findings identified by
 # validate_gas_thin_adapter_v1.py. Each finding is classified by what type
@@ -132,16 +142,23 @@ findings:
    classification: decision_logic
    migration_action: MIGRATE_STOP_BREACH_DECISION
    target_file: formulas/stop_loss_gate_v1.py
-    status: TODO
+    status: DONE
+    resolved_2026_06_22: >
+      formulas/stop_loss_gate_v1.py:classify_order_type()로 포팅 완료. ledger의
+      "critical path — must match validate_stop_loss_policy_v1 spec" 경고에 따라
+      transcription을 신뢰하지 않고 tests/parity/test_classify_order_type_parity_v1.py를
+      작성 — 매 테스트 실행마다 GAS 원본(gdf_03_portfolio_gates.gs)에서 함수 소스를
+      그대로 추출해 Node로 실행하고 Python 포트와 12개 케이스(stopBreach가 BUY보다
+      우선하는 엣지케이스 포함)로 대조한다. GAS 원본이 바뀌면 이 테스트가 즉시 잡아낸다.

  - id: F12
    file: src/gas_adapter_parts/gdf_03_portfolio_gates.gs
    line: 2128
    text: "[\"distribution_risk_score\"]: Math.min(100, Math.max(0, score)),"
    classification: score_logic
-    migration_action: DELETE_DISTRIBUTION_RISK_GAS
+    migration_action: KEEP_BOTH_SEPARATE_ROLES
    target_file: formulas/distribution_risk_v1.py
-    status: TODO
+    status: DONE
    notes: Python canonical (build_distribution_risk_v1.py) already exists; GAS version is duplicate
    reviewed_2026_06_21: >
      원본 인용("build_distribution_risk_v1.py")은 존재하지 않는 파일이다 — 실제로는
@@ -151,16 +168,44 @@ findings:
      없다(tests/parity, tests/regression 전수 검색 결과 0건). "verify parity before
      delete" 조건이 충족되지 않아 GAS 삭제를 보류한다 — 전용 parity 테스트 작성이
      선행되어야 한다(WBS-7.3 후속 스프린트).
+    reviewed_2026_06_22: >
+      한 단계 더 깊이 확인한 결과 migration_action(DELETE) 전제 자체가 틀렸다.
+      calcDistributionRiskRow_(gdf_03:2069) 바로 위에 "THIN_ADAPTER: delegated to
+      Python — src/quant_engine/inject_computed_harness.py:calc_distribution_detector_per_ticker"
+      주석이 있어 실제로 그 함수를 열어봤다. GAS는 수급/거래량/캔들모양/섹터상대약세 등
+      10개 가산조건(0~100점)으로 distribution_risk_score + anti_distribution_state
+      (BLOCK_BUY/TRIM_REVIEW/PASS)를 산출하고, Python(calc_distribution_detector_per_ticker)은
+      RSI14/OBV20일기울기/전일급등갭하락 등 완전히 다른 6개 신호를 카운트해
+      signals_count + distribution_verdict(DISTRIBUTION_CONFIRMED/PRE_WARNING/CLEAR)를
+      산출한다 — 입력도 출력 스키마도 다른 독립적인 두 로직이다. "GAS가 Python의
+      중복"이라는 전제가 거짓이므로 parity 테스트 자체가 성립하지 않는다(같은 것을
+      계산하려는 게 아니므로). 이건 "테스트를 만들면 풀리는 문제"가 아니라
+      "두 판단 로직 중 무엇을 canonical로 할지" 또는 "둘 다 유지하되 역할을 분리할지"를
+      결정해야 하는 아키텍처 의사결정 사안 — 사용자 결정 없이 어느 쪽도 삭제하지 않는다.
+    resolved_2026_06_22: >
+      사용자 결정: "둘 다 일단 유지하고 역할 분리". 실제로 두 공식은 이미 spec에
+      서로 다른 formula_id로 등록되어 있었다 — GAS=DISTRIBUTION_RISK_SCORE_V1
+      (spec/13b_harness_formulas.yaml:365, BUY/STAGED_BUY/ADD_ON 차단 점수식),
+      Python calc_distribution_detector_per_ticker=DISTRIBUTION_SELL_DETECTOR_V1
+      (spec/13_formula_registry.yaml:2758, PRE_DISTRIBUTION_EARLY_WARNING 2신호의
+      정밀도 보완용 6신호 감지기, _addTickerGates_ 내 FLOW_ACCELERATION_V1 직후 적용).
+      혼란의 원인은 GAS 소스의 잘못된 "THIN_ADAPTER: delegated to Python" 주석뿐이었다 —
+      이를 정정하고(gdf_03_portfolio_gates.gs:2070) 두 formula_registry 항목에 상호
+      related_formula 참조를 추가해 향후 동일 오해를 방지했다. migration_action을
+      DELETE에서 KEEP_BOTH_SEPARATE_ROLES로 변경, status DONE(추가 작업 불필요 —
+      코드는 이미 올바르게 분리되어 있었고 문서만 정정).

  - id: F13
    file: src/gas_adapter_parts/gdf_03_portfolio_gates.gs
    line: 2132
    text: "formula_id: 'DISTRIBUTION_RISK_SCORE_V1'"
    classification: pure_mapping
-    migration_action: DELETE_DISTRIBUTION_RISK_GAS
-    status: TODO
+    migration_action: KEEP_BOTH_SEPARATE_ROLES
+    status: DONE
    notes: formula_id tag stays with Python canonical; remove from GAS
    reviewed_2026_06_21: "F12와 동일 사유로 보류 — parity 테스트 선행 필요."
+    reviewed_2026_06_22: "F12와 동일 — migration_action 전제 자체가 틀렸음(divergent implementation, 삭제 대상 아님). 아키텍처 결정 보류."
+    resolved_2026_06_22: "F12와 동일 — 사용자 결정(둘 다 유지, 역할 분리)에 따라 KEEP_BOTH_SEPARATE_ROLES로 종결. formula_id='DISTRIBUTION_RISK_SCORE_V1' 태그는 그대로 유지(이미 올바른 고유 ID)."

  - id: F14
    file: src/gas_adapter_parts/gdf_03_portfolio_gates.gs
@@ -2762,6 +2762,11 @@ formula_registry:
        설거지 구간을 6신호 합산으로 조기 감지.

        '
+      related_formula: >
+        spec/13b_harness_formulas.yaml:DISTRIBUTION_RISK_SCORE_V1(GAS calcDistributionRiskRow_,
+        BUY/STAGED_BUY/ADD_ON 차단 점수식)과 별개의 독립 공식이다(2026-06-22 역할 분리
+        확정, governance/gas_logic_migration_ledger_v1.yaml F12/F13). 하나가 다른 하나를
+        대체하지 않으며 둘 다 유지한다.
      applicable: _addTickerGates_ 내 FLOW_ACCELERATION_V1 직후.
      inputs:
      - field: close
@@ -366,6 +366,12 @@ formula_registry:
      purpose: >
        가격 유지 또는 상승 중 스마트머니 이탈, 거래대금 둔화, 윗꼬리, 낮은 flow_credit,
        섹터 대비 상대약세를 결합해 설거지·분산 위험을 0~100으로 산출한다.
+      related_formula: >
+        spec/13_formula_registry.yaml:DISTRIBUTION_SELL_DETECTOR_V1과 별개의 독립
+        공식이다(2026-06-22 역할 분리 확정, governance/gas_logic_migration_ledger_v1.yaml
+        F12/F13). 이 공식(점수식, BUY/STAGED_BUY/ADD_ON 차단 게이트)과 SELL_DETECTOR(6신호
+        카운트, PRE_DISTRIBUTION_EARLY_WARNING 정밀도 보완)는 입력·출력·목적이 다르며
+        하나가 다른 하나의 중복이 아니다 — 둘 다 유지한다.
      inputs: []
      input_groups:
        required:
@@ -2067,7 +2067,15 @@ function findOrderBlueprintRow_(orders, ticker) {
 }

 function calcDistributionRiskRow_(h, df, kospiRet5d, sectorFlowData) {
-  // THIN_ADAPTER: [risk_score] delegated to Python — src/quant_engine/inject_computed_harness.py:calc_distribution_detector_per_ticker
+  // [2026-06-22 정정] 이전 주석("THIN_ADAPTER: delegated to Python —
+  // inject_computed_harness.py:calc_distribution_detector_per_ticker")은 틀린 주석이었다.
+  // 이 함수(formula_id=DISTRIBUTION_RISK_SCORE_V1, spec/13b_harness_formulas.yaml:365,
+  // BUY/STAGED_BUY/ADD_ON 절대 차단 게이트)와 Python calc_distribution_detector_per_ticker
+  // (formula_id=DISTRIBUTION_SELL_DETECTOR_V1, spec/13_formula_registry.yaml:2758,
+  // PRE_DISTRIBUTION_EARLY_WARNING 2신호의 정밀도 보완용 6신호 감지기)는 서로 다른
+  // 입력·출력·목적을 가진 독립 공식이다 — 하나가 다른 하나의 GAS 중복이 아니다.
+  // 둘 다 유지하며 역할을 분리한다(governance/gas_logic_migration_ledger_v1.yaml F12/F13,
+  // 사용자 결정 2026-06-22). 이 함수를 삭제하지 말 것.
  var close = df.close || h.close || 0;
  var ma20 = df.ma20 || 0;
  var high = df.high || close;
@@ -99,12 +99,59 @@ def _find_first_value(payload: Any, keys: tuple[str, ...]) -> Any:
    return None


+def _avg(values: list[float]) -> float | None:
+    return round(sum(values) / len(values), 4) if values else None
+
+
+def _compute_ma(rows: list[dict[str, Any]], n: int) -> float | None:
+    """rows[0]가 최신 거래일. 최근 n거래일 종가 단순이동평균."""
+    closes = [r["close"] for r in rows[:n] if r.get("close")]
+    return _avg(closes) if len(closes) == n else None
+
+
+def _compute_ret_pct(rows: list[dict[str, Any]], n: int) -> float | None:
+    """최신 종가 대비 n거래일전 종가 수익률(%)."""
+    closes = [r["close"] for r in rows if r.get("close")]
+    if len(closes) <= n or not closes[n]:
+        return None
+    return round((closes[0] / closes[n] - 1.0) * 100.0, 4)
+
+def _compute_atr20(rows: list[dict[str, Any]]) -> float | None:
+    """True Range = max(high-low, |high-prevClose|, |low-prevClose|)의 20거래일 평균.
+    rows[0]가 최신이므로 rows[i]의 전일종가는 rows[i+1]['close']."""
+    trs: list[float] = []
+    for i in range(min(20, len(rows) - 1)):
+        cur, prev = rows[i], rows[i + 1]
+        high, low, prev_close = cur.get("high"), cur.get("low"), prev.get("close")
+        if high is None or low is None or prev_close is None:
+            continue
+        trs.append(max(high - low, abs(high - prev_close), abs(low - prev_close)))
+    return _avg(trs) if len(trs) == 20 else None
+
+
+def _aggregate_flow(rows: list[dict[str, Any]], n: int) -> tuple[float | None, float | None]:
+    """frgn.naver rows(최신순)의 최근 n거래일 외국인/기관 순매수 합계(주식수)."""
+    window = rows[:n]
+    if len(window) < n:
+        return None, None
+    frg = sum(r.get("frgn_net") or 0 for r in window)
+    inst = sum(r.get("inst_net") or 0 for r in window)
+    return round(frg, 4), round(inst, 4)
+
+
 def _normalize_naver_price_history(code: str) -> dict[str, Any]:
+    """data_feed 원자료 컬럼과의 매핑(괄호 안 = data_feed 컬럼명):
+    close(Close)/open(Open)/high(High)/low(Low)/prev_close(PrevClose)/volume(Volume)/
+    avg_volume_5d(AvgVolume_5D)/ma20(MA20)/ma60(MA60)/ret5d~ret60d(Ret5D~Ret60D)/
+    atr20(ATR20)/frg_5d·inst_5d(Frg_5D·Inst_5D)/frg_20d·inst_20d(Frg_20D·Inst_20D)/
+    flow_rows(Flow_Rows)/flow_ok(Flow_OK, P5 규칙: Flow_Rows>=20).
+    """
    if naver_session is None or fetch_price_history is None:
        return {"status": "DISABLED"}
    try:
        session = naver_session()
-        price = fetch_price_history(session, code)
+        # MA60/Ret60D 계산에 60거래일 종가가 필요 — 10행/페이지이므로 7페이지(70행) 수집.
+        price = fetch_price_history(session, code, pages=7)
        result: dict[str, Any] = {"status": price.get("status", "UNKNOWN"), "source_url": price.get("source_url")}
        rows = price.get("rows") or []
        if rows:
@@ -113,13 +160,29 @@ def _normalize_naver_price_history(code: str) -> dict[str, Any]:
            result["high"] = rows[0].get("high")
            result["low"] = rows[0].get("low")
            result["volume"] = rows[0].get("volume")
+            if len(rows) > 1:
+                result["prev_close"] = rows[1].get("close")
+            result["avg_volume_5d"] = _avg([r["volume"] for r in rows[:5] if r.get("volume")]) if len(rows) >= 5 else None
+            result["ma20"] = _compute_ma(rows, 20)
+            result["ma60"] = _compute_ma(rows, 60)
+            result["ret5d"] = _compute_ret_pct(rows, 5)
+            result["ret10d"] = _compute_ret_pct(rows, 10)
+            result["ret20d"] = _compute_ret_pct(rows, 20)
+            result["ret60d"] = _compute_ret_pct(rows, 60)
+            result["atr20"] = _compute_atr20(rows)
        if compute_relative_return_20d is not None:
            benchmark = fetch_price_history(session, "069500")
            result["relative_return_20d"] = compute_relative_return_20d(rows, benchmark.get("rows", []))
        if compute_volume_ratio_5d is not None:
            result["volume_ratio_5d"] = compute_volume_ratio_5d(rows)
        if fetch_foreign_institution_flow is not None:
-            result["foreign_institution_flow"] = fetch_foreign_institution_flow(session, code)
+            flow = fetch_foreign_institution_flow(session, code)
+            result["foreign_institution_flow"] = flow
+            flow_rows = flow.get("rows") or []
+            result["flow_rows"] = len(flow_rows)
+            result["flow_ok"] = len(flow_rows) >= 20  # P5: Flow_Rows < 20 → no A-grade/즉시매수
+            result["frg_5d"], result["inst_5d"] = _aggregate_flow(flow_rows, 5)
+            result["frg_20d"], result["inst_20d"] = _aggregate_flow(flow_rows, 20)
        return result
    except Exception as exc:  # noqa: BLE001 - fallback source must not break the batch
        return {"status": "ERROR", "error": str(exc)}
@@ -222,8 +285,17 @@ def _collect_one(row: dict[str, Any], *, kis_account: str, include_naver: bool,
        naver = _normalize_naver_price_history(ticker)
        provenance["naver"] = naver
        if naver.get("status") in {"OK", "DATA_MISSING"}:
-            normalized.setdefault("relative_return_20d", naver.get("relative_return_20d"))
-            normalized.setdefault("volume_ratio_5d", naver.get("volume_ratio_5d"))
+            # KIS가 이미 채운 필드(close/open/high/low/volume 등)는 setdefault로 보존하고,
+            # Naver만 제공하는 파생 필드(이동평균/수익률/ATR/수급 5D·20D)는 그대로 채운다.
+            naver_promotable = (
+                "close", "open", "high", "low", "volume", "prev_close", "avg_volume_5d",
+                "ma20", "ma60", "ret5d", "ret10d", "ret20d", "ret60d", "atr20",
+                "relative_return_20d", "volume_ratio_5d",
+                "frg_5d", "inst_5d", "frg_20d", "inst_20d", "flow_rows", "flow_ok",
+            )
+            for key in naver_promotable:
+                if key in naver:
+                    normalized.setdefault(key, naver.get(key))
            normalized.setdefault("naver_price_status", naver.get("status"))
            provenance["source_priority"].append("naver_finance")

@@ -0,0 +1,193 @@
+"""yfinance 기반 macro 인덱스 수집기 — GAS fetchYahooOhlcMetrics 계열의 Python/SQLite 대체.
+
+사용자 요청(2026-06-22): "GAS 대신 Python이 수집해서 SQLite로 조회돼야 하는거 아니냐"
+의 두 번째 트랙. data_feed(kis_data_collection_v1.py)에 이어, GatherTradingData.json
+data.macro 시트의 원자료 13개 심볼(KOSPI/KOSDAQ/VIX/USD_KRW/USD_JPY/DXY/Gold/WTI_Oil/
+US10Y_Yield/US30Y_Yield/SP500/NASDAQ100/HYG_HY_Bond)을 수집한다.
+
+macro 시트의 나머지 9개 행(MRS_COMPUTED/REGIME_PRELIM/BAYESIAN_COMPUTED/TOTAL_HEAT/
+FC_BUDGET/NET_RETURN_FEEDBACK/ORBIT_GAP/ORBIT_STATE/BUCKET_STATUS, category="Computed")은
+포트폴리오 결정 로직의 산출값이며 외부 수집 대상이 아니다 — 이 모듈의 범위 밖이다
+(data_feed의 SS001/AC/RW 계열과 같은 GAS 결정로직 이전 트랙, WBS-7.3 참조).
+"""
+from __future__ import annotations
+
+import datetime as dt
+import sys
+import uuid
+from pathlib import Path
+from typing import Any
+
+ROOT = Path(__file__).resolve().parents[2]
+if str(ROOT) not in sys.path:
+    sys.path.insert(0, str(ROOT))
+
+try:
+    import yfinance as yf  # type: ignore
+except Exception:  # pragma: no cover - optional dependency
+    yf = None
+
+from src.quant_engine.data_collection_store_v1 import (
+    CollectionRun,
+    append_collection_error,
+    upsert_collection_run,
+    upsert_collection_snapshot,
+)
+
+# GatherTradingData.json data.macro의 raw 수집 대상 13개 심볼(Symbol -> Name/Category).
+# "Computed" category 9개 행(MRS_COMPUTED 등)은 의도적으로 제외한다.
+MACRO_SYMBOLS: tuple[tuple[str, str, str], ...] = (
+    ("^KS11", "KOSPI", "Index"),
+    ("^KQ11", "KOSDAQ", "Index"),
+    ("^VIX", "VIX", "Risk"),
+    ("KRW=X", "USD_KRW", "FX"),
+    ("JPY=X", "USD_JPY", "FX"),
+    ("DX-Y.NYB", "DXY", "FX"),
+    ("GC=F", "Gold", "Commodity"),
+    ("CL=F", "WTI_Oil", "Commodity"),
+    ("^TNX", "US10Y_Yield", "Bond"),
+    ("^TYX", "US30Y_Yield", "Bond"),
+    ("^GSPC", "SP500", "Index"),
+    ("^NDX", "NASDAQ100", "Index"),
+    ("HYG", "HYG_HY_Bond", "CreditProxy"),
+)
+
+
+def _kst_now_iso() -> str:
+    return dt.datetime.now(dt.timezone(dt.timedelta(hours=9))).isoformat()
+
+
+def _avg(values: list[float]) -> float | None:
+    return round(sum(values) / len(values), 4) if values else None
+
+
+def _ret_pct(closes: list[float], n: int) -> float | None:
+    """closes[0]이 최신. n거래일전 종가 대비 수익률(%)."""
+    if len(closes) <= n or not closes[n]:
+        return None
+    return round((closes[0] / closes[n] - 1.0) * 100.0, 4)
+
+
+def fetch_macro_symbol(symbol: str, name: str, category: str) -> dict[str, Any]:
+    """yfinance에서 OHLC 히스토리를 받아 macro 시트 컬럼(Close/Ret1D~20D/MA20/MA60)을 산출."""
+    if yf is None:
+        return {"status": "DISABLED", "symbol": symbol, "name": name, "category": category}
+    try:
+        ticker = yf.Ticker(symbol)
+        hist = ticker.history(period="4mo")  # ~85 거래일 — MA60/Ret20D 계산에 충분
+        if hist is None or hist.empty:
+            return {"status": "DATA_MISSING", "symbol": symbol, "name": name, "category": category}
+        closes = list(hist["Close"].iloc[::-1])  # 최신순으로 정렬(rows[0]=최신)
+        as_of = hist.index[-1]
+        result: dict[str, Any] = {
+            "status": "OK",
+            "symbol": symbol,
+            "name": name,
+            "category": category,
+            "close": round(float(closes[0]), 4),
+            "ret1d": _ret_pct(closes, 1),
+            "ret2d": _ret_pct(closes, 2),
+            "ret5d": _ret_pct(closes, 5),
+            "ret10d": _ret_pct(closes, 10),
+            "ret20d": _ret_pct(closes, 20),
+            "ma20": _avg(closes[:20]) if len(closes) >= 20 else None,
+            "ma60": _avg(closes[:60]) if len(closes) >= 60 else None,
+            "as_of_date": as_of.strftime("%Y-%m-%dT%H:%M:%S"),
+        }
+        return result
+    except Exception as exc:  # noqa: BLE001 - per-symbol failure must not break the batch
+        return {"status": "ERROR", "symbol": symbol, "name": name, "category": category, "error": str(exc)}
+
+
+def collect_macro_to_sqlite(*, sqlite_db: Path, symbols: tuple[tuple[str, str, str], ...] = MACRO_SYMBOLS) -> dict[str, Any]:
+    run_id = uuid.uuid4().hex
+    started_at = _kst_now_iso()
+    upsert_collection_run(
+        sqlite_db,
+        CollectionRun(
+            run_id=run_id,
+            collector_name="macro_index_collection_v1",
+            started_at=started_at,
+            status="RUNNING",
+            input_source="yfinance",
+            output_db_path=str(sqlite_db),
+            notes="macro 시트 raw 수집(GAS fetchYahooOhlcMetrics 대체)",
+        ),
+    )
+
+    summary: dict[str, Any] = {
+        "formula_id": "MACRO_INDEX_COLLECTION_V1",
+        "run_id": run_id,
+        "started_at": started_at,
+        "sqlite_db": str(sqlite_db),
+        "row_count": len(symbols),
+        "errors": [],
+        "rows": [],
+    }
+
+    for symbol, name, category in symbols:
+        result = fetch_macro_symbol(symbol, name, category)
+        if result.get("status") in ("OK", "DATA_MISSING"):
+            upsert_collection_snapshot(
+                sqlite_db,
+                run_id=run_id,
+                dataset_name="macro",
+                ticker=symbol,
+                name=name,
+                sector=category,
+                as_of_date=result.get("as_of_date"),
+                source_priority="yfinance",
+                source_status=result.get("status", "UNKNOWN"),
+                payload=result,
+                provenance={"source": "yfinance", "symbol": symbol},
+            )
+            summary["rows"].append({"symbol": symbol, "name": name, "close": result.get("close"), "status": result.get("status")})
+        else:
+            error = {"symbol": symbol, "error": result.get("error", "unknown")}
+            summary["errors"].append(error)
+            append_collection_error(
+                sqlite_db,
+                run_id=run_id,
+                source_name="yfinance",
+                error_kind=result.get("status", "ERROR"),
+                error_message=str(result.get("error", "")),
+                ticker=symbol,
+                payload=result,
+            )
+
+    summary["finished_at"] = _kst_now_iso()
+    summary["status"] = "PASS" if not summary["errors"] else "PASS_WITH_WARNINGS"
+    upsert_collection_run(
+        sqlite_db,
+        CollectionRun(
+            run_id=run_id,
+            collector_name="macro_index_collection_v1",
+            started_at=started_at,
+            status=summary["status"],
+            input_source="yfinance",
+            output_db_path=str(sqlite_db),
+            notes="macro 시트 raw 수집(GAS fetchYahooOhlcMetrics 대체)",
+        ),
+        finished_at=summary["finished_at"],
+    )
+    return summary
+
+
+def main() -> int:
+    import argparse
+    import json
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--sqlite-db", type=Path, default=ROOT / "outputs" / "macro_index_collection" / "macro_index_collection.db")
+    parser.add_argument("--output-json", type=Path, default=ROOT / "Temp" / "macro_index_collection_v1.json")
+    args = parser.parse_args()
+
+    summary = collect_macro_to_sqlite(sqlite_db=args.sqlite_db)
+    args.output_json.parent.mkdir(parents=True, exist_ok=True)
+    args.output_json.write_text(json.dumps(summary, ensure_ascii=False, indent=2), encoding="utf-8")
+    print(json.dumps(summary, ensure_ascii=False, indent=2))
+    return 0 if summary["status"] in ("PASS", "PASS_WITH_WARNINGS") else 1
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
@@ -0,0 +1,83 @@
+"""data_feed 원자료 컬럼(MA/Ret/ATR/수급 5D·20D) 파생 함수 단위 테스트.
+
+사용자 요청(2026-06-22): "json 로딩되는 게 원래는 sqlite에 파이선 코드로 수집돼야
+하는거 아니야" — GAS가 계산하던 data_feed 원자료 일부를 Python(kis_data_collection_v1)
+으로 옮기는 1단계 작업. 네트워크를 사용하지 않고 순수 계산 로직만 검증한다.
+"""
+from __future__ import annotations
+
+import sys
+from pathlib import Path
+
+ROOT = Path(__file__).resolve().parents[2]
+if str(ROOT) not in sys.path:
+    sys.path.insert(0, str(ROOT))
+
+from src.quant_engine.kis_data_collection_v1 import (
+    _aggregate_flow,
+    _compute_atr20,
+    _compute_ma,
+    _compute_ret_pct,
+)
+
+
+def _price_rows(closes: list[float], highs: list[float] | None = None, lows: list[float] | None = None) -> list[dict]:
+    """closes[0]이 최신 거래일. high/low를 안 주면 close와 동일하게 채운다(ATR=0 케이스 테스트용)."""
+    highs = highs or closes
+    lows = lows or closes
+    return [{"close": c, "high": h, "low": l, "volume": 1000} for c, h, l in zip(closes, highs, lows)]
+
+
+def test_compute_ma_returns_none_when_insufficient_rows():
+    rows = _price_rows([100.0, 101.0, 102.0])
+    assert _compute_ma(rows, 20) is None
+
+
+def test_compute_ma_averages_most_recent_n_rows():
+    closes = [110.0] * 5 + [100.0] * 15
+    rows = _price_rows(closes)
+    # 최근 5거래일 평균 = 110, 20거래일 평균 = (5*110 + 15*100)/20 = 102.5
+    assert _compute_ma(rows, 5) == 110.0
+    assert _compute_ma(rows, 20) == 102.5
+
+
+def test_compute_ret_pct_against_n_days_ago_close():
+    closes = [110.0, 109, 108, 107, 106, 100.0]
+    rows = _price_rows(closes)
+    # 최신(110) vs 5거래일전(100) → (110/100 - 1) * 100 = 10%
+    assert _compute_ret_pct(rows, 5) == 10.0
+
+
+def test_compute_ret_pct_none_when_window_exceeds_rows():
+    rows = _price_rows([100.0, 99.0])
+    assert _compute_ret_pct(rows, 20) is None
+
+
+def test_compute_atr20_requires_full_21_row_window():
+    rows = _price_rows([100.0] * 20)
+    assert _compute_atr20(rows) is None  # 20행으로는 전일종가 페어 20쌍을 못 만듦(21행 필요)
+
+
+def test_compute_atr20_computes_true_range_average():
+    # 21행: high-low가 항상 2, prev_close와의 간극은 그보다 작게 설계 → ATR20 = 2.0
+    closes = [100.0 + i * 0.1 for i in range(21)]
+    highs = [c + 1.0 for c in closes]
+    lows = [c - 1.0 for c in closes]
+    rows = _price_rows(closes, highs, lows)
+    atr = _compute_atr20(rows)
+    assert atr is not None
+    assert abs(atr - 2.0) < 0.5
+
+
+def test_aggregate_flow_sums_recent_window():
+    rows = [{"frgn_net": 100, "inst_net": -50}] * 5 + [{"frgn_net": 1000, "inst_net": 1000}] * 15
+    frg5, inst5 = _aggregate_flow(rows, 5)
+    assert frg5 == 500
+    assert inst5 == -250
+
+
+def test_aggregate_flow_none_when_window_exceeds_rows():
+    rows = [{"frgn_net": 10, "inst_net": 10}] * 3
+    frg, inst = _aggregate_flow(rows, 20)
+    assert frg is None
+    assert inst is None
@@ -0,0 +1,37 @@
+"""macro 인덱스 파생 계산(ret_pct/avg) 단위 테스트 — 네트워크 미사용."""
+from __future__ import annotations
+
+import sys
+from pathlib import Path
+
+ROOT = Path(__file__).resolve().parents[2]
+if str(ROOT) not in sys.path:
+    sys.path.insert(0, str(ROOT))
+
+from src.quant_engine.macro_index_collection_v1 import MACRO_SYMBOLS, _avg, _ret_pct
+
+
+def test_macro_symbols_cover_thirteen_raw_instruments():
+    assert len(MACRO_SYMBOLS) == 13
+    symbols = {s for s, _, _ in MACRO_SYMBOLS}
+    assert "^KS11" in symbols  # KOSPI
+    assert "HYG" in symbols
+    # "Computed" 카테고리(MRS_COMPUTED 등)는 의도적으로 포함하지 않는다.
+    assert "MRS_COMPUTED" not in symbols
+
+
+def test_ret_pct_against_n_days_ago():
+    closes = [110.0, 108, 107, 106, 105, 100.0]
+    assert _ret_pct(closes, 5) == 10.0
+
+
+def test_ret_pct_none_when_window_exceeds_length():
+    assert _ret_pct([100.0, 99.0], 20) is None
+
+
+def test_avg_returns_none_for_empty_list():
+    assert _avg([]) is None
+
+
+def test_avg_computes_mean():
+    assert _avg([10.0, 20.0, 30.0]) == 20.0