initiated policy improvement

govin08 · govin08 · commit bc757a146ba1 · 2025-10-13T23:28:37.000+09:00
diff --git a/_posts/2025-09-18-policy_evaluation.md b/_posts/2025-09-18-policy_evaluation.md
@@ -2,7 +2,7 @@
 layout: single
 title: "(Sutton, 4.1절) Policy Evaluation"
 categories: machine-learning
-tags: [reinforcement learing, Bellman operator, contraction principle, operator norm]
+tags: [reinforcement learing, dynamic programming, Bellman operator, contraction principle, operator norm]
 use_math: true
 published: true
 author_profile: false
diff --git a/_posts/2025-10-13-policy_improvement.md b/_posts/2025-10-13-policy_improvement.md
@@ -0,0 +1,23 @@
+---
+layout: single
+title: "(Sutton, 4.2절) Policy Improvement"
+categories: machine-learning
+tags: [reinforcement learing, dynamic programming, policy improvement theorem, policy iteration]
+use_math: true
+published: true
+author_profile: false
+toc: true
+---
+
+더 시간이 지나기 전에 Dynamic programming 포스팅을 끝내고 싶은 마음이 생겼다.
+다른 주제들도 공부하여 포스팅을 남기고 싶은데 DP를 끝내지 않고 다른 것을 쓰기는 싫기 때문이다.
+그러니까 일종의 의무감에서 이 글을 쓰고 있다.
+당장 이전부터 PCA와 PLS에 대해 공부하고 싶었고 얼마 전에는 game theory나 control theory에 손을 댈까도 생각했었는데, 오늘은 MPC와 LQR을 배워야 할 필요가 생긴 것이다.
+그러니 DP는 빠르게 공부하여 치워버리자.
+
+그리고 사실 글을 쓸 준비가 되어있다고 생각한다.
+[이전 글](https://govin08.github.io/machine-learning/policy_evaluation/)을 쓰고 나서 간간이 4.2절을 보았고 어느 정도 이해는 했던 터였다.
+
+
+## 4.2 Policy Improvement
+