CausalRM: Causal-Theoretic Reward 模型 (Model)ing for RLHF from Observational User Feedbacks
CausalRM: Causal-Theoretic Reward Modeling for RLHF from Observational User Feedbacks
作者
Authors:
Hao Wang, Licheng Pan, Zhichao Chen, Chunyuan Zhen...
期刊
Journal:
-
年份
Year:
2026
分类
Category: