登录 注册

Identifying Latent Actions and Dynamics from Offline 数据 (Data) via Demonstrator Diversity
Identifying Latent Actions and Dynamics from Offline Data via Demonstrator Diversity

🔗 访问原文
🔗 Access Paper

📝 摘要
Abstract

Can latent actions and environment dynamics be recovered from offline trajectories when actions are never observed? We study this question in a setting where trajectories are action-free but tagged with demonstrator identity. We assume that each demonstrator follows a distinct policy, while the environment dynamics are shared across demonstrators and identity affects the next observation only through the chosen action. Under these assumptions, the conditional next-observation distribution $p(o_{t+1}\mid o_t,e)$ is a mixture of latent action-conditioned transition kernels with demonstrator-specific mixing weights. We show that this induces, for each state, a column-stochastic nonnegative matrix factorization of the observable conditional distribution. Using sufficiently scattered policy diversity and rank conditions, we prove that the latent transitions and demonstrator policies are identifiable up to permutation of the latent action labels. We extend the result to continuous observation spaces via a Gram-determinant minimum-volume criterion, and show that continuity of the transition map over a connected state space upgrades local permutation ambiguities to a single global permutation. A small amount of labeled action data then suffices to fix this final ambiguity. These results establish demonstrator diversity as a principled source of identifiability for learning latent actions and dynamics from offline RL data.

📊 文章统计
Article Statistics

基础数据
Basic Stats

247 浏览
Views
0 下载
Downloads
26 引用
Citations

引用趋势
Citation Trend

阅读国家分布
Country Distribution

阅读机构分布
Institution Distribution

月度浏览趋势
Monthly Views

相关关键词
Related Keywords

影响因子分析
Impact Analysis

6.70 综合评分
Overall Score
引用影响力
Citation Impact
浏览热度
View Popularity
下载频次
Download Frequency

📄 相关文章
Related Articles