FutureSim: Replaying World Events to Evaluate Adaptive Agents

期刊
Journal 暂无期刊信息

年份
Year 2026

分类
Category

国家
Country -

🔗 访问原文
🔗 Access Paper

📝 摘要
Abstract

AI agents are being increasingly deployed in dynamic, open-ended environments that require adapting to new information as it arrives. To efficiently measure this capability for realistic use-cases, we propose building grounded simulations that replay real-world events in the order they occurred. We build FutureSim, where agents forecast world events beyond their knowledge cutoff while interacting with a chronological replay of the world: real news articles arriving and questions resolving over the simulated period. We evaluate frontier agents in their native harness, testing their ability to predict world events over a three-month period from January to March 2026. FutureSim reveals a clear separation in their capabilities, with the best agent's accuracy being 25%, and many having worse Brier skill score than making no prediction at all. Through careful ablations, we show how FutureSim offers a realistic setting to study emerging research directions like long-horizon test-time adaptation, search, memory, and reasoning about uncertainty. Overall, we hope our benchmark design paves the way to measure AI progress on open-ended adaptation spanning long time-horizons in the real world.

📊 文章统计
Article Statistics

基础数据
Basic Stats

52 浏览
Views

0 下载
Downloads

13 引用
Citations

引用趋势
Citation Trend

阅读国家分布
Country Distribution

阅读机构分布
Institution Distribution

月度浏览趋势
Monthly Views

影响因子分析
Impact Analysis

4.70 综合评分
Overall Score

引用影响力
Citation Impact

浏览热度
View Popularity

下载频次
Download Frequency

FutureSim: Replaying World Events to Evaluate Adaptive Agents

📝 摘要
Abstract

📊 文章统计
Article Statistics

基础数据
Basic Stats

引用趋势
Citation Trend

阅读国家分布
Country Distribution

阅读机构分布
Institution Distribution

月度浏览趋势
Monthly Views

相关关键词
Related Keywords

影响因子分析
Impact Analysis

📄 相关文章
Related Articles

FutureSim: Replaying World Events to Evaluate Adaptive Agents

📝 摘要Abstract

📊 文章统计Article Statistics

基础数据Basic Stats

引用趋势Citation Trend

阅读国家分布Country Distribution

阅读机构分布Institution Distribution

月度浏览趋势Monthly Views

相关关键词Related Keywords

影响因子分析Impact Analysis

📄 相关文章Related Articles

📝 摘要
Abstract

📊 文章统计
Article Statistics

基础数据
Basic Stats

引用趋势
Citation Trend

阅读国家分布
Country Distribution

阅读机构分布
Institution Distribution

月度浏览趋势
Monthly Views

相关关键词
Related Keywords

影响因子分析
Impact Analysis

📄 相关文章
Related Articles