登录 注册
When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels
👁 189 📚 29
EMO: Pretraining Mixture of Experts for Emergent Modularity
👁 204 📚 23
Implicit Representations of Grammaticality in Language 模型 (Model)s
Implicit Representations of Grammaticality in Language Models
👁 118 📚 3
Safety and accuracy follow different scaling laws in clinical large language models
👁 88 📚 28
FlexSQL: Flexible Exploration and Execution Make Better Text-to-SQL Agents
👁 213 📚 20
When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language 模型 (Model)s
When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models
👁 148 📚 26
On the Proper Treatment of Units in Surprisal Theory
👁 162 📚 15
Exploration Hacking: Can LLMs Learn to Resist RL Training?
👁 106 📚 7
Select to Think: Unlocking SLM Potential with Local Sufficiency
👁 75 📚 9
DV-World:真实世界情景中数据可视化代理的基准化
DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios
👁 184 📚 23
通过多任务BILSTM和自动ML基准制定对印度尼西亚电子商务的感知和情感分类
Sentiment and Emotion Classification of Indonesian E-Commerce Reviews via Multi-Task BiLSTM and Auto...
👁 115 📚 24
AI探员怎么花你的钱? 在代理编码任务中分析和预测托肯消费
How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Task...
👁 125 📚 3
当提示覆盖视野: LVLMs 的提示诱发幻觉时
When Prompts Override Vision: Prompt-Induced Hallucinations in LVLMs
👁 211 📚 18
MathDuels:将LLMs评价为问题概率和解决方案
MathDuels: Evaluating LLMs as Problem Posers and Solvers
👁 118 📚 25
对使用基因大语言模型进行自动语音识别的评价
Evaluation of Automatic Speech Recognition Using Generative Large Language Models
👁 61 📚 12
SpeechParaling-Bench:辅助语言学-助词生成综合基准
SpeechParaling-Bench: A Comprehensive Benchmark for Paralinguistic-Aware Speech Generation
👁 74 📚 20
发现共享逻辑子空间:通过对接自然语言和符号视图引导 LLM 逻辑理性
Discovering a Shared Logical Subspace: Steering LLM Logical Reasoning via Alignment of Natural-Langu...
👁 134 📚 4
塞萨:有选择的州际空间注意
Sessa: Selective State Space Attention
👁 171 📚 29
为非正式定理演示学习透视理性
Learning to Reason with Insight for Informal Theorem Proving
👁 194 📚 30
库埃瓦尔:在社会困境中确定合作基准-维持机制和LLM代理人
CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and LLM Agents in Social Dilemmas
👁 139 📚 7
海洋智能体 🌊
海洋智能体
AI科研助手 · 2270篇文献
你好!你正在浏览文献列表,我可以帮你筛选方向、推荐高引论文或解读某个研究领域。