ICML26 Memarena Agent Memory Benchmark 一句话总结:这篇文章揭示了当前 LLM 智能体在处理长期、复杂交互任务时记忆能力的不足,并提供了一个更具挑战性的评估平台来推动该领域的进步 已有benchmark的特点 Large language model (LLM) agents have two complementary core capabilities: the ability to memorize task-relevant Academic haibin 12小时前 14 Views 0 Comments