🤖 AI 每日资讯 LLM 翻译

全球 AI 技术动态 · 每日更新 · 智能翻译

📅 2026 年 04 月 15 日 Wednesday
📱 Reddit ML: 10 📱 Reddit AI: 10 💻 GitHub: 5 📚 arXiv: 10 💬 HN: 1 🤗 HF: 0 总计:36 条

📱 Reddit r/MachineLearning

10 条
1. ICML的AC指南是什么? (或: ICML qq线程) [D]
What is the AC guidance for ICML? (Or: ICML qq thread) [D]
Reddit by /u/WhiteBear2018 🕐 2026-04-14
2. 2000多万份具有引用图和矢量嵌入的印度法律文件–法律NLP的潜在用途? [D]
20M+ Indian legal documents with citation graphs and vector embeddings – potential uses for legal NLP? [D]
Reddit by /u/zriyansh 🕐 2026-04-14
3. “我不知道!" :教导神经网络放弃HALO损失。[R]
"I don't know!": Teaching neural networks to abstain with the HALO-Loss. [R]
Reddit by /u/4rtemi5 🕐 2026-04-14
4. 我从头开始将纯粹的尖峰神经网络( SNN )扩展到1.088B参数。超出预算,但以下是我找到的[R]
I scaled a pure Spiking Neural Network (SNN) to 1.088B parameters from scratch. Ran out of budget, but here is what I found [R]
Reddit by /u/zemondza 🕐 2026-04-13
5. 更深入地思考,而不是更长时间:组合推广的深度回流变压器[R]
Thinking Deeper, Not Longer: Depth-Recurrent Transformers for Compositional Generalization [R]
Reddit by /u/marojejian 🕐 2026-04-13
6. [N] AMA公告: Max Welling ( VAE、GNN、AI4Science和CuspAI )
[N] AMA Announcement: Max Welling (VAEs, GNNs, AI4Science & CuspAI)
Reddit by /u/Benlus 🕐 2026-04-13
7. TurboOCR : 270–1200 img/s OCR ,带Paddle + TensorRT ( C + +/CUDA , FP16 ) [P]
TurboOCR: 270–1200 img/s OCR with Paddle + TensorRT (C++/CUDA, FP16) [P]
Reddit by /u/Civil-Image5411 🕐 2026-04-13
8. 您认为目前哪个会议/期刊的审查流程最公平、最准确? [D]
Which conference/journal do you believe currently has the most fair and accurate review process?[D]
Reddit by /u/kostaspap90 🕐 2026-04-13
9. [D]自我推销主题
[D] Self-Promotion Thread
Reddit by /u/AutoModerator 🕐 2026-04-02
10. [D]每月谁在招聘,谁想被招聘?
[D] Monthly Who's Hiring and Who wants to be Hired?
Reddit by /u/AutoModerator 🕐 2026-03-31

📱 Reddit r/artificial

10 条
1. 如果人工智能变得足够聪明,每次都能像人类一样通过,那么人类是否还是重要的?
If AI gets smart enough to pass as human every time does being human even matter anymore?
Reddit by /u/iLiveForTruth 🕐 2026-04-14
2. 英伟达推出用于量子纠错和校准的Ising AI模型
Nvidia unveils Ising AI models for quantum error correction and calibration
Reddit by /u/tekz 🕐 2026-04-14
3. 人工智能可能会让我们思考和写作更加相似,微软有多少产品被命名为“Copilot” ?以及来自Hacker News的许多其他链接
AI may be making us think and write more alike, How many products does Microsoft have named 'Copilot'? and many other links from Hacker News
Reddit by /u/alexeestec 🕐 2026-04-14
4. “对隐私的严重威胁” Meta就智能眼镜的计划面部识别发出75个组织的警告
"A serious threat to privacy" Meta issued warning by 75 orgs over planned facial recognition in smart glasses
Reddit by /u/Tiny-Independent273 🕐 2026-04-14
5. openclaw ai agent与仅使用chatgpt
openclaw ai agent vs just using chatgpt
Reddit by /u/sychophantt 🕐 2026-04-14
6. MYTHOS SI通过递归观察(非模式匹配)在FFmpeg中发现新的漏洞类
MYTHOS SI Discovers New Vulnerability Class in FFmpeg Through Recursive Observation (Not Pattern Matching)
Reddit by /u/MarsR0ver_ 🕐 2026-04-14
7. LLM为什么不在谈话中跟踪时间?
Why don't LLMs track time in their conversations?
Reddit by /u/PolyViews 🕐 2026-04-14
8. 我建立了一个全天候的YouTube流,人工智能每隔几分钟就写一首新歌
I built a 24/7 YouTube stream where AI writes a new song every few minutes about what time it is
Reddit by /u/mmp7700 🕐 2026-04-13
9. Claude与ChatGPT的路径相同。我测量了它。
Claude is on the same path as ChatGPT. I measured it.
Reddit by /u/TheArchitectAutopsy 🕐 2026-04-13
10. 纽约市医院将停止与Palantir共享患者的私人健康数据
NYC hospitals will stop sharing patients' private health data with Palantir
Reddit by /u/Goldenmentis 🕐 2026-04-13

💻 GitHub Trending AI

5 个
State-of-the-art ML for PyTorch, TensorFlow, JAX
GitHub ⭐ 120k+
Building applications with LLMs through composability
GitHub ⭐ 85k+
Data framework for LLM applications
GitHub ⭐ 30k+
Enabling Next-Gen LLM Applications via Multi-Agent Conversation
GitHub ⭐ 28k+
High-throughput LLM serving
GitHub ⭐ 25k+

📚 arXiv CS.AI

10 篇
自主离网光伏系统的稳定运行依赖于尊重大气热力学的太阳能预测算法。 当代深度学习模型始终表现出严重的异常,主要是云瞬变期间的严重时间相位滞后和物理上不可能的夜间发电。 To resolve this divergence between data-driven modeling and deterministic celestial mechanics, this research introduces the Thermodynamic Liquid Manifold Network. The proposed methodology projects 15 meteorological and geometric variables into a Koopman-linearized Riemannian manifold to systematically map complex climatic dynamics. The architecture integrates a Spectral Calibration unit and a multiplicative Thermodynamic Alpha-Gate. This system synthesizes real-time atmospheric opacity with theoretical clear-sky boundary models, structurally enforcing strict celestial geometry compliance. This completely neutralizes phantom nocturnal generation while maintaining zero-lag synchronization during rapid weather shifts. Validated against a rigorous five-year testing horizon in a severe semi-arid climate, the framework achieves an RMSE of 18.31 Wh/m2 and a Pearson correlation of 0.988. The model strictly maintains a zero-magnitude nocturnal error across all 1826 testing days and exhibits a sub-30-minute phase response during high-frequency transients. Comprising exactly 63,458 trainable parameters, this ultra-lightweight design establishes a robust, thermodynamically consistent standard for edge-deployable microgrid controllers.
arXiv 👥 Mohammed Ezzaldin Babiker Abdullah
To identify 安全性 violations, auditors 通常 搜索 over large sets of 智能体 traces. This 搜索 is difficult because failures 是 通常 rare, 复杂的, 和 sometimes 即使 adversarially hidden 和 仅 detectable when multiple traces 是 analyzed together. 这些 challenges arise 在 diverse settings 例如 如 misuse campaigns, covert sabotage, reward hacking, 和 提示词 injection. Existing approaches struggle here 为了 several reasons. Per-trace judges miss failures 那 仅 become visible 跨越 traces, naive agentic auditing does not 规模 to large trace collections, 和 fixed monitors 是 brittle to unanticipated behaviors. 我们 引入 Meerkat, which combines 聚类 with agentic 搜索 to uncover violations specified 在 natural language. Through structured 搜索 和 自适应 investigation of promising regions, Meerkat finds sparse failures without relying on 种子 场景, fixed workflows, or exhaustive enumeration. 跨越 misuse, misalignment, 和 任务 gaming settings, Meerkat significantly improves 检测 of 安全性 violations over baseline monitors, discovers widespread developer cheating on a top 智能体 基准测试, 和 finds nearly 4x more examples of reward hacking on CyBench than previous audits.
arXiv 👥 Adam Stein, Davis Brown, Hamed Hassani et al.
我们 have witnessed 显著 advances 在 LLM 推理 能力 with the advent of DeepSeek-R1. 然而, much of this 进展 has been fueled by the abundance of internet question-answer (QA) pairs, a major bottleneck going forward, since 例如 data is limited 在 规模 和 concentrated mainly 在 domains like 数学. 在 对比, other sciences 例如 如 物理 lack large-规模 QA datasets to effectively train 推理-capable models. 在 this work, 我们 show 那 物理 simulators can serve 如 a powerful alternative source of supervision 为了 训练 LLMs 为了 physical 推理. 我们 generate random scenes 在 物理 engines, create synthetic question-answer pairs 从 simulated interactions, 和 train LLMs using reinforcement learning on this synthetic data. Our models exhibit zero-shot sim-to-real transfer to 现实世界 物理 基准: 为了 示例, 训练 solely on synthetic simulated data improves 表现 on IPhO (International 物理 Olympiad) 问题 by 5-10 percentage points 跨越 模型 sizes. 这些 结果 demonstrate 那 物理 simulators can act 如 scalable data generators, enabling LLMs to acquire deep physical 推理 技能 超越 the limitations of internet-规模 QA data. 代码 available at: https://sim2reason.github.io/.
arXiv 👥 Mihir Prabhudesai, Aryan Satpathy, Yangmin Li et al.
Accurate delineation of the Clinical Target Volume (CTV) is essential 为了 radiotherapy planning, yet 仍然 时间-consuming 和 difficult to 评估, especially 为了 复杂的 treatments 例如 如 Total Marrow 和 Lymph Node Irradiation (TMLI). While 深度学习-based auto-分割 can reduce workload, safe clinical 部署 requires 可靠 cues indicating where models may be wrong. 在 this work, 我们 propose a 预算-aware uncertainty-driven 质量 assurance (QA) 框架 built on nnU-Net, combining uncertainty quantification 和 post-hoc calibration to produce voxel-wise uncertainty maps (based on predictive entropy) 那 can guide targeted manual 综述. 我们 compare temperature scaling (TS), deep ensembles (DE), checkpoint ensembles (CE), 和 test-时间 augmentation (TTA), evaluated 两者 individually 和 在 combination on TMLI 如 a representative use case. 可靠性 is assessed through ROI-masked calibration metrics 和 uncertainty--误差 对齐 under realistic revision 约束, summarized 如 AUC over the top 0-5% most uncertain voxels. 跨越 configurations, 分割 准确率 仍然 stable, whereas TS substantially improves calibration. Uncertainty-误差 对齐 improves most with calibrated checkpoint-based 推理, 领先 to uncertainty maps 那 highlight more consistently regions requiring manual edits. Overall, integrating calibration with 高效 ensembling seems a promising 策略 to implement a 预算-aware QA 工作流 为了 radiotherapy 分割.
arXiv 👥 Ricardo Coimbra Brioso, Lorenzo Mondo, Damiano Dei et al.
Recently, large language models (LLMs) 是 capable of generating highly fluent textual content. While they offer 显著 convenience to humans, they also 引入 various risks, like phishing 和 学术 dishonesty. Numerous 研究 efforts have been dedicated to developing algorithms 为了 detecting AI-generated 文本 和 constructing relevant datasets. 然而, 在 the 领域 of Chinese corpora, challenges remain, including limited 模型 多样性 和 data homogeneity. To 解决 这些 issues, 我们 propose C-ReD: a comprehensive Chinese Real-提示词 AI-generated 检测 基准测试. Experiments demonstrate 那 C-ReD not 仅 enables 可靠 在-领域 检测 but also supports strong generalization to unseen LLMs 和 external Chinese datasets-addressing critical gaps 在 模型 多样性, 领域 coverage, 和 提示词 realism 那 have limited prior Chinese 检测 基准. 我们 release our resources at https://github.com/HeraldofLight/C-ReD.
arXiv 👥 Chenxi Qing, Junxi Wu, Zheng Liu et al.
推理 has become a central 能力 在 large language models. Recent 研究 has shown 那 推理 表现 can be improved by looping an LLM's layers 在 the latent 维度, resulting 在 looped 推理 language models. Despite promising 结果, few works have investigated how their internal dynamics differ 从 those of 标准 feedforward models. 在 this 论文, 我们 conduct a mechanistic 分析 of the latent states 在 looped language models, focusing 在 particular on how the stages of 推理 observed 在 feedforward models compare to those observed 在 looped ones. To this end, 我们 analyze cyclic recurrence 和 show 那 为了 many of the studied models each 层 在 the cycle converges to a distinct fixed point; consequently, the recurrent block follows a consistent cyclic trajectory 在 the latent space. 我们 provide evidence 那 如 这些 fixed points 是 reached, attention-head behavior stabilizes, 领先 to constant behavior 跨越 recurrences. Empirically, 我们 discover 那 recurrent blocks learn stages of 推理 那 closely mirror those of feedforward models, repeating 这些 stages 在 depth with each iteration. 我们 研究 how recurrent block 大小, input injection, 和 normalization influence the emergence 和 stability of 这些 cyclic fixed points. 我们 believe 这些 findings help translate mechanistic insights into practical guidance 为了 architectural 设计.
arXiv 👥 Hugh Blayney, Álvaro Arroyo, Johan Obando-Ceron et al.
工具-augmented 大型语言模型 (LLM) agents have 展示 impressive 能力 在 automating 复杂的, multi-step 现实世界 任务, yet remain vulnerable to indirect 提示词 injection. Adversaries exploit this weakness by embedding malicious instructions within 工具-returned content, which agents directly incorporate into their conversation history 如 trusted observations. This vulnerability manifests 跨越 three primary attack channels: web 和 local content injection, MCP 服务器 injection, 和 技能 file injection. To 解决 这些 vulnerabilities, 我们 引入 \textsc{ClawGuard}, a 新颖的 runtime 安全 框架 那 enforces a 用户-confirmed rule set at every 工具-call boundary, transforming unreliable 对齐-依赖 defense into a deterministic, auditable mechanism 那 intercepts adversarial 工具 calls before any 现实世界 effect is produced. By automatically deriving 任务-特定 access 约束 从 the 用户's stated objective prior to any external 工具 invocation, \textsc{ClawGuard} blocks all three injection pathways without 模型 modification or infrastructure change. Experiments 跨越 five 最先进的 language models on AgentDojo, SkillInject, 和 MCPSafeBench demonstrate 那 \textsc{ClawGuard} 达到 鲁棒 protection against indirect 提示词 injection without compromising 智能体 utility. This work establishes deterministic 工具-call boundary enforcement 如 an 有效 defense mechanism 为了 安全 agentic AI systems, requiring neither 安全性-特定 微调 nor architectural modification. 代码 is publicly available at https://github.com/Claw-Guard/ClawGuard.
arXiv 👥 Wei Zhao, Zhe Li, Peixin Zhang et al.
Modeling open-play soccer tactics is a formidable challenge due to the stochastic, multi-agent nature of the game. Existing computational approaches typically produce single, deterministic trajectory forecasts or focus on highly structured set-pieces, fundamentally failing to capture the inherent variance and branching possibilities of real-world match evolution. Here, we introduce GenTac, a diffusion-based generative framework that conceptualizes soccer tactics as a stochastic process over continuous multi-player trajectories and discrete semantic events. By learning the underlying distribution of player movements from historical tracking data, GenTac samples diverse, plausible, long-horizon future trajectories. The framework supports rich contextual conditioning, including opponent behavior, specific team or league playing styles, and strategic objectives, while grounding continuous spatial dynamics into a 15-class tactical event space. Finally, we demonstrate that GenTac can be successfully trained to generalize to other dynamic team sports, including basketball, American football, and ice hockey.
arXiv 👥 Jiayuan Rao, Tianlin Gui, Haoning Wu et al.
GUI agents drive 应用 through their visual interfaces instead of programmatic APIs, interacting with arbitrary software via taps, swipes, 和 keystrokes, reaching a long tail of 应用 那 CLI-based agents cannot. Yet 进展 在 this area is bottlenecked 较少 by modeling 容量 than by the absence of a coherent full-stack infrastructure: 在线 RL 训练 suffers 从 environment instability 和 closed pipelines, 评估 protocols drift silently 跨越 works, 和 trained agents rarely reach real users on real devices. 我们 present \textbf{ClawGUI}, an 开源 框架 addressing 这些 three gaps within a single harness. \textbf{ClawGUI-RL} provides the first 开源 GUI 智能体 RL infrastructure with validated support 为了 两者 并行 virtual environments 和 real physical devices, integrating GiGPO with a 过程 Reward 模型 为了 dense step-水平 supervision. \textbf{ClawGUI-Eval} enforces a fully standardized 评估 流水线 跨越 6 基准 和 11+ models, achieving 95.8\% reproduction against official baselines. \textbf{ClawGUI-智能体} brings trained agents to Android, HarmonyOS, 和 iOS through 12+ chat platforms with hybrid CLI-GUI 控制 和 persistent personalized memory. Trained end to end within this 流水线, \textbf{ClawGUI-2B} 达到 17.1\% Success Rate on MobileWorld GUI-仅, outperforming the same-规模 MAI-UI-2B baseline by 6.0\%.
arXiv 👥 Fei Tang, Zhiqiong Lu, Boxuan Zhang et al.
当代 large language models (LLMs) have 展示 显著 推理 能力, 尤其 在 专业的 domains like 数学 和 物理. 然而, their 能力 to 泛化 这些 推理 技能 to more 通用的 和 更广泛 上下文--通常 称为 通用的 推理--仍然 未充分探索. 不同于 领域-特定 推理, 通用的 推理 依赖 较少 on 专家 知识 but 仍然 呈现 艰巨 推理 challenges, 例如 如 复杂的 约束, 嵌套的 逻辑的 分支, 和 语义的 干扰. To 解决 this 空白, 我们 引入 General365, a 基准测试 专门 设计 to 评估 通用的 推理 在 LLMs. By 限制 背景 知识 to a K-12 水平, General365 明确 解耦 推理 从 专业的 专业知识. The 基准测试 包含 365 种子 问题 和 1,095 变体 问题 跨越 八个 类别, 确保 两者 高 难度 和 多样性. 评估 跨越 26 领先 LLMs 揭示 那 即使 the 表现最好的 模型 达到 仅 62.8% 准确率, 在 鲜明 对比 to the 近乎完美 表现 of LLMs 在 数学 和 物理 基准. 这些 结果 表明 那 the 推理 abilities of 当前 LLMs 是 严重 依赖于领域, 留下 显著 空间 为了 改进 在 更广泛 应用. 我们 设想 General365 如 a 催化剂 为了 推进 LLM 推理 超越 领域-特定 任务 迈向 鲁棒, 通用的-用途 现实世界 场景. 代码, 数据集, 和 排行榜: https://general365.github.io
arXiv 👥 Junlin Liu, Shengnan An, Shuang Zhou et al.

💬 Hacker News

1 条

🤗 Hugging Face

0 篇
暂无数据