Tag: llm

999 articles · Page 5 / 42

A paper on combining RLVR with human demonstrations to train style, structure, and diversity beyond verifiable rewards.

hardware

CommunityJul 3, 20262026-07-03

How To Compare Code Models Beyond Benchmark Scores

Code model evaluation should weigh real task success, retries, latency, and token cost, not benchmark scores alone.

agi

SourceJul 3, 20262026-07-03

Conditional Co-Ablation Reveals Hidden Backup Transformer Circuits

How CoAx exposes backup circuits that single ablation can miss due to self-repair in transformers.

llm

SourceJul 3, 20262026-07-03

Context Governance for Verifiable AI Agent Knowledge

How ContextNest frames context governance with a verifiable knowledge vault layer for auditable AI agents beyond retrieval quality.

hardware

SourceJul 3, 20262026-07-03

DiscoLoop Tests Internal Multi-Hop Reasoning In One Pass

DiscoLoop explores multi-hop reasoning inside a single forward pass without relying on long external CoT tokens.

hardware

SourceJul 3, 20262026-07-03

Can Multimodal AI Improve Rail Crossing Safety Assessment

Examines whether combining rail crossing images with accident records improves safety assessment and what validation matters.

hardware

SourceJul 3, 20262026-07-03

OCB Tests Native Office Understanding Beyond PDF QA

OCB evaluates native Office file understanding, revealing document AI limits beyond PDF-based QA.

hardware

CommunityJul 3, 20262026-07-03

Why Foundational Learning Still Matters in the AI Era

AI can boost productivity but also amplify errors, making foundational learning essential for problem framing, verification, and judgment.

k-ai-pulse

RoundupJul 2, 20262026-07-02

AI Resource Roundup (24h) - 2026-07-02

A curated link roundup from recently collected official updates and tech news.

agi

SourceJul 2, 20262026-07-02

Bug Reproduction Tests as Signals for Code Agents

How code agents can use bug reproduction tests as diagnostic signals during patch generation, not just post-hoc checks.

llm

SourceJul 2, 20262026-07-02

DART-VLN Improves Discrete VLN Without Retraining at Test Time

DART-VLN targets stale memory reads and local backtracking in discrete VLN using training-free test-time control.

hardware

SourceJul 2, 20262026-07-02

Dynamic 3D Reconstruction from Monocular Video with Generative Priors

A method for building dynamic 3D Gaussians from monocular video and correcting reconstruction gaps with a conditional video model.

llm

SourceJul 2, 20262026-07-02

Interpreting RAG Retrieval With Sparse Autoencoder Features

Explores using sparse autoencoders to disentangle dense RAG embeddings for interpretable retrieval analysis and steering.

agi

SourceJul 2, 20262026-07-02

Latent Space Control for Trustworthy LLM Behavior

From steering vectors to model calibrators, this paper frames latent-space intervention as a path to better LLM control and trust.

agi

CommunityJul 2, 20262026-07-02

Which Jobs Are Safer: Office or Skilled Trades?

Official data on AI and automation exposure compares office jobs and skilled trades by task structure and employment outlook.

hardware

SourceJul 2, 20262026-07-02

On-Device AI Security Across App, Model, and OS

A look at the main security risks in mobile on-device AI, focusing on attack surfaces across apps, models, and OS.

hardware

CommunityJul 2, 20262026-07-02

Public AI Infrastructure: Distributed Access or Concentrated Scale

Examines distributed vs. concentrated public AI compute strategies and what they mean for sovereign AI capacity.

hardware

CommunityJul 2, 20262026-07-02

How Safety Holds Up Across Long AI Conversations

Examines whether AI safety remains consistent in long conversations and highlights gaps in session-level evaluation.

hardware

SourceJul 2, 20262026-07-02

Scaling Thermodynamic AI With Backprop and Gibbs Sampling

How Ising-based thermodynamic computing may scale training, with focus on sampling costs and hardware limits.

hardware

SourceJul 2, 20262026-07-02

Staleness and Learning Rates in Asynchronous RLHF

Examines how stale rollouts and learning rates affect stability in asynchronous RLHF, with practical signals like staleness and ESS.

agi

CommunityJul 1, 20262026-07-01

AI Employment Narrative Shifts From Loss to Redesign

Examines whether AI eliminates jobs or redesigns tasks, and why this shift matters for hiring, reskilling, and productivity.

k-ai-pulse

RoundupJul 1, 20262026-07-01

AI Resource Roundup (24h) - 2026-07-01

A curated link roundup from recently collected official updates and tech news.

llm

SourceJul 1, 20262026-07-01

Design Axes for Agentic Orchestration in Enterprises

A practical guide to balancing agent autonomy, traceability, and control in enterprise orchestration design.

hardware

SourceJul 1, 20262026-07-01

Emergent Misalignment Depends on More Than Training Data

EM may depend on optimizers and batch settings, making finetuning recipes part of safety evaluation, not just data.

Aionda

Tag: llm

Combining RLVR and Human Demonstrations for Better LMs

How To Compare Code Models Beyond Benchmark Scores

Conditional Co-Ablation Reveals Hidden Backup Transformer Circuits

Context Governance for Verifiable AI Agent Knowledge

DiscoLoop Tests Internal Multi-Hop Reasoning In One Pass

Can Multimodal AI Improve Rail Crossing Safety Assessment

OCB Tests Native Office Understanding Beyond PDF QA

Why Foundational Learning Still Matters in the AI Era

AI Resource Roundup (24h) - 2026-07-02

Bug Reproduction Tests as Signals for Code Agents

DART-VLN Improves Discrete VLN Without Retraining at Test Time

Dynamic 3D Reconstruction from Monocular Video with Generative Priors

Interpreting RAG Retrieval With Sparse Autoencoder Features

Latent Space Control for Trustworthy LLM Behavior

Which Jobs Are Safer: Office or Skilled Trades?

On-Device AI Security Across App, Model, and OS

Public AI Infrastructure: Distributed Access or Concentrated Scale

How Safety Holds Up Across Long AI Conversations

Scaling Thermodynamic AI With Backprop and Gibbs Sampling

Staleness and Learning Rates in Asynchronous RLHF

AI Employment Narrative Shifts From Loss to Redesign

AI Resource Roundup (24h) - 2026-07-01

Design Axes for Agentic Orchestration in Enterprises

Emergent Misalignment Depends on More Than Training Data