Tag: llm

999 articles · Page 8 / 42

How trustworthy is AI-run psychology automation? Focus on theory coding, data quality control, and replication limits.

CommunityJun 25, 20262026-06-25

Can 3D Layout Plus AI Improve Animation Stability

Examines whether fixing 3D layout and pose before AI stylization improves animation stability, despite flicker and edit costs.

k-ai-pulse

RoundupJun 25, 20262026-06-25

AI Resource Roundup (24h) - 2026-06-25

A curated link roundup from recently collected official updates and tech news.

hardware

SourceJun 25, 20262026-06-25

Autodata Reframes Synthetic Data as Agentic System Design

Autodata treats synthetic data as an agentic system, raising key questions on validation, leakage, and repeatability.

hardware

SourceJun 25, 20262026-06-25

Automating Benchmarks for Neural Relational Reasoning Generalization

Why automated LLM-built benchmarks for relational reasoning need difficulty control, reliable answers, and bias checks.

agi

CommunityJun 25, 20262026-06-25

Balancing AI Benefits and Existential Risks Economically

Why AI's growth benefits and existential risks should be compared within one economic framework, not separate debates.

llm

CommunityJun 25, 20262026-06-25

Beyond RAG for Domain-Specific LLM Decision Tasks

RAGBench and LegalBench show why enterprise LLM evaluation must separate retrieval quality from domain-specific judgment.

llm

SourceJun 25, 20262026-06-25

Evaluating VLM Visual Search Beyond Accuracy and Tokens

A framework for evaluating VLM visual search with classic human tasks, using token length and search cost beyond accuracy.

llm

SourceJun 25, 20262026-06-25

FlowR2A Reframes Planning as Reward-Conditioned Action Generation

FlowR2A reframes autonomous driving planning from scoring actions to learning reward-conditioned action distributions.

hardware

SourceJun 25, 20262026-06-25

Grounded LLM Workflows for Inherited Disease Diagnosis Ranking

DeepBD highlights grounded LLM workflows for inherited disease diagnosis, emphasizing traceable evidence and recall gains.

llm

SourceJun 25, 20262026-06-25

GUI Agents Must Stop at Sensitive Screens

Why GUI agents should hand control to users on sensitive screens, beyond task success alone.

llm

SourceJun 25, 20262026-06-25

How LLMs Fail Plausibly on Research Math Problems

A look at four plausible LLM failure modes in research-level math and why verification design matters beyond accuracy.

hardware

CommunityJun 25, 20262026-06-25

What INT8 ConvRot Actually Proves in Local Generation

Separates verified evidence from community impressions on INT8 ConvRot for local image and video generation workflows.

agi

SourceJun 25, 20262026-06-25

Lossy Memory Can Mislead Models With Confidence

Why lossy memory can be more dangerous than no memory, and what it means for long-term memory design in LLM agents.

agi

SourceJun 25, 20262026-06-25

Managing Release Loops in Continual LLM Evolution

A survey reframes continual learning for industrial LLMs as a closed-loop update and release operations problem.

hardware

SourceJun 25, 20262026-06-25

Modeling LLM Verifier Loops With Convergence Guarantees

A framework modeling LLM-verifier loops as a four-stage absorbing Markov chain to analyze convergence and failure points.

agi

SourceJun 25, 20262026-06-25

Multi-Agent LLMs Trace Financial Literacy Through Game Logs

A study on stealth assessment of financial literacy using game logs, multi-agent LLMs, and BKT, with focus on label quality.

llm

SourceJun 25, 20262026-06-25

OncoSynth Preserves Treatment Effects In Oncology Synthetic Data

OncoSynth models causal chains in oncology synthetic data to reduce treatment effect estimation bias beyond predictive metrics.

hardware

SourceJun 25, 20262026-06-25

Rethinking Agent Safety Beyond Model Internal Guardrails

Why agent safety must shift from internal prompts and filters to external runtime permission enforcement.

agi

SourceJun 25, 20262026-06-25

Stabilizing Black-Box AI With Randomized Ensemble Calls

A 2026 arXiv paper proposes randomized repeated calls to stabilize black-box AI, with tradeoffs in cost and sigma range.

hardware

SourceJun 25, 20262026-06-25

Uncertainty-Aware RL for De Novo Molecular Design

Why treating molecular property scores as deterministic rewards can mislead RL, and how uncertainty-aware design may help.

k-ai-pulse

RoundupJun 24, 20262026-06-24

AI Resource Roundup (24h) - 2026-06-24

A curated link roundup from recently collected official updates and tech news.

hardware

SourceJun 24, 20262026-06-24

CineCap And The Challenge Of Cinematic Video Captioning

CineCap targets cinematic video captioning, focusing on camera motion, shot size, angle, and structured scene reasoning.

hardware

SourceJun 24, 20262026-06-24

Compositional 3D Generation With Multi-View Consistency Challenges

A look at collision handling, view consistency, and editability in compositional 3D scene generation.

Aionda

Tag: llm

When AI Can Automate Psychology Experiments Reliably

Can 3D Layout Plus AI Improve Animation Stability

AI Resource Roundup (24h) - 2026-06-25

Autodata Reframes Synthetic Data as Agentic System Design

Automating Benchmarks for Neural Relational Reasoning Generalization

Balancing AI Benefits and Existential Risks Economically

Beyond RAG for Domain-Specific LLM Decision Tasks

Evaluating VLM Visual Search Beyond Accuracy and Tokens

FlowR2A Reframes Planning as Reward-Conditioned Action Generation

Grounded LLM Workflows for Inherited Disease Diagnosis Ranking

GUI Agents Must Stop at Sensitive Screens

How LLMs Fail Plausibly on Research Math Problems

What INT8 ConvRot Actually Proves in Local Generation

Lossy Memory Can Mislead Models With Confidence

Managing Release Loops in Continual LLM Evolution

Modeling LLM Verifier Loops With Convergence Guarantees

Multi-Agent LLMs Trace Financial Literacy Through Game Logs

OncoSynth Preserves Treatment Effects In Oncology Synthetic Data

Rethinking Agent Safety Beyond Model Internal Guardrails

Stabilizing Black-Box AI With Randomized Ensemble Calls

Uncertainty-Aware RL for De Novo Molecular Design

AI Resource Roundup (24h) - 2026-06-24

CineCap And The Challenge Of Cinematic Video Captioning

Compositional 3D Generation With Multi-View Consistency Challenges