Tag: explainer

232 articles · Page 6 / 10

Overview of PCN: iterative inference, fixed-point convergence (dv≈0), links to backprop equivalence/approximation, and compute bottlenecks.

agi

SourceMar 9, 20262026-03-09

Prompt Group Training For Robust Text Guided Segmentation

Summarizes prompt group-aware training that aligns predictions across equivalent prompts, reducing variance and improving average zero-shot Dice.

hardware

CommunityMar 8, 20262026-03-08

Beyond Benchmark Scores: Reproducible, Multi-Metric Model Evaluation

Why tiny benchmark gaps mislead: evaluation settings, reproducible logs, and multi-metric, roadmap-driven model selection.

agi

CommunityMar 8, 20262026-03-08

Combining LLM Agents With Science Models for Reliable Loops

A practical pattern: LLMs handle planning and interpretation, while science models provide constraint-based scoring and stopping gates.

hardware

CommunityMar 8, 20262026-03-08

When 4-Bit Quantization Beats FP16 Perplexity

Explain why 4-bit quantized models can show lower PPL than FP16, and outline a reproducible evaluation protocol.

llm

CommunityMar 8, 20262026-03-08

Why AI Talk Runs Long While Drinking

How acute alcohol use can weaken response inhibition and make AI talk too long, plus simple rules to keep rapport in social settings.

hardware

CommunityMar 8, 20262026-03-08

Why Custom Instructions Personas Drift Under Hierarchy

Model Spec’s chain of command can override custom instructions, causing persona and reasoning drift. Design priorities, exceptions, and fallbacks to improve reproducibility.

llm

CommunityMar 7, 20262026-03-07

CAPTCHA Security Friction Tradeoffs In Real User Flows

Real-user data shows CAPTCHA time varies by context, while ML and relay attacks raise friction without guaranteed security gains.

hardware

SourceMar 7, 20262026-03-07

Evaluating Zero-Shot MLLMs for Reliable Video Anomaly Alerts

Assesses zero-shot MLLMs for video anomaly detection, focusing on false alarms/misses, prompt specificity, 1–3s clips, and PR/F1 evaluation.

hardware

SourceMar 7, 20262026-03-07

Gating Robot Autonomy Using Deep Perception Uncertainty Signals

SPIRIT uses deep perception uncertainty to gate shared autonomy, switching between semi-autonomous manipulation and haptic teleoperation.

hardware

CommunityMar 7, 20262026-03-07

Designing Language for LLM Expectations and Verification

How to reduce anthropomorphism, overconfidence, and hallucinations by structuring work as claim-evidence-verification checklists.

hardware

SourceMar 7, 20262026-03-07

LegalBench And Auditable Argumentation For Legal LLMs

How LegalBench evaluates legal LLM reasoning beyond accuracy, emphasizing justification and auditability through structured argumentation and governance.

hardware

SourceMar 7, 20262026-03-07

Logi-PAR Adds Differentiable Rules to Clinical Activity Recognition

Logi-PAR (arXiv:2603.05184v1) integrates neural-guided differentiable rules into clinical PAR, enabling rule traces and counterfactual interventions.

hardware

SourceMar 7, 20262026-03-07

Memory Admission Control for Reliable LLM Agents

A practical look at memory admission control for LLM agents, reducing long-term memory pollution while improving auditability and metrics.

agi

SourceMar 7, 20262026-03-07

Multimodal Clinical Reasoning Needs Controlled Evaluation, Not Scores

In multimodal clinical reasoning, reported gains don’t guarantee safety; prioritize controlled evaluation, grounding, and auditable failure modes.

hardware

CommunityMar 7, 20262026-03-07

Why PDF-to-Excel Rankings Flip Across Input Methods

PDF-to-Excel results vary by upload limits and text vs visual parsing. Use structure metrics and fixed schemas for fair evaluation.

hardware

CommunityMar 7, 20262026-03-07

Tradeoffs Between Web Search and Reasoning Modes

How web search and reasoning modes trade off accuracy, reproducibility, and latency—plus a simple test procedure to verify results yourself.

llm

CommunityMar 6, 20262026-03-06

Designing Long-Form LLM Workflows Beyond Large Context Windows

For long policy reports, context and upload limits push chunked workflows that separate evidence retrieval from drafting, improving traceability and quality.

agi

SourceMar 6, 20262026-03-06