From Black-Box Grading to Rubric-Based Explainable Scoring
A look at rubric- and concept-based grading that makes open-ended scoring more reviewable, editable, and accountable.
A look at rubric- and concept-based grading that makes open-ended scoring more reviewable, editable, and accountable.
CyberJurors evaluates agent systems on multi-round, multimodal evidence handling and platform rule adaptation in e-commerce disputes.
Why multimodal AI still struggles with charts and scientific figures, and how to verify image-based conclusions in practice.
Examines human-AI collaboration for replicability prediction, balancing speed and consistency against bias, accountability, and privacy risks.
MOV-Bench highlights evaluation gaps in multi-hop audio-visual reasoning and shows consistent gains from agentic search.
Examines whether offline RL can cut online RL costs in code generation post-training without sacrificing practical quality.
How under-specified applied ML papers can become executable benchmarks through agentic workflows and slot-based reporting.
AI vertical integration is less about chips than controlling the training stack, latency, throughput, utilization, and recovery.
A curated link roundup from recently collected official updates and tech news.
How policy-as-code layers can govern generalist LLM agents by controlling tool use, approvals, and data exposure.
A look at structuring table QA with guided cell navigation and staged inference to improve accuracy and verify evidence paths.
MOCHA treats agent skills as multi-field artifacts and argues they must be optimized with platform constraints in mind.
Examines how offloading and preemption affect multi-model LLM serving under GPU memory limits and model-specific costs.
A curated link roundup from recently collected official updates and tech news.
COBALT proposes smartphone and cloud teleoperation to reduce data collection bottlenecks in robot imitation learning.
In handwritten math grading, process understanding matters more than OCR, requiring rubric-based review and human checks.
Multi-image prompts can bypass single-image filters, exposing structural safety gaps in multimodal LLM defenses.
A study on claim verification that proposes ternary decisions and explainable argumentation under incomplete or conflicting evidence.
A curated link roundup from recently collected official updates and tech news.
A curated link roundup from recently collected official updates and tech news.
How prompt-guided image compression for VLMs shifts focus from human visual quality to preserving clues needed for tasks.
A case of wrapping Florence-2 with ROS 2 topics, services, and actions for local inference and reproducible integration.
A look at when entity resolution needs full GNN extensions and when task-specific minimal graph structure is enough.
A curated link roundup from recently collected official updates and tech news.