Policy Layers for Governing Generalist LLM Agents Safely
How policy-as-code layers can govern generalist LLM agents by controlling tool use, approvals, and data exposure.
Humanoids, autonomy, and embodied AI.
Hub content is updated incrementally.
How policy-as-code layers can govern generalist LLM agents by controlling tool use, approvals, and data exposure.
A look at structuring table QA with guided cell navigation and staged inference to improve accuracy and verify evidence paths.
A curated link roundup from recently collected official updates and tech news.
MOCHA treats agent skills as multi-field artifacts and argues they must be optimized with platform constraints in mind.
Examines how offloading and preemption affect multi-model LLM serving under GPU memory limits and model-specific costs.
A curated link roundup from recently collected official updates and tech news.
COBALT proposes smartphone and cloud teleoperation to reduce data collection bottlenecks in robot imitation learning.
In handwritten math grading, process understanding matters more than OCR, requiring rubric-based review and human checks.
Multi-image prompts can bypass single-image filters, exposing structural safety gaps in multimodal LLM defenses.
A study on claim verification that proposes ternary decisions and explainable argumentation under incomplete or conflicting evidence.
A curated link roundup from recently collected official updates and tech news.
A curated link roundup from recently collected official updates and tech news.
How prompt-guided image compression for VLMs shifts focus from human visual quality to preserving clues needed for tasks.
A case of wrapping Florence-2 with ROS 2 topics, services, and actions for local inference and reproducible integration.
A look at when entity resolution needs full GNN extensions and when task-specific minimal graph structure is enough.
How serverless gossip learning and carbon-aware orchestration address unreliable connectivity in maritime AI systems.
A curated link roundup from recently collected official updates and tech news.
AI-generated code quality varies by task and prompt, so security, maintainability, and risk checks matter more than speed alone.
A look at distributed MADRL for large-scale scheduling, focusing on scalability, adaptability, and design tradeoffs.
A look at research evaluating harmful manipulation through human-AI multi-turn interaction beyond static benchmarks.
Anthropic’s 1,250 AI-led interviews show how user research is shaping feature priorities and safety design.
Why mathematics must address AI through values, practice, teaching, technology, and ethics to protect autonomy.
A unified view of probabilistic trustworthy AI: performance bottlenecks may lie in memory and random data movement, not just compute.
A neuroimaging benchmark comparing vision-enabled LLMs on MRI and CT, focusing on clinical reasoning, errors, and safety tradeoffs.