Tag: safety

12 articles available

View all tags View all posts

Designing Guardrails for Agentic LLM Execution

llm

CommunityJul 6, 20262026-07-06

Designing Guardrails for Agentic LLM Execution

As agentic LLMs move from answering to acting, permissions, approvals, and safety design matter more than benchmarks.

What Defines Success In Home Cooking Humanoids

agi

CommunityJul 6, 20262026-07-06

What Defines Success In Home Cooking Humanoids

Home cooking humanoids should be judged by task success, time, safety, and cost, not human-like appearance.

Open-Weight LLM Safety Beyond Release-Time Alignment

agi

SourceJul 4, 20262026-07-04

Open-Weight LLM Safety Beyond Release-Time Alignment

Open-weight LLM safety should be judged not only at release, but by how easily fine-tuning can weaken safeguards later.

Reframing Shielded RL as Design-Time Structure Analysis

hardware

SourceJun 12, 20262026-06-12

Reframing Shielded RL as Design-Time Structure Analysis

A concise look at shielded RL reinterpreted as a design-time tool for structural safety analysis, not runtime blocking.

Why Custom Instructions Personas Drift Under Hierarchy

hardware

CommunityMar 8, 20262026-03-08

Why Custom Instructions Personas Drift Under Hierarchy

Model Spec’s chain of command can override custom instructions, causing persona and reasoning drift. Design priorities, exceptions, and fallbacks to improve reproducibility.

Gating Robot Autonomy Using Deep Perception Uncertainty Signals

hardware

SourceMar 7, 20262026-03-07

Gating Robot Autonomy Using Deep Perception Uncertainty Signals

SPIRIT uses deep perception uncertainty to gate shared autonomy, switching between semi-autonomous manipulation and haptic teleoperation.

How Conversational AI Design Shapes Intimacy And Trust

agi

CommunityMar 4, 20262026-03-04

How Conversational AI Design Shapes Intimacy And Trust

Examines how warmth, memory, and consistency in conversational AI affect intimacy, trust, and safety evaluation criteria.

Why Tiny Prompt Changes Can Break Robot Safety

agi

CommunityMar 1, 20262026-03-01

Why Tiny Prompt Changes Can Break Robot Safety

How small prompt shifts can amplify into risky robot actions, and why alignment alone can’t guarantee physical safety.

Validating Failure Modes in Vision-Agent Robotics Systems

hardware

CommunityMar 1, 20262026-03-01

Validating Failure Modes in Vision-Agent Robotics Systems

In high-risk deployments, prioritize uncertainty, false positives/negatives, and closed-loop failure propagation over single-model scores.

Tracing Output Drift With Snapshots, Seeds, And Safety

hardware

CommunityFeb 25, 20262026-02-25

Tracing Output Drift With Snapshots, Seeds, And Safety

Even with the same model alias, outputs can shift due to snapshot routing, safety behaviors, and sampling settings. Use logs and regression tests to isolate causes.

Designing Boundaries for Relationship Tests in AI Chats

hardware

CommunityFeb 16, 20262026-02-16

Designing Boundaries for Relationship Tests in AI Chats

How to handle relationship-test prompts in AI chats: set refusal boundaries with Safe Complete, document branching rules, and validate via evaluation.

California And EU Launch Investigations Into xAI Grok Safety

llm

NewsJan 27, 20262026-01-27

California And EU Launch Investigations Into xAI Grok Safety

Explores legal investigations into xAI's Grok and the shift toward mandatory AI safety standards in California and the EU.