SGLang Team Launches RadixArk With Four Hundred Million Valuation

TL;DR

The SGLang team from UC Berkeley transitioned to RadixArk with a $400 million valuation from Accel.
RadixAttention technology achieved up to 6.4x higher throughput in RAG and structured generation tasks.
The framework recorded 3.1x higher throughput and 3.7x faster TTFT than vLLM for 70B models.

From Lab to Market: The Emergence of RadixArk

In the AI industry, inference efficiency is linked to cost reduction. SGLang gained recognition as an open-source framework for processing complex prompts. The project is transitioning into a corporate entity named RadixArk to accelerate commercialization.

RadixArk’s core technology is a shared prefix caching mechanism called 'RadixAttention.' LLMs reuse previous data instead of recalculating during long contexts or repetitive instructions. This process is similar to bookmarking library pages for immediate access.

Research shows SGLang demonstrated up to 6.4x higher throughput than existing engines in RAG tasks. Structured generation tasks also showed similar throughput improvements. Static graph compilation based on CUDA graphs led to an additional 2.8x performance improvement. This offers potential server cost reduction for companies using AI agents or complex workflows.

Performance Validation through Comparison with vLLM

Performance differences were observed against vLLM, a common choice for LLM inference engines. Benchmark data indicates SGLang recorded 3.1x higher throughput for 70B-scale models compared to vLLM. The 'Time to First Token' (TTFT) was approximately 3.7x faster in low-latency environments.

These figures support RadixArk’s $400 million valuation. The technology showed a competitive advantage in enterprise environments with high traffic. Accel’s investment likely followed an assessment of SGLang’s ability to resolve industrial bottlenecks.

Analysis: An Efficiency-Centric Strategy

The spin-out of RadixArk reflects two major trends in the AI industry.

First is the economic value of 'structured generation.' Modern AI services often require data extraction in JSON format or specific standards. SGLang specializes in processing these structured commands for the enterprise AI market.

Second is the speed of commercialization based on open source. Academic projects are increasingly attracting investment and spinning out into independent entities. Capital reacts to the scale of problems a technology aims to solve.

However, vLLM maintains a strong community and continues to release performance updates. RadixArk faces the challenge of building value for customers while remaining open-source.

Practical Application: Considerations for Adoption

Enterprises building LLM infrastructure can consider SGLang as a candidate for production.

RAG Workflow Optimization: Developers can examine RadixAttention’s caching efficiency when using long documents.
Agent System Construction: SGLang’s static graph compilation can reduce response times when reusing system prompts.
Benchmarking: For models of 70B or more, users can compare cost per token and throughput.

FAQ

Q: How does RadixAttention differ from standard KV caching? A: Standard caching focuses on single sessions. RadixAttention shares common prefixes across multiple users or requests. This can reduce memory usage and increase speed.

Q: What is the background behind the $400 million valuation? A: Inference costs are a major expenditure for AI companies. Performance improvements can lead to server infrastructure cost savings for large-scale operators.

Q: Is it difficult for existing vLLM users to switch to SGLang? A: SGLang is designed for compatibility. Maximizing performance may require optimizing specific interfaces and static graph compilation settings.

Conclusion

The spin-out of RadixArk suggests competition in the inference market has entered an efficiency-driven phase. The 6.4x throughput improvement may impact the profit structure of AI businesses. Observers are watching if RadixArk can become an enterprise standard while collaborating with open-source communities.

참고 자료

🛡️ Source
🏛️ SGLang: Efficient Execution of Structured Language Model Programs

Aionda