Tavus Introduces Raven-1, Bringing Multimodal Perception to Real-Time Conversational AI

New system fuses audio and visual signals to understand emotion, intent, and context in natural language conversations.

Published on Feb. 17, 2026

Tavus, a San Francisco-based AI research company, has launched Raven-1, a multimodal perception system that enables AI to understand emotion, intent, and context the way humans do. Raven-1 captures and interprets audio and visual signals together, allowing AI systems to comprehend not just what users say, but how they say it and what that combination actually means.

Why it matters

Traditional emotion detection systems have limitations, as they flatten nuance into rigid categories, assume emotional consistency across entire utterances, and treat audio and visual signals independently. Raven-1 addresses these issues by capturing the full picture of tone, expression, posture, and the incongruence between words and signals, which is crucial for high-stakes use cases like healthcare, therapy, coaching, and interviews.

The details

Raven-1 is a multimodal perception system built for real-time conversation in the Tavus Conversational Video Interface (CVI). It produces interpretable natural language descriptions of emotional state and intent at sentence-level granularity, integrating tone, prosody, facial expression, posture, and gaze into unified real-time context. The system maintains context that is never more than a few hundred milliseconds stale, excelling on short, ambiguous, emotionally loaded inputs where traditional systems fail.

  • Raven-1 was launched into general availability on February 16, 2026.

The players

Tavus

A San Francisco-based AI research company pioneering human computing, the next era of computing built around adaptive and emotionally intelligent AI humans.

Raven-1

A multimodal perception system developed by Tavus that enables AI to understand emotion, intent, and context the way humans do.

Got photos? Submit your photos here. ›

What’s next

Raven-1 is now generally available across all Tavus conversations and APIs, and developers can access the perception layer through Tavus APIs for custom tool calls and programmatic logic.

The takeaway

Raven-1's multimodal perception capabilities represent a significant advancement in conversational AI, enabling systems to understand emotion, intent, and context in a more human-like way and providing crucial insights for high-stakes applications like healthcare and coaching.