What we're exploring
Applied AI research with a practical bent. We care about what works in production — not just what looks good in a paper.
Multi-Model Deliberation
How do we get better answers by running the same query through multiple independent models and reconciling their outputs? Legion is our production testbed for this research.
Agent Governance & Safety
What does it mean for an AI agent to be "safe" in production? We're building the governance framework — permission models, audit trails, approval gates, and rollback mechanisms.
Citation-Verified RAG
Retrieval-augmented generation is powerful — and notoriously unreliable. We're building verification layers that trace every claim back to a source document with confidence scoring.
Domain-Specific LLM Evaluation
Standard benchmarks don't tell you if a model understands insurance regulation or medical coding. We're building evaluation suites that measure real domain accuracy.
Collaborate with us
We're open to research partnerships, academic collaborations, and industry working groups. If our research aligns with your work, reach out.
Get in touch