Built on the stack that
powers frontier AI.

Onyx AI Labs is a member of NVIDIA Inception, a program designed to nurture startups revolutionizing industries with AI. Through Inception, we have direct access to NVIDIA's latest GPU hardware, AI frameworks, inference microservices, and technical resources — the same infrastructure trusted by the world's most demanding AI workloads.

Access & Infrastructure

What the Inception program unlocks

NVIDIA Inception is an acceleration platform for AI and advanced computing startups. Members receive hardware grants, cloud credits, technical training, and go-to-market support — compressing months of infrastructure procurement into direct access.

DGX Hardware Access

Priority access to NVIDIA DGX systems — the reference architecture for large-scale AI training and inference. We train and serve models on the same GPU clusters that power GPT, Claude, and Gemini.

NIM Microservices

Pre-built, optimized inference containers for NVIDIA-optimized models. Drop-in deployment of LLMs, embedding models, and vision models with TensorRT-LLM acceleration.

Technical Resources

Direct engineering support from NVIDIA solution architects, early access to new frameworks and SDKs, and expert training on the full NVIDIA AI platform.

Go-to-Market Support

Co-marketing opportunities, event presence, case study development, and access to NVIDIA's enterprise customer network.

Technology Stack

The NVIDIA technologies we build on

NVIDIA NeMo

Model Training & Customization

Framework for building, training, and fine-tuning large language models. Powers our Cortex regulatory intelligence pipeline — purpose-trained models for compliance domains.

NIM Microservices

Optimized Inference

Production-ready inference containers with TensorRT-LLM acceleration. Delivers 5-10x throughput improvement over standard model serving for our multi-model Legion platform.

TensorRT-LLM

Inference Optimization

GPU-accelerated inference engine. Reduces latency and increases throughput for all our deployed language models — critical for real-time regulatory compliance checks.

Triton Inference Server

Model Orchestration

Multi-framework model server supporting concurrent execution of deep learning models. Enables our multi-model architecture to query and ensemble across model types.

NV-EmbedQA

Semantic Search & Retrieval

GPU-accelerated embedding and question-answering pipeline. Powers Cortex's citation-backed retrieval over 73K+ regulatory obligations with cryptographic audit receipts.

Megatron-Core

Distributed Training

PyTorch-based library for large-scale transformer model training. Used in Forge for custom LLM training on domain-specific corpora across multiple GPUs.

Milvus + GPU Indexing

Vector Database

GPU-accelerated vector search for semantic retrieval at scale. Indexes regulatory frameworks, case law, and compliance documents across 20+ jurisdictions.

DGX Hardware

Compute Infrastructure

Purpose-built AI infrastructure for training and inference. Our models are trained and evaluated on the same reference architecture used by frontier AI labs.

CUDAcuDNNNCCLRAPIDSNeMo GuardrailsNeMo EvaluatorNVIDIA AI EnterpriseBase Command Platform

Products

Onyx products running on the NVIDIA stack

Every product we build leverages NVIDIA infrastructure. Here's how.

Cortex

Regulatory Intelligence API

Purpose-trained models for regulatory compliance. NeMo for fine-tuning, Triton for multi-model orchestration, NV-EmbedQA for citation-backed retrieval across 73K+ obligations. Cryptographic audit receipts on every answer.

NeMo · Triton · NV-EmbedQA · TensorRT-LLM

Forge

Vertical LLM Factory

Takes domain expertise and regulatory corpus as input, outputs production-ready model packages. Megatron-Core for distributed training, NIM for optimized deployment containers, TensorRT-LLM for inference acceleration.

NIM · Megatron-Core · DGX · TensorRT-LLM

Foundry

Agentic Dev Workspace

Multi-model collaborative development environment. Triton orchestrates concurrent model access, NeMo Guardrails enforces safety policies, NIM delivers optimized inference for interactive agent workflows.

Triton · NeMo Guardrails · NIM

Legion

Multi-LLM Deliberation Platform

Five-model structured deliberation. Triton manages concurrent inference across Claude, GPT, Gemini, Grok, and DeepSeek. TensorRT-LLM optimizes throughput for real-time multi-round synthesis.

Triton · TensorRT-LLM · DGX

About NVIDIA

NVIDIA pioneered accelerated computing. Founded in 1993, the company invented the GPU and has since evolved into a full-stack computing company — from chips to systems to software to AI frameworks. NVIDIA's platform is the backbone of modern AI: every major large language model, from GPT to Claude to Gemini, is trained on NVIDIA GPUs.

The NVIDIA Inception program supports over 16,000 AI startups worldwide with technical resources, hardware access, and go-to-market support. Members span healthcare, robotics, autonomous vehicles, financial services, and enterprise AI.

NVIDIA Inception Program

Building on the NVIDIA stack?

We'd love to compare notes. Whether you're exploring NIM, training with NeMo, or deploying on DGX — let's talk infrastructure.

Get in touch

Built on the stack thatpowers frontier AI.

What the Inception program unlocks

DGX Hardware Access

NIM Microservices

Technical Resources

Go-to-Market Support

The NVIDIA technologies we build on

NVIDIA NeMo

NIM Microservices

TensorRT-LLM

Triton Inference Server

NV-EmbedQA

Megatron-Core

Milvus + GPU Indexing

DGX Hardware

Onyx products running on the NVIDIA stack

Cortex

Forge

Foundry

Legion

About NVIDIA

Building on the NVIDIA stack?

Built on the stack that
powers frontier AI.