Six stages. Raw documents in. Shipping, signed, auditable AI product out. Built by an NVIDIA Inception member to push the full NVIDIA stack as far as it will go.
Not a training tool. Not a CLI with a Gradio URL. A factory — double-click and it works.
LLaMA-Factory, Unsloth Studio, and every competitor assume you already have clean data, an ML engineer, and a GitHub account. Forge ships the whole factory — and it runs on the full NVIDIA stack, not just the GPU.
Built specifically to push the full NVIDIA stack as far as it would go. Every stage of Forge uses NVIDIA-native infrastructure — not as a plugin, but as the engine.
| Capability | Forge | LLaMA-Factory | Unsloth |
| Raw doc ingest (PDF/DOCX) | ✓ | ✗ | ✗ |
| Data curation + synthetic gen | ✓ | ✗ | ✗ |
| GPU auto-selection | ✓ | ✗ | ✗ |
| NeMo training engine | ✓ | ✗ | ✗ |
| Eval gates (block bad models) | ✓ | ✗ | ✗ |
| Air-gap Docker packaging | ✓ | ✗ | ✗ |
| Signed governance receipts | ✓ | ✗ | ✗ |
| Desktop app (double-click) | ✓ | ✗ | ✗ |
LLaMA-Factory and Unsloth are excellent training tools. Forge ships the entire factory.
Every stage is integrated, not bolted together. Data flows through the pipeline automatically. No context-switching between tools, no shell scripts, no YAML archaeology.
PDF, DOCX, HTML, and web sources → chunked, embedded, stored in LanceDB. Powered by NVIDIA NeMo Retriever extractors — captures tables, diagrams, and page elements that generic loaders lose.
Curate, score, deduplicate, and generate synthetic data. Built-in human review queue for flagged samples. Produces a clean, scored training dataset before a single GPU cycle is spent.
GPU profiler detects your hardware and sets memory budgets automatically. Recommends the optimal base architecture for your task, data volume, and hardware — no spreadsheets required.
NeMo is the primary engine — SFT, LoRA, QLoRA. Engine registry automatically selects Unsloth or HuggingFace PEFT when they're the right fit. One UI. Multiple backends. You don't choose the plumbing.
Seven-dimension benchmarks that block bad models from shipping. Not just metrics — gates. A checkpoint either passes or it doesn't. Produces a qualified checkpoint, not just a checkpoint.
Air-gappable Docker Compose, cloud NIM, or Terraform. Every artifact is Ed25519 signed with a governance receipt. Ships as a Windows MSI installer. Mac coming.
Cortex — our regulatory intelligence API — was built end-to-end using Onyx Forge. It is the product that Forge built. Every stage, every artifact, every governance receipt.
Regulatory Intelligence API
Cortex ingests 20+ regulatory frameworks, trains on supervised compliance examples, passes 7-dimension eval gates, and ships as a signed Docker package with Ed25519 governance receipts. Every stage ran through Forge.
You're not just using an LLM API. You want to run your own models on your own hardware with the same infrastructure NVIDIA uses for enterprise deployments. Forge gives you the full NVIDIA toolchain without the integration work.
You have a domain, documents, and a product to ship. You don't have time to assemble an ingest pipeline, a training workflow, an eval framework, and a packaging system. Forge is the whole stack, double-click to start.
You need more than a fine-tuned checkpoint. You need signed artifacts, eval reports, governance receipts, and packaging your security team can review. Forge ships production-grade outputs — not prototypes.
Air-gappable Docker Compose. Ed25519 signed artifacts. No cloud dependency at runtime. Designed for environments where the model, the data, and the audit trail must stay inside your perimeter.
Forge is in limited early access. We're onboarding teams building serious AI products on NVIDIA hardware.
Pricing available on request. No commitments required.
We'll be in touch within one business day.