VMware Cloud Foundation is now an AI‑native platform, and VMware Private AI Foundation with NVIDIA is the way enterprises turn their existing VCF investment into a governed, scalable backbone for Private AI.

Why Private AI on VCF matters
Enterprises want the flexibility of modern AI, but with strong privacy, governance, and predictable cost. By integrating Private AI capabilities directly into VMware Cloud Foundation 9.0, Broadcom turns VCF into a platform that can securely host AI models, data, and applications side‑by‑side with existing workloads, instead of treating AI as a separate island.
Platform layers at a glance
These layers allow platform teams to keep using familiar operational tools while exposing AI and data services through standardized, self‑service interfaces.
Privacy, security, and model lifecycle
A core design goal is treating models as first‑class corporate assets that need governance just like containers or images. Model Store uses Harbor as a registry for models and NVIDIA inference containers, allowing security teams to scan, approve, and control access via RBAC and dedicated projects. The platform supports air‑gapped deployments by using curated, on‑premises repositories for AI images and libraries, which is crucial for organizations with strict data sovereignty or regulatory requirements.
GPUs as first‑class citizens
VCF 9.0 takes GPUs from “special snowflakes” to managed, schedulable resources. The platform supports NVIDIA vGPU, allowing administrators to carve a physical GPU into multiple vGPU profiles, time‑slice capacity, or reserve larger profiles for demanding models while retaining features like vMotion. New hardware support includes accelerated platforms such as NVIDIA Blackwell B200 and RTX 6000‑class GPUs, enabling both high‑end training and large‑scale inference on‑premises.
GPU consumption and visibility
These capabilities help avoid GPU hoarding and fragmentation, ensuring expensive accelerators serve the highest‑value workloads.
Model Store and secure onboarding
Models typically arrive as large data files rather than VMs or containers, and Private AI treats them accordingly. Organizations can pull open models into a controlled onboarding workflow: evaluate in a dedicated environment, scan for vulnerabilities or policy issues, and then promote into Harbor projects as “approved” or “unapproved” models. Once in Model Store, these assets can be reused by multiple teams without repeated manual downloads or ad‑hoc security checks, significantly reducing operational friction.
Model Runtime: models as a service
Model Runtime turns stored models into production‑ready endpoints, deployed as infrastructure‑as‑code or via self‑service UI. It supports both completions models (LLMs) and embedding models, wiring them to appropriate inference engines and GPUs under the hood. A key design choice is OpenAI‑compatible APIs: existing applications that target the OpenAI API can often be redirected to on‑prem endpoints, letting teams move workloads from public cloud to private infrastructure with minimal code change and lower total cost of ownership.
Model services and APIs
This abstraction lets platform teams upgrade or switch models while keeping a stable interface for consuming applications.
Data services, vector DB, and RAG
Private AI is not only about models; it is about connecting those models to enterprise data. Data Services Manager provides database‑as‑a‑service capabilities and can expose PostgreSQL with pgvector, effectively acting as a vector database for embeddings used in retrieval‑augmented generation (RAG) workflows. A Data Indexing and Retrieval service then crawls multiple sources—object storage, collaboration tools, file repositories—chunks content, generates embeddings, and populates knowledge bases that AI agents can query.
Agents, knowledge bases, and MCP
On top of models and vector stores, the platform introduces an AI agent builder that attaches agents to specific knowledge bases with fine‑grained access controls. This allows different business units to run agents over distinct slices of the corporate corpus without bleeding data across domains. Planned support for Model Context Protocol (MCP) will let agents interoperate with a large ecosystem of MCP servers—covering systems like Slack, GitHub, ServiceNow, and databases—so agents can not only query data but also trigger actions within enterprise systems.
Operational visibility and AI‑assisted operations
All of this only works if infrastructure teams can see how GPUs and AI services are being used. VCF 9.0 introduces improved visibility into vGPU profiles, consumption per cluster, and detailed GPU telemetry, empowering capacity planning and chargeback/showback. An intelligent assistant, powered by the same Private AI capabilities, surfaces documentation and contextual guidance inside the management UI, making it easier to operate the platform at scale

