VMware Cloud Foundation Becomes an AI‑Native Platform with Private AI Foundation and NVIDIA

VMware Cloud Foundation is now an AI‑native platform, and VMware Private AI Foundation with NVIDIA is the way enterprises turn their existing VCF investment into a governed, scalable backbone for Private AI.

Why Private AI on VCF matters

Enterprises want the flexibility of modern AI, but with strong privacy, governance, and predictable cost. By integrating Private AI capabilities directly into VMware Cloud Foundation 9.0, Broadcom turns VCF into a platform that can securely host AI models, data, and applications side‑by‑side with existing workloads, instead of treating AI as a separate island.

Platform layers at a glance

Layer	What it provides	Key components
Virtual infrastructure	Core compute, storage, and networking for all workloads	vSphere, ESXi, vSAN, NSX
Control plane	Cloud‑like APIs/UI to provision infra services	Supervisor Services
Infra services	VMs, Kubernetes, storage, networking, databases, GPUs	VMs, TKGS/Tanzu, DSM, vGPU
AI services	Model lifecycle, vector DB, indexing, agents, inference endpoints	Model Store, Model Runtime, Agents
Consumption	Interfaces for admins and developers	GUI, CLI, APIs, IaC tools

These layers allow platform teams to keep using familiar operational tools while exposing AI and data services through standardized, self‑service interfaces.

Privacy, security, and model lifecycle

A core design goal is treating models as first‑class corporate assets that need governance just like containers or images. Model Store uses Harbor as a registry for models and NVIDIA inference containers, allowing security teams to scan, approve, and control access via RBAC and dedicated projects. The platform supports air‑gapped deployments by using curated, on‑premises repositories for AI images and libraries, which is crucial for organizations with strict data sovereignty or regulatory requirements.

GPUs as first‑class citizens

VCF 9.0 takes GPUs from “special snowflakes” to managed, schedulable resources. The platform supports NVIDIA vGPU, allowing administrators to carve a physical GPU into multiple vGPU profiles, time‑slice capacity, or reserve larger profiles for demanding models while retaining features like vMotion. New hardware support includes accelerated platforms such as NVIDIA Blackwell B200 and RTX 6000‑class GPUs, enabling both high‑end training and large‑scale inference on‑premises.

GPU consumption and visibility

Aspect	Capability in VCF 9.0
vGPU profiles	Multiple profiles per GPU for different workload sizes
Mobility	vMotion support for VMs with vGPU attached
Capacity planning	Cluster‑level view of vGPU usage and remaining capacity
Telemetry	Utilization, memory, temperature via operations console

These capabilities help avoid GPU hoarding and fragmentation, ensuring expensive accelerators serve the highest‑value workloads.

Model Store and secure onboarding

Models typically arrive as large data files rather than VMs or containers, and Private AI treats them accordingly. Organizations can pull open models into a controlled onboarding workflow: evaluate in a dedicated environment, scan for vulnerabilities or policy issues, and then promote into Harbor projects as “approved” or “unapproved” models. Once in Model Store, these assets can be reused by multiple teams without repeated manual downloads or ad‑hoc security checks, significantly reducing operational friction.

Model Runtime: models as a service

Model Runtime turns stored models into production‑ready endpoints, deployed as infrastructure‑as‑code or via self‑service UI. It supports both completions models (LLMs) and embedding models, wiring them to appropriate inference engines and GPUs under the hood. A key design choice is OpenAI‑compatible APIs: existing applications that target the OpenAI API can often be redirected to on‑prem endpoints, letting teams move workloads from public cloud to private infrastructure with minimal code change and lower total cost of ownership.

Model services and APIs

Capability	Description
Completions models	LLMs exposed via standardized API endpoints
Embedding models	Engines optimized for vector embeddings and RAG
OpenAI API compatibility	Drop‑in migration path from public to private endpoints
Horizontal scaling	Replicas for throughput and resilience

This abstraction lets platform teams upgrade or switch models while keeping a stable interface for consuming applications.

Data services, vector DB, and RAG

Private AI is not only about models; it is about connecting those models to enterprise data. Data Services Manager provides database‑as‑a‑service capabilities and can expose PostgreSQL with pgvector, effectively acting as a vector database for embeddings used in retrieval‑augmented generation (RAG) workflows. A Data Indexing and Retrieval service then crawls multiple sources—object storage, collaboration tools, file repositories—chunks content, generates embeddings, and populates knowledge bases that AI agents can query.

Agents, knowledge bases, and MCP

On top of models and vector stores, the platform introduces an AI agent builder that attaches agents to specific knowledge bases with fine‑grained access controls. This allows different business units to run agents over distinct slices of the corporate corpus without bleeding data across domains. Planned support for Model Context Protocol (MCP) will let agents interoperate with a large ecosystem of MCP servers—covering systems like Slack, GitHub, ServiceNow, and databases—so agents can not only query data but also trigger actions within enterprise systems.

Operational visibility and AI‑assisted operations

All of this only works if infrastructure teams can see how GPUs and AI services are being used. VCF 9.0 introduces improved visibility into vGPU profiles, consumption per cluster, and detailed GPU telemetry, empowering capacity planning and chargeback/showback. An intelligent assistant, powered by the same Private AI capabilities, surfaces documentation and contextual guidance inside the management UI, making it easier to operate the platform at scale