One of our clients found their VMware bill jumping from €80,000 to €350,000 per year after the Broadcom acquisition. They had 12 ESXi hosts, over 500 virtual machines, and three sites. They needed a real alternative, without stopping the business. This is what we did.
The context: when VMware is no longer an option
The client is an industrial company with presence at three sites in Catalonia. Their virtualization environment had been running on VMware vSphere with vSAN for seven years: 12 ESXi hosts, over 500 production VMs, VLAN networks segmented by department, and replication between sites. Everything worked. Until the renewal bill arrived.
With Broadcom's licensing model change, the cost went from €80,000 to €350,000 per year — a 337% increase. It was not a mistake: it was the new price. The client called us with one clear question: "Can we leave VMware without stopping the business?"
Starting environment: 12 ESXi 7.0 hosts | 3 sites | 512 VMs | vSAN | 47 VLANs | Cross-site replication | 99.9% SLA
Phase 1: Full audit (weeks 1-2)
Before touching anything, we needed to know exactly what we had. We performed an exhaustive inventory of the entire VMware environment using RVTools, custom PowerCLI scripts, and manual dependency documentation.
# Inventario automatizado con PowerCLI
Get-VM | Select Name, PowerState, NumCpu, MemoryGB, `
@{N='DiskGB';E={(Get-HardDisk -VM $_ | Measure -Sum CapacityGB).Sum}}, `
@{N='VLAN';E={(Get-NetworkAdapter -VM $_ ).NetworkName}}, `
Guest, VMHost | Export-Csv -Path vm_inventory.csv
# Resultado: 512 VMs, 14.3 TB de disco, 47 VLANs unicas
The audit revealed surprises. We found 47 machines running Windows Server 2012 R2 that nobody remembered existed, some running old file services that still had active users. We also discovered 23 orphaned snapshots consuming 2.1 TB of unnecessary space.
What we documented:
- ✓ 512 VMs with CPU, RAM, disk, network, and OS details
- ✓ 47 VLANs with service dependency maps
- ✓ Backup policies and cross-site replication
- ✓ Templates, orphaned snapshots, and obsolete VMs
- ✓ Criticality classification: 83 critical, 156 important, 273 standard
Phase 2: New environment design (weeks 2-3)
We designed a 12-node Proxmox VE cluster (reusing the same hardware) with Ceph as distributed storage. The key was to exactly replicate the existing network topology so applications would not notice any change.
Ceph Storage
Triple replication, 3 pools: NVMe for critical VMs (SQL, Exchange), SSD for general production, HDD for archive and backups. 4+2 erasure coding for cold pool.
Network
Linux bridges + VLANs replicating exactly the 47 VMware networks. LACP bonding for trunks. Dedicated 25 Gbps Ceph network separated from VM traffic.
High Availability
HA groups per rack with affinity rules. Fencing via IPMI/iDRAC (STONITH). Corosync with redundant link between sites. Quorum configured to tolerate an entire site failure.
Backup & DR
Proxmox Backup Server with deduplication. Daily incremental backups, weekly full. Ceph RBD mirroring replication between sites for DR.
Phase 3: Pilot with 30 VMs (weeks 3-4)
We did not migrate everything at once. We selected 30 non-critical VMs — development environments, test servers, and internal tools — to validate the entire process. The VMDK to QCOW2 conversion was the technical core:
# Conversion VMDK -> QCOW2 con compresion
qemu-img convert -f vmdk -O qcow2 -o preallocation=metadata \
vm-disk.vmdk vm-disk.qcow2
# Importar a Proxmox (directo a Ceph pool)
qm importdisk 100 vm-disk.qcow2 ceph-ssd --format raw
# Para VMs grandes, conversion directa a raw en Ceph (mas rapido)
qemu-img convert -f vmdk -O raw vm-disk.vmdk rbd:ceph-ssd/vm-100-disk-0
# Verificacion de integridad
qemu-img check vm-disk.qcow2
qemu-img compare vm-disk.vmdk vm-disk.qcow2
The pilot revealed two important issues that we resolved before the mass migration:
Issue 1: Disk drivers
Windows VMs with LSI Logic controller would not boot with VirtIO directly. Solution: install VirtIO drivers before migrating, while still on VMware. We created a procedure with a script that mounted the VirtIO ISO and installed drivers automatically.
Issue 2: Custom MTU networks
Some storage VLANs used jumbo frames (MTU 9000). The default Proxmox bridge uses MTU 1500. We configured each bridge with the correct MTU in the /etc/network/interfaces files of each node.
Phase 4: Mass migration (weeks 4-8)
With the pilot validated, we entered migration mode. Every Saturday, in 4-hour maintenance windows (06:00 to 10:00), we migrated batches of 50 to 80 VMs. We automated the entire process with bash scripts:
#!/bin/bash
# migrate_batch.sh - Migracion por lotes VMDK -> Proxmox/Ceph
BATCH_FILE="$1" # CSV: vmname,vmid,pool,node
LOG="/var/log/migration/$(date +%Y%m%d).log"
while IFS=',' read -r vmname vmid pool node; do
echo "[$(date)] Migrando $vmname -> VMID $vmid en $node" | tee -a "$LOG"
# 1. Exportar VMDK via SSH desde ESXi
ssh esxi "vim-cmd vmsvc/power.off \
\$(vim-cmd vmsvc/getallvms | grep $vmname | awk '{print \$1}')"
scp esxi:/vmfs/volumes/datastore1/$vmname/$vmname.vmdk /tmp/migration/
# 2. Convertir e importar directamente al pool Ceph
qemu-img convert -p -f vmdk -O raw \
/tmp/migration/$vmname.vmdk rbd:$pool/vm-${vmid}-disk-0
# 3. Crear config de VM en Proxmox
qm create $vmid --name "$vmname" --memory 4096 --cores 2 \
--net0 virtio,bridge=vmbr0 --ostype l26 \
--scsi0 $pool:vm-${vmid}-disk-0 --scsihw virtio-scsi-single \
--boot order=scsi0
# 4. Verificacion
qm start $vmid
sleep 30
qm agent $vmid ping && echo "[OK] $vmname operativa" | tee -a "$LOG"
# 5. Cleanup
rm /tmp/migration/$vmname.vmdk
done < "$BATCH_FILE"
For critical VMs — SQL Server, Exchange, and the ERP — we could not afford any maintenance window. We used a hot replication strategy:
- Continuous VMDK → Ceph RBD replication with block-level rsync
- Final delta sync during micro-cutover (< 2 minutes)
- DNS and ARP switch for cutover
- Automatic post-migration validation with health checks
Result: zero data loss. Maximum downtime per critical VM: 97 seconds.
Phase 5: Optimization and decommission (weeks 8-10)
With all VMs migrated, we spent two weeks optimizing the Ceph cluster performance and completing the transition:
# Tuning Ceph OSD para NVMe
ceph config set osd.* bluestore_cache_size_hdd 1073741824
ceph config set osd.* bluestore_cache_size_ssd 3221225472
ceph config set osd.* osd_memory_target 4294967296
# Balanceo de PGs
ceph balancer mode upmap
ceph balancer on
# Ajuste de recovery para no impactar produccion
ceph config set osd osd_recovery_max_active 1
ceph config set osd osd_recovery_sleep_hdd 0.1
ceph config set osd osd_max_backfills 1
# Verificacion de salud
ceph health detail
ceph osd pool stats
In parallel, we trained the client's IT team with 3-day hands-on sessions covering VM management, Ceph, backups, HA, and troubleshooting. We delivered complete documentation of the entire environment and a runbook for incidents. Finally, we decommissioned the ESXi hosts: format, license reclamation, and vCenter retirement.
Results: the numbers speak
| Metric | VMware vSphere | Proxmox + Ceph |
|---|---|---|
| Annual license cost | 350.000 € | 0 € * |
| Support cost (optional) | Included in license | ~12.000 € |
| Migration + optimization cost | - | ~86.000 € |
| Total cost year 1 | 350.000 € | 98.000 € |
| IOPS (4K random read) | ~85.000 | ~98.000 |
| Usable storage capacity | 14.3 TB | 20.1 TB |
| Availability (4 months) | 99.95% | 99.99% |
* Proxmox VE is open source (AGPLv3). Support cost is for optional enterprise support subscription.
Lessons learned: 5 tips for your migration
01 Start with the easy VMs
Test and development VMs are the ideal testing ground. They allow your team to build confidence with the process and detect issues without impact. We found the VirtIO driver issue thanks to migrating a Windows test VM first.
02 Install VirtIO drivers BEFORE migrating
For Windows VMs, mount the VirtIO ISO inside VMware and install disk, network, and ballooning drivers. After that, migration to Proxmox is transparent. If you do it afterwards, you will need to boot in safe mode or with a temporary IDE controller.
03 Do not underestimate networking
80% of post-migration issues we encountered were network-related: incorrect MTU, misconfigured VLANs, bridges without proper tags. Spend time documenting and replicating the EXACT network topology. Test connectivity across all VLANs before migrating the first VM.
04 Ceph needs tuning
Ceph with default configuration works, but does not perform at its best. Adjusting BlueStore cache, PG count per pool, recovery parameters, and journal sizes made a measurable difference. In our case, we gained 23% more IOPS just from tuning.
05 Document EVERYTHING
Every decision, every configuration change, every problem and its solution. The runbook we created is 120 pages and has enabled the client's IT team to resolve incidents autonomously. Documentation is the difference between a successful migration and a slow-motion disaster.
Complete timeline
Weeks 1-2
Audit: inventory of 512 VMs, network maps, criticality classification
Weeks 2-3
Design: 12-node Proxmox cluster, Ceph 3 pools, networking, HA, backup
Weeks 3-4
Pilot: 30 non-critical VMs, VirtIO and MTU issue detection
Weeks 4-8
Mass migration: 50-80 VM batches every Saturday, critical VMs with hot replication
Weeks 8-10
Ceph optimization, IT team training, documentation, ESXi decommission
Your turn
If you are facing a VMware renewal with outrageous prices, or simply want to explore open alternatives with professional support, we can help. We have been deploying and managing Proxmox environments with Ceph in production for companies of all sizes for years. Every project is different, but our accumulated experience allows us to plan migrations with guarantees.
Free assessment of your VMware infrastructure
We analyze your environment, estimate migration costs and timelines, and present a detailed plan. No strings attached.