Cluster Backups with Velero and CloudCasa

January 10, 2026•3 min read

kubernetesvelerobackupsdisaster-recoverycloudcasa

Running production workloads without backups is asking for trouble. Today I set up Velero for Kubernetes-native backups with CloudCasa for managed orchestration.

Why Velero?

Velero (formerly Heptio Ark) is the standard for Kubernetes backups:

Cluster state backups - All resources, not just data
Volume snapshots - PersistentVolume data
Scheduled backups - Cron-based automation
Disaster recovery - Restore to same or different cluster
Migration - Move workloads between clusters

Installing Velero

First, install the Velero CLI:

wget https://github.com/vmware-tanzu/velero/releases/download/v1.13.0/velero-v1.13.0-linux-amd64.tar.gz
tar -xvf velero-v1.13.0-linux-amd64.tar.gz
sudo mv velero-v1.13.0-linux-amd64/velero /usr/local/bin/

I'm using Backblaze B2 as my backup destination (S3-compatible, cheap):

velero install \
  --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.9.0 \
  --bucket k8s-backups \
  --secret-file ./credentials-velero \
  --backup-location-config \
    region=us-west-004,s3ForcePathStyle=true,s3Url=https://s3.us-west-004.backblazeb2.com \
  --use-node-agent \
  --default-volumes-to-fs-backup

The credentials file:

[default]
aws_access_key_id=<backblaze-key-id>
aws_secret_access_key=<backblaze-app-key>

Backup Configuration

I created scheduled backups for each critical namespace:

apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: daily-default-backup
  namespace: velero
spec:
  schedule: "0 2 * * *"  # 2 AM daily
  template:
    includedNamespaces:
      - default
    includeClusterResources: true
    storageLocation: default
    volumeSnapshotLocations:
      - default
    ttl: 720h  # Keep for 30 days

Similar schedules for other namespaces:

# monitoring namespace - weekly (Prometheus data is reproducible)
# argocd namespace - daily (GitOps state)
# microservices namespace - daily (Kafka, Redis, MongoDB)

CloudCasa Integration

While Velero handles the heavy lifting, CloudCasa provides a management layer:

Dashboard - Visual backup status across clusters
Alerting - Notifications on backup failures
Policy management - Centralized backup policies
Cross-cluster visibility - If I add more clusters later

Install the CloudCasa agent:

kubectl apply -f https://cloudcasa.io/download/kubeagent.yaml

Testing Restores

Backups are worthless if you can't restore. I tested by:

Creating a test namespace with a deployment and PVC
Backing it up with Velero
Deleting everything in the namespace
Restoring from backup

# Create backup
velero backup create test-backup --include-namespaces test-ns

# Simulate disaster
kubectl delete namespace test-ns

# Restore
velero restore create --from-backup test-backup

The restore brought back:

✅ Namespace
✅ Deployments
✅ Services
✅ ConfigMaps and Secrets
✅ PersistentVolumeClaims
✅ PersistentVolume data (via file system backup)

Kopia for File-Level Backups

Velero uses Kopia under the hood for file-system backups. I see Kopia maintenance jobs running:

$ kubectl get pods -n velero | grep kopia
argocd-default-kopia-maintain-job-xxx     Completed
default-default-kopia-maintain-job-xxx    Completed
monitoring-default-kopia-maintain-job-xxx Completed

These handle deduplication and garbage collection of backup data.

Database Considerations

For databases (MongoDB, PostgreSQL, Redis), I have two approaches:

Velero PVC snapshots - Captures the entire volume
Application-level backups - mongodump, pg_dump scheduled separately

Velero PVC snapshots might capture inconsistent state if the DB is writing during backup. For critical data, I run application-level dumps to a separate PVC, then Velero backs that up.

Current Backup Schedule

Namespace	Frequency	Retention
default	Daily 2 AM	30 days
argocd	Daily 2 AM	30 days
monitoring	Weekly Sunday	90 days
microservices	Daily 2 AM	30 days
vertex-platform	Daily 3 AM	30 days

Total storage used in Backblaze: ~15 GB (with deduplication)

Monitoring Backup Health

I added Prometheus alerts for backup failures:

groups:
  - name: velero
    rules:
      - alert: VeleroBackupFailed
        expr: velero_backup_failure_total > 0
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "Velero backup failed"

CloudCasa also sends email alerts if backups miss their schedule.

Lessons Learned

Test restores regularly - A backup you haven't tested isn't a backup
Consider application consistency - Volume snapshots might catch databases mid-write
Monitor backup storage - Costs can grow if retention is too long
Document restore procedures - When disaster strikes, you won't remember the commands

Documenting the evolution of my homelab infrastructure.