Part 3 - Production MongoDB on Kubernetes: StatefulSets, Backups, and Cloud-Native Patterns

Introduction

Kubernetes has become the standard platform for running containerized workloads. But stateful applications like databases present unique challenges: they need persistent storage, stable network identities, and careful orchestration during updates and failures.

This is Part 3 (the final installment) of our MongoDB operations series. We’ve covered operational fundamentals (Part 1) and high availability with security (Part 2). Now we’re bringing it all together for production deployment on Kubernetes.

Running databases on Kubernetes isn’t just about writing YAML files. It’s about understanding StatefulSets, managing persistent storage, automating backups, and building systems that survive node failures and cluster updates. Let’s dive into production-ready MongoDB on Kubernetes.

Why Kubernetes for MongoDB?

Before we get into implementation details, let’s address the fundamental question: should you run MongoDB on Kubernetes?

Benefits:

  • Unified platform: Manage databases alongside applications
  • Resource efficiency: Better bin-packing and resource utilization
  • Automation: Operators handle routine tasks (upgrades, scaling, backups)
  • Portability: Run same configuration across clouds and on-premises
  • Self-healing: Automatic Pod restart on failures
  • Declarative config: Version-controlled infrastructure as code

Challenges:

  • Operational complexity: Kubernetes adds another layer to troubleshoot
  • Storage performance: Network-attached storage slower than local disks
  • Noisy neighbors: Resource contention from other workloads
  • Upgrade coordination: Both MongoDB and Kubernetes need careful upgrades

The decision depends on your team’s Kubernetes expertise and operational requirements. If you’re already running applications on Kubernetes and have strong Kubernetes operations, running MongoDB there makes sense. If you’re new to Kubernetes, managed services (MongoDB Atlas, AWS DocumentDB) might be better starting points.

StatefulSets: The Foundation

Kubernetes Deployments work great for stateless applications. But databases need stable network identities and persistent storage. That’s what StatefulSets provide.

StatefulSet Key Features

Stable network identities: Each Pod gets a predictable hostname (mongodb-0, mongodb-1, mongodb-2) that persists across rescheduling. This is critical for replica sets where members refer to each other by hostname.

Ordered deployment and scaling: Pods start in order (0, then 1, then 2). MongoDB replica set initialization depends on this: we initialize the replica set after all members are running.

Persistent storage: Each Pod gets its own PersistentVolumeClaim that follows the Pod across rescheduling. Your data survives node failures.

Production StatefulSet Configuration

Here’s a production-ready StatefulSet for a 3-node MongoDB replica set:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mongodb
spec:
  serviceName: "mongodb-service"
  replicas: 3
  selector:
    matchLabels:
      app: mongodb
  template:
    metadata:
      labels:
        app: mongodb
    spec:
      terminationGracePeriodSeconds: 30
      containers:
      - name: mongodb
        image: mongo:7.0
        command:
        - mongod
        - "--replSet"
        - "rs0"
        - "--bind_ip_all"
        ports:
        - containerPort: 27017
          name: mongodb
        env:
        - name: MONGO_INITDB_ROOT_USERNAME
          valueFrom:
            secretKeyRef:
              name: mongodb-secret
              key: username
        - name: MONGO_INITDB_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mongodb-secret
              key: password
        volumeMounts:
        - name: mongodb-data
          mountPath: /data/db
        resources:
          requests:
            cpu: "1000m"
            memory: "2Gi"
          limits:
            cpu: "2000m"
            memory: "4Gi"
        livenessProbe:
          exec:
            command:
            - mongosh
            - --eval
            - "db.adminCommand('ping')"
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
        readinessProbe:
          exec:
            command:
            - mongosh
            - --eval
            - "db.adminCommand({ replSetGetStatus: 1 })"
          initialDelaySeconds: 30
          periodSeconds: 10
  volumeClaimTemplates:
  - metadata:
      name: mongodb-data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: "fast-ssd"
      resources:
        requests:
          storage: 100Gi

Key configuration decisions:

  • Resource limits: Set based on your workload. MongoDB uses available RAM for caching, so memory limits are critical.
  • Storage size: Start with 2-3x your current data size to allow for growth and indexes.
  • StorageClass: Use SSD-backed storage (gp3 on AWS, pd-ssd on GCP) for production.
  • Health checks: Liveness probe restarts unhealthy Pods; readiness probe controls traffic routing.

Initializing the Replica Set

After deploying the StatefulSet, initialize the replica set:

# Connect to the first Pod
kubectl exec -it mongodb-0 -- mongosh -u admin -p password

# Initialize replica set
rs.initiate({
  _id: "rs0",
  members: [
    { _id: 0, host: "mongodb-0.mongodb-service:27017" },
    { _id: 1, host: "mongodb-1.mongodb-service:27017" },
    { _id: 2, host: "mongodb-2.mongodb-service:27017" }
  ]
})

# Verify status
rs.status()

The hostname format is {pod-name}.{service-name}:{port}. StatefulSets use a headless service (with clusterIP: None) for Pod DNS.

Headless Service for Discovery

StatefulSets require a headless service for network identity:

apiVersion: v1
kind: Service
metadata:
  name: mongodb-service
spec:
  clusterIP: None
  selector:
    app: mongodb
  ports:
  - port: 27017
    targetPort: 27017

This creates DNS entries for each Pod: mongodb-0.mongodb-service.default.svc.cluster.local.

Applications connect via the replica set connection string:

mongodb://mongodb-0.mongodb-service:27017,mongodb-1.mongodb-service:27017,mongodb-2.mongodb-service:27017/?replicaSet=rs0

MongoDB Operators: Automated Operations

Managing StatefulSets manually works, but it’s tedious. MongoDB operators automate common tasks: replica set configuration, automated upgrades, backup scheduling, and self-healing.

MongoDB Community Operator

The MongoDB Community Kubernetes Operator provides a Kubernetes-native way to manage MongoDB.

Install via Helm:

helm repo add mongodb https://mongodb.github.io/helm-charts
helm install community-operator mongodb/community-operator

Deploy a replica set:

apiVersion: mongodbcommunity.mongodb.com/v1
kind: MongoDBCommunity
metadata:
  name: mongodb-replica-set
spec:
  members: 3
  type: ReplicaSet
  version: "7.0.0"
  security:
    authentication:
      modes: ["SCRAM"]
  users:
    - name: admin
      db: admin
      passwordSecretRef:
        name: mongodb-admin-password
      roles:
        - name: clusterAdmin
          db: admin
        - name: userAdminAnyDatabase
          db: admin
      scramCredentialsSecretName: admin-scram
  statefulSet:
    spec:
      volumeClaimTemplates:
        - metadata:
            name: data-volume
          spec:
            accessModes: ["ReadWriteOnce"]
            resources:
              requests:
                storage: 100Gi
            storageClassName: fast-ssd

The operator handles:

  • Replica set initialization and configuration
  • Automatic user creation with proper authentication
  • TLS certificate management
  • Rolling upgrades without downtime
  • Self-healing when Pods fail

Benefits of operators:

  1. Declarative management: Describe desired state, operator maintains it
  2. Best practices built-in: Operators encode MongoDB expertise
  3. Automated upgrades: Change version number, operator handles rolling upgrade
  4. Reduced operational burden: Less manual intervention required

When to use operators vs. manual StatefulSets:

  • Operators: Production systems, teams familiar with Kubernetes patterns, need automated operations
  • Manual StatefulSets: Learning/development, custom configurations not supported by operators, compliance requirements for manual control

MongoDB Enterprise Operator

For Enterprise features (Ops Manager integration, authentication with LDAP, automated backups to S3), use the MongoDB Enterprise Operator.

It provides everything Community Operator does plus:

  • Integration with MongoDB Ops Manager/Cloud Manager
  • Automated backup scheduling
  • LDAP authentication
  • Encryption key management
  • Advanced monitoring integration

Storage: The Critical Component

Storage is the most important aspect of running databases on Kubernetes. Poor storage choices lead to data loss and terrible performance.

StorageClass Configuration

Define a StorageClass optimized for MongoDB:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: mongodb-storage
provisioner: kubernetes.io/aws-ebs  # AWS example
parameters:
  type: gp3
  iopsPerGB: "10"
  fsType: ext4
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer

Key parameters:

  • provisioner: Cloud provider’s storage provisioner (AWS EBS, GCP Persistent Disk, Azure Disk)
  • type: Storage type (gp3 for AWS, pd-ssd for GCP)
  • allowVolumeExpansion: Enables resizing volumes without recreating Pods
  • volumeBindingMode: WaitForFirstConsumer ensures volumes are provisioned in the same zone as Pods

Storage Performance Considerations

MongoDB is I/O intensive. Storage performance directly impacts database performance.

IOPS requirements:

  • Development: 100-500 IOPS sufficient
  • Production: 3,000+ IOPS recommended
  • High-performance: 10,000+ IOPS for write-heavy workloads

Cloud provider options:

  • AWS: Use gp3 (general purpose) or io2 (provisioned IOPS)
  • GCP: Use pd-ssd or pd-extreme
  • Azure: Premium SSD or Ultra Disk

Local storage for maximum performance:

For the absolute best performance, use local SSDs. But this comes with caveats: data doesn’t survive node replacement, and you need robust backup strategies.

volumes:
- name: mongodb-data
  hostPath:
    path: /mnt/local-ssd/mongodb
    type: DirectoryOrCreate

Only use local storage if you understand the trade-offs and have automated backup/restore.

Volume Snapshots for Backups

Kubernetes VolumeSnapshot resources enable point-in-time backups:

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: mongodb-snapshot-20250119
spec:
  volumeSnapshotClassName: csi-snapclass
  source:
    persistentVolumeClaimName: mongodb-data-mongodb-0

Volume snapshots are:

  • Fast: Snapshot in seconds regardless of data size
  • Space-efficient: Copy-on-write, don’t duplicate unchanged data
  • Cloud-native: Integrated with cloud provider snapshot features

But they’re not a replacement for logical backups (mongodump). Use both: volume snapshots for fast recovery, mongodump for portability.

Automated Backups with CronJobs

Backups are non-negotiable for production databases. Kubernetes CronJobs automate backup scheduling.

Daily Backup CronJob

apiVersion: batch/v1
kind: CronJob
metadata:
  name: mongodb-backup
spec:
  schedule: "0 2 * * *"  # Daily at 2 AM
  successfulJobsHistoryLimit: 7
  failedJobsHistoryLimit: 3
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: mongodump
            image: mongo:7.0
            command: ["/bin/bash"]
            args:
            - -c
            - |
              BACKUP_DATE=$(date +%Y%m%d-%H%M%S)
              echo "Starting backup: ${BACKUP_DATE}"

              mongodump \
                --uri="${MONGODB_URI}" \
                --gzip \
                --out=/backup/${BACKUP_DATE}

              if [ $? -eq 0 ]; then
                echo "Backup completed successfully"

                # Cleanup old backups (keep last 30 days)
                find /backup -type d -mtime +30 -exec rm -rf {} +
              else
                echo "Backup failed"
                exit 1
              fi
            env:
            - name: MONGODB_URI
              valueFrom:
                secretKeyRef:
                  name: mongodb-backup-secret
                  key: connection-string
            volumeMounts:
            - name: backup-storage
              mountPath: /backup
            resources:
              requests:
                memory: "512Mi"
                cpu: "500m"
              limits:
                memory: "2Gi"
                cpu: "1000m"
          volumes:
          - name: backup-storage
            persistentVolumeClaim:
              claimName: mongodb-backup-pvc
          restartPolicy: OnFailure

Configuration highlights:

  • Schedule: Cron syntax (daily at 2 AM here)
  • History limits: Keep recent job logs for troubleshooting
  • Compression: --gzip reduces storage by 5-10x
  • Cleanup: Automated deletion of old backups
  • Resource limits: Prevent backup jobs from consuming all cluster resources

Backup to S3

For offsite backups, upload to S3:

# Add to CronJob args
aws s3 cp /backup/${BACKUP_DATE} \
  s3://my-mongodb-backups/backups/ \
  --recursive

Requires AWS credentials in the Secret or IAM roles for service accounts (IRSA on AWS EKS).

Testing Restore Procedures

Backups are worthless if you can’t restore. Test quarterly:

  1. Create a restore Job from a recent backup
  2. Restore to a temporary namespace
  3. Verify data integrity
  4. Document recovery time
  5. Clean up test environment

Restore Job template:

apiVersion: batch/v1
kind: Job
metadata:
  name: mongodb-restore-test
spec:
  template:
    spec:
      containers:
      - name: mongorestore
        image: mongo:7.0
        command: ["/bin/bash"]
        args:
        - -c
        - |
          mongorestore \
            --uri="${MONGODB_URI}" \
            --gzip \
            --drop \
            /backup/20250119-020000
        env:
        - name: MONGODB_URI
          valueFrom:
            secretKeyRef:
              name: mongodb-restore-secret
              key: connection-string
        volumeMounts:
        - name: backup-storage
          mountPath: /backup
      volumes:
      - name: backup-storage
        persistentVolumeClaim:
          claimName: mongodb-backup-pvc
      restartPolicy: OnFailure

Production Best Practices

Production MongoDB on Kubernetes requires attention to reliability, security, and performance.

High Availability Configuration

Pod anti-affinity prevents co-location of replica set members:

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchExpressions:
        - key: app
          operator: In
          values:
          - mongodb
      topologyKey: kubernetes.io/hostname

This ensures each replica runs on a different node. Node failure affects only one replica.

Pod Disruption Budgets prevent voluntary disruptions (like node drains) from breaking quorum:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: mongodb-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: mongodb

With 3 replicas and minAvailable: 2, Kubernetes won’t drain a node if it would reduce available replicas below 2.

Spread across availability zones:

Use topology spread constraints to distribute Pods across zones:

topologySpreadConstraints:
- maxSkew: 1
  topologyKey: topology.kubernetes.io/zone
  whenUnsatisfiable: DoNotSchedule
  labelSelector:
    matchLabels:
      app: mongodb

This tolerates zone failures in multi-zone clusters.

Security Hardening

Run as non-root user:

securityContext:
  runAsUser: 999
  runAsGroup: 999
  fsGroup: 999
  runAsNonRoot: true

Use Secrets for credentials:

apiVersion: v1
kind: Secret
metadata:
  name: mongodb-secret
type: Opaque
stringData:
  username: admin
  password: secureRandomPassword123
  connection-string: mongodb://admin:secureRandomPassword123@mongodb-service:27017

Never commit secrets to version control. Use sealed secrets, external secrets operators, or HashiCorp Vault.

Enable TLS:

command:
- mongod
- --tlsMode=requireTLS
- --tlsCertificateKeyFile=/etc/mongodb/tls/mongodb.pem
- --tlsCAFile=/etc/mongodb/tls/ca.pem
volumeMounts:
- name: tls-certs
  mountPath: /etc/mongodb/tls
  readOnly: true

TLS prevents eavesdropping and man-in-the-middle attacks.

Network policies restrict access to MongoDB:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: mongodb-network-policy
spec:
  podSelector:
    matchLabels:
      app: mongodb
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: backend-api
    ports:
    - protocol: TCP
      port: 27017

Only Pods with app: backend-api label can connect to MongoDB.

Resource Management

Set appropriate resource requests and limits:

resources:
  requests:
    memory: "2Gi"
    cpu: "1000m"
  limits:
    memory: "4Gi"
    cpu: "2000m"

Requests guarantee resources; limits prevent runaway processes from consuming all node resources.

Configure WiredTiger cache size:

MongoDB defaults to 50% of RAM minus 1GB. With a 4GB limit:

command:
- mongod
- --wiredTigerCacheSizeGB=1.5

Cache = (4GB - 1GB) × 0.5 = 1.5GB.

Monitoring and Observability

Prometheus MongoDB Exporter exposes metrics:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mongodb-exporter
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mongodb-exporter
  template:
    metadata:
      labels:
        app: mongodb-exporter
    spec:
      containers:
      - name: mongodb-exporter
        image: percona/mongodb_exporter:0.40
        args:
        - --mongodb.uri=$(MONGODB_URI)
        - --collect-all
        ports:
        - containerPort: 9216
          name: metrics
        env:
        - name: MONGODB_URI
          valueFrom:
            secretKeyRef:
              name: mongodb-exporter-secret
              key: uri

Configure Prometheus to scrape the exporter, then create Grafana dashboards for visualization.

Key metrics to monitor:

  • Replica set member health
  • Replication lag
  • Connection count
  • Cache hit ratio
  • Disk utilization
  • Query performance

Set up alerts for:

  • Replica set member down
  • Replication lag > 30 seconds
  • Backup job failures
  • Disk usage > 80%
  • High connection count (> 80% of max)

Troubleshooting Common Issues

Pod stuck in Pending:

  • Check PVC status: kubectl get pvc
  • Check storage quota
  • Verify StorageClass exists

Replica set won’t initialize:

  • Check Pod logs: kubectl logs mongodb-0
  • Verify network connectivity between Pods
  • Ensure headless service exists

Performance degradation:

  • Check resource limits aren’t being hit
  • Monitor disk I/O with node metrics
  • Check for noisy neighbor Pods
  • Review query patterns with slow query log

Backup job failing:

  • Check Secret exists with correct credentials
  • Verify PVC has sufficient space
  • Check backup Pod logs
  • Test manual mongodump command

Conclusion: Production-Ready MongoDB

Running MongoDB on Kubernetes combines the flexibility of container orchestration with the reliability requirements of stateful applications. It’s not trivial, but with proper configuration, it’s entirely viable for production.

Key takeaways from this series:

Part 1 taught us operations fundamentals: monitoring metrics that matter, tuning performance characteristics, and optimizing indexes strategically.

Part 2 covered reliability and security: building replica sets for high availability, choosing good shard keys, implementing authentication and encryption, and managing users with RBAC.

Part 3 (this article) brought it together on Kubernetes: StatefulSets for stable identities and storage, operators for automation, automated backups, and production hardening.

Production Readiness Checklist

  • ✅ Three-node replica set with anti-affinity
  • ✅ SSD-backed persistent storage
  • ✅ Resource requests and limits configured
  • ✅ Health checks (liveness and readiness probes)
  • ✅ Pod Disruption Budget to prevent quorum loss
  • ✅ Secrets management for credentials
  • ✅ TLS enabled for connections
  • ✅ Network policies restricting access
  • ✅ Automated daily backups with CronJob
  • ✅ Tested restore procedures
  • ✅ Monitoring with Prometheus + Grafana
  • ✅ Alerts configured for critical conditions
  • ✅ Documentation for operations team

MongoDB on Kubernetes is a powerful combination when done right. Follow these patterns, test thoroughly, and you’ll build reliable database infrastructure that scales with your business.

Additional Resources

Comments

Kevin Duane

Kevin Duane

Cloud architect and developer sharing practical solutions.