Part 1 - MongoDB Operations & Performance: Monitoring, Tuning, and Index Optimization

Introduction

Running MongoDB in production is more than just spinning up a database server. As your application scales, you’ll face challenges that require deep operational knowledge: slow queries grinding user experiences to a halt, connection pool exhaustion during traffic spikes, and mysterious memory pressure that seems to appear out of nowhere.

This is the first article in a three-part series on production MongoDB operations. Here, we’ll focus on the daily operational tasks that keep MongoDB running smoothly: monitoring the right metrics, tuning performance characteristics, and optimizing indexes for your workload.

Whether you’re a database administrator managing dozens of clusters, an SRE responsible for system reliability, or a backend engineer trying to understand why your queries are slow, this guide will give you practical, battle-tested approaches to MongoDB operations.

What This Series Covers

Part 1 (this article) focuses on operations and performance: monitoring strategies, performance tuning, and index optimization.

Part 2 will cover high availability and security: replica sets, sharding, security hardening, and user management.

Part 3 will explore production deployment on Kubernetes: StatefulSets, operators, backup automation, and cloud-native patterns.

By the end of this series, you’ll have a comprehensive playbook for running MongoDB reliably at scale.

Monitoring and Alerting

You can’t fix what you can’t see. Effective MongoDB monitoring gives you visibility into your database’s health, helps you identify issues before they become outages, and provides the data you need to optimize performance.

The Metrics That Matter

When monitoring MongoDB, it’s tempting to track everything. But alert fatigue is real. Focus on metrics that indicate actual problems or predict future issues.

Server-Level Metrics:

These metrics show the health of your MongoDB server itself:

  • CPU utilization: Sustained high CPU (>80%) indicates inefficient queries or insufficient capacity
  • Memory usage: MongoDB uses available RAM for caching; watch for memory pressure and page faults
  • Disk I/O: High disk I/O often indicates your working set exceeds RAM
  • Network throughput: Helps identify replication lag or application communication issues
  • Available disk space: Running out of disk will cause MongoDB to refuse writes

Database Metrics:

These metrics show how MongoDB is handling your workload:

  • Operations per second (opcounters): Track inserts, queries, updates, deletes, and commands
  • Active connections: Monitor against your connection limit to prevent exhaustion
  • Queued operations: Read and write queue depth indicates MongoDB can’t keep up with requests
  • Lock percentage: High lock contention slows everything down
  • Page faults: Major page faults mean MongoDB is reading from disk instead of cache
  • Replication lag: For replica sets, how far behind secondaries are from the primary
  • Cache hit ratio: WiredTiger cache hits vs. misses

Query Performance:

These metrics help you identify problematic queries:

  • Slow query log entries: Queries taking longer than your threshold (typically 100ms)
  • Query execution time: How long queries take on average
  • Index usage: Are your queries using indexes or doing full collection scans?
  • Collection scan operations: Full scans are expensive and usually indicate missing indexes

Monitoring with MongoDB Shell

The MongoDB shell provides several commands for real-time monitoring. These are invaluable for troubleshooting live issues.

Server status gives you a comprehensive snapshot:

db.serverStatus()

This returns a massive document with everything from connection counts to WiredTiger cache statistics. You’ll typically query specific subdocuments rather than dumping the entire thing.

Check what’s currently happening:

// View all current operations
db.currentOp()

// View only active queries (most useful for troubleshooting)
db.currentOp({ "active": true })

// Kill a runaway query
db.killOp(12345)  // Replace with actual operation ID

When your database suddenly slows down, currentOp() often reveals the culprit: a poorly-written query scanning millions of documents, or an administrative command blocking other operations.

Database and collection statistics:

// Database-level stats
db.stats()

// Collection-level stats (size, document count, indexes)
db.collection.stats()

// Connection information
db.serverStatus().connections

The profiler captures slow queries for analysis:

// Enable profiling for queries slower than 100ms
db.setProfilingLevel(1, { slowms: 100 })

// View recent slow queries
db.system.profile.find().sort({ ts: -1 }).limit(10).pretty()

// Check profiler status
db.getProfilingStatus()

The profiler is essential for identifying performance problems. When users report “the app is slow,” the profiler often points directly to the offending query.

Command-Line Monitoring Tools

MongoDB ships with two command-line tools that provide real-time monitoring without querying the database directly.

mongostat shows server statistics every second:

mongostat --uri="mongodb://user:pass@host:27017"

# Sample every 5 seconds instead
mongostat --uri="mongodb://user:pass@host:27017" 5

# Monitor all members of a replica set
mongostat --uri="mongodb://user:pass@host:27017/?replicaSet=rs0" --discover

The output shows inserts, queries, updates, deletes, connection count, and more. It’s like top for MongoDB - perfect for watching behavior in real-time.

mongotop tracks which collections are busiest:

mongotop --uri="mongodb://user:pass@host:27017"

# Update every 10 seconds
mongotop --uri="mongodb://user:pass@host:27017" 10

This shows how much time MongoDB spends reading from and writing to each collection. If you’re wondering where your I/O is going, mongotop answers that question.

Production Monitoring with Prometheus and Grafana

For production environments, you need persistent metrics, dashboards, and alerting. Prometheus and Grafana have become the standard open-source stack.

The MongoDB Exporter exposes MongoDB metrics in Prometheus format. Deploy it as a sidecar container or separate service, configure Prometheus to scrape it, and you’ll have historical metrics for trending and alerting.

Pre-built Grafana dashboards show you everything: operation rates, cache hit ratios, replication lag, connection counts, and more. You can set up alerts to page you when things go wrong: replication lag exceeding 30 seconds, connection pool exhaustion, or disk space running low.

Create a monitoring user with appropriate permissions:

db.createUser({
  user: "monitoring",
  pwd: "secure_password",
  roles: [
    { role: "read", db: "admin" },
    { role: "clusterMonitor", db: "admin" },
    { role: "read", db: "local" },
    { role: "read", db: "config" }
  ]
})

Alert Thresholds

Good alerts wake you up for real problems, not false positives. Here are reasonable thresholds based on production experience:

  • High CPU: >80% sustained for 5+ minutes (spikes are normal, sustained load isn’t)
  • High Memory: >85% of available RAM (MongoDB uses all available memory for caching)
  • Replication Lag: >30 seconds (indicates secondary can’t keep up)
  • Connection Limit: >80% of max connections (you’re about to hit the limit)
  • Disk Space: <20% free space (running out of disk causes writes to fail)
  • Page Faults: >100 per second (working set doesn’t fit in RAM)
  • Queue Depth: >50 queued operations (MongoDB can’t keep up with request volume)
  • Slow Queries: >100ms execution time (tune this based on your SLAs)

These thresholds aren’t universal - adjust based on your workload and SLAs. The goal is actionable alerts, not noise.

Performance Tuning

MongoDB’s default configuration works well for many workloads, but production environments often need tuning. Performance optimization involves understanding your workload, identifying bottlenecks, and adjusting configuration accordingly.

WiredTiger Cache: MongoDB’s Memory Layer

MongoDB’s WiredTiger storage engine uses an internal cache to keep frequently-accessed data in memory. This cache is the single most important factor in MongoDB performance.

By default, WiredTiger allocates 50% of RAM minus 1GB (or 256MB, whichever is larger). On a 16GB server, that’s about 7GB for the cache.

Check current cache usage:

db.serverStatus().wiredTiger.cache

Look at these fields:

  • bytes currently in the cache: How much cache is used
  • maximum bytes configured: Your cache size limit
  • pages read into cache: Disk reads (higher = more cache misses)
  • pages written from cache: Disk writes

If your cache is consistently maxed out and you’re seeing high page faults, your working set exceeds available memory. Options include:

  1. Increase RAM (most effective)
  2. Increase cache size (only if you have free RAM)
  3. Optimize indexes (reduce working set size)
  4. Archive old data (reduce active dataset)

Adjust cache size in mongod.conf:

storage:
  wiredTiger:
    engineConfig:
      cacheSizeGB: 8

Don’t allocate all RAM to the cache - leave headroom for the OS, file system cache, and connection overhead. A good rule: use the default calculation unless you have specific reasons to deviate.

Query Performance Analysis

Slow queries are the most common MongoDB performance problem. The explain() method shows you exactly how MongoDB executes a query.

Analyze a query plan:

db.orders.find({
  status: "pending",
  created_at: { $gte: ISODate("2024-01-01") }
}).explain("executionStats")

Focus on these execution stages:

  • COLLSCAN: Full collection scan - reads every document (BAD)
  • IXSCAN: Index scan - uses an index (GOOD)
  • FETCH: Retrieves full documents from collection
  • SORT: In-memory sort - expensive without proper index

If you see COLLSCAN, you probably need an index. If you see SORT without an index, you’re sorting in memory, which is limited to 32MB by default.

Find queries not using indexes:

// Enable slow query logging
db.setProfilingLevel(1, { slowms: 100 })

// Find non-indexed queries
db.system.profile.find({
  "planSummary": { $ne: "IXSCAN" }
}).sort({ ts: -1 }).limit(10)

The profiler combined with explain() gives you everything needed to identify and fix slow queries.

Connection Pooling

Every MongoDB connection consumes server resources. Connection pooling reuses connections across requests, dramatically reducing overhead.

Configure connection pool in your application:

// Node.js example
const client = new MongoClient(uri, {
  maxPoolSize: 100,        // Maximum connections
  minPoolSize: 10,         // Minimum connections
  maxIdleTimeMS: 30000,    // Close idle connections after 30s
  waitQueueTimeoutMS: 5000 // Wait up to 5s for available connection
});

Monitor connection pool usage:

db.serverStatus().connections
// Returns:
// - current: Active connections
// - available: Available slots
// - totalCreated: Connections created since server start

Connection pool exhaustion is a common problem during traffic spikes. Symptoms include slow response times and timeout errors. Solutions:

  1. Increase maxPoolSize (if your server can handle more connections)
  2. Reduce connection idle time (free up connections faster)
  3. Fix connection leaks (ensure connections are properly closed)
  4. Scale horizontally (add read replicas for read-heavy workloads)

Read Preferences and Write Concerns

For replica sets, you can tune the balance between consistency, durability, and performance.

Read preferences determine which replica set members handle reads:

  • primary: All reads from primary (default, strongest consistency)
  • primaryPreferred: Primary if available, otherwise secondary
  • secondary: Read from secondaries only (reduces primary load)
  • secondaryPreferred: Secondary if available, otherwise primary
  • nearest: Lowest network latency (good for geo-distributed apps)
// Route analytics queries to secondaries
db.collection.find().readPref("secondary")

Write concerns balance durability and performance:

// High durability (slower, waits for majority of nodes)
db.collection.insertOne(doc, {
  writeConcern: { w: "majority", j: true }
})

// Fast writes (less durable, only waits for primary)
db.collection.insertOne(doc, {
  writeConcern: { w: 1, j: false }
})

Most applications should use { w: "majority", j: true } for critical data and { w: 1 } for data that can be reconstructed (like cache entries or temporary data).

Aggregation Pipeline Optimization

MongoDB’s aggregation pipeline is powerful but can be slow if not optimized. The key is reducing the number of documents processed as early as possible.

Best practices:

  1. Use $match early: Filter documents before processing
  2. Use $project early: Drop unnecessary fields immediately
  3. Leverage indexes: Ensure $match and $sort can use indexes
  4. Use $limit: Cap result size when appropriate
  5. Avoid $lookup: Denormalize data instead when possible (joins are expensive)
// Well-optimized pipeline
db.orders.aggregate([
  // Filter first (can use index)
  { $match: {
    status: "completed",
    date: { $gte: ISODate("2024-01-01") }
  }},

  // Drop unnecessary fields early
  { $project: {
    customer_id: 1,
    total: 1,
    date: 1
  }},

  // Sort (can use index if one exists)
  { $sort: { total: -1 } },

  // Limit results
  { $limit: 100 }
])

// Check if indexes are used
db.orders.aggregate([...], { explain: true })

The order matters enormously. Moving $match to the beginning can turn a 30-second query into a 100ms query.

Index Management

Indexes are MongoDB’s most powerful performance tool. They’re also easy to misuse. Good index strategy requires understanding your queries, your data distribution, and the trade-offs indexes impose.

Index Types and When to Use Them

Single field indexes are the simplest and most common:

// Index on a single field
db.users.createIndex({ email: 1 })  // Ascending
db.posts.createIndex({ created_at: -1 })  // Descending

Direction matters for sorting: if you always sort by created_at descending, create a descending index.

Compound indexes support queries on multiple fields:

// Supports queries on status, or status+date, or status+date+priority
db.orders.createIndex({
  status: 1,
  date: -1,
  priority: 1
})

Compound index field order is critical (more on this in the ESR rule section).

Text indexes enable full-text search:

db.articles.createIndex({
  title: "text",
  content: "text"
})

// Query with text search
db.articles.find({ $text: { $search: "mongodb performance" }})

Geospatial indexes support location queries:

db.places.createIndex({ location: "2dsphere" })

// Find places within 5km
db.places.find({
  location: {
    $near: {
      $geometry: { type: "Point", coordinates: [-73.97, 40.77] },
      $maxDistance: 5000
    }
  }
})

Unique indexes enforce uniqueness constraints:

db.users.createIndex({ email: 1 }, { unique: true })

Partial indexes index only documents matching a filter:

// Only index pending and processing orders
db.orders.createIndex(
  { status: 1, date: 1 },
  {
    partialFilterExpression: {
      status: { $in: ["pending", "processing"] }
    }
  }
)

Partial indexes save space and improve write performance by only indexing the subset of documents you actually query.

TTL indexes automatically delete old documents:

// Delete sessions after 1 hour
db.sessions.createIndex(
  { created_at: 1 },
  { expireAfterSeconds: 3600 }
)

Perfect for temporary data like sessions, logs, or cache entries.

The ESR Rule: Equality, Sort, Range

When creating compound indexes, field order dramatically affects performance. The ESR rule provides a simple guideline:

  1. Equality fields first (exact matches)
  2. Sort fields second
  3. Range fields last (>, <, $gte, $lte, $ne, $in)

Example: For this query:

db.orders.find({
  status: "completed",           // Equality
  date: { $gte: ISODate("2024-01-01") }  // Range
}).sort({ priority: -1 })        // Sort

Optimal index:

db.orders.createIndex({
  status: 1,    // Equality (narrows documents first)
  priority: -1, // Sort (MongoDB can walk index in sorted order)
  date: -1      // Range (last because it spans multiple values)
})

Why does order matter? MongoDB scans the index from left to right. Equality fields narrow the search space maximally. Sort fields let MongoDB return sorted results without an in-memory sort. Range fields come last because they span multiple index entries.

Index Selectivity and Cardinality

Not all fields make good indexes. Cardinality (number of unique values) and selectivity (how much an index narrows results) determine index effectiveness.

High cardinality = good index candidate:

  • user_id: Millions of unique values
  • email: Unique per user
  • transaction_id: Unique per transaction

Low cardinality = poor index candidate:

  • status: 3-4 values (pending, active, completed, cancelled)
  • is_active: 2 values (true/false)
  • country: ~200 values

Low cardinality fields can still be useful in compound indexes, especially as the first field if your queries always filter on them. But a standalone index on is_active provides minimal benefit.

Monitoring and Maintaining Indexes

List all indexes:

db.collection.getIndexes()

Check index usage statistics:

// See how often each index is used
db.collection.aggregate([{ $indexStats: {} }])

// Find unused indexes
db.collection.aggregate([
  { $indexStats: {} },
  { $match: { "accesses.ops": { $lt: 10 } } }
])

Unused indexes waste space and slow down writes (every write updates all indexes). Drop them.

Drop an unused index:

db.collection.dropIndex("index_name")

Check index sizes:

// Index sizes for a collection
db.collection.stats().indexSizes

// Total index size for entire database
db.stats().indexSize

If your indexes exceed available RAM, MongoDB can’t keep them all in memory. This causes index scans to hit disk, dramatically slowing queries. Solutions:

  1. Drop unused indexes
  2. Use partial indexes (index only what you need)
  3. Shard the collection (distribute indexes across servers)
  4. Add more RAM (always the best solution if possible)

Create indexes in the background:

// Doesn't block reads/writes during index build
db.collection.createIndex({ field: 1 }, { background: true })

Note: As of MongoDB 4.2, background index builds are the default. Foreground builds are faster but block all operations on the database.

Conclusion and Next Steps

Effective MongoDB operations require continuous monitoring, performance analysis, and index optimization. The techniques covered here - monitoring the right metrics, tuning cache and connections, analyzing query performance, and building smart indexes - form the foundation of reliable MongoDB operations.

Key takeaways:

  • Monitor what matters: Focus on actionable metrics (replication lag, slow queries, connection exhaustion)
  • Understand your queries: Use explain() and the profiler to identify bottlenecks
  • Index strategically: Follow the ESR rule, monitor index usage, drop unused indexes
  • Cache is king: Ensure your working set fits in WiredTiger cache
  • Tune for your workload: Balance consistency and performance with read preferences and write concerns

In Part 2, we’ll dive into high availability and security: configuring replica sets for failover, implementing sharding for horizontal scale, hardening security, and managing users with role-based access control.

Until then, start monitoring your MongoDB instances, analyze your slow queries, and review your indexes. The data will tell you where to focus your optimization efforts.

Additional Resources

Kevin Duane

Kevin Duane

Cloud architect and developer sharing practical solutions.