Optimizing MongoDB Indexes

Good indexes are an important part running a well performing application on MongoDB. MongoDB performs best when it can keep your indexes in RAM. Reducing the size of your indexes also leads to faster queries and the ability to manage more data with less RAM.

These are a few tips to reduce the size of your MongoDB indexes:

1) Determining Indexes Sizes

The first thing you should do is to understand the size of your indexes. You want to know the sizes before you make changes to confirm that the changes have actually reduced the size. Ideally, you are graphing your indexes over time with your monitoring tools.

Using the mongo shell you can run db.stats() to get database indexes stats:

> db.stats()
	"db" : "examples1",
	"collections" : 6,
	"objects" : 403787,
	"avgObjSize" : 121.9966467469235,
	"dataSize" : 49260660,
	"storageSize" : 66695168,
	"numExtents" : 20,
	"indexes" : 9,
	"indexSize" : 48524560,
	"fileSize" : 520093696,
	"nsSizeMB" : 16,
	"ok" : 1
  • indexes - The number of indexes in examples1 DB
  • indexSize - The size of the indexes in example1 DB

Since each collection has indexes, you can run db.collection.stats() to see them:

> db.address.stats()
	"ns" : "examples1.address",
	"count" : 3,
	"size" : 276,
	"avgObjSize" : 92,
	"storageSize" : 8192,
	"numExtents" : 1,
	"nindexes" : 2,
	"lastExtentSize" : 8192,
	"paddingFactor" : 1,
	"flags" : 1,
	"totalIndexSize" : 16352,
	"indexSizes" : {
		"_id_" : 8176,
		"_types_1" : 8176
	"ok" : 1
  • totalIndexSize - The size of all indexes in the collection
  • indexSizes - A dictionary of index name and size

NOTE: all sizes returned by these commands are in bytes.

These commands are useful but they are tedious to use manually. To report on indexes stats, I wrote a utility, index-stats.py, that can be found in the mongodb-tools project on Github that makes things easier.

(virtualenv) mongodb-tools$ ./index-stats.py
Checking DB: examples2.system.indexes
Checking DB: examples2.things
Checking DB: examples1.system.indexes
Checking DB: examples1.address
Checking DB: examples1.typeless_address
Checking DB: examples1.user
Checking DB: examples1.typeless_user

Index Overview
|         Collection         |         Index          | % Size | Index Size |
| examples1.address          | _id_                   |   0.0% |      7.98K |
| examples1.address          | _types_1               |   0.0% |      7.98K |
| examples1.typeless_address | _id_                   |   0.0% |      7.98K |
| examples1.typeless_user    | _id_                   |  10.1% |      6.21M |
| examples1.typeless_user    | address_id_1           |  10.1% |      6.21M |
| examples1.typeless_user    | typeless_address_ref_1 |   5.9% |      3.62M |
| examples1.user             | _id_                   |  10.1% |      6.21M |
| examples1.user             | _types_1               |   6.9% |      4.24M |
| examples1.user             | _types_1_address_id_1  |  12.2% |      7.51M |
| examples1.user             | _types_1_address_ref_1 |  26.2% |     16.09M |
| examples2.things           | _id_                   |  10.1% |      6.21M |
| examples2.things           | _types_1               |   8.4% |      5.13M |

Top 5 Largest Indexes
|        Collection       |         Index          | % Size | Index Size |
| examples1.user          | _types_1_address_ref_1 |  26.2% |     16.09M |
| examples1.user          | _types_1_address_id_1  |  12.2% |      7.51M |
| examples1.typeless_user | _id_                   |  10.1% |      6.21M |
| examples2.things        | _types_1               |   8.4% |      5.13M |
| examples1.user          | _types_1               |   6.9% |      4.24M |

Total Documents: 600016
Total Data Size: 74.77M
Total Index Size: 61.43M
RAM Headroom: 2.84G
Available RAM Headroom: 1.04G

The output shows the total index size, each index size, and their relative sizes to each other. In addition, the Top 5 Largest indexes are reported across all your collections. This makes it easy to determine your largest indexes and the ones where reducing their size will provide most benefit.

  • RAM Headroom is your physical memory - index size. A positive value means you have RAM available for indexes to fit in memory.
  • Available RAM Headroom is free memory - index size. Since other processes consume memory on this system, I don’t have the total RAM Headroom available.

The RAM Headroom stat idea comes from the MongoDB monitoring service I use, ServerDensity.

From this output, I would focus on the examples1.user collection and the types_1_address_ref_1 and types_1_address_id_1 indexes first.

2) Remove Redundant Indexes

If you have been releasing code changes over a period of time, you’ll likely end up with redundant indexes. MongoDB can use the prefix of a compound index if all the component parts are not available. In the previous output,

| examples1.user          | _types_1               |   6.9% |      4.24M |

is redundant with

| examples1.user          | _types_1_address_ref_1 |  26.2% |     16.09M |
| examples1.user          | _types_1_address_id_1  |  12.2% |      7.51M |

Because _types_1 is the prefix to these two indexes. Dropping it would save 4.2M on the total index size and be one less index to update when user documents change.

To make it easier to find these indexes, you can run redundant-indexes.py from mongodb-tools:

(virtualenv)mongodb-tools$ ./redundant-indexes.py
Checking DB: examples2
Checking DB: examples1
Index examples1.user[_types_1] may be redundant with examples1.user[_types_1_address_ref_1]
Index examples1.user[_types_1] may be redundant with examples1.user[_types_1_address_id_1]
Checking DB: local

3) Compact Command

If you are running MongoDB 2.0+, you can run the compact command to defragment your collections and rebuild the indexes. The compact command locks the database so make sure you know where you are running it beforehand. If you are running with replica sets, the easiest thing to do is to run it on your secondaries, one at a time, fail-over the primary to new secondary and run compact on the old primary.

4) MongoDB 2.0 Index Improvements

If you are not running MongoDB 2.0 or later, upgrading and rebuilding your indexes should provide about a 25% savings.

See Index Performance Enhancements

5) Check Index Criteria

Another thing to check is your index criteria. You want the values that are indexed to be small and as selective as possible. Indexing values that do not help MongoDB find your data faster slow queries down and increase the index size. If you are using a mapping framework for your application, and it support defining indexes in the code, you should check to see what it’s actually indexing. For example MongoEngine for Python uses a “_types” field to identify subclasses in the same collection. This can add a lot of space and may not add to the selectivity of you indexes.

In my test data, my largest index is:

| examples1.user             | _types_1_address_ref_1 |  26.2% |     16.09M |

Looking at the data for it:

> db.user.findOne()
	"_id" : ObjectId("4f2ef95c89a40a11c5000002"),
	"_types" : [
	"address_id" : ObjectId("4f2ef95c89a40a11c5000000"),
	"address_ref" : {
		"$ref" : "address",
		"$id" : ObjectId("4f2ef95c89a40a11c5000000")
	"_cls" : "User"

You can see that _types is an array with a value of User, the class name. Since I don’t have any subclasses of User in my code, indexing this value does not help the index selectivity. Another way of thinking about this is that each value in the index is going to have “User” as a prefix which adds a few extra bytes for value and does not increase the selectivity of the index.

Removing it in the code with:

class User(Document):
    meta {'index_types':False}

Changes the index to:

| examples1.user             | address_ref_1          |  16.8% |     12.39M |

About a 23% savings.

Digging in further, address_ref_1 is a ReferenceProperty to an Address object. The data above shows that it is a dictionary that contains the id of the reference field as well as the collection that it points to. If we change this ReferenceProperty to an ObjectIdProperty, which is what address_id, is, you can get additional savings:

| examples1.user             | address_id_1           |   9.5% |      6.21M |
| examples1.user             | address_ref_1          |  20.9% |     13.70M |

About a 53% savings. This changes the index value from being stored as a serialized dictionary to just and ObjectId which is likely highly optimized with MongoDB. Changing the property type does require code changes though and you also lose the automatic de-referencing capability provided by ReferenceProperties. It can produce significant savings though.

In total, we’ve reduced the original index by 61% by adjusting some index criteria and making some small code changes.

6) Delete/Move Old Data

In most applications, some data is accessed more frequently than others. If you have old data that won’t be accessed by your users, you may be able to purge it, move it to another un-indexed collection, or archive it somewhere outside of the DB. Ideally, you database contains and is indexing the working set of available data.

There are some other good optimization ideas that can be found here:

How do you tune your indexes?

comments powered by Disqus