LANNN – MongoDB on AWS – mongo on single EC2 instance

wp-content/uploads/2013/04/AWS-nginx-node-mongo-300x212.jpg

How to set up MongoDB on single EC2 instance? This article is a guide line for adding mongo to LANNN stack – explaining how to install, configure and safely run MongoDB on small production servers. Continue reading to find out about more pitfalls and important pros and cons.

Installing and testing MongoDB

This guide will take you through mongo installation on AWS Linux Ami (CentOS). If you are running Ubuntu or other distro check out official mongo installation manual and com back when here after server is up and running.

 

CentOS

Add repository file

Copy below settings and paste it into your .repo  file

Now you can install mongo

Ubuntu

First issue 10gen key as shown in official documentation

Next add repository file

And paste into it below line

Update repository

And now you can install mongo

 

Once all is done check version with

Single EC2 configuration

On CentOS database file is located by default under dbpath=/var/lib/mongo. To change that create new folder for stored data and move there all mongo files (remember to stop the service 1st if you running it already)

Now you could either start MongoDB with additional parameters

or change that settings editing /etc/mongo.conf and then start service with service mongod start  which is equivalent to:

To make sure mongod service will auto restart after reboot execute

CentOS

Ubuntu

 

Reducing disk space

By default mongo will create 3 files with total size of 3 GB (1GB per file).

 If you running using small EC2 instance just to test your idea or for quick prototype and mongo runs along with other services, this allocation will shrink available space quite dramatically. To change that you can use smallfiles option in your mongo.conf file.

Smallfiles option reduces the initial size for data files and limits them to 512 MB and each journal file from 1 GB to 128 MB, which can significantly reduce space taken away from EC2 drive.

Setting noprealloc to true disables the preallocation of data files. It will speed up start up time but might have some impact on performance causing during normal operations.

Important: disabling file preallocation might put your application on hold even for a minute as additional space has to be calculated and allocated. Try to not use it, unless you don’t care about performance penalty.

To change default settings edit  /etc/mongo.conf adding desired options.

When you done, stop the service, clean up the files and finally restart it again.

Single EC2 “gotchas”

Running MongoDB on single (EC2) instance is not recommended. It is however possible. If you are planning mini production server with full read blow “gotchas” to understand what you are getting into.

Safe writes

To make fast asynchronous writes and updates by default MongoDB does not provide a operation status (success or error) response for those operations. Clients typically use getLastError in combination write / update operation to confirm operation has been completed.

Additional options can be passed as shown below for different type of result.

Using getLastError  on your client makes write / update operation more “synchronous” but sometimes might be required confirmation is needed. That combination slightly reduces performance so it should be done only when needed.

It is achievable with most of mongo drivers, ex. for Mongoose it can be configured for new schemas passing safe parameter to constructor function, as shown below.

More information about safe writes and mentioned fsync can be found on MongoDB website.

Journaling ON (is your only hope)

Since version 2.0 MongoDB comes with journaling which reduces chances of losing data. When database is shutdown unexpectedly, it will be replayed before starting up new mongod process.  That way database is kept in a consistent state. It is enabled by default and causes 5% performance hit. It is absolutely critical to make sure journaling is enabled if you are running mongo on single instance.

No replica set no failover

Replica set is a cluster of mongodb instances that replicate amongst one another and ensure automated failover. Cluster operates based upon master slave replication and voting (elections) for first master server during failover. In some cluster is extended with arbiter that exist solely to vote in elections (it does not replicate data).

Important: With single EC2 instance those mechanisms are redundant. In case of unexpected server shutdown, although  integrity data will be restored with journal files,  server/service won’t be available until operation is completed.

Early sharding

Sharding is a solution splitting mongo data across multiple machines – MongoDB supports automatic sharding. It is done to increase performance of replica set and so if system is expanding, most likely it will have to be “sharded”. Sharding is an expensive operation, that is why it should be done when your server hits 80% – 85% capacity. Following best practice it is recommended to start with sharded setup on smaller machines with option to scale up the instance. It is much faster than migrating hundred thousand of chunks.

Note: If there is a need to update shard key (for setup with shards) the only way to do it is to remove the document and reinsert it. Key can not be updated otherwise. 

Here is nice article about some more gotchas you should know, written by one of MongoDB Masters: Russell Smith.

Memory is important but not critical

It’s certainly possible to run MongoDB on a machine with a small amount of free RAM.

Mongo automatically uses ALL (really?!!) free memory on the machine as its cache. It can be confirmed with system resource monitors, like free -m. However it’s usage is dynamic. If another process suddenly needs any amount of RAM, mongo will release cached memory.

From technical point of view operating system’s virtual memory subsystem manages mongo’s memory. That way it can use as much free memory as it is available swapping to disk as if and when it is needed.

Note: It is highly recommended to create setup with enough available memory to fit the whole working data set in RAM as that is the only way to achieve maximum performance. It becomes even more important for MongoDB running on single EC2 instance without additional shards.

Monitoring

Utilities

The MongoDB comes with few useful utilities providing performance and activity statistics:

  • mongotop - reports tracked read and write of MongoDB instance activity. It reports operation per-collection and can be use to check activity and use against expectations, available from CLI (command line interface).
  • mongostat  -  it captures and returns counters of database operations. It reports per-type operations (insert, query, update, delete, etc.), helping to understand the server load distribution, available from CLI,
  • http://localhost:28017 - REST Interface displaying diagnostic and monitoring information. It can be enabled by rest option set to true and can be accessed on port 1000 higher that the database port (28017 by default)

Statistics

MongoDB comes  with commands that return statistics providing information about state of the instance. Statistics output can be used in scripts and programs, to develop custom alerts, or to modify application behavior  in response to the instance activity.

  • db.serverStatus() - commands instantly generates document with general overview of database state (disk usage, memory use, connection, journaling, index accesses). It  does not  impact MongoDB performance,
  • rs.status()  - generates document with state and configuration of the replica set. It is useful for confirming replica configuration, and checking the connections between host and replica set members,
  • db.stats()  - returns a document with data providing information about used storage and data contained in the database and object, collection, and index counters. Useful for checking and tracking the state and storage of a specific database,
  • db.printCollectionStats() - provides statistics with count of the objects in the collection, the size of the collection, the amount of disk space used by the collection, and information about the indexes.

More information can be find on official mongo website in monitoring tools section. You should probably take a look at administration interfaces as well.

Utilizing AWS EBS (Elastic Block Storage)

What is EBS?

“Amazon Elastic Block Store (EBS) provides block level storage volumes for use with Amazon EC2 instances. Amazon EBS volumes are network-attached, and persist independently from the life of an instance. Amazon EBS provides highly available, highly reliable, predictable storage volumes that can be attached to a running Amazon EC2 instance and exposed as a device within the instance. Amazon EBS is particularly suited for applications that require a database, file system, or access to raw block level storage.”

Source: http://aws.amazon.com/ebs/

Why and how to using EBS

EBS provides persistent data storage discharging you from responsibility of making constant backups. It can be attached to any EC2 instance and mounted to your file system in order to be used like part of it. There is no doubt of its usefulness however it is really important to note that not all EC2 instances are EBS optimized. If you think about creating your stack over small or medium EC2 instances using EBS for software that requires fast access to your drive would be a mistake. You should still use it as it is really cost effective to use it for storing data but keep in mind that this whole section has been written for those who are using Large (or more power) EC2 instance.

System configuration with EBS

Important: Read previous paragraph to make sure you know what you are doing!

You can create and attach EBS using AWS console.  Once it is done you can start from creating all required folders, which we will use with mounted ebs volume to store data files, logs and libs.

Find mongo database location:

Next stop mongod service and move mongo  data folder to ebs/data folder. IMPORTANT: Remember to replace /var/lib/mongo with own path.

Lets move log files to EBS drive as well.

And finally we can recreate all  the directories and point everything back to where it belongs.

And to make sure those are being created after system restart lets add them to /etc/fstab

And to the bottom of the file

Finally lets make sure dbpath is pointing to the right place. Open config file with your favourite editor like (you should try vim if you didn’t yet): *** link to vim commands ***

Find line starting with dbpath and replace it with:

Now mongod can be restarted

Lastly we can test if everything works fine by restarting the server

Resources

Series Navigation<< LANNN – extra speed with using Varnish in front of Nginx and Node.js
This entry is part 8 of 8 in the series AWS - Nginx Node.js MongoDB

Freelance developer, IT enthusiast, blogger with entrepreneurial spirit and passion for making games

Tagged with:
, , , , , , , , , , , , , ,
Posted in
AWS, Technical
One comment on “LANNN – MongoDB on AWS – mongo on single EC2 instance
  1. Dharshan says:

    Thanks for the great article. MongoDirector.com lets you automate the entire process of deploying replica sets and shards on EC2. You can also backup, monitor and scale. You can also lock down access using EC2 security groups

Leave a Reply

Categories