A “namespace” is the concatenation of the database name and
the collection names [1] with a period
character in between.

Collections are containers for documents that share one or more
indexes. Databases are groups of collections stored on disk using a
single set of data files. [2]

For an example acme.users namespace, acme is the database
name and users is the collection name. Period characters can
occur in collection names, so that acme.user.history is a
valid namespace, with acme as the database name, and
user.history as the collection name.

While data models like this appear to support nested collections, the
collection namespace is flat, and there is no difference from the
perspective of MongoDB between acme, acme.users, and
acme.records.

MongoDB flushes writes to disk on a regular interval. In the default
configuration, MongoDB writes data to the main data files on disk
every 60 seconds and commits the journal roughly every 100
milliseconds. These values are configurable with the
commitIntervalMs and syncPeriodSecs.

These values represent the maximum amount of time between the
completion of a write operation and the point when the write is
durable in the journal, if enabled, and when MongoDB flushes data to
the disk. In many cases MongoDB and the operating system flush data to
disk more frequently, so that the above values represents a
theoretical maximum.

However, by default, MongoDB uses a “lazy” strategy to write to
disk. This is advantageous in situations where the database receives a
thousand increments to an object within one second, MongoDB only needs
to flush this data to disk once. In addition to the aforementioned
configuration options, you can also use fsync and
Write Concern Reference to modify this strategy.

MongoDB does not have support for traditional locking or complex
transactions with rollback. MongoDB aims to be lightweight, fast, and
predictable in its performance. This is similar to the MySQL MyISAM
autocommit model. By keeping transaction support extremely simple,
MongoDB can provide greater performance especially for
partitioned or replicated
systems with a number of database server processes.

MongoDB does have support for atomic operations within a single
document. Given the possibilities provided by nested documents, this
feature provides support for a large number of use-cases.

If you see a very large number connection and re-connection messages
in your MongoDB log, then clients are frequently connecting and
disconnecting to the MongoDB server. This is normal behavior for
applications that do not use request pooling, such as CGI. Consider
using FastCGI, an Apache Module, or some other kind of persistent
application server to decrease the connection overhead.

If these connections do not impact your performance you can use the
run-time quiet option or the command-line option
--quiet to suppress these messages from the
log.

Each MongoDB document contains a certain amount of overhead. This
overhead is normally insignificant but becomes significant if all
documents are just a few bytes, as might be the case if the documents
in your collection only have one or two fields.

Consider the following suggestions and strategies for optimizing
storage utilization for these collections:

Use the _id field explicitly.

MongoDB clients automatically add an _id field to each document
and generate a unique 12-byte ObjectId for the _id
field. Furthermore, MongoDB always indexes the _id field. For
smaller documents this may account for a significant amount of
space.

To optimize storage use, users can specify a value for the _id field
explicitly when inserting documents into the collection. This
strategy allows applications to store a value in the _id field
that would have occupied space in another portion of the document.

You can store any value in the _id field, but because this value
serves as a primary key for documents in the collection, it must
uniquely identify them. If the field’s value is not unique, then it
cannot serve as a primary key as there would be collisions in the
collection.

Use shorter field names.

MongoDB stores all field names in every document. For most
documents, this represents a small fraction of the space used by a
document; however, for small documents the field names may represent
a proportionally large amount of space. Consider a collection of
documents that resemble the following:

{last_name:"Smith",best_score:3.9}

If you shorten the field named last_name to lname and the
field named best_score to score, as follows, you could save 9
bytes per document.

{lname:"Smith",score:3.9}

Shortening field names reduces expressiveness and does not provide
considerable benefit for larger documents and where document
overhead is not of significant concern. Shorter field names do not
reduce the size of indexes, because indexes have a predefined
structure.

In general it is not necessary to use short field names.

Embed documents.

In some cases you may want to embed documents in other documents
and save on the per-document overhead.

For documents in a MongoDB collection, you should always use
GridFS for storing files larger than 16 MB.

In some situations, storing large files may be more efficient in a
MongoDB database than on a system-level filesystem.

If your filesystem limits the number of files in a directory, you can
use GridFS to store as many files as needed.

When you want to keep your files and metadata automatically synced
and deployed across a number of systems and facilities. When using
geographically distributed replica sets MongoDB can distribute
files and their metadata automatically to a number of
mongod instances and facilities.

When you want to access information from portions of large
files without having to load whole files into memory, you can use
GridFS to recall sections of files without reading the entire file
into memory.

Do not use GridFS if you need to update the content of the entire file
atomically. As an alternative you can store multiple versions of each
file and specify the current version of the file in the metadata. You
can update the metadata field that indicates “latest” status in an
atomic update after uploading the new version of the file, and later
remove previous versions if needed.

Furthermore, if your files are all smaller the 16 MB
BSONDocumentSize limit, consider storing the file manually
within a single document. You may use the BinData data type to store
the binary data. See your drivers
documentation for details on using BinData.

Here, my_query then will have a value such as {name:"Joe"}. If my_query contained special characters, for example
,, :, and {, the query simply wouldn’t match any
documents. For example, users cannot hijack a query and convert it to
a delete.

You must exercise care in these cases to prevent users from
submitting malicious JavaScript.

Fortunately, you can express most queries in MongoDB without
JavaScript and for queries that require JavaScript, you can mix
JavaScript and non-JavaScript in a single query. Place all the
user-supplied fields directly in a BSON field and pass
JavaScript code to the $where field.

If you need to pass user-supplied values in a $where clause,
you may escape these values with the CodeWScope mechanism. When you
set user-submitted values as variables in the scope document, you can
avoid evaluating them on the database server.

Field names in MongoDB’s query language have semantic meaning. The
dollar sign (i.e $) is a reserved character used to represent
operators (i.e. $inc.) Thus,
you should ensure that your application’s users cannot inject operators
into their inputs.

In some cases, you may wish to build a BSON object with a
user-provided key. In these situations, keys will need to substitute
the reserved $ and . characters. Any character is sufficient,
but consider using the Unicode full width equivalents: U+FF04
(i.e. “＄”) and U+FF0E (i.e. “．”).

Consider the following example:

BSONObjmy_object=BSON(a_key<<a_name);

The user may have supplied a $ value in the a_key value. At
the same time, my_object might be {$where:"things"}. Consider the following cases:

Insert. Inserting this into the database does no harm. The
insert process does not evaluate the object as a query.

Update. The update() operation permits $ operators
in the update argument but does not support the
$where operator. Still, some users
may be able to inject operators that can manipulate a single
document only. Therefore your application should escape keys, as
mentioned above, if reserved characters are possible.

Query Generally this is not a problem for queries that
resemble {x:user_obj}: dollar signs are not top level and
have no effect. Theoretically it may be possible for the user to
build a query themselves. But checking the user-submitted content for
$ characters in key names may help protect against this kind
of injection.

MongoDB implements a readers-writer lock. This means that
at any one time, only one client may be writing or any number
of clients may be reading, but that reading and writing cannot
occur simultaneously.

In standalone and replica sets the lock’s scope
applies to a single mongod instance or primary
instance. In a sharded cluster, locks apply to each individual shard,
not to the whole cluster.

The comparison treats a non-existent field as it would an empty BSON
Object. As such, a sort on the a field in documents {} and {a:null} would treat the documents as equivalent in sort order.

With arrays, a less-than comparison or an ascending sort compares the
smallest element of arrays, and a greater-than comparison or a
descending sort compares the largest element of the arrays. As such,
when comparing a field whose value is a single-element array (e.g. [1]) with non-array fields (e.g. 2), the comparison is between
1 and 2. A comparison of an empty array (e.g. []) treats
the empty array as less than null or a missing field.

The {cancelDate:null} query matches documents that either
contain the cancelDate field whose value is nullor that
do not contain the cancelDate field. If the queried index is
sparse, however, then the query will only match
null values, not missing fields.

Changed in version 2.6: If using the sparse index results in an incomplete result, MongoDB will not
use the index unless a hint() explicitly specifies the
index. See Sparse Indexes for more information.

As an alternative, if your collection has a field or fields that are
never modified, you can use a unique index on this field or these
fields to achieve a similar result as the snapshot().
Query with hint() to explicitly force the query to use
that index.

As a cursor returns documents other
operations may interleave with the query: if some of these
operations are updates that cause the
document to move (in the case of a table scan, caused by document
growth) or that change the indexed field on the index used by the
query; then the cursor will return the same document more than
once.

one-to-many relationships when the “many” objects always appear
with or are viewed in the context of their parents.

You should also consider embedding for performance reasons if you have
a collection with a large number of small documents. Nevertheless, if
small, separate documents represent the natural model for the data,
then you should maintain that model.

If, however, you can group these small documents by some logical
relationship and you frequently retrieve the documents by this
grouping, you might consider “rolling-up” the small documents into
larger documents that contain an array of embedded documents. Keep in mind
that if you often only need to retrieve a subset of the documents
within the group, then “rolling-up” the documents may not provide
better performance.

“Rolling up” these small documents into logical groupings means that queries to
retrieve a group of documents involve sequential reads and fewer random disk
accesses.

Additionally, “rolling up” documents and moving common fields to the
larger document benefit the index on these fields. There would be fewer
copies of the common fields and there would be fewer associated key
entries in the corresponding index. See Index Concepts for more
information on indexes.

Begin by reading the documents in the Data Models
section. These documents contain a high level introduction to data
modeling considerations in addition to practical examples of data
models targeted at particular issues.

Additionally, consider the following external resources that provide
additional examples:

An update can cause a document to move on disk if the document grows in
size. To minimize document movements, MongoDB uses
padding.

You should not have to pad manually because by default, MongoDB uses
Power of 2 Sized Allocations to add padding automatically. The Power of 2 Sized Allocations
ensures that MongoDB allocates document space in sizes that are powers
of 2, which helps ensure that MongoDB can efficiently reuse free space
created by document deletion or relocation as well as reduce the
occurrences of reallocations in many cases.

However, if you must pad a document manually, you can add a
temporary field to the document and then $unset the field,
as in the following example.

Warning

Do not manually pad documents in a capped
collection. Applying manual padding to a document in a capped
collection can break replication. Also, the padding is not
preserved if you re-sync the MongoDB instance.

varmyTempPadding=["aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"];db.myCollection.insert({_id:5,paddingField:myTempPadding});db.myCollection.update({_id:5},{$unset:{paddingField:""}})db.myCollection.update({_id:5},{$set:{realField:"Some text that I might have needed padding for"}})