Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It only takes a minute to sign up.

I'm looking to reshape the Documents in one of my collections, and have found two ways to do it, but need guidance. For simplicity, say I have a collection, "myColl", and I need to reshape Documents that look like this:

{
x:"foo",
y:"bar"
}

To:

{
nest: {
x: "foo",
y: "bar"
}
}

This can be accomplished by using the aggregation framework to reshape the documents, and then rewrite the entire collection. When run against a test collection of about 150K records, the following takes roughly 5 seconds:

I'm leaning towards the aggregation approach for performance reasons; however, someone mentioned here that it is creating a "new" collection with somewhat of a negative connotation, but it's not entirely apparent as to why. Are there causes of concern that I should be aware of other than the Type safety mentioned in that comment?

Also, if the cursor approach is better, then how might I speed up the execution? Setting the "w" param of WriteConcern to 0 doesn't do anything in my test because everything is hosted on the same box so skipping the acknowledgement doesn't save me any time, and is orthogonal to the fact that aggregation is executing order of magnitudes faster.

1 Answer
1

Let's first start from Aggregation. As per MongoDB BOLHere Aggregation operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result.
The aggregation pipeline can use indexes to improve its performance during some of its stages. In addition, the aggregation pipeline has an internal optimization phase.

The most basic pipeline stages provide filters that operate like queries and document transformations that modify the form of the output document.

The pipeline provides efficient data aggregation using native
operations within MongoDB, and is the preferred method for dataaggregation in MongoDB.

MongoDB also provides map-reduce operations to perform aggregation. In general, map-reduce operations have two phases: a map stage that processes each document and emits one or more objects for each input document, and reduce phase that combines the output of the map operation. Optionally, map-reduce can have a finalize stage to make final modifications to the result. Like other aggregation operations, map-reduce can specify a query condition to select the input documents as well as sort and limit the results.

Note: Starting in MongoDB 2.4, certain mongo shell functions and properties are inaccessible in map-reduce operations. MongoDB 2.4 also provides support for multiple JavaScript operations to run at the same time. Before MongoDB 2.4, JavaScript code executed in a single thread, raising concurrency issues for map-reduce.

All of these operations aggregate documents from a single collection. While these operations provide simple access to common aggregation processes, they lack the flexibility and capabilities of the aggregation pipeline and map-reduce.

> db.orders.distinct("cust_id")
[ "A123", "B212" ]

Cursor

As MongoDB BOL Iterate a Cursor in the mongo Shell The db.collection.find() method returns a cursor. To access the documents, you need to iterate the cursor. However, in the mongo shell, if the returned cursor is not assigned to a variable using the var keyword, then the cursor is automatically iterated up to 20 times to print up to the first 20 documents in the results.

The following examples describe ways to manually iterate the cursor to access the documents or to use the iterator index.

Manually Iterate the Cursor

var myCursor = db.orders.find( { Cust_id: "A123" } );
myCursor

You can use the cursor method forEach() to iterate the cursor and access the documents, as in the following example:

The toArray() method loads into RAM all documents returned by the cursor; the toArray() method exhauststhe cursor.

Cursor Behaviors

Closure of Inactive Cursors by default, the server will automatically close the cursor after 10 minutes of inactivity, or if client has exhausted the cursor. To override this behavior in the mongo shell, you can use the cursor.noCursorTimeout() method:

var myCursor = db.orders.find().noCursorTimeout();

After setting the noCursorTimeout option, you must either close the cursor manually with cursor.close() or by exhausting the cursor’s results.

As finally, In Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result.
The pipeline provides efficient data aggregation using native operations within MongoDB, and is the preferred method for data aggregation in MongoDB.

where in the mongo shell, if the returned cursor is not assigned to a variable using the var keyword, then the cursor is automatically iterated up to 20 times to print up to the first 20 documents in the results.
The toArray() method loads into RAM all documents returned by the cursor; the toArray() method exhausts the cursor.

Thanks for the obvious time that you put into your answer. Although I think you did a good job describing aggregation and cursors and how to work with them, I was unable to find the answer to my question in your post. I'm trying to find some intuition on why aggregation is order magnitude faster in my example, whether that is to be expected, or is circumstantial. Ultimately, I'd like to develop intuition on what's the most efficient (quick) way to reshape documents in a large collection.
– TungOct 31 '18 at 18:04