Let's share the knowledge

MongoDB

Securing application data is critical for any client and business, and the same principle applies to Sitecore application as well.

One of the most important component of Sitecore is MongoDB, which is where we store all Experience(xDB) related data, MongoDB was shipped into Sitecore’s ecosystem from Sitecore 7.5, and it’s very important to make sure the data stored in xDB is secured, and only authorized uses has access to the data.

Recently, we heard about thousands of MongoDB data being hacked, so what’s the reason behind it? any guess? it’s simple- all those DBs were not configured to be secure and anyone can access it.

It would be great if MongoDB installation itself comes with an option, where we can secure our data while installing it, like how we do it for SQL.

Even though, we can go back and secure the data by setting up users/roles and permissions, but it’s always great to do it it first place.

We also have to see and make sure that Connection string used for MongoDB is all protected with credentials, so that only authorized users can access it.

As part of this blog post, i would like to cover the steps which we can follow and make our Sitecore application more secure.

If we are trying to access MongoDB without passing valid credentials, we get this error in the log, please see the screen shot for ref:

Once we pass the valid credentials, this error will go off.

It’s always a good practice to authenticate MongoDB in local environment as well, this helps us in setting the habit for it and we can uncover any issues well in advance.

I hope this helps in getting the understanding about how we can secure and authenticate MongoDB, and how to create users/permissions for the same.

There is a great article in MongoDB documentation, around setting up auth for Mongo and setting up users, creating roles for the same, please consider reviewing this as well, this is great source of information.

As part of my MongoDB Blog series part 1, we discussed how we can Install MongoDB and we used “C” drive as default Installation location for it.

But as a best practice and considering scalable environment, we shouldn’t be using “C” drive to store any application data, it should be reserved only for system files, so that if we need to upgrade the OS or need to repair existing Installation, our application data is still safe.

We have done this for one of the implementation, and it’s always better to do this in early phase of the development.

We can always go back and change the default data directory location for MongoDB in Sitecore, and can use mongo config file which we created for Installing MongoDB.(Please refer MongoDB Blog series part-1 for more details)

This is how it was before:

Let’s use another drive to store data and logs for MongoDB, in this case it’s “G”

From the above screenshot we can see that, the data and log folder points to “G” drive now, and rest of the configuration is all same.

If you Installed MongoDB as a service, your service still points to “C” drive, but your data and log will be stored in “G” drive.

If you want to copy over existing data from “C” to “G” (in this example), make sure to stop your service, copy your data from “C” to “G”(in this example) and restart MongoDB Service.

This is good from maintenance perspective, and easy to manage afterwards.

I hope this helps someone, who is looking for something similar.

Thanks, and please let me know for any questions and any feedback, happy to discuss more.

In previous blog post we have gone through MongoDB introduction with Sitecore, features and installation.In this blog we will go over available scaling options in MongoDB, and then followed with introduction to contacts and out of the box queries.

Scaling:

There are three types of scaling:

Standalone environment

Vertical Scaling and

Horizontal Scaling

Standalone environment:

A standalone is all in one configuration, where we install all xDB components in the same computer, which includes:

Content management server

Content delivery server

Database server

Reporting server

Collection server.

This is not an optimal production environment setup, and it’s mostly resembles the development environment, where we have all components in the same workstation, we can say this setup as “not scalable environment”.

Vertical Scaling:

Vertical scaling means adding more resources to single node in the system,which typically involves adding/upgrading more hardware to single machine.

When we start inclining towards Vertical setup, we tend to have separate servers for each component, i.e separate servers for:

Database

Content management

Content delivery and

Reporting server

If we see that specific component requires hardware upgrade, then we can just scale that environment/component up, without touching any other server, and this way we can scale the complete Sitecore system.

Horizontal Scaling:

Though we can scale each component of the System, by following vertical Scaling, but what about if we have just one Content delivery server and because of some server issue, we lost all data from that server, just can’t imagine right?

In this specific case even we have scaled up the content delivery server by upgrading the size,RAM and all other component(s) as per the requirements, but such thing can’t help us out if something goes wrong with that specific server, which will ultimately results in data loss.

In this scenario, we can resolve the issue by deploying multiple servers for the same components, which includes:

Multiple content management servers

Multiple content delivery servers

Multiple MongoDB(Analytics) servers

Separate session state server.

This type of setup helps in resolving the issue of, one server going down for some reason, From MongoDB presepective, we can achieve this by adding multiple servers for Analytics, we do it via adding Replica sets.

By means of replication we achieve following:

Availability

MongoDB provides high data availability with replica sets.

A replica set consists of two or more copies of the same data.

What happens in Replica set is, we setup the environment which defines a primary server, which will be used to read and write the Analytics information, at the same time all data from replicaset-1 will get copied to replicaset-2 and replicaset-3, all the servers are always in sync.

From here, if something goes wrong to replicaset-1 server, MongoDB internally makes either replicaset-2 or replicaset-3 as a primary source of reading and writing the information, this we can always make sure data availability.

Introduction to Contacts:

In xDB a contact is an individual visitor.

This visitor may be anonymous or he may have been authenticated.

A contact is a combination of facets.

Contact Includes:

Identifiers

Personal Information

Email

Phone Number

Addresses

Identifying Contacts:

Contact identification is the process of connecting the current session, device and contact session to an identifier. This is implemented using the Identify() method which is part of the Sitecore Analytics tracker namespace.

Sitecore.Analytics.Tracker.Current.Session.Identify(identifier)

A contact is always identified by an identifier, identifier is an string value which uniquely identifies a contact in relation to website and this value is always provided by contact itself.

Identifiers can be one of the following:

User login

User id from third party system and/or

Email address

Here is the sample snippet which shows how we can validate the use in MongoDB:

MongoDB Queries:

Let’s look into the sample two queries, which is used to fetch data from out of the collections.

Consider a case where we have millions of records in “Contacts” collection, and wants to get specific contact record, we can add a filter where we can pass “FirstName”, and we use “Personal.Firstname” Facet for this.

Another example, if we want to find an identifier based on specific Id, we can use this query:

db.getCollection(‘Identifiers’).find({“_id”:”ANKIT”})

In the same way we can also create custom collections, and add documents to it using Mongo Shell.

We can create custom collections using Mongo Shell, and the beauty of this is, when we try to create a new collection, and if that collection doesn’t exists it will create it automatically, and documents of the collections can have different structure, which makes it more flexible.

Sitecore introduced MongoDB in it’s ecosystem to solve the problem of scaling analytics, let’s try to understand MongoDB from Sitecore prespective, and see how it’s useful and where exactly it sit in the Sitecore system, we will have series of three posts where we will start with the introduction about it, it’s features, advantages, setting up MongoDB in your environment.

In the second series we will talk about contacts and some of the out of the box queries, and will to understand MongoDB collection, and in the final post we will see how we can create custom facets to extend MongoDB functionality.

Introduction to MongoDB

Sitecore 7.5 introduced MongoDB as the main datastore for the Sitecore Experience Database (xDB). Sitecore xDB allows organizations to collect all of their customer interactions from all channels to create a comprehensive, single view of the customer that allows marketers to better optimize the customer experience in real-time.

Following are some of the features of MongoDB:

Open source

NoSQL

Document oriented database.

Primarily used for collecting data and information about visitors(for analytics)

Visitors and their interactions are written to MongoDB in JSON format, which then processed by an aggregation pipeline into a format that is used for reporting.

There are several advantages and benefits with MongoDB, some of them are listed below:

Scalability

Standalone environment

Vertical Scaling and

Horizontal Scaling

Performance

Flexibility

Unstructured data and Schemas

Scaling is one of the critical feature in Sitecore, and we will discuss about all different scaling options in the next series of this post, where we will talk more about horizontal scaling and how MongoDB uses it to make sure the availability of data.