Main menu

Going to production with Sitecore on Azure PaaS

Masi Malmi
20.3.2018

Almost one year has passed since I started my journey with Sitecore and Azure and a lot has happened since. I’ve had the privilege to work on several Sitecore on Azure PaaS projects and therefor widen my knowledge and expertise as well as work on some best practices along the way. As an example I want to share you how to build CI/CD pipelines in VSTS using modern DevOps techniques. What does it really take to run Sitecore on Azure PaaS?

VSTS pointers

Before jumping into the actual topic I want to cover a couple of pointers about VSTS which are prerequisites when looking into utilizing VSTS fully for Sitecore CI/CD pipelines.

Build and Release pipelines

Depending on your setup the easiest way to get started is to purchase one Hosted pipeline for the project needs. It costs 40$/month and covers all your needs. In case you host your App Services under Application Service Environment (ASE) you’ll need to use private agents instead and that’s a whole other story. It’s maybe worth mentioning also that only VSTS admins have the privilege to acquire these so in most cases you’ll have to ask the owner of the VSTS instance to handle this for you.

Service Endpoint

Before you can run any CI/CD pipelines from VSTS to your Azure subscription you need a Service Endpoint between the VSTS instance and Azure subscription(s) used in the project. Also creating these requires VSTS admin privileges.

Custom VSTS tasks

if you’re using custom VSTS tasks in your CI/CD pipelines you need to have Administrator of all pools permissions in the VSTS instance. This permission needs to be specifically granted to your account.

Session handling with Redis

There’s a long story behind but it all started from the very first load tests I was running against the to-be production website running on Azure PaaS. Although at that point there were still many features to be finished I took note of these strange errors related to sessions when the website was under heavy load. I figured they were just normal timeouts related to expired sessions or similar back then. I couldn’t really tell where the problem was as it was originating from Sitecore’s internals.

Then in December Sitecore published a knowledge base article addressing the very same issue. Now I realized other Sitecore partners were experiencing similar challenges on their Azure PaaS environments with the default Redis session handler. The article was quite extensive and included several suggestions on how to optimize the session handling to avoid the problems.

After utilizing all the feasible changes into our setup the issue still persisted. I needed to find the root cause or we couldn’t go live with the website.

After a series of load tests and a very lengthy discussion with Sitecore support, we found out what was the root cause behind the session handling issues. Big thanks go to Valeriy Leman for helping us out!

It turned out that Azure’s Standard pricing tier WebApps are not suitable for a high traffic website. The CPUs start to peak with about 150 to 200 concurrent users (per S3 instance) and once this happens Sitecore’s session handling begins to fail randomly. Even though you scale the instance count the benefit is not substantial, their CPUs just aren’t powerful enough. Scaling up the Redis cache pricing tier did not have a big impact either. Sitecore’s official recommendations seem to have very little to do with actual production use, personally I would use them only for test and demo purposes.

So we needed more powerful WebApps. But switching into Application Service Environment v2 (ASE) would not have made much sense close to go-live. Luckily there’s another option available these days: Premium V2 pricing tier. Once we switched our production environment AppServices to use this pricing tier the session handling issues were gone.

There’s one thing that the Sitecore knowledge base article does not cover though which is how to raise the threshold values for WORKER and IOCP threads in StackExchange.Redis. The default values are not sufficient for the website to be able to serve several thousand concurrent users. In Sitecore 8.2 the correct way to fix this is to

2. Configure the pipeline for the CD role and tune the thread pool values for your needs:

Be prepared for failovers – and beyond

WebApps are a very nice concept. But when it comes to unexpected failures in the application there might be need to route the traffic someplace else while fixing the immediate issue. In multi-tenant WebApps you don’t really have control over the load balancing because it happens transparently in the background. This is where Azure Traffic Manager(ATM) comes into play.

ATM has turned out to be a very useful service. It works on the DNS level and gives you the much-needed control for defining secondary endpoints where to redirect traffic in case the primary endpoint is failing.

But it does not end here. It’s also very good to have if you’re planning to upgrade to Sitecore 9 soon for instance. There’s so many changes in the infrastructure related to this exercise that I would much rather provision a new Sitecore 9 Azure PaaS environment, upgrade the existing solution to 9 and copy the data on top of that then even consider doing the upgrade on existing Azure PaaS infrastructure. This is one of the main benefits of using Azure PaaS so remember to take full advantage of it!

And for the record, it worked well when we had to switch existing production environment with a new one. It was not possible to change the pricing tier for existing environment from Standard to PremiumV2.

Azure search and Sitecore indexing

This topic deserves some attention as it’s been one of the biggest pain points in our first Sitecore on Azure PaaS project. Sitecore uses the Azure Search (AS) service as the default indexing provider when run on Azure PaaS. Their documentation does mention about some of the limitations related to this service but it does not end there. When using this service in your Sitecore project be aware of the following facts which apply to Sitecore 8.2

Exclude all custom templates from being included in the Sitecore OOB indexes

Azure Search service has a limit of 1000 fields per index and this level can be reached quite quickly if you don’t do anything about it

Azure PaaS does not support indexing media files

Azure Search is similar to Lucene in syntax but has many limitations compared to it

Changes to the limits in commit policies most likely needed – to avoid issues with AS size limits during when Sitecore updates the indexes

Indexing strategy: remove indexing responsibilities from the CD role

In our project we moved them to CM

For production environment: make sure to scale the AS search service replica count to three (3) for 99% SLA

Conclusion

Sitecore works quite well in Azure PaaS already. As long as you actively monitor your PaaS services and scale them per need you’re good to go. The amount of active CD instances depends on your specific requirements so make sure to run some load tests to find your baseline but as a rule of thumb use either P3V2 or I3 (ASE) App Service plan for your production website and adjust your Redis cache tier to meet the concurrent user requirements. Azure also offers services for preventing DDoS attacks these days so no need for custom solutions there either.

We’re currently running Sitecore 8.2 update 4 and update 5 for our customers in Azure PaaS. Biggest challenges faced so far have had to do with Sitecore’s indexing and Azure Search in general, Sitecore’s event queues and session handling.

Looking ahead to Sitecore 9 there’s a major overhaul ahead with the infra under the hood but also some relief as MongoDB can be replaced with Azure SQL for xDB – bringing Sitecore back to full MS stack. Sitecore 9 XDB environment requires over 10 Azure SQL Databases so it would make sense to start utilizing Elastic Pools for those.

This has been only the beginning and there’s more to come in the coming months. I hope you enjoyed reading my latest blog post!