SharePoint Solutions in the Cloud era for the IT Pro

SharePoint infrastructure – Best Practices

This time I have decided to share our best practices material for SharePoint. Yes, I know, some of you SharePoint savvies would question its meaning. as SharePoint is SOOO broad and infrastructure is just one pillar. TRUE! however its the foundation, and as such you have to invest there first before building your service consisting of web applications, sites, web parts, workflows, dashboards and…. you get the idea :-)

So here’s a summary of the presentation attached, James Baldwin and myself presented back at EMC World 2011 in Vegas.

I’ve also added a lot of links and references for you to find the technical material necessary to accomplish plan and deployment activities.

Please feel free to comment as I would love to get your feedback as for what works/doesn’t work in your environment.

Servers and Virtualization

Virtualizing SharePoint servers has the same benefits as any other application and/or database servers in your datacenter. but there’s actually more than “just”:

SharePoint is a perfect candidate for horizontal scaling. meaning scaling out server roles as more resources required. You would actually get some performance benefits when Web servers (front/back end) are broken into multiple instances by scaling out rather up, utilizing same hardware resources. In many other cases I find that working for the application and SQL server roles too.

Don’t get intimidated by the hypervisor overhead (<10%), again you can always distribute content databases using multiple SQL instances, Partition Index across multiple Crawl/Index servers. all servers roles in SharePoint 2010 can scale out!

Some would recommend to consider leaving index and/or SQL physical, we have proved so many times that ALL server roles can be virtualized without any problem while balancing processing (CPU) pressure (I wouldn’t be that worried with I/O throughput) with multiple virtual machines. Microsoft general recommendation to dedicate at least 8 physical cores for a medium sized farm is very generic and I don’t see that as a showstopper. if you take that guidance literally I would suggest to wait for the next rev. of vSphere (very soon) and Hyper-V (later this/next year) which would present up to 32 vCPU support.

Plan for USER LOAD peaks and not for systematic peaks. from what we have observed in lab tests and actual customer data, SharePoint’s regular timer jobs are responsible for most of the I/O and CPU peaks.

Virtual SharePoint farm - Reference Architecture

The configuration above supported more than 20,000 users with 10% concurrency using only three Dell R910 ESX cluster.

Storage Planning

Of course you would have to plan for performance and not capacity in most cases; definitely for SharePoint farms in a production environments. before getting into details, here’s a good picture of where SharePoint data is located:

To point the finger the the “hottest” areas in terms of I/Os I would rank it in the following order of importance (IOPS and latency):

Search databases (Crawl and Property) data and log files

Query Server/s – Query component/s

tempdbdata and log files

databases logs

From our lab tests this is what we have found in terms of I/O sizes and R/W ratios. as you can understand this is not your typical OLTP or OLAP profile, SharePoint workload can vary but here’s some data you might found useful when planning storage resources:

Microsoft suggests really high I/Os derived from the search components of the farm. while definitely true, I would argue the suggested IOPS requirements apply to most environments. here’s what we have observed vs. TechNet recommendations:

TempDB – Yes (The same blocks are re-used on disk and performance of TEMPDB directly affects SharePoint performance request – tempdb is used in every SharePoint request)

Content databases/BLOB Store – Maybe (Depending on the diversity of workload. If some site collections tend to be busier than others)

CX/VNX Thin Provisioning and LUN Compression

BLOB Storage (RBS)

We work with Metalogix StoragePoint for BLOB externalization. currently all EMC storage solutions are supported with StoragePoint as it has connectors to file, block and object storage. some of the possible BLOB stores:

Symmetrix VMAX – Block

VNX – Block and/or File (NFS/CIFS)

Atmos/VE – object (REST)

Centera – Centera API

Isilon – File (CIFS/NFS)

Data Domain – File (CIFS/NFS)

While each would make sense to our customers depending on their general storage design and requirements, please consider the following guidelines:

While Microsoft has some guidance for SharePoint availability as covered in Plan for availability (SharePoint Server 2010), there’s a lot more involved to obtain a true and complete SharePoint availability across multiple sites. Storage based replication can accelerate the failover process and can scale as your farm sotrage needs grow. The most basic methods of availability can be achieved with SQL server log shipping and/or database mirroring, but while effective for smaller configurations they still lack the complete farm protection. the only components that can be continuously protected with db mirroring/log shipping are SQL databases, and not all of them! what about the index? WFE? app servers?

SharePoint DR involves a lot, but can be significantly simplified when virtualizing all server roles, thus providing end-to-end mobility of SharePoint farm services without worrying about the BLOB filesystem, individual databases, index partitions etc. When choosing storage based replication, the first thing to consider is leveraging consistency grouping all SharePoint volumes. This is of a great value as it can guarantee an end-to-end (I like that term :-) ) consistency at any point in time.

Here’s a table to help you understand what is related to what and how to go about consistency grouping available with almost all EMC replication solutions (SRDF, RecoverPoint, MirrorView etc.):While this is a great solution, that type of DR still involves manual failover, restart and configuration. that’s the reason why I would be always recommending virtualizing all server roles, that would enable you to leverage automation solutions for virtual infrastracture. Namely, vCenter Site Recovery Manager (SRM) or Multi-Site Hyper-V clustering enabled by EMC Cluster Enabler (SRDF, RecoverPoint, MirrorView). Assuming VPLEX is deployed, you won’t even need Cluster Enabler but just rely on Windows Server Failover Cluster (VMs) to achieve that.

There are several reference architectures we have successfully tested and published available on emc.com: