Architecture, Design and Strategy – LessthanDotA Technical Community for IT Professionals2017-08-07T12:30:56Zhttp://blogs.lessthandot.com/index.php/feed/atom/WordPressdamberhttp://blogs.lessthandot.com/?p=86512017-05-30T19:28:11Z2017-05-31T17:10:36Z

So, you’re a developer? or an architect? or maybe even a technical delivery manager?

Great

I need you!

I’m currently doing some research for an upcoming book, and for a new software product, and who better to ask than those that are building and delivering software solutions each and every day?

So I’d like you to share the wisdom of your experience, by filling in a survey that will take you about 10–15 minutes to complete, about your experiences with designing, building, running and managing software. Start-ups and Enterprises both welcome.

There is a prize draw for an Amazon Echo, which you can opt into, as a thank-you for taking the time.

So, if you design, build or manage software on a regular basis, I would like to hear from you…

]]>2Eli Weinstock-Herman (tarwn)http://blogs.lessthandot.com/?p=42282016-02-07T19:30:51Z2015-11-18T14:36:12ZOne of the benefits of Microsoft SQL Azure over an on-premises or VM installation is built-in resiliency. In a typical on-premises/VM installation your database lives on a single server, with all the single points of failure that brings to mind. SQL Azure, on the other hand, always has 3 or more replicas assigned for each database. This allows it to weather issues like network glitches and commodity hardware failures with no administration and little to no downtime.

There are interactive simulations below! Skip ahead if you just want to play with them, read through if you want more of the details

Finding good, detailed articles about this has been difficult. Here’s a couple I found:

What really interested me was the database communications. How do reads flow into the database when there’s 3 of them? How do writes occur when one of my database replicas is down? How does a replica catch back up when it is available again?

I learn well from reading, but had to reread the articles a few times over the years before the information really stuck. So this post is an attempt to approach the topic from another direction, with active simulations of how this communications works in SQL Azure.

Note: When the two disagree, I’ll rely on the slight less out of date top article. When my practical (aka, support tickets) experience disagrees with both, I’ll point it out.

Note 2: I suspect the simulations below will make this a mobile-unfriendly post, sorry.

Key Details of SQL Azure

Before we start answering the questions above, lets extract some details from those dense articles to set the stage.

There are actually several layers of systems involved in SQL Azure, this post is going to focus just on the database operations. I’ll point out when the “fabric” is involved, but it won’t be part of the simulations. That being said, here’s the key details for the database:

There are a minimum of 3 database replicas at all times

All incoming traffic goes to the Primary replica (elected by the “fabric”)

Replicas exist on different physical servers (created/managed by the “fabric”)

Database Writes require a quorum of 2 of the 3 replicas acknowledging the write in order to COMMIT

Database Reads return directly from the Primary replica

There is support for both transactional and full restores

Each “data node” in the network includes the SQL Server processes involved in the items above as well asservices for failure detection, re-establishing nodes after failure, throttling, and so on. I won’t be diving into those today, this is all about the database replica.

Some Warnings:

As far as I know, Azure SQL does not use HTTP codes internally. I used them in the simulations below as I thought they would be more recognizable then me making something up

SQL Server is not limited to key/value or single statement operations, this is a simplification I made so I could focus on the mechanics of the communications instead of diving into the MSSQL storage engine

How do writes work?

Writes in SQL Azure come through a TDS gateway that, transparent to us, passes our queries to the Primary replica. The replica determines what the change will be from our operation, assigns a Change Sequence Number (CSN) to it, then replicates it to the secondary replicas. The Primary replica only commits the changes after it has received at least one acknowledgement back from the secondaries, ensuring the data now is now on at least two replicas (the Primary and one secondary).

Press the “Run” button below to start sending writes from the “gateway” into the replicas.

What you’re seeing is a simulation of the writes I described above. Each replica has a set of data that has been stored and a short transaction log and indicates whether it is the “PRIMARY” or “secondary” in it’s title bar.

The “gateway” in the top left sends each write to the PRIMARY replica. The PRIMARY replica calculates the storage change of the write, assigns it a CSN, and sends it to the two secondary replicas. These secondaries apply the change locally and send back an acknowledgement, at which point the PRIMARY commits the change (more on this in a moment). Once the PRIMARY commits the change, it returns a success response back to the person that sent that particular INSERT or UPDATE statement.

Keep in mind, this is a simulation. The model for the COMMIT above is based on what I found in the articles above, but is probably not quite right (and I would love it if someone has more definitive information about this so i could improve it).

How do reads work?

Reads are easy. Since the TDS gateway directs all queries to the Primary replica and it always has the most up to date data, it can respond with the values it has locally without seeking a quorum from the other replicas.

Press the “Run” button to send some quick writes and then watch how reads work.

As “Read” messages come in from the gateway, the PRIMARY replica looks the value up locally and returns it directly.

In the real SQL Azure replicas, this means that the PRIMARY replica has more work to do then the secondaries. This is where the “fabric” behind the scenes becomes critical, as it is responsible for trying to maintain a good balance of primary (read and write load) and secondaries (writes) across each server. When a new replica is created or a new PRIMARY is elected from the existing replicas, the “fabric” has to adjust things behind the scenes to balance out the work.

Weathering Outages

The point of the 3 node replica setup is to get high levels of resiliency from shared commodity hardware. If an outage is short enough, a transaction log update from whichever replica has the latest log can catch a restoring replica up to date. If the log has been exhausted, a full update can catch up a replica. Eventually, if the server or replica is unavailable long enough, the fabric will provision a new replica to replace it (not implemented in the simulation).

To help show both short outage cases, the simulated replicas only keep their last 4 transactions. This way a replica missing only a couple transactions will restore from transactions but a replica offline for more than 4 transactions will require a full restore.

Press the “Run” button to watch a shorter and longer outage while writing.

This is running a scripted loop of operations to show both restore cases. The script presses the turbo button during write transactions so we can skip ahead to the restore operations. When a replica’s border turns red, this means it has gone offline.

1) We prime the network with a couple writes, take replica “B” offline, send a couple more writes, then bring replica “B” back online. This results in a restore from transaction log.
2) After a couple more writes, we take replica “B” offline again, wait for 5 more writes to occur, then bring replica “B” back online. This results in a full restore.

When a replica comes online, it sends a restore request to the other replicas and identifies the latest CSN it applied. If the other replicas have that CSN in their log, they send back the log and the restoring replica can use the latest of those two logs to catch up. if neither of the replicas can send back a log, then the restoring replica asks for a full restore. This isn’t heavily detailed in the documentation, so this is another place that matches the document but may not quite match the reality.

When the PRIMARY replica goes down, the documentation outlines monitoring that occurs that causes the “fabric” to elect a new PRIMARY replica. From my own experience, one or more types of failures are actually monitored on a 5-10 minute poll and this will result in a short outage (remainder of the 5-10 minute poll loop) before it is noticed and the “fabric” elects a new PRIMARY from the remaining secondaries.

For longer replica outages, not included in this simulation, the “fabric” will provision a new replica from a full restore and add it to the cluster as a new secondary, replacing the bad node.

Now that we have Writes, Reads, and an IT Person stumbling over power cords, it’s time to put it all together and play a little.

Putting it all together

Here we have a functioning network and 3 buttons. One button starts a stream of random reads and writes, the next unleashes our stumbling IT person to wander aimlessly around and stumble over power cords, and the third allows you to toggle between slower and faster message travel times.

Start sending random writes, reads, and outages!

One thing you may notice is that the outages are no longer confined to a single replica, now even the primary can go down.

This adds tracking expected versus actual responses, stale data, outage stats, and SLAs as well as the ability to add additional replicas to the cluster.

Where the Simulation Is Wrong(ish)

There are a few things that either did not match reality or for which I couldn’t find good enough information. Feedback would be awesome for these. There are also a few places where I simplified concepts that were outside the scope of talking about the communications and restore processes, if their absence is a problem, let me know and I’ll try to extend the models.

Things I simplified:

Monitoring: I didn’t model the fabric or neighbor-based monitoring, instead servers will magically come back online every time and monitoring is performed by the generic “network” simulation.

Writes/Commits: I simplified this to single insert commits

HTTP Error Messages: I used HTTP status codes in messages because I don’t know the internal communications and it seemed good/simple enough

Things I got wrong:

Commits: While I tried to match the explained process, it is not wholly accurate and there is definitely a bug when a commit comes in with an Online Primary, a restoring Secondary, and an Offline secondary. It will be queued up for commit on the secondary but aborted on the primary due to lack of quorum, leading to a secondary that has bad data (and possibly both secondaries, if the other comes online before the next write occurs).

The Full Restore logic – this was an extrapolated guess from the documentation

SQL Transactions and multi-step operations – these aren’t implemented purely, but didn’t seem to add much value from the perspective of showing how the distributed logic works

See anything else? I would love to know so I could improve the models, let me know.

]]>8Ted Krueger (onpnt)http://blogs.lessthandot.com/?p=24342014-06-04T02:31:24Z2014-06-04T01:56:40ZAvailability Groups were introduced in SQL Server 2012 and have quickly become the forefront of high availability for the SQL Server Database Engine Services. In prior versions of SQL Server, true high availability was not a complete solution packaged with the native installation and feature set. While mirroring was introduced in SQL Server 2005 and provided a much-needed advance towards achieving highly available data services, mirroring still required much customization to effectively provide a true high availability solution.

It’s important to define high availability as it pertains to database level services – database high availability requires the means of retaining data services to users, applications and services within a defined allowable interruption of data services. This equates simply as, are data services available to any connection within a given tolerance. The nines has always been the flavor of measuring availability in the way of uptime. “My database server achieved four nines availability last year!” This achievement is essentially stating the database services were not available for an estimated .01% of the total operating hours in a year or, 52 minutes. Remember, this achievement is based on service level agreements that can have vast variations. While even Full Clustered Instances of SQL Server implement a level of high availability, there is still the single point of failure that is of highest importance to the data services – the database. Points of failure and focusing on where they may occur, begins allowing a visual representation of how availability achievements can be achieved.

Points of Failure

Hardware

Operating System

Data Corruption

Data Loss

Network

These five possible points of failure are specific to almost all data services. As discussed, Full Clustered Instances is a protection against hardware and operating system failures, mirroring is protection against data loss and corruption events, and geo-clustering can be utilized to prevent network failure events. However, even combining all of these technologies or features, the concept of data connectivity is still of concern as customization is needed if a single point of connection is not available to an application or user.

As stated already, pre-SQL Server 2012, even a four nines achievement was difficult. This was due to the customization that was needed to maintain a mirror failover event. Given that SQL Server is mirroring from instance name to instance name, and there is no single entry point for applications, some manual or programmatic intervention would be needed to handle the fact that a data server name could change. While connection strings built in certain frameworks could allow for a failover partner attribute, not all connectivity types and providers supported them. This forced coding changes, monitoring needs and much effort to manage a complete, seamless failover of one designated primary database server.

Availability Groups in SQL Server 2012 combats these issues directly by joining clustering and mirroring technologies and associating them into a group that coexist and all rely on each other in order to maintain availability. The concept of applying a cluster design and including single instances acting on their own, begins to remove failure points in all areas of availability.

With Availability Groups and Windows Server Failover Clustering, we can look at the same failure points differently, asking how they allow for availability to be retained at any point of failure.

Hardware

Prevention be implementing the same level of protection in FCI but with WSFC

Operating System

Prevention be implementing the same level of protection in FCI but with WSFC

Data Corruption

Prevention with mirrored databases between each instance in the group

Data Loss

Prevention with mirrored databases standing on unique disk subsystems

Network

Allowance of multi-subnet clustering for added protection of network loss

The following step-by-step illustration has been composed for a general setup and configuration need of Availability Groups in SQL Server 2012. This is essential to becoming familiar with Availability Groups and determining how they fit into specific high availability needs and how data services are being used. The specific setup and configuration for the Availability Group, SQLAG located on the network, has been utilized to document the needed setup and configurations. Use this document for setup and configuration of new Availability Groups or the model in place on the network for the Availability Group SQLAG.

Availability Groups are being utilized for three primary objectives:

Achieve High Availability

Offload read-only queries (or requests)

Achieve a Disaster and Recovery solution

In order to meet all three objectives, a 4-Node Availability Group has been designed, based on a node and share majority solution.

In the Availability Group solution pictured above, the following scenarios Availability Group can be achieved:

Note: Failover scenarios are extended to any node and cluster services maintained provided file share resources are online. This failure scenario will maintain the cluster and listener resources of up to 2 nodes. File share loss lowers a maximum node failure loss of one node. If file share loss occurs and 2 nodes are lost, the cluster cannot be maintained.

Windows Server Failover Clustering and Availability Group Information

Setup of the Availability Groups in the structure covers one or more databases. The naming conventions utilized are listed below along with each setting that has been configured in this setup per node.

Windows Server Failover Cluster (WSFC)

Lab Server Build – Ensure to search for new hotfixes

Windows Server 2008 R2 Enterprise Service Pack 1

Physical Memory – 4.00 GB

64-bit Operating System

Processor – Intel Xeon X5660 @ 2.80Ghx

1 CPU allocation (Logical count = 1)

Hotfix patches applied

KB2494036

A hotfix is available to let you configure a cluster node that does not have quorum votes in Windows Server 2008 and in Windows Server 2008 R2

KB2687741

A hotfix that improves the performance of the “AlwaysOn Availability Group” feature in SQL Server 2012 is available for Windows Server 2008 R2

Server names

NODE1

NODE2

NODE3

NODE4

Cluster Name

SQLCLSTR

Quorum configuration for WSFC

Node Majority File Share

Network Configuration

Cluster Network 1

Cluster Access IP – 10.2.4.71

Subnet 10.2.4.0/24

NODE1

10.2.4.31

NODE2

10.2.4.34

NODE3

10.2.4.50

NODE4

10.2.4.55

SQL Server Availability Groups

SQL Server Instances

Microsoft SQL Server Enterprise (64-bit)

Build 11.0.2100.0

Named Instances

NODE1\SHAREPOINT2013

NODE2\SHAREPOINT2013

NODE3\SHAREPOINT2013

NODE4\SHAREPOINT2013

Availability Group (AG)

AG Name

SQLAG

Timeout configuration for all replicas – 10 seconds

Endpoint port utilized – 5022

AG Listener Name

SQLAGLISTENER

10.2.4.72

AG Replicas

NODE1

Primary

Synchronous Role

Automatic Failover

NODE2

Secondary

Synchronous Role

Automatic Failover

NODE3

Secondary

Synchronous Role

Read Only

Read-intent only

Manual Failover

NODE4

Secondary

Asynchronous Role

Manual Failover

WSFC Setup and Configuration

Cluster Setup

Four servers are acting as nodes in the Windows Server Failover Cluster (WSFC). The servers are all located on the subnet 10.2.4.x. Each server has the failover clustering full feature installation.

Configuring the WSFC cluster first requires each server to have the Failover Clustering feature installed. To perform this installation, follow these steps.

Connect to each server or open Server Manager for remote access to each server. In Server Manager, go to the Features node and click the Add Features link.

In the Select Features wizard, check “Failover Clustering” and click Install.

Once the installation is completed, a restart of the server is not required but highly recommended.

Perform these steps for every server that will be enlisted in the WSFC.

Configure the Cluster

The following steps are used to configure the cluster containing the four logical servers that have the failover clustering features installed.

On a local computer with Failover Manager installed, or one of the servers that will be in the cluster, click Start–>Administrative Tools–>Failover Cluster Manager.

(The Failover Cluster Manager can also be opened by using the MMC snap-in.)

(If the welcome screen is shown, click Next.)

Enter in each server’s name that will be part of the cluster. Click Add after typing in the server name or utilize the Browse button to search the domain for registered server names.

Click Next to review the Validation Warning options.

In the Validation Warning screen, if validation tests have not been executed yut, select Yes and click Next to run the validation tests. This will review the servers for all configurations and options, as they pertain to the server being set up correctly for participation in a cluster.

Select No to bypass the validation warning review

In the Access Point for Administering the Cluster, enter the name of the cluster and the network information needed. If multiple subnets will be utilized, add both in this screen.

Click Next and confirm the settings are accurate. If any changes are required, click Back to make adjustments.

Click Next to create the cluster.

Once completed, click Finish.

Configuring the Quorum

To configure the quorum settings for the new cluster, right click the cluster name in Cluster Manager. Scroll to More Actions and select Configure Cluster Quorum Settings. Click Next to the Before You Begin screen if it is shown.

In the Select Quorum Configuration screen, chose which type of quorum should be utilized. If a disk or file share resource has not been added, the only selection available will be Node Majority. Note: to create a share resource, the share must be located outside of the servers in the cluster. For this configuration, we want a file share as a fifth node to retain a healthy quorum in the event of node losses.

Click Next to confirm settings and click Next again to configure the settings.

SQL Server 2012 Availability Group Setup and Configuration

Configure the SQL Server Availability Group

In order to configure an Availability Group, the following steps are required in this sequence.

Each server that will take part as a replica must be in a WSFC.

A network name should be created to act as the listener (the name by which applications and users will connect to SQL Server and the databases in the Availability Group). A listener can be created from the wizard if the listener is not created beforehand.

Security should be set up for the administrator that will be executing the AG setup. This account must be in the sysadmin server role on all SQL Server instances that are acting as replicas and have access to read the domain Active Directory services. If a listener is not prepared before creating the AG, the account is required to be a domain admin.

Each server that will act as a replica must have AlwaysOn features enabled in SQL Server.

The database(s) that will be in the Availability Group should be restored or created on the replica that will act as the primary. These database(s) should be in RECOVERY mode and set to a recovery model of Full.

Each database on the primary requires a full backup to be executed.

A share should be created to retain the full backup of the primary replica’s database(s). This share can be located on any of the replica servers or an external share resource.

Once the above security and required resources are available, continue with the steps below to create the AG.

Remote into each server that will act as a replica in the Availability Group and configure SQL Server for AlwaysOn features to be enabled. (Warning: This setting requires SQL Server to be restarted.)

Open SQL Server Configuration Manager.

Select SQL Server Services in the tree view. Right click the SQL Server instance to be configured and select Properties.

In the SQL Server Properties window, select the AlwaysOn High Availability tab. Check Enable AlwaysOn Availability Groups. The Windows failover cluster name should default to the cluster that the server is part of. If this does not auto-populate, the server is not in a WSFC and is required to be added before this step is performed.

Click OK and restart the SQL Server services.

(Perform the above steps for each SQL Server that will be in the Availability Group.)

Connect to SQL Server via SQL Server Management Studio (SSMS) on the server that will act as the primary replica in the Availability Group. Restore or create the database(s) that are required to be in the Availability Group. Perform a full back up on each database. (Note: ensure each database is in Full Recovery Model)

Expand the AlwaysOn High Availability node. Right click the Availability Groups node and select New Availability Group Wizard

Click Next if shown the Introduction window.

Enter the name chosen for the Availability Groups and click Next.

Ensure each database that will take part in the Availability Group passes validations. If a database does not pass the validation process that is required to be added to an Availability Group, the list to the right of each database will show what is required. For example: if a full backup was not performed, the link will read, “Full backup is required”.

After performing the required prerequisites, click Refresh to re-validate the databases. Once the validation process states, “Meets prerequisites”, check each database that is required and click Next.

In the Specify Replicas screen, click the Add Replica…button to add the replicas that will be part of the Availability Group.

If changes are required due to network needs or port configurations, make them in this window. For example: if port 5022 is used for other communications, adjust to port 5023 or another free port.

Click the Backup preferences tab.

Check Secondary only option for backup preferences and check the checkbox for Replica NODE3.

Click the Listener tab. If the listener was created prior to configuring the Availability Group, leave “Do not create an Availability Group listener now”. If a listen should be created, check the “Create an Availability Group listener” and enter the required information. Note: domain admin privileges are required to create the domain name if choosing to create the listener at this time.

Click Next to save the replica settings.

In the Select Initial Data Synchronization window, enter a location for the full backup and log backup to be stored to initialize the other replicas in the Availability Group. This location can be a drive or share on any of the replicas or an external share. In the example below, replica 04 is used with the admin share to the T Drive.

Click Next to run the validation process from the settings saved so far. The only warning should be if the listener was chosen not to be created at the time of the setup.

If a warning or error has been found, click the link to review the message and act accordingly.

Click Next to review the choices that will be executed upon clicking Finish to create the Availability Group and initialize the replicas.

Note: it is recommended to click the Script button at this time and save the script generated by the wizard to a secure location for future recovery.

Once Finish is clicked, the Availability Group and all replicas will be initialized and created. This process can take some time depending on the size of each database in the Availability Group.

The process will execute in this order

Endpoints will be configured/created

Extended Events for monitoring an AG will be created

The Availability Group will be created

The Availability Group will be brought online

Secondary replicas will be added to the Availability Group

The WSFC quorum will be validated for votes of each replica server

A full backup of the database in the Availability Group will be taken

Steps repeated in this order for each replica

A backup will be restored to the secondary replica

The tail-end log backup will be taken from the primary

The tail-end log backup will be restored to the secondary replica

Database will be joined to the Availability Group

Given the even number of node votes in this wizard driven setup, we will need to configure the voting weight of the nodes at a later point. With the even number the warning is shown as “Validating WSFC quorum vote configuration”.

Configure the Listener

If the listener was not created in the initial wizard setup, perform these steps to assign the listener to the Availability Group.

Open SSMS 2012

Execute the statement below to assign the listener name and IP to the Availability Group

Note: Assign the IP as needed in this step. This is the IP that will translate for any connection being made to the Availability Group containing all databases.

Configure Read Routing for Read-Intent Connections

Multiple replicas can be configured for read-intent only connections. This configuration is used to route connections that specify in the connection string the “read-intent” setting. For example: the following connection string is set for read-intent and will be routed to a read-only secondary replica.

To configure a replica for read-intent only connections, a primary and secondary role must be set. Each replica in an Availability Group has both a primary and secondary role. By default, the primary role for all replicas is All Connections. This allows for every secondary or primary to be utilized in the event of a disaster as a primary connection with full read/write capabilities. For read-only with read-intent configurations, there are two or more replicas that are configured for the role of primary with all connections and then a secondary role as a read routing for read-intent purposes.

In the following configuration, the diagram illustrates the 2-node read routing configuration.

Note: Each replica requires the secondary role to be configured in a read-intent configuration. In this configuration, the primary replica has a secondary role of read-intent. In the case of the automatic failover to the secondary node, read-intent connections would be routed back to the secondary node.

Connecting to SQL Server and the Availability Group

Every Availability Group utilizes the listener for connectivity between an external source and the actual database(s) enrolled in the Availability Group. It is possible, for administrative purposes only, to connect directly to each replica’s SQL Server instance directly. These direct connections should be performed for instance level configurations such as MAXDOP, memory and so on. Configuring or making changes to the Availability Group should also be performed on the primary replica by directly connecting to it. For example: the primary in the configuration is NODE1 and this would be the instance to connect to for configuration of the Availability Group.

For database level changes such as security, connect to the database through the listener name.

For application or service connections, no other changes are needed beyond the normal connection strings utilized for connecting directly to either a named instanced, default instance or clustered instances of SQL Server. However, if a read-intent connection is desired, the ApplicationIntent=ReadOnly, attribute is required in the connection string for read routing to take place.

Modifying the Availability Group

Add a database to an existing Availability Group

To add or remove a database from a pre-existing Availability Group, connect to the SQL Server instance acting as the primary replica via SSMS.

Expand the AlwaysOn High Availability node and expand the Availability Group that you wish change.

Right click Availability Databases and click Add Database.

The Add Database to Availability Group wizard will be shown. This wizard is the same as the initial Availability Group setup. Each database must follow the same requirements of being in full recovery model and having a full backup taken.

Follow the screens and fill in each required piece of information that is specific to the database(s) you wish to add to the Availability Group

When the wizard has completed, the database will have a full backup taken and restored to all replicas, and a tail-end log backup will be taken, before it is added to the Availability Group. During this time the other databases and the Availability Group will be available to all connections. There are no requirements beyond these steps to add a database to the Availability Group.

To remove a database, right click the database in the listing and select Remove Database from Availability Group. Note: this will remove the database from the Availability Group only. This will not delete the physical database located on all replicas. The primary database will persist allowing direct connections and all replica databases will be in a NORECOVERY state (restoring…).

Modifying Availability Group Settings

All modifications to the Availability Group should be performed on the primary replica. To modify settings for a specific Availability Group, connect to the primary replica via SSMS 2012, expand the AlwaysOn High Availability list and expand the Availability Groups node.

Right click the Availability Group requiring modifications and click Properties.

All changes should be performed in the Availability Group properties window.

Changes that can be performed

1) Add or remove replicas

2) Enable read-only or read-intent replicas

3) Switch between synchronous and asynchronous committing between replicas

4) Modify endpoint utilization (Note: endpoints cannot be altered in this window. To modify an endpoint, go to server objectsàendpoints in SSMS.)

5) Backup Preferences

6) Primary and Secondary roles

7) Permissions

Failover Testing and Monitoring

Initiate a Failover

To simulate and test a failover scenario, connect to the primary or listener via SSMS 2012. Expand the AlawaysOn High Availability list and Availability Groups’ node. Right click the Availability Group to be tested and select Failover.

The failover wizard will be displayed.

Select the replica to failover to and complete the wizard.

Monitoring Availability Groups

When Availability Groups are enabled on SQL Server 2012, Extended Events are implemented for monitoring the state and health of the mirroring sessions and other session states. To review the information that is being collected, connect to the listener name through SSMS and right click AlwaysOn High Availability and click Show Dashboard.

A list of Availability Groups available to monitor will be shown.

Warning: The dashboard works on a refresh rate. The queries pulling information may have an impact on performance if too many dashboards are active. I do not recommend leaving any one dashboard open for a long period of time.

Click the Availability Group to review the health and status.

The overall health of the Availability Group, databases and replicas will be shown.

Review the events being captured by selecting each one to show the details.

To review the quorum configuration and votes per replica in the cluster, click the View Cluster Quorum Information link.

To make modifications to the quorum or votes (NodeWeight), use the Failover Cluster Manager or PowerShell. For example: the following PowerShell statement is used to configure the votes shown above for NODE4 to have no votes in the health of the cluster. This is a common configuration for a disaster and recovery replica node in the cluster and Availability Group setup.

PowerShell

1
2

Import-Module FailoverClusters
(Get-ClusterNode NODE4).NodeWeight=0

Import-Module FailoverClusters
(Get-ClusterNode NODE4).NodeWeight=0

What’s next?

Following this example of setting up a lab to test on with Availability groups, we’ll show how to script this entire solution with PoSh and T-SQL to ensure the setup, configuration and additional databases placed on the group, are not as long-winded of a process.

]]>6Eli Weinstock-Herman (tarwn)http://blogs.lessthandot.com/index.php/2012/12/scalability-is-easy-to-get/2016-02-07T21:43:06Z2012-12-05T12:50:00ZScalability is easy, provided you don’t need it to work.

Probably the number one failure of system scaling is when people dive right in and start building. No baselines, limited measurements, no analysis, just a hypothesis and a whole lot of late nights tweaking the system. With extra complexity comes extra costs, from the initial development through more expensive maintenance. Scale poorly and not only do we take on those extra complexity costs, but also the more obvious additional costs of the actual implementation (new servers, more resources, etc).

The Somewhat Contrived Example

Recently I’ve been working on a system to simulate parallelizing workloads, specifically workloads that depend on external resources with rate or load thresholds. Let’s use it to look at a somewhat contrived example.

Note: For this post, the simulated “service” has a 100 request/minute limit and throttles individual clients for 15 second windows. Individual operations consist of 50ms of local processing and a single service request that has 50ms of latency and 100 ms for processing and response time. Similar results can be achieved with more realistic batch sizes and rates, the smaller values just allow me to more quickly produce samples for the blog post.

So the backstory is that I have a batch process that is running more and more slowly as we take on larger and more frequent workloads.

I start by testing the batch process locally so I can see how slow it is before I make changes.

1 Client, 60 Requests, 100rpm

Locally it runs pretty quickly, but I’m betting that parallelizing the process will give me a significant increase in speed.

1 Client vs 30 Clients, 60 Requests, 100rpm

Look how well that performance improved, I achieved better than a 6x improvement in performance. My job here is done.

Except when I try to push this to production, I start getting a lot of errors.

30 Clients, 300 Requests, 100rpm – 67% Failure Rate

As it turns out, the external API my process saves the data to has a rate limit. When I exceed the allowable rate, I’m throttled for a short period of time. Any requests I make during that throttle period are returned with errors indicating I’m throttled.

Hmm. Luckily there are a number of common patterns available to retry these types of failures. I’ll add an exponential back-off retry pattern so that when I get throttled my service will retry failed requests at slower rates until the service un-throttles me. While I’ve found plenty of code examples online, none of them seem to have recommendations, so I’ll just use one of the sample settings they provide.

30 Clients, 300 Requests, 100rpm, Retry Policy #1 – 7% Failure Rate

Hmm, better. My failure rate has gone way down. What if I tweak the values?

30 Clients, 300 Requests, 100rpm, Retry Policy #2 – 29% Failure Rate

Oh, that was bad, I obviously was on the right track before. What if I just extend the retry amount a bit to try and knock out the last bit of errors.

30 Clients, 300 Requests, 100rpm, Retry Policy #1B – 0% Failure Rate

Ok, perfect. Now I have a system that is more than 6 times faster than the original, can be easily extended by throwing more workers at it, and is actually in a better position to handle occasional slow downs from my 3rd-party service.

Success!

Where I Went Wrong

Ok, so maybe not. Over the course of my little story I went wrong in a number of places. Even though this was a contrived example, I’ve watched very similar scenarios play out in a number of different organizations with real systems.

The Bottleneck

The first and most critical problem was that I didn’t actually locate the bottleneck in my process, I simply tried to do more of the same. The Theory of Constraints tells us that we can improve the rate of a process by identifying and exploiting the constraints.

In this system, the constraint looked like it was the sequential execution of the tasks, but in reality the constraint was the time it took to call the 3rd-party API. Had we identified that bottleneck before starting, we could have approached the problem differently.

Process – Alternative Design

Rather than the parallel complexity, we can modify how the tasks are executed to try and take advantage of knowing where our bottleneck is. If the API allowed us to submit several requests in a batch, this redesign would net us several orders of magnitude improvement. Another option would be to run the results of the local processing into a queue and submit requests from there at a slow trickle, using only a percentage of our API limit so as not to disrupt any other real-time operations or batch processing the system supports. Another option we could take advantage of is not starting any of our expensive 3rd-party communications until we know that the entire job can actually be processed successfully through our local process.

Identifying the constraint unlocks the ability to turn the problem on it’s head and achieve higher improvements, typically by orders of magnitude.

The Math Error

I concluded the scenario above by assuming I had found a good solution that also had a lot of headroom. Unfortunately what I actually did was find the ceiling. I have tuned the retry policy to 30 parallel systems, increasing that number could easily destabilize it further and cause more errors or delays. The headroom is, in fact, an illusion.

The Evolving Assumption

Somewhere along the way I found 30 clients to be a great improvement and didn’t test other options. Then I made some assumptions about a retry policy. Then I tweaked that retry policy until the error rate disappeared. My assumptions made sense at the time, so I never questioned where they were leading me.

When I found a winning combination for my retry rate, I didn’t realize I was missing other, better options:

30 Clients, 300 Requests, 100rpm, Retry Policy #1B vs 4A-C

What’s worse, is that trail of assumptions along the way was never re-validated. I concluded with a 6.5x improvement, but is that still accurate?

1-50 Clients, 300 Requests, 100rpm, Retry Policy #1B

When I run the same settings on a range of 1 to 50 clients, we can see that I lost that original 6x improvement along the way. All I have managed to do is add complexity and some very explicit costs for those additional systems.

Note: What happened to retry policy #3? Well originally 1B was actually named 3, then I decided to update the name halfway through the post but was too lazy to update all of the completed images. Oh well.

Scaling the Wrong Thing

This post used fairly small numbers, had I applied a larger workload, higher throughput rates, different throttling windows, the whole problem would have turned out differently.

When we set out to make a system scale, we need to identify the real scenarios we are trying to scale for and the bottlenecks that stand in our way. Blindly performance tuning can look like an improvement, but is really just a poor short term investment that often entrenches the current performance problems even more deeply. There are a lot of questions that should be asked about the intended result, responsiveness fo the system, other operations it has to support while under load, potential overlap of that load, the type of load, etc. The patterns for one system may have relevance for another, but could just as easily be completely incorrect.

How hard it is to scale a system is going to depend on a lot of factors. Getting it wrong just happens to be the easiest option.

]]>4Alex Ullrichhttp://blogs.lessthandot.com/index.php/2012/06/getting-flexible-with-ndepend-4-and-cqlinq/2012-06-04T10:45:00Z2012-06-04T10:45:00ZAt my last job we had a non-functional attribute that another team used to decorate service methods that they consumed. The other team was working on an alternative client to our WCF services, and they weren’t on the same release schedule they needed to be able to target multiple versions of our services within a single version of their application. Because of this requirement, they maintained a wrapper around our services that handled some of the differences between versions. The main use for this attribute was to foster communication between the teams, so that if we changed a decorated method we would let them know. As I’m sure anyone on this other team would tell you, we weren’t always that good about communicating these changes.

In an effort to make communication between teams easier we used a CQL query like this to report changes to these methods as part of our automated builds:

SELECT METHODS FROM NAMESPACES "Services"
WHERE HasAttribute "OPTIONAL:Services.KnownExternalClientsAttribute"
AND CodeWasChanged

This was nice, but it only got us part of the way there. This would alert us to signature changes or changes to the content of the method, but not necessarily changes to the message contracts passed in to the method. In Pseudo-CQL the query I had in mind looks something like this:

This didn’t work however (CQL doesn’t really have support for subqueries), and I couldn’t really find anything in the language that would allow us to achieve what we wanted. NDepend 4 introduces a new linq-based replacement called CQLinq that offers a lot more flexibility, so I figured I would see if I could write the query that we needed using it. It ended up being much easier than I thought – CQLinq gives us access to most (if not all) of the standard LINQ operators, and the same functions for querying code using attributes and history that we had with CQL. This is the query I came up with:

Once we have the query we can mark it as critical, so we will have a failing build after the changes are made. Only the first build after making the changes should fail, but that would be enough to trigger an investigation that would result in communicating the changes to the other team.

CQL has always been my favorite feature of NDepend, so its no surprise that CQLinq is my favorite feature in this new release. The LINQ based syntax feels much more natural to me when writing queries against a codebase than the SQL-like syntax of CQL, and still gives us all the same visualization goodies to foster quick understanding of the query results. I’m really excited to dig in a little more and see what else I can do with it.

]]>2Eli Weinstock-Herman (tarwn)http://blogs.lessthandot.com/index.php/2011/04/adding-user-emulation-to-an/2011-04-27T12:57:00Z2011-04-27T12:57:00ZOne of the tricks I picked up from my last job (and our forum software, now that I think of it) is the idea of user emulation. I could log into the application, search for a user, and, at the push of a button, temporarily become that user. The only differences between emulating them and actually logging in as them were a black bar that indicated who I am (with a link to stop emulating), all audit records continued to reflect my own user id, and I didn’t need to keep track of 30 different sample accounts and passwords.

As I said, it’s a neat trick.

Advantages

Implemented consistently, this stops being just a trick and becomes a very powerful tool. Developers, QA, even customer service and support see benefits from being able to quickly emulate end users.

Debugging

As developers the largest benefit for us is the ability to debug our systems from a variety of viewpoints. Rather than going through the trouble of creating and managing sample users for every impacted role, we can use existing user records with real data behind them. This not only reduces the time to start debugging and remove the time involved in ongoing maintenance of test accounts, but may actually force out a few extra bugs that we wouldn’t catch with a new, vanilla user account.

Support

Duplicating a bug or answering a question becomes a lot easier when we can emulate the person on the other end of the bug report or phone. We can emulate the person in our production environment and in our development environment and verify that both environments break in the same way (or that development doesn’t break after we fix it). We also get firsthand clues, which can knock hours off the bug-hunting process.

Customer Service

Just as developers and support benefit on the technical side, in some cases Customer Service representatives (or selected members of the business) can use emulation to provide business or first level user support. When an end user has a complex question about an order or report, the service representative no longer needs to imagine their way through the issue, but instead can emulate that user, walk through the process, and see exactly what their user is seeing. This can be even more critical in systems where users see only a subset of the functionality or data available to service representatives.

Implementation

So emulation is a useful tool as well as a neat trick. Like many things, it is generally easier to bake this in from the beginning than to add it after the fact. If the user information is accessed in a consistent fashion in existing software, it is possible to squeeze in emulation logic and clean up the few places people cut corners and accessed users outside the normal context. If the user information is loaded and accessed at will throughout the application, adding emulation will be much harder (though there is some additional benefit in that it forces you to clean up your architecture a bit).

Public Class SessionContext
Private Property EmulatedUser As User
Private Property LoggedInUser As User
Public ReadOnly Property User As User
Get
If Me.EmulatedUser IsNot Nothing Then
Return Me.EmulatedUser
Else
Return Me.LoggedInUser
End If
End Get
End Property
Public ReadOnly Property IsEmulating As Boolean
Get
Return Me.EmulatedUser IsNot Nothing
End Get
End Property
''' <summary>
''' Property used for accessing current's users information for auditing
''' </summary>
Public ReadOnly Property UserIdForAuditing As Integer
Get
If Me.LoggedInUser IsNot Nothing Then Return Me.LoggedInUser.UserID
Return 0
End Get
End Property
Public Sub LogInUser(ByVal newUser As User)
Me.LoggedInUser = newUser
Me.EmulatedUser = Nothing
' plus other login logic stuff
End Sub
Public Sub LogOutUser()
Me.LoggedInUser = Nothing
Me.EmulatedUser = Nothing
End Sub
Public Sub StartEmulating(ByVal selectedUser As User)
Me.EmulatedUser = selectedUser
End Sub
Public Sub StopEmulating()
Me.EmulatedUser = Nothing
End Sub
End Class

Altogether not that complex a code construct, although I’m sure it will grow more so over the lifetime of the application.

As long as we consistently access user information through the exposed User property in the session and user the exposed UserIdForAuditing property for auditing purposes, then most of the work is done for us. The only other piece we need is a button on the UI to start emulating and some logic to handle the danger below.

Dangers

There are two main dangers to watch for. The first danger is making sure emulation doesn’t grant users the ability to promote themselves. Either emulation needs to be reserved for administrative users, or logic needs to be added to make certain levels or roles unavailable for emulation (or assignment during emulation).

The other main danger is that you now have a much greater chance (probably guarantee) that you will have the same user “logged in” in two locations at once. Most applications handle this just fine, but there are also many that cannot. Examples of this behavior are enforcing single location sign-ons, coded security that assumes multiple location means an account has been compromised, and making the mistake of storing session data keyed only to a user id instead of a unique session.

]]>1Eli Weinstock-Herman (tarwn)http://blogs.lessthandot.com/index.php/2010/10/why-and-how-i-model/2010-10-15T09:20:39Z2010-10-15T09:20:39ZOver my years in (and before) IT, I’ve seen long projects, failed projects, confused projects, wildly successful projects, and even fun projects that ended far differently than we expected. The consistent take-away for me is that I am a big picture type of person, and that understanding that big, abstract picture cuts out a lot of wasted time sprinting down the wrong paths.

Creating a model forces me to refine a concept down to it’s simplest elements, forces me to face the unknowns that my mind has so casually been skipping over. When done well, a model communicates a clear idea and replaces not only the thousands words required to explain it, but the 9000 I would have wasted getting there.

I model to think through processes, question my assumptions, and provide guidance towards a solution. While it probably looks like something I threw together in about ten minutes, there are actually a lot of processes going on behind the scenes.

Purpose – What are We Drawing?

No Goal? Here’s a Diagram.

As with all things, a diagram should have a goal. A model that isn’t trying to communicate an idea is filler for a report no one is going to read anyway. A goal should be concise and limited to a single subject or perspective:

The data flow from the end customer to our master data system

An order-to-cash business process

The functional architecture of a software application

A graphic representation of our current state

Mess around with too many factors and at the end of the day a mess is all you’ll have:

The physical network topology combined with the disaster recovery plan and data flows between the systems

The application architecture with defined user work flows and user experience elements

Or to translate: gobbledygook.

Constraint – Less is More

A goal provides me with my first constraint, and constraints are good. Defining constraints will keep my model simpler and consistent, which means the end message will be clearer. At the same time, a well-defined set of constraints will encourage creativity, providing a better end product.

Often my constraints will include things like not allowing connections to cross, only using a very simple set of shapes, restricting myself to only a few shades of color, or setting time limits. I’ll define the perspective I want to use with my goal, whether it will be a topological map, a flow, or just a set of connected shapes.

This keeps me focused instead of playing with the entire palette of colors, shapes, and page sizes available in my favorite software tools.

Content – Work on a Temporary Surface

Even with constraints and a goal, I still don’t know exactly where I will end up or what I will learn along the way, so I start on the whiteboard. With a whiteboard I can start diagramming out the pieces I know, add in new items or resolve question marks as I run into them, and easily combine and rearrange my thoughts. Some of my constraints will be ineffective at this stage, but natural constraints (like the number of markers I have and the board size) will replace them in helping my creativity and thought processes.

This stage is also where I figure out my wording. Because it’s so easy to see the big picture (heh) on my whiteboard, I also get a good feel for when words are too specific, not specific enough, or possibly just not quite the right word for what I am trying to communicate. Instead of focusing on getting all the boxes lined up, I can focus on using clear and consistent terminology that will help support the final model rather than detract from it.

Medium – Where is it Going?

Content needs Context

As I make the transition from whiteboard to diagramming software, the last piece of the equation is to consider the medium I am going to use to communicate the model. A standalone diagram may put higher priority on further simplicity of shapes and colors, where a presentation model may put lower priority on text and higher priority on subtleties for deeper conversation.

Will there need to be a legend? Is font-size 8 going to be a waste of time or readable font? If I use subtle shades of color will it all print the same color or show up in gloriously rendered imagery on a 12 foot display? Will adding a cartoon get a chuckle in a presentation or a frown in an executive review? Can I include a picture of my cat?

The context the diagram will be communicating in will determine the last set of constraints.

Terminology – The Wrong Word Invalidates the Model

General wording may have been roughed in on the whiteboard and some really good words may have been chosen, but these now need to be examined in light of the future medium as well as the audience. In many cases, using a word out of context can distract my audience or even negate the model’s message entirely.

That being said, endless anxiety over perfection is nowhere near as good as a cool beer at the end of a work day, so we need to strike a balance between working through the night and achieving good enough. So I’ll be careful, in general, when using customer terminology and try to be pragmatic in my search for the perfect name for the third box from the left.

Composition – Additional Layers of Meaning

The last stage of the model, having transferred it from whiteboard to software and applied corrections to terminology, is to add some depth that supports the initial concept.

Colors, re-arranging layout to alter proximity, fonts, and even line thicknesses are all tools I use to add subtle depth to a model. If I am planning on presenting the model, I can start the discussion on the general message of the model and dive into these subtleties as the discussion progresses. A thicker line between two systems can communicate greater bandwidth or a more secure transport layer. A common shading of colors between multiple objects communicates a relationship or similarity. As with the stages before, I try to use constraint. Applying the whole palette and a different shape for each object may seem fun, but it’s going to communicate confusion (and possibly a desire for medication).

While working with the composition, I will also create temporary versions to play with drastically different layouts or shapes. This gives me a fresh look at a concept that has undoubtedly been on my whiteboard for days, giving me an opportunity to catch last minute holes or simply provide alternative layout options.

Sounds Like a Lot of Work…

There are different levels of work involved in modeling. In some cases even finding the time to stop doing and try to draw an idea may seem like a waste.

Not looking ahead?Diagram for that too…

How do I judge when it is worth spending the time?

5 Minutes: If it only takes five minutes to draw a fast diagram of what I am intending to do, then that 5 minutes didn’t cost much and I can move forward with confidence.

30 Minutes: If it takes 30 minutes, I’ve erased and redrawn half of it, and the person I am explaining it to is still arguing with me, then it’s time to draw a model.

3 Hours: If it takes 3 hours, we end with more questions than we started with, half the questions have the potential for refocusing the project, and we’re still trying to figure out what to call this thing…yeah, it’s definitely time to get a handle on what we’re spending time on.

Jumping right into any project when we can’t draw a high level summary means we’re spending time and resources on something we can’t adequately define. It doesn’t matter how fast we’re moving if we’re spending that time running in random directions and ignoring cross-traffic.

]]>2Alex Ullrichhttp://blogs.lessthandot.com/index.php/2010/09/cql-from-visual-studio-with-ndepend-3/2010-09-06T20:29:00Z2010-09-06T20:29:00ZFor the last few months I’ve had the pleasure of working with NDepend version 3. Most of my development at home is on linux these days, so I haven’t used it as much as I’d like, but I have been using it to poke around in various codebases and see what the new Visual Studio integration is all about. The last version integrated with Visual Studio, technically speaking, but it didn’t seem nearly as thorough as what I’ve seen in version 3. I suspect the improved extensibility model in VS 2010 has a lot to do with this, but can’t confirm (I haven’t tried it with 2008 either).

My favorite feature of NDepend has always been CQL, the SQL-like query language that allows you to query your codebase using a variety of common metrics. This is the same as it ever was (the integration with VS is even quite similar) but with the more thorough integration it seems much more useful. I like how easy it is now to keep an eye on my CQL constraints when I rebuild.

My favorite CQL feature is the ability to set up CQL constraints from now. This is really cool for older projects, where it’s unrealistic to think that your team will be able to fix everything right away. But what you can do with this feature is ensure that all new or modified code does measure up to your team’s standards. You may not be able to clean up all those 1,000 line methods right away, but you can ensure that newer methods fit in a more reasonable size limit (like 975 ). This is one of the most useful features of the application, IMO. The way it works is by allowing you to establish a baseline. By comparing the code’s current state to this baseline, future analyses are able to determine which methods/types/etc are new or changed, and apply the constraint to only these methods/types/etc. Sometimes I feel like this would be useful just to be able to concisely see which methods have been changed as well (version control logs aren’t the most friendly things to read, especially spread throughout a large codebase). Below is a screenshot of this feature in action.

The first three queries listed are built in to NDepend. I added the fourth, just to have a listing of new/changed methods ready. The CQL for this query is simply

Text

1

SELECT METHODS WHERE WasAdded OR CodeWasChanged

SELECT METHODS WHERE WasAdded OR CodeWasChanged

Not a bad way to keep an eye on what is getting changed in the codebase. To get in this state I added three new methods to the codebase I was looking at (in a place that I could remove them easily since they are not only low quality but useless as well). Two had 7 parameters, putting them in violation of the constraint for basic quality principles. I didn’t add any tests, so all three were in violation of the test coverage constraint. And finally they all showed up in the list of new methods. It’s worth noting the yellow circle at the bottom right as well – the yellow means that warnings were encountered when running the CQL portion of the analysis. Green would be good, and red would mean I have some bad queries that can’t be run.

Double clicking a row in the CQL Explorer will take you to the CQL Editor – from here you can view the results of the query, and the CQL it contains. From there you can easily navigate to the method definition in your source code by double-clicking.

One of the things I really like here is the comments in the built-in queries. They contain numerous links to metric definitions on the NDepend website, and sometimes even links to blog entries where the lead developer, Patrick Smacchia has explained features in greater detail. I really like this form of documentation, it makes it easier to keep up to date and also minimizes what needs to be stored on the user’s computer.

What I was happiest to find in the VS integration is the ability to superimpose CQL results onto the metrics view. The metrics view consists of a grid where each block represents a unit of code (type, method, etc…) and they are sized according to their value for the metric in question. When running CQL queries, the units of code matching your criteria are highlighted, giving a great visualization of how much code exhibits the properties you are looking for.

The selected query (about types having too many efferent couplings) is the currently selected query, and I had moused over the QueryParser class to highlight it in pink and show the metrics summary on the right. I find that having this built right into visual studio really helps me figure out where to focus my refactoring energy.

CQL rules validation phase is fast. The performance challenge was to make this happens almost instantly to avoid slowing the developer machine. Hopefully for a large 100K Lines of Code application, code gets re-analyzed and 200 CQL rules can get checked, all within 3 seconds after the (re)compilation of one or several .NET assemblies. These fast performances were made possible thanks to the development of a new technology of incremental code analysis. With incremental code analysis, only modified code gets re-analyzed. I can attest that this was extremely challenging and complex development!

From what I’ve seen, the effort’s been a success. I’ve mostly been using it to look at different versions of Lucene.net, as I’ve got some work to do to get some of my code to build in VS2010. For this size codebase (~23k lines) the analysis is completed very quickly, even if I disable the incremental analysis. The CQL validation portion completes almost instantly, and memory usage doesn’t seem to get out of hand even when keeping VS open for days. I’d imagine if your computer can handle running Visual Studio to begin with, it won’t have too much trouble with the NDepend integration. I could see some of the VS add-ins that don’t play well with others causing issues, especially with a very large codebase, so I hope to go back and test with a larger codebase and some other add-ins installed eventually.

Most of the other NDepend goodies are available in VS now as well (Dependency Graphs, Test Coverage Analysis, Class Browser, etc…) but I won’t get into all that here. I really see CQL as the app’s killer feature, and that is what I spend the most time thinking about. There is a good overview of the app’s capabilities here if you’d like to read more.

]]>1Eli Weinstock-Herman (tarwn)http://blogs.lessthandot.com/index.php/2010/07/model-view-presenter-looking-at-passive/2010-07-15T09:44:12Z2010-07-15T09:44:12ZModel-View-Presenter is an architecture pattern that defines a structure for behavior and logic at the UI level. M-V-P separates the logic of the presentation, such as interacting with back-end services and the business layer, from the mechanics of displaying buttons and interface components.

I often build small projects to help understand and grow my skills as a developer, architect, and all-around technologist (as may be apparent from the wide range of topics I post on). Today I worked with a combination of Visio and Visual Studio to build a sample project to play with the Passive View concept and to help grow my own understanding of the concept. This post will cover the Visio side of my learning-curve.

Note: I know some people were waiting for another Virtual Lab entry this week, and here I am writing about Architecture instead. Don’t worry, the virtual lab series will continue, I just felt like doing a write-up while I was playing this past weekend.

Passive View

Passive View is a subset of the Model-View-Presenter pattern. In Passive View, the interface is responsible for handling interface-specific logic, such as figuring out how to put a value in a textbox or react to events from button clicks, but all actions and logic outside of the raw UI are sent to the Presenter to execute or manage. The Presenter is responsible for calling business methods in the Business model and updating the data that is available in the View.

Basic Model-View-Presenter Diagram

From the outside in, the architecture for Passive View looks something like this:

UI – The User Interface reflects what is going on beneath it by implementing one or more View interfaces

Presenter – The Presenter receives interactions from the UI or Model and updates the Views it is attached to

Model – The model is a facade or black box in our diagram, behind which is a business logic layer and data layer

In a flat architecture we would collect data from the interface, perhaps do some business and data validation, and then save it directly to a database using stored procedures or inline SQL. Defining a data access layer (or data model like entity framework) allows our application to operate on cohesive, defined objects that are meaningful to the application and stored and retrieved consistently. Defining a business logic layer allows us to centralize business rules that operate on entities in our application in a manner that is consistent with the business and internally consistent in the application, minimizing the risk that occurs when making changes to the business flow. Separating the logic of populating inputs and responding to button presses on the UI from the information being communicated to the end user and conceptual responses to their input allows the system to interact with the user consistently across any number of interfaces into the same application.

The definition of each level increases our ability to automate testing and supports greater Separation of Concerns.

Implementing a Sample Project

My learning exercise has been the the creation of an ASP.Net search page that allows an end user (customer) to search for finished products from the AdventureWorks sample database. The architecture and design decisions were done as an exercise in Visio using simple shapes and layouts.

My example application has several functional and non-functional requirements:

Non-Functional – Use a simple model stack that can be easily replaced with a Service-Oriented one at a later time

Non-Functional – Build with the idea that we will later create a Silverlight or WPF front-end

Non-Functional – Make pretty pictures for article

My unwritten, final requirement was to finish the whole thing in half a day, though luckily I didn’t define whether I intended that to mean 4 hours or 12.

Initial Architecture

To start I created a diagram of the application architecture:

More Extensive Model-View-Presenter Diagram

The purple layer is my presentation layer, which reflects the View. The blue layer is my Presenter layer which contains the logic for interacting between the end user and interface as well as a definition, or contract, of the information available in the View. The Green is the Model (or is behind the model, depending on your viewpoint) and exposes business functions and data entities for the Presenter to interact with.

Class Layout

Once the high level diagram was completed, I could approach the task of creating some base classes and interfaces to use in implementing the project.

Presenter.IView – Generic View Interface that all Presenters can interact with and all screens implement

Presenter.BasePresenter – Generic Presenter class that all Presenters will implement

To keep the project to a single morning but also allow the ability to come back and build a more architecturally sound solution, I implemented the Model in a very basic fashion that was referenced locally by the Presenter project and makes direct calls to SQL Server using ADO and parametrized, inline SQL. This buys me the benefits of having a well-defined Model (via the interface) but allows me concentrate my time and effort on the learning part of the project (ie, the M-V-P interaction and structure). Defining the model interface also leaves me open to come back and replace it with better separated code and the ability to create a model that acts as a facade to a service stack, instead of local DLLs.

Model.BasicModel.Model – Basic implementation of a model that will interact with AdventureWorks on SQL Server

Model.Entities.Product – A Product Entity that can be communicated between an IModel instance and Presenter

Presenter.IProductSearchView – A view of the data involved in a product search

ProductSearch.aspx – A web page that implements the IProductSearchView and interacts with the ProductSearchPresenter

My final Visio diagram ended up looking like this:

Diagram of Example Application

In this case the left side represents basic components (bases classes and interfaces) that are used to define common structure or contracts on the right side.

The Code

For the purposes of the example project, my view has properties for Search Text, a Search Count (number of results), Results (a generic list of the Product entity), and a boolean indicating whether there are results to display. My Web Form implements these properties, tying them to elements on the screen.

As the presenter populates properties in the view, the information is automatically reflected on the page. The actual logic of how the business functions are called and populate those properties are neatly packaged up in the Presenter and View interface and very little logic occurs in the actual web form.

To create a unit test, we define a simple view that implements the view interface, execute the presenter logic, and verify the properties are populated the way we would expect when the same presenter calls are made from the interface.

Extending the Architecture Further

Extending the application to display product search in a different manner would only require the addition of a new interface that also implements the Product Search View. A Silverlight front-end would only require creating the basic project, implementing the product search View, and wiring the new interface controls to the view properties. To replace the direct mode reference with a service reference, we could create a service facade that implemented the IModel interface, connected to a local or remote WCF service behid the scenes to handle the real model logic. And finally, instead of counting on our QA department to test all of the application interactions, we can create unit tests directly against the Presenter and Views to ensure that all of the interactions below the top surface of the application are happening consistently and to our expectation.

Your Turn

Getting this much of the architecture working is a good first step. I took a number of shortcuts on the BasicModel class in my example, but I now have a functional Model-View-Presenter application to play with. Hopefully there was enough information in the article to interest you in trying this out on your own. I urge you to read the articles linked in the top of the post (or several more in my Model-View-Presentation bookmarks) and come up with your own diagrams and sample project. Even doing a small project will force you to run into questions and considerations you wouldn’t have had by simply reading about it, not to mention unrelated tidbits you will pick up along the way (for instance, I also learned about ObservableCollections today).

]]>11SQLDenishttp://blogs.lessthandot.com/index.php/2010/07/msdn-giveaway-winners/2010-07-09T15:23:42Z2010-07-09T15:23:42ZThe winners of the MSDN Ultimate subscriptions are Emtucifor and Shawson. Originally we were going to select the winner based on comments, we also only had one subscription. Then Ted Krueger donated one of his subscriptions, there was only one comment that stood out and it was Emtucifor’s. The comment is below

I have always dreamed of owning my own software company. A few years back I started doing some database development on the side, but then I got married, had a son, and began having some health challenges which together halted what I’d been doing.

But there is a special opportunity coming to me this Saturday to get back into the swing of things: my wife and son will be leaving the country for six weeks. I had already been planning to devote myself to developing one of many application ideas, but now:

1) I would use MSDN Ultimate Subscription to build a secure server/file transfer/fetching/archiving/processing/reconciling/user worklist managing/cross-platform system (uncannily and quite coincidentally, exactly what is desperately needed at a company I know). It will help other people’s lives in several ways: filling a need that many companies are bound to have and for a low price (as initially I will need to build market presence more than profit); increasing the number of small software development businesses out there, proving again that it can be done and providing inspiration for the masses; getting a family’s Dad home so he can spend more time with them. If it helps, I’ll blog about the development and growth of my application and business.

2) The functionality I would use that is unique to VS Ultimate is Architecture and Modeling. I believe in “starting the way you mean to finish.” So even though I’ll be a one-man shop at first, that means test-driven design, agile development, automated regression testing, version control, good backups, and all the infrastructure needed to hire employees when the time comes without having to change much. I’ll be eager to add the layer diagramming abilities of VS Ultimate to this mix.

3. Blogs and technical community activity:

http://blogs.lessthandot.com/index.php/All/?disp=authdir&author=71
http://stackoverflow.com/users/57611/emtucifor
http://tek-tips.com/userinfo.cfm?member=emtucifor
http://squaredthoughts.blogspot.com/ (a bit out of date but still representative of my work)
not to mention http://forum.lessthandot.com/memberlist.php?mode=viewprofile&u=98

Not only did we think that the comment was good but apparently someone else as well, he actually used parts of it as his own in Andy Leonard’s giveaway here: http://sqlblog.com/blogs/andy_leonard/archive/2010/07/03/a-visual-studio-2010-msdn-seeding-card-giveaway-contest.aspx

We decided to raffle of the other subscription by vote..so I already started to put the polls together. During that time David Taylor left a great comment

1) Why do you need this – Because I am a poor, seriously underpaid SQL Server 2008 DBA/Developer who can’t even afford VS2010 Pro!
What are you going to build with this – I am going to build practice apps to learn from, as I am relatively new to development
will it help other people’s lives? It is my hope that training myself will get me into a job in which I am helping other people.

2) What specific functionality that is only part of Ultimate are you going to use? – Architecture Explorer

3) You need to have a technical blog and provide the URL to that blog, if you are an active member of the technical community (stackoverflow, msdn forums etc etc) then also include those links. – http://dyfhid.wordpress.com. Also, I am the Volunteer Coordinator for PASS’ APplication Development Virtual Chapter, located at http://appdev.sqlpass.org

Now we had a dilemma, do we yank the poll and give the award to David or not? We proceeded with the poll. Wouldn’t you know it, David won Andy’s contest, because of that I emailed him and we decided that he wouldn’t be eligible here.

Here are the results of the final poll: http://forum.lessthandot.com/viewtopic.php?f=121&t=11550

Congratulations to both winners, hopefully they will let us know in a couple of months if this has indeed made their developer life easier