Querying Distributed Partitioned Views

Editor's Note: This is the second article in a three-part series about distributed partitioned views in SQL Server 2000.

Distributed partitioned views are the core component of Microsoft's scale-out strategy. (For more information, see "Scalability Wars.") In "Distributed Partitioned Views," August 2000, we introduced SQL Server 2000's distributed partitioned view technology and demonstrated how to set up and modify the views. In this article, we demonstrate how to query distributed partitioned views. We show you the execution plans for various queries so you can see how SQL Server distributes queries across the nodes.

Next, let's start the querying demonstration by executing simple select statements that retrieve all the rows from the Customers view.

SELECT * FROM Customers

Now look at the execution plan that Figure 1 shows. Notice that these statements query the local table CustomersAF. SQL Server issues a remote query against each of the other tables: Node2.testdb.dbo .CustomersGP and Node3.testdb.dbo.CustomersQZ. SQL Server concatenates the result of the queries and returns the result to the user.

Next, let's perform a query that requests rows only from the local table.

SELECT * FROM Customers WHERE customerid = 'ALFKI'

Now look at the execution plan for this query, which Figure 2 shows. The Startup Filters (the yellow filters in the plan) are the key building blocks that the optimizer uses to determine whether, at runtime, SQL Server needs to go to a particular node to get rows. If you hold your mouse pointer over the local Clustered Index Scan or over either of the two remote queries, you'll notice that the graphical execution plan incorrectly shows that SQL Server executed each of the queries once. But look at the textual output, which Figure 3 shows, of the statistics profile for the same query. This output (Executes = 0) shows that SQL Server executed neither of the remote queries. This fact significantly affects query performance. SQL Server accessed only the servers that contained rows that the query needed. Note that SQL Server won't always use the startup filter to determine at runtime whether it needs to access a remote node. SQL Server auto-parameterized the query above. In other words, SQL Server placed the query's execution plan in cache with a parameter for the customerid column that the example used in our search criteria. SQL Server reuses auto-parameterized plans when it runs a similar query with a different value in the search criteria, thus avoiding the expense of generating a new plan. For example, if you run the query above but replace the customer ID with another one, SQL Server will reuse the auto-parameterized plan.

SELECT * FROM Customers WHERE customerid = 'OLDWO'

The execution plan looks the same as the last one, of course, because it's the same plan. But if you look at the output of the statistics profile, you'll notice that SQL Server executed the remote query against Node3 once, whereas SQL Server didn't execute the queries against the local server (Node1) or the remote server (Node2). SQL Server is conservative in determining which queries to auto-parameterize. If even a slight chance exists that two queries that are similar but have different values in the search criteria will require different plans, SQL Server won't auto-parameterize the plans. For example, consider the following query and its execution plan, which Figure 4 shows:

Notice that the execution plan contains no startup filters. The plan shows that SQL Server accesses only the local table. This query differs from the previous two in that SQL Server didn't auto-parameterize this query. SQL Server determined that it would perform partition elimination (in this case, eliminating the partitions on Node2 and Node3) at compile time and not at runtime, so you don't see the partition elimination in the execution plan. Nevertheless, SQL Server performed the elimination. The following query, which is similar to the previous one, won't be able to reuse an auto-parameterized plan because such a plan doesn't exist. Figure 5 shows the execution plan for this query:

In this plan, SQL Server also performs partition elimination at compile time, so you see only one remote query, which SQL Server issues against Node3. (For more information about auto-parameterization and plan caching, see Kalen Delaney, Inside SQL Server, "SQL Server 7.0 Plan Caching," September 1999.)

Now let's see some more magic by examining how SQL Server handles GROUP BY queries in distributed partitioned views. Consider the following GROUP BY query and its execution plan, which Figure 6 shows:

SELECT customerid, COUNT(*) AS count_custs FROM Orders GROUP BY customerid

Notice that SQL Server calculates aggregates in the local node and issues a GROUP BY query against each remote node. Later, SQL Server concatenates all the results and aggregates each node's aggregates to form the final result set. Splitting the aggregation in this example is called "local-global aggregation." This method is much more efficient than bringing all the individual rows from all the nodes and aggregating them locally. By executing the GROUP BY query on the source servers, SQL Server transfers much less data across the network. In addition, this method distributes the processing load to exploit the power of all the servers. Note that you might get a different plan for this query. This particular plan was generated when we configured Node1 to use a maximum of 64MB of memory. The optimizer came up with a plan that pushes the other nodes' resources to the limit. Similarly, if you partition two tables on the same column with the same partitioning criteria, SQL Server also locally performs any join queries that you run against the views. Or, you can use a method that returns all the rows to the requesting server and performs the join at that server. This option lets you split the join processing to the participating servers and exploits each server's resources for performing local joins. Now let's compare a join query against two views that are partitioned on the same partitioning criteria with a join query against two views that are partitioned on different partitioning criteria. To make this comparison, partition the Orders table by the orderid column instead of the customerid column, as Listing 1 shows.

Now run the following join query against the views Customers and Orders, which are partitioned with the same partitioning criteria:

Compare this query's execution plan, which Figure 7 shows, to the following query's execution plan, which Figure 8 shows. The following query runs against the views Customers and Orders2, which use different partitioning criteria:

Notice that in the first query, each node processed the join locally, and later SQL Server concatenated the results in the local server. In the second query, SQL Server brought the unchanged results of the Customer's partitions and the Orders2 partitions to the local server. Then SQL Server performed the join on the local server without using the remote servers' resources.

More Distributed Partitioned Views to Come

Now that you're familiar with the basic steps to create, modify, and query distributed partitioned views, you need to know how to handle the exceptions. In an upcoming issue, we'll offer suggestions for achieving similar functionality when your tables or views don't meet the criteria for distributed partitioned views that we described here.

Discuss this Article 2

Pederb (not verified)

on Nov 26, 2004

Hi
I have a question regarding quering partitioned views in Sqlserver2000. If I query a partitioned view, but don't know the values in the where clause, i.e. select * from viewA where intVal in(select intVal from tbl1). Will this result in an optimized query that searches only the relevant tables?

I have SQL2K enterprise on W2K advance.
I have tested a partitioned view. Only when the check constraints are added to the member tables during tables are created, this view works. But for the tables I added check constraints after the creation, the partitioned views don't work. (it is not SQL7)
Another thing is when I don't have top in select SQL server access all member tables when only one member table is needed.
Is it a bug or any configuration to be changed?

From the Blogs

The quest for the Golden Record to achieve a single, accurate and complete version of a customer record is worth the pursuit to attain survivorship. Record matching and consolidation are only the beginning. Melissa Data takes a new approach. Learn how to apply intelligent rules based on reference data to make smarter and better decisions for data cleansing....More

On SQL Servers where Availability Groups (or Mirroring) isn’t in play, I typically recommend keeping a combination of on-box backups along with copying said backups off-box as well. Obviously, keeping databases AND backups on the SAME server is the metaphorical equivalent of putting all of your eggs in one basket – and therefore something you should avoid like the plague....More

One of the biggest strengths of AlwaysOn Availability Groups is that they allow DBAs to address both high availability and disaster recovery concerns from a single set of tooling or interfaces. But, this doesn’t mean that you won’t still need backups....More