Monday, July 14. 2008

Programming Design Patterns define recommended approaches of solving common application problems. Within design patterns is a subset of design patterns called Idioms.
Idioms you can think of as a strategy for expressing recurring constructs or if you will sub-problems and often take advantage of the special features of a language.
They tend to be specific to a programming language and can not be reused
in other languages they were not specifically designed for. To demonstrate the differences lets compare two design patterns we commonly use.

Macro Design Pattern

Design Pattern:

For core pieces of your application try to create a buffer between those components you can change and those you can't.

When developing applications, we often use industry standard abstraction layers and add our own abstraction layers on top of them. For example when building .NET
applications, our pages never inherit directly from System.Web.UI.Page, but instead from an intermediary class we have control over.

The mind set behind this is that an industry standard abstraction layer is one that has taken into consideration all sorts of optimization concerns, been thoroughly tested by mass numbers of users. It in essense
has a lot of good black box stuff we don't want to reinvent. Since it is
one used by the industry, you have little control over its direction and it may not completely handle your particular use-case. The easiest way to customize it for your specific needs is to put your own abstraction wrapper around it and make it a rule never to call
the inner abstraction. So we start off just putting a wrapper around the parts we plan to use that do nothing but wrap.
We may then decide later on we need the wrapper to do more. The buffer also
allows us to rip out the inner layer and replace with a more efficient one.

Case in point - Lets say you are trying to bullet-proof your application against SQL Injection attacks,
if all SQL goes thru your abstraction layer - you can put a validation check
around it that verifies the SQL conforms to the unique policies of your application before it gets passed to the sub abstraction layer.
You can also do all sorts of interesting things like create your own dialect of SQL or translate Transact-SQL dialect to PSQL
that then gets fed to your final database abstraction layer.

Micro Design Pattern: AKA the idiom

The way in which different languages choose to navigate and roll data constructs is different. Procedural languages use while loops, for loops and so forth. This is such a common
thing to do for Relational Databases that SQL has constructs INNER JOIN, LEFT JOIN, CROSS JOIN, WHERE, GROUP BY etc. So something that seems so seemingly procedural in nature
is completely abstracted away in SQL idioms.

A person used to summarizing data with :

orders.sort('salesperson,customerid')
for each order in orders
if order.salesperson != cursalesperson then
if cursalesperson > '' then
tallystats[] = array(curtotal, custcount)
end if
curtotal = orders.total
cursalesperson = order.salesperson
curcustomer = order.customerid
custcount = 1
else
curtotal += orders.total
if curcustomer != order.customerid then
custcount += 1
curcustomer = order.customerid
end if
end if
loop
if cursalesperson > '' then
tallystats[] = array(curtotal, custcount)
end if

May be puzzled to see the SQL idiom equivalent of

SELECT salesperson, SUM(total) As salestotal,
COUNT(DISTINCT customerid) As custcount
FROM orders
GROUP BY salesperson
ORDER BY salesperson

As you demand solutions for more complicated versions of these questions, you quickly realize the benefit of these SQL idioms.
At the end of the day we choose our languages because of the ecosystem built around them
(e.g. what platforms they can run on, tools to build with, libraries to use)
and fitness of the idioms it provides to solve our problems.

More SQL Idioms - the FOR 1..n Loop Idiom Equivalent

This one is by far one of our favorites. I'll call this the Dummy Number trick.

Basic Use Case:
You need a for loop that gives you a sequence of dates, numbers etc. SQL does not support FOR loops. To achieve this feat
you use a table - lets call this the Dummy Number table that has nothing but a sequence of numbers say between 1 and 200000. PostgreSQL has the handy generate_series() function
which creates this dummy number table for you on the fly.

Below are a couple of concrete examples:

Creating repeating labels - say business cards
Lets say you need to create a set of business cards, but you don't want to waste paper - so you want your cards to start at position 11, where you had left off and you want to print
10 labels for the employee 'Joey'. The below will generate blanks for the first 10 slots and repeating Joey fields for the next 10.

SELECT CASE WHEN n > 10 THEN e.first_name ELSE '' END As first_name,
CASE WHEN n > 10 THEN e.last_name ELSE '' END As last_name,
CASE WHEN n > 10 THEN e.title ELSE '' END As title,
CASE WHEN n > 10 THEN e.phone ELSE '' END As phone
FROM employees e CROSS JOIN generate_series(1,20) n
WHERE e.first_name = 'Joey'
ORDER BY n;

Create a report that gives you sales for each day of year 2007 even for days that have no sales

SELECT pdays.doy, SUM(o.order_total) As sales_total
FROM (SELECT CAST('2007-01-01' As date) + n As doy
FROM generate_series(0,364) n ) pdays
INNER JOIN orders o ON o.sale_date = pdays.doy;
GROUP BY pdays.doy
ORDER BY pdays.doy

PostGIS - this trick comes in very handy in geometry manipulationCreate a new data set with all lines in your table cut up such that no line is greater than 100 units.
Assumes no line segment is longer than 10000x100 units long.

PostgreSQL Array Idioms

PostgreSQL has idioms that are fairly unique to it. Its extensive support of arrays as first class data-types introduces a lot of interesting use-cases.
Below is another example using PostGIS for nearest neighbor search.

For each building - list the 3 closest police precincts that are within 5000 units (our units are in feet so ~ 1 mile away)

SELECT m.building_name, m.nn_precincts[1] As nn_precinct1,
m.nn_precincts[2] As nn_precinct2,
m.nn_precincts[3] As nn_precinct3
FROM
(SELECT b.building_name, b.the_geom,
ARRAY(SELECT p.precint_name
FROM police_precints p
WHERE ST_DWithin(b.the_geom, p.the_geom, 5000)
ORDER BY ST_Distance(b.the_geom, p.the_geom)
LIMIT 3) As nn_precincts
FROM buildings b ) m
ORDER BY m.builidng_name

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.Enter the string from the spam-prevention image above: