Monthly Archives: December 2014

At this point, we had established how this metadata could be the condensation formed from the vapor of our thoughts, especially when it came to the overall design of the model and functionality within the architecture. By molding both the tables and the files through a series of iterations, we were routinely applying a system that seemed to work methodically and successfully. At this point of the design, we were optimistic enough to start discussions about another important aspect of our distributed architecture: communication. Namely, how were the various processes (which were executing on heterogeneous systems) supposed to communicate with one another? Almost a decade ago (when these design sessions occurred), web services were not as popular as today, and so they were not given much consideration. (Due to the volume of data that was expected to pass between machines, web services were not favored as an option anyway.) Since there were no cheap options available for use and since we were limited by a miniscule budget, we decided to create our own mechanism for communication. And how would we do that? To that general question, we again gathered around the runes of our metadata and peered closely in order to decipher the needed answer to our inquiry.

Since XML had already been used as a markup language both to showcase the metadata and to help specify some of the project’s needed functionality, a few of us proposed that we nominate it as a candidate for a homegrown protocol. At the time, XML and JSON (along with other contemporary standards like REST) had not yet become normal options for communication, but we pushed for such an adoption. Some senior managers had concerns, but for the most part, they were very intrigued by the idea. We just needed a thorough example to make the final argument. In order to make the argument rock solid, we would have to incorporate all of the other changes made to our metadata within the past few weeks. For example, we had added other important points to our security metadata known as ‘fields’; now users could lock the specific field of a specific product, so that no other user could alter it. After a day or two of consideration and playing with various options, we created the first iteration of our communication protocol:

Of course, we knew the final implementation of our communication layer would need to incorporate other complexities (like an intelligent XML parser and a homegrown TCP/IP networking layer), but those components were part of the actual software implementation. Here, we simply wanted to demonstrate how we could utilize the metadata as a basis for our communication protocol, and we had done so. In the months and years to follow, we would be pleasantly surprised to learn that this choice would pay in dividends. In the seven years since the delivery date of our product data system, this protocol and its supporting libraries have been flexible enough to never require any code alteration. By relying on the metadata, any required changes have always been made to the numerous tables and files of our metadata, but the code has never needed such. In truth, we would learn how this pleasant surprise would become applicable to many other subsystems based on our metadata-driven design. Would you believe me if I said that the changes needed for our product data system (and its distributed architecture) have increased by 50% each year, but the actual code changes to our system have dropped by 20% each year? Even though MDD does require some level of commitment, there are quite a few rewards which can be returned from such an investment.

So, as mentioned previously, we had just gone through our first iteration of inserting and then altering the metadata in our tables, and along with forming an initial resolution for our permissions requirement, we had also achieved our first milestone with this new method of design. Now, you might be saying, “But that only exemplifies a data model. Big deal…what about designing the actual functionality?” Of course, we need to design that as well, and as you will see, there’s no reason that we can’t apply the same method here. However, in designing the flow of the functionality, we will need to design the execution blocks (i.e., a high-level view of the functions).

After achieving some success with the permissions issue and subsequent others, the various stakeholders reconvened to address the very subject of the functionality in our architecture. In this particular case, we needed to review established business rules that would validate product data before saving/persisting it, and these rules were cryptically embedded into some COBOL on our legacy systems. After I had spent days deciphering most of it (and hating every moment), we sat down to scrutinize and then approve of my translation sans a Rosetta stone. Initially, I was going to present these business rules as just simple random pseudocode, but since I’m a big fan of code reuse, why not somehow build upon our previous meetings and incorporate the metadata from those newly created tables? So, using XML, I created a markup language of my own that utilized our attributes and fields, still white-hot from the forge:

<failure_message>Alpha Product has a missing field (or the title’s length is larger than allowed). More than likely, a Price attribute is probably missing.</failure_message>

<success_message>Alpha Title has all fields!</success_message>
</validate>

</if>

In this example, for any product flagged as an ‘Alpha’ product, we would ensure that certain fields had all of their respective attributes populated. In addition, it would look at the attribute ‘Title’ (of the field ‘fieldTitle’) and ensure that its length wasn’t longer than the maximum allowed. On success, it would return the text from ‘success_message’, and in the case of failure, it would return the text from ‘failure_message’.

I had taken the incremental step of creating metadata in files that referenced our metadata in tables, and even though it existed in something other than SQL columns, it was yet another milestone for our new design method. Before the meeting, I was somewhat apprehensive about the reactions of some of our stakeholders, but quickly, it became clear that I had been mistaken about my concerns. Nearly all attendees immediately grasped the rules being presented, since it was simply an additional layer to the language established with the metadata in previous sessions. Amid the enthusiasm, some people began to ponder other possibilities, like placing the business rules in a set of tables…but, after I explained the difficulty with such an implementation, that idea was gently placed to the side. (That’s not unexpected, since excited people holding a hammer will see a nail everywhere.) Even though other suggestions were made and discarded, I was enjoying the palpable enthusiasm in the room. We had found a cadence with this method, and we were making progress. I and a few others considered again how we could develop an architecture around this metadata, with properties in database tables and flow in markup files. Even though each ponderance about the possibility brought a fearful jolt, it was slowly appearing to be less crazy of an idea.

Now that I’ve spent some time describing the various benefits of using metadata-driven design, I think that it’s probably time to finally expound on the process via a working example. After all, it doesn’t seem like a concrete method until it has been put into practice. So, as I had described in a previous post, my group here at Barnes & Noble needed a new product data system and all of its inherent constituents (databases, applications, etc.). However, before we started the project, we needed to define its scope and its functionality, and in this case, domain-driven design assisted in the creation of a manifest for our requirements. However, since this system had never been created with such a focus on foresight and since we were all new to architecture design, we didn’t even know how to start this phase of the design process. It became frustrating when we tried to simply converse about a particular point of the project, and documentation with images and text didn’t seem to help the situation much.

For example, all of the stakeholders agreed that the system should know how to handle each data point via a number of dimensions: retrieving/persisting, validating, applying permissions, auditing, error handling, etc. In order to discuss how we would create a design that accounted for all of these specifications, we needed to put something on the whiteboard that we could all scrutinize together. Since the managers and directors were not as familiar with programming, the use of UML, pseudocode, and prototyped classes did not resonate with them. However, to some degree, it did seem that everyone was familiar with databases and SQL. So, a few of us decided to draw some hypothetical tables on the whiteboard and populate them with data, which would give shape to the items listed in the manifest. Suddenly, the room became alive as the whiteboard became a sounding board, and people began molding this table data that described our requirements. After that initial meeting, we created the tables on the whiteboard in a separate database, and everyone obtained access to them. We then went through multiple iterations of people altering the schema’s tables, conversing through emails about those changes, and then having another rendezvous to talk about them on the whiteboard.

Soon, this simple iteration began to produce results. Initially, the information about the data points’ dimensions (to which we began referring as metadata) was organized into appropriate metadata tables:

In one case, discussions quickly came to a consensus that in some cases, permissions should not be applicable to only one data point. For example, a product’s price was actually a set of data points: its numerical value, its type (retail, cost, etc.), its currency, etc. In that case, an user authorized to change price data should be able to change all of those data points, and the system should be intelligent enough to group the permissions for such a set of related data points. So, in an effort to create our own version of an access control list, we then took the metadata about permissions and altered it to reflect this new enhancement to our system:

It’s somewhat difficult to emphasize its psychological impact, but this emergent productivity from creating a few simple tables excited us as we successfully began to build an architecture through small increments. Through only a few simple iterations of this metadata, we had quickly understood our permissions problem and drafted a decent approach to resolving the issue. At this point, we were only thinking of it as a design tool, not as a potential foundation layer for our architecture. However, as we repeatedly reshaped this metadata to better fit our desired solution, someone eventually pondered with the question “I wonder if we could build something that just ran off of these tables?” To which, a few of us nodded in cautious, pensive optimism…but that’s a story for another day.

Despite all the things taught in school, analysis seems to be the last or penultimate lesson of programming courses in college. As an interviewer and as a senior programmer, I’ve seen many occurrences of common mistakes in junior programmers, and usually, it’s a failure due to a lack of skills in regard to analysis. Contextually, you have to examine multiple factors in order to create an optimal solution, and even though some of that comes with experience, it’s also a matter of being a practice. Such was the case a few weeks ago, when I was mentoring one of our junior programmers. It seems that his code was caught in a perpetual loop, and while reviewing his code alongside him, I started to become more acquainted with this particular implementation of processing records in a large database table.

In his program, this junior programmer had a code block that would repeatedly 1.) execute a query to return a maximum of 500 unprocessed records at a time, 2.) place that data into data structures, 3.) perform some functions on behalf of each record, and then 4.) mark the respective records as having been successfully processed:

In the matter of the infinite loop, the code had a case where a particular record was not marked as successfully processed, and the same record was being repeatedly handled over and over. Hence, the program never reached a point of completion. After pointing the problem out to him, he looked relieved, but I told him that he shouldn’t relax just yet. When he looked at me in puzzlement, I asked him about the persistent performance issues in this program. “Are they still there?” I asked. He nodded, and in response, I told him that we had just discovered the culprit.

Since the number of available records could vary, the program could be attempting to handle just 500 records, but in other cases, it could be tens of thousands. Since the table had a considerable number of records, the execution of its complex query could take a few seconds before the result set was returned and before its data could be read into data structures. In the case of just 500 records or less, the time spent was marginal…but in the case of tens of thousands, the repetition of these steps compounded to create a significant expense. When I explained that fact to him, his face revealed a moment of catharsis. “I see!” he exclaimed. “But what should I do?”

I showed him a simple solution of having a read-only query (i.e., no database locks) and having separate database connections for reads and writes. Using a C# OracleDataReader with the read-only query, he could iterate through the whole dataset with the ‘read’ connection and then process the records with the ‘write’ connection:

// The 'oWriteRecordCommand' is an OracleCommand created with a // different Oracle connection
MarkRecordAsProcessed(oTmpRecord, oMarkRecordCommand);
}
}

(In this case, there was no concern about rows altered by another program during the execution of this code, since the same program created/updated the rows in a previous step.) Afterwards, the junior programmer never experienced another issue with infinite loops, and his program performed much better with the single execution and iteration of the table’s query. Even though it’s nothing extraordinary, it’s another example of how a slight adjustment to the code can have a profound effect on the overall impact of your program. It’s only too bad that our youth don’t get taught the appropriate lessons that come along with the high premium of college tuition.

So, assuming that you’ve read all the previous posts about metadata-driven design, you might ask how one goes about putting MDD into practice. I’m glad that you asked! Of course, there’s a good deal to write about such a subject, but at first, I’ll start with something more familiar to programmers. As I’m sure that any reader here knows, there are a number of methodologies in place to build the implementation of a given software design, and one of the most popular and effective methods is Agile. It’s a supremely flexible way to handle the various, dynamic obstacles thrown at software development, and you’d be mistaken if you’ve never even considered it as applicable to your own environment. However, despite its popularity with software implementation, Agile is rarely ever considered for one important aspect of software: design. As noted by Andrew Binstock in a recent piece for Dr. Dobbs Journal, the authors and proponents of Agile have never written about its application to design. Why is that? There are probably a number of reasons, but personally, I think that it has been a conundrum due to an important question: to what are we applying the Agile method when it comes to the subject of design? We need a medium to mold our ideas…and, in this case, I think that the potential answer is metadata.

In the case of software, the goal is to build the implementation; in the case of software design, the goal is to construct an architecture. In the case of software, the medium which represents and constitutes the software is the code (or, if in its early stage, UML and/or pseudocode); we can refactor the code as the requirements change in real time. In the case of software design, I nominate metadata as the medium which represents and constitutes our ideas; we can refactor the metadata as the overall requirements of the project change in real time. (In this case, I’m referring to metadata in a general sense, referring to both data that describes an application’s schema and the configuration data that powers the processing.) By having a palpable representation of the architecture, we can apply some of the same methods that create a refined code base in order to craft a sleek design. Does that sound a little crazy? Maybe a little bit.

However, I don’t think of it as that crazy of a suggestion. In our group here at Barnes & Noble, we use this type of method whenever we initiate a project of any significant scope, and during the design process, we apply similar practices that are also found in Agile. Instead of paired programming, we assign a stakeholder and a designer/analyst together, and together, they will draft designs of the application’s metadata. We then have iterations of reviews that look to refactor these preliminary designs, and each person has a certain degree of accountability for the molding of this metadata. Like actual software development, sometimes the information needed for the architecture can only be obtained through a slow trickle, but with another version of continuous integration, we abstract the necessities of new requirements and then attempt to fit them into the metadata which is already in place. (Surprisingly, we have found numerous instances where one entry of metadata can be multidimensional, providing a functional direction to several applications layers at once.) Of course, there are other examples, but it’s fair to say that Agile can have a role in software design. In addition, by using metadata as our medium, perhaps we can even apply some of the tools from lean software development; since we’re dealing with something quantifiable, maybe we can analyze our own design process in order to improve it at a later date. So, do I still sound crazy? Hopefully, at very least, I sound a little less so.