Stay “So Fresh, So Clean” in Legacy Application Development

The title may not fit until the end of this post, but bear with me. I work in a large legacy .Net application and I felt it pulling me into the depths of its ASP.NET 1.1’ness and decided to write this post to remind myself that I can keep my skills moving forward and still keep the legacy maintained.

I had a new project where I had to dump the configuration data for customers into a file with a human readable format. The purpose of this project is to allow someone to compare the configuration data from environment to environment to verify the configuration is maintained and correct as it makes its way to production. The comparison is done outside of the project and the system with a third party diff tool. The configuration data is held in database tables so a simple dump of the data from the table to the file is what I set out to do.

There is already a project in the application that handles this type of scenario, but the code is “So Dirty, So Complex” and a nightmare to maintain. It’s also rigid, full of hard dependencies, and so unreliable that no one uses the tool. Hence the reason I was tasked to work on the use case. Since this is such a simple use case I wanted to create new code that will provide a solution that is easier to maintain and extend and get rid of a very small piece of the stench in this legacy app.

There are three basic parts of my solution:

Data Retriever – code to retrieve the data, in this instance, from a database

Serializer – code to serialize the Data Retriever into the Output

Outputter – code to write the Serializer to the output format, in this instance a file

I am using ADO.net and DbDataReader for database agnostic data streaming. The Serializer is currently dependent on DbDataReader, but it would be simple at this point to introduce a Mapper to pre-process the data stream and pass a DTO instead of reader to Serializer. I didn’t do this in the first iteration because it didn’t provide enough value for the time it would have taken to figure out a way to abstract DTOs in a way that made the solution flexible. The Outputter is basic System.IO and there is no interface for it at this point. We could provide an Outputter interface then we could output to other formats say to another database table or posting to a service.

In the Serializer, I decided on JSON as the human readable format, because it is a standard format and easier to read than XML, IMHO. Also, its a chance to bring new concepts into this legacy application that has no exposure, that I know of, to JSON. I tested serialization solutions side by side, a custom JSON serializer I coded by hand and JSON.net. My test was to just through the same data set at both solutions in a test harness and record and compare timing. I was mindful of using some semblance of scientific method, but my test environment runs on my local dev box that has a lot going on, so the results are not gospel and can vary depending on what’s running in the background.

After running my tests and analyzing the results and my limited research I chose to use the custom serializer. Although JSON.net is an awesome framework, the custom was a better fit for this iteration and here are my observations and reasoning why I went this direction:

The custom was more than an order of magnitude faster in my unscientific test. With JSON.net there is an additional step to create an IEnumerable<dynamic> to facilitate serialization so we probably went from O(1) to O(2), but I’m not sure without seeing the JSON.net internals. There may also be optimization in my JSON.net usage that could make it faster, but I had to do this project quick and simple. Without going into the gory details here’s the results of average time over 100 iterations of serializing the test data:

Custom Time: 00:00:00.0004153

JSON.net Time: 00:00:00.2529317This difference in time remained near the same in multiple runs of the test.

JSON.net output is not formatted. It’s all one line and defeats the human readable aspect of the project. This is probably configurable somehow, but I didn’t research.

I don’t need to deserialize and don’t have to worry about the complexity of deserialization. If I did, I would probably go with JSON.Net.

I am not sure if we are authorized to use JSON.Net in production.

We are serializing flat data from one table or view (no table join object mapping) and we don’t have to worry about the complexities of multi-level hierarchies or I’d choose JSON.net.

In the end even though I tied myself to a specific implementation I built in extendability through abstractions. We can later swap serializers and also build new dumps based on other tables pretty easily. I could see possibly adding a feature to import a dump file for comparison in the system instead of having to use an external tool. This could also be the basis for moving data from system to system in a way that will be much simpler than the previous project. Taking the time to look at multiple solutions presented me with areas that I should think about and prepare for extension without going overboard with abstractions.

The real point is to try more than one thing when trying to find a solution to a problem. Compare solutions to find reasons to use or not use solutions. Don’t pick the first thing that comes your way or comes to mind. Spend a little time learning something new and experimenting or you will rarely learn anything new on your own and will stay a slave of Google (no disrespect as I lean heavily on Google search). This is especially important for engineers dealing with enterprise legacy applications. Don’t let yourself get outdated like a 10 year old broken legacy application. Stay “So Fresh, So Clean”.