Archive for the ‘Software Development’ Category

Revolutions are endemic to tech culture. A new group comes along and wonders why the last generation built something so complex, and they set out to tear down the old institutions. After a bit, they begin to realize why all of the old institutions were so complex, and they start implementing the features once again.

We’re seeing this in the NoSQL world, as some of the projects start adding back things that look like transactions, schemas, and standards. This is the nature of progress. We tear things down only to build them back again. NoSQL is finished with the first phase of the revolution and now it’s time for the second one.

I’ve read an interesting article about NoSQL databases at infoworld. The article explains how the revolution of NoSQL databases have emerged and what the current situation from the perspective of solution development is. I think the article touches on important issues where NoSQL databases have inherent weaknesses besides their advantages, and every solution architect or developer must be aware of those issues.

You may know that System.Security.Cryptography for Compact Framework lacks many cryptography algorithms compared to the desktop .Net Framework (2005 and later). In a project we have needed SHA512 encryption on Windows CE and we have found /cfAes library which provides almost all of the crypto functionality of .NET Framework. We are grateful to the author for sharing the class library.

The following table displays a comparison between the versions of .Net Framework with respect to the support for different crypyography algorithms (X means that it is supported, 0 means partially supported).

Every time I buy a new PC, either desktop or notebook, its hard disk capacity is larger than the previous one even though the total price of the PC is about equal. The same thing may apply for the other components of the PC like main memory capacity and CPU power, but hard disk capacity is something very different.

Matthew Komorowski has collected hard drive capacity/price data and created the graph below:

Source: http://www.mkomo.com/cost-per-gigabyte

Komorowski has also drawn a conclusion about the capacity/cost trend as:

Over the last 30 years, space per unit cost has doubled roughly every 14 months (increasing by an order of magnitude every 48 months)

As a software development professional, I am never a fanatic of one platform or one tool. The choice always depends on many factors and constraints. The most important thing, I think, is not the tools you use to solve the problem of the customer. Customers usually do not know about them at all. Customers usually expect good, effective and timely solutions.

Deasciification is the process of converting text written with only ASCII letters to its correct form using corresponding letters in Turkish alphabet (or any language that contains non-ascii letters). For example, the text “Cok yogun bir calisma ve emegin urunu” conveys the meaning, that is, human intelligence is able resolve ambiguities (if any) and understand text like this. The text, however, should be written as “Çok yoğun bir çalışma ve emeğin ürünü” (in Turkish). This is what a deasciifier is supposed to do.

Well, why do we need deasciification? We may not have Turkish letters on the keyboard (or the OS we are using may be without Turkish keyboard layout) and we need to end up with a text in correct Turkish form. It is also possible that we are accustomed to typing only with Ascii letters for some reason.

In addition, we may need to analyze a large collection of Turkish documents, and this collection can be contaminated with text written in Ascii, which will degrade the performance of our analysis. Then, the only possibility is to use deasciification. This is the most important reason for me as I often perform text mining on Turkish document collections, and I always need deasciification.

In this post, I’ll shortly review a few deasciification tools developed with several languages.

The first deasciifier is the one which is part of Zemberek project. Written completely in Java, Zemberek is an open-source general purpose Natural Language Processing library and toolset designed for Turkic languages, especially Turkish. A web-based demo of Zemberek is available at http://zemberek-web.appspot.com/. I usually use the deasciifier of Zemberek in my text mining research when I work with Turkish text datasets.

The next deasciifier is developed by Gökhan Tür at Sabancı University. More information and a demo is available at http://www.hlst.sabanciuniv.edu/TL/deascii.html. This system is currently not open-source, and not available for download.