This blog covers the practical techniques, trials and tribulations associated with the transformation of IT systems from legacy technologies to systems using SOA and modern open systems. It also includes the occasional interlude with rants about technology in general.

Tuesday, April 29, 2014

One of my favorite treats as a young child in Mexico City
was a candy called “Suertes” (“Luckies”). It consisted of a cardboard roll containing
little round candies known as “chochitos” and a small plastic toy (the toy was
the lucky surprise: usually a cheap top, a miniature car, or a soldier figurine).
It was a cheap treat—think of a
third-world version of Kinder Eggs. Less
third-world was the way these “Suertes” were packaged. I now know that each roll was formed with a recycled
IBM punch card further wrapped in rice paper to prevent the diminutive round
chochitos from falling through the used card’s EBCDIC –encoded perforations[1].

Since the cards were essentially eighty column data encoders,
I came to this conclusion: Data is fungible; it can even be used to wrap
candies!

While the world’s population has more than doubled since the
punch card days, data storage capability has grown exponentially during the
same period. In fact, storage capacity is poised to
outstrip the maximum information content humanity is able to generate.
According to a research study published in the Science Express Journal[2]
, 2002 was the year when that digital storage capacity exceeded the analogue
capacity. By 2007, 94% of all data stored was
digital. While it is estimated
that the world had reached 2.75 Zettabytes of total data storage in 2012[3],
we are expected to hit the 40 Zettabytes mark by 2020 which comes to about 5.2
Terabytes of data for every human being alive.

Not only has digital storage become a dirt-cheap commodity,
but advances in compression and search algorithms have turned storage into a
dynamically accessible asset—a true source of information. The emergence of the
Cloud also allows further storage optimization. (I would be surprised to learn
Amazon is storing a copy of your online books in your cloud space versus simply
maintaining an index pointing to a single master copy of each book in their
catalogue.)

The ability to store huge amounts of data in a digital form
speaks to the phenomena of “Datification”. True,
most of what we are now placing in digitized form are pictures and videos, and
studies show that less than 1% of all this data has been analyzed. But even
as more than half a billion pictures are
being posted to social media sites every day, new machine learning techniques
to help us analyze this type of graphic content are being developed. There is
no doubt that we are truly in the midst of the Digital Era. Or rather,
the Era of Big Data . . .

Big Data has been defined as having the following
attributes: Volume (obviously!), Velocity (dealing with the need to get data via
an on-demand, even streaming basis), Variety (encompassing non-structured
data), and Veracity (making sure the data is trusted). The field of Data
Science is being formed around the exploitation of big data, particularly in
ways that take advantage of the emerging properties derived by the four-V
attributes. The emergent phenomenon reveals
that Data is now viewed as a product in its own right.

One of the most exciting ways in which the Data Science/Big-Data phenomena has
delivered value is with the unexpected ways data correlations can appear and be
exploited for surprising business purposes.
You are probably familiar with how Google is able to track flu epidemics
based on search patterns, and how companies are finding ways to market to
various demographics based on ancillary consumption data (Wal-Mart noticed that, prior to a hurricane, sales of
Pop-Tarts increased along with sales of flashlights).

But while all this is fine from a theoretical and anecdotal
perspective, as a CIO, CTO, or IT executive for a medium size or small company you
would do well to ask: What does all this hype have to do with my company’s bottom
line?

In my last article I recommended evaluating potential big data-applications
for your business. Even if you do not know precisely how all this big data transformation
will impact you, there are steps you can proactively take now. Just as in the story of the Wizard of Oz,
this is a case where the journey is part of the destination. You should pave
the yellow brick road that will take you there:

Revisit the state of Data Governance in your organization. Obviously
you should maintain the traditional SQL related roles, but transforming towards
big data requires a fresh look at storage engineering, data integrity, data
security, and the need to train for and secure needed emerging skills such as those
of data scientists.

Establish a “Datification” strategy for your business. Have you
ever seen those reality shows about Hoarders? That’s it. You must become a
fanatical data hoarder. This is not the time to dismiss any of the data you
capture as too insignificant or expensive to store. Part of the strategy is the
creation and documentation of taxonomy of data to better organize and
understand potential data interrelations.

Re-focus on data quality and integrity. Review your data cleansing and
deduplication processes. Adapt them to meet the higher volumes presented by
Datification. The ideal time to ensure
the data you capture is as clean as possible is at the point of data
acquisition. The old adage of Garbage-In/Garbage-Out still applies with big data,
except now the motto is Big Garbage In/ Big Garbage Out.

Normalize the data. Just because the data is in digital form, does
not mean you can use it. Big-data
practitioners estimate that about 80% of their work goes into preparing the
data in a manner that can be exploited.

Review and adapt the data security strategy. Design your data
security strategy from the get go. I
recommend you visit two of my previous blogs discussing the subject of security: “The
Systems Management Stack”, and “Security
& Continuance”. Bottom line,
your security strategy should be part of the core data strategy.

Move to the Cloud, even if the cloud is internal. Too much time is
being spent deciding whether or not to “Move to the Cloud”. Most businesses I have come across are wary of
placing strategic data assets in a public cloud. You should separate the debate as to whether
or not to make the move to a public cloud from the need to ensure the data can
be in a “cloud” form. You cannot have Datification without Cloudification. This
means that you should be using virtualized access and storage of your data to
the nth degree. You should ensure decoupling
all access of the data from its physical location via appropriate service-level
interfaces. The decision as to whether or not to use a
local private cloud, network private cloud, or public cloud or any other
variation (Platform as a Service, Infrastructure as a Service, etc.) is the topic for another blog article. Be
aware that if you try to create and manage your own cloud you will need to secure
the appropriate internal engineering resources. This is not an inexpensive
proposition. Also, you and your cloud consultant will need to define a Storage
Area Network strategy that allows placement of heterogeneous data with large
scalable capabilities. Following this route will also require you to define non-SQL
data replication, data sharding, and backup strategies. The time to start this
process is now.

Conduct a census of useful externally available data. A key premise
of big data is the view of data as a product in its own right. Not only are you
positioning your company’s data as a capitalizable asset that could potentially
be made available to others as a revenue generating option, but you will also
be in a position to access and exploit data assets available by others. At a
minimum, you should conduct a census of potential data set sources openly
available from public entities and governments and define a strategy of how you
can better exploit these assets.

Obviously you will have to face the task of justifying the
needed investment to your CEO and financial controllers. Projects related to data virtualization
intrinsically improve availability, and other projects dealing with security
(PCI or otherwise) should all be justifiable purely on best-practice, business
continuance basis. You will need to tap
into traditional operational budgets to better fund them. Also, this is one of those cases where you
will need to find obvious functional features that you can jointly sponsor with
your business partners (these are the proverbial “low hanging” fruits). If there is not enough money (when is there?),
you don’t have to do everything at once. You can begin with data elements your
taxonomy has identified as most essential.

Furthermore, there is an increasing realization that big data
can actually be accounted as a company asset. After all, the company valuations
of Facebook and Twitter are primarily based on the strength of their data sets.
For example, it is currently estimated that the value of each member to
Facebook is about $100. Customer acquisition costs in the social media space
are usually estimated to be in the range of $5 to $15; so properly structuring
consumable data sets can be used as part of your financial justification.

That’s it. This endeavor should keep you busy for a while. At the end of the road you will have proven
you had courage and a heart all along; plus you’ll get a Big Data diploma too!

[1] Of
course as a child, I did not know the punch cards were being repurposed to hold
the candy and so I always wondered why someone would “design” perforated cards
to hold the chochitos!

[3]
Optimally compressed. One Zettabye equals one thousand Exabytes. One Exabyte
equals one billion Gigabytes or one million Terabytes. The actual digitized
speech of all words ever spoken by human beings could be stored in 42
Zettabytes (16 kHz, 16-bit audio). What follows after Zettabytes, in case you
are wondering is: Yottabye, Xenottabyte, Shilentnobyte, and Domegemegrottebyte
which in addition to having 18 Scrabble-busting letters in its name, it represents
1033 bytes.

IT Transformation with SOA

About Me

Israel has been recognized by Computerworld as one of their Premiere 100 IT honorees. Israel is a business and technology leader who has contributed the technology vision as key strategist and designer behind the enterprise technology roadmaps of large hospitality and travel companies. Israel has also have developed and deployed various mission-critical systems and in the process, he's been instrumental in creating and building effective and skilled development organizations.