The pace in the Dojo is designed to result in continually creating working code. Every morning the team holds itself accountable for it coding progress during stand-up. It's no fun to give a status of "I made no progress yesterday". Checking corporate email and checking your phone gets in the way of this progress. Any time spent in reflection and discussion on what was learned is documented and attached to the Pivotal tracker backlog.

My personal blog is a scratchpad where I reflect on what I have learned and share it in the hope that it helps others apply the knowledge I just gained. As a result of the pairing work style, I went a whole month without updating my personal blog. The main reason is that I didn't have the time that I usually spend putting together my thoughts. It was all dedicated to producing the code.

Cutting My Typing in Half Was More Productive

During the pairing process, I yielded control of the keyboard while my pairing partner typed. In theory, 50% or more of my time was spent watching my partner's keystrokes. It's a bit of a paradox that typing less results in more productivity. During the month of January, my pairing partners and I downloaded more software and coded more prototypes than I had done by myself in the previous quarter.

Rotation Is Cool

After my first few days in the Dojo I arrived at work one day and was moved to a completely different project that I knew nothing about. I had been in the middle of debugging some problems that we had been seeing in some open source software that we had downloaded the day before. All of a sudden it became someone else's problem and I was able to wipe my mind clean and focus on learning something new. This rotation approach resulted in my teammates and I developing a fluency in every project that the Dojo was working on. Having so many minds applied to the problem resulted in a diversity of viewpoints for coding which is not possible when working alone.

December 12, 2016

This week I will be attending Data West, the San Diego Super Computer's first-ever winter forum on Data. The conference is being organized in large part by Dr. Jim Short. December marks the beginning of the third year of working with Dr. Short on the specific topic of the value of data. If I were to write a summary statement for each year it would be as follows.

Year 1 - Data Valuation Problems

Dr. Short discovers, through broad industry surveys and targeted executive interviews, extraordinarily large data valuation use cases (> 1 billion dollars) that suffer from a lack of formalized data valuation business processes and underlying IT support. These significant use cases include data valuation during bankruptcy, M&A, cyber-insurance, monetization, and data sale. As the research progressed, a significant set of problem statements began to emerge, including...

...how much to pay for a data set

...how to identify data sets that can be monetized

...creating new data products and services

...understanding ROI on investment in the creation of analytic models

...how to correlate the value of data with data management (e.g. protection levels based on value)

During that first year, I found the following statement from Dr. Short to be most useful:

Surveys conducted and our own interviews / surveys show that currently in over two thirds of companies surveyed there is no systematic method for accurately measuring data value over time. And our research is showing that “architecting for value” is a critical future requirement in IT and business strategy planning and investment (to successfully compete and meet strategy and performance goals).

A common theme emerged in year 1: companies must begin to augment product and service revenues with data products and services but don't know how or where to start. Year 2 would attempt to fix that problem.

To describe all 5 steps in action, we stepped through each insight as part of an industry use case that highlights the overall research results.

Year 3 - Data Valuation Implementations - and Further Research!

Within my own company we have already implemented many of these recommendations. In fact, Dell EMC produced one of its first data services during Year 2: the myService360 product.In addition, Dell EMC shipped our first product (Analytic Insights Module) that contains a catalog supporting data valuation.

While the research has already impacted the industry, it is still possible to continue in a variety of different research directions, as highlighted by the Data West agenda.

If you are in the San Diego area this week, there is still time to register. If you have specific questions or areas of research that you would like to explore, please comment or reach out to me on Twitter via the link below.

In this diagram we see Step 1: assign value to a business decision. At this point the business does not know which data sets could potentially contribute to the decision, and they also don't have any idea about the potential economic value of the data sets and analytic models within their organization. This lack of understanding about the economic value of their data and models is the cause of their struggles to leverage data as a capital asset.

Step 1 concludes that the business decision being made, for example, could result in 45 million dollars in additional revenue being brought into the company. We've highlighted in Step 1 that the CIO needs to be involved in this conversation and begin to update the IT infrastructure, with an initial focus on ingesting data into a data lake before tracking business usage. The diagram below highlights:

the identification of assets (e.g. sensor data) that are relevant to the business decision.

In step 3 we conduct the data science activity that will ultimately lead to the desired business outcome. The three data sets identified below become part of an analytic workspace, models are run, and new data sets are published, along with the analytic models that created them. These new data sets and models are also published into the lake, creating a lineage graph that accompanies the catalog.

The fourth and final step is to annotate the data sets and models with statements of value. Given that the CIO and/or a representative have been involved in the dialogue from the beginning, they understand which algorithms should be used to assign value. For example, the simplest algorithm would assign value as follows:

Divide the actual/potential business value ($45m) by the number of data sets

Divide the actual/potential business value ($45m) by the number of analytic models

Using these algorithms results in the following value assignment.

Using this approach the infrastructure has enabled the association of statements of value with data sets and analytic models. The business can continue to iterate on this model by applying data sets and models to different business decisions and continuing to annotate. Over time, as more and more data sets and models are annotated with value, the business develops the following capabilities:

They have a much better sense of how much to pay when purchasing a data set

They begin to understand the high-value data sets and models that could potentially be monetized

They can focus on data quality initiatives for high-value data sets that could be offered as a service or product

They can assess their investment in data science teams and processes and determine the effective ROI

This concludes the series of blog posts advising the CIO community on how to implement a data valuation framework.

For those CIOs or IT architects that are interested in continuing the discussion in a face-to-face setting, I recommend registration and attendance at the upcoming Data West winter forum in San Diego on December 13-14 2016.

October 28, 2016

This post describes the final insight in a series of five data valuation insights shared at Evanta's Global CIO Executive Summit at the Skytop Lodge in Pennsylvania last week. The insights have their roots in a research project that EMC launched with the San Diego Supercomputer Center nearly two years ago. Over the course of the past two years Dr. Jim Short and myself, along with various research partners, have spent a good deal of time interviewing CxOs, surveying the industry, and interacting with other academic partners. The insights generated during these two years was shared during an Evanta keynote entitled Data's Economic Value in the Age of Digital Business. The blog posts represent our intentions to socialize these insights more broadly. Before diving into the fifth and final post, I'd like to review the first four insights.

First Key Insight: CIOs Need to Insert Themselves Into the Data Value Conversation

Second Key Insight: Data Workflow and Ingest are the IT Touch Points for Measuring Data's Value

As part of the research into data value, Dell EMC's product teams had some brainstorming sessions to enumerate possible locations to run valuation algorithms. We determined that there were 5 candidates worthy of discussion (listed in the picture below), with 2 of them qualifying as the most likely to succeed (content ingest and content workflow).

This third insight represents conclusions reached based on our own implementation of a framework that treats data and analytic models as capital assets. Our Chief Data Governance Officer partnered with the IT organization to build a data catalog that combined business and technical metadata. As the teams focused on the full lifecycle of data and analytic models, they used a metadata enrichment approach to increase the overall value of the data. This focus on combining business and technical metadata allowed the company to release its first data service: MyService360.

Once a framework has been put in place to combine business metadata with technical metadata, all of the ingredients are in place to run algorithms that calculate value based on the business context. Once the calculated value (CV) has been established, whether it be for data, analytic models, or both, a mechanism must be in place to permanently record the CV against the asset. This requires that all assets exist in a catalog, and that each catalog entry can be annotated with the CV. The diagram below highlights a product that has these capabilities (the Analytic Insights Module or AIM).

Fifth Key Insight: Build Valuation Business Processes on Top of IT Valuation Services

The fifth and final step is for the IT organization to surface the annotation method to new corporate business processes that have been created for the purpose of treating data and models as capital assets with value. These new business processes can be thought of as the algorithms that the CIO helped to create in step 1. The annotation can be surfaced in two ways:

Manual APIs that associate value via a function call.

Automated approaches that are implemented as a running process that sits alongside the catalog and calculates value.

The diagram below depicts these two methods as they could be integrated with an Analytic Insights Module.

This fifth insight completes a vision of an end-to-end data valuation system.

In an upcoming post I will step through a real-world use case that highlights this system in action.

In this post I'd like to discuss the fourth insight regarding annotation.

Fourth Key Insight: Annotate Data & Models with Valuation Metadata

The message here is straightforward:

after the CIO has inserted him/herself into a corporate discussion on data value, and .....

after the CIO has decided on the valuation algorithms that are appropriate for the business, and ....

after the CIO has created the touch points for combining business and technical metadata, then.....

.... it is time provide the capability to annotate the value of data and models based on their relevance to the business.

The ability to annotate data based on its value sounds simple enough: call an API that associates a specific piece of metadata (e.g. BVI = 8, EVI = $100,000) with a data set or an analytic model.

However, the tricky part is to have this annotation function as part of an overall system that tracks data as a capital asset from cradle (e.g. ingest) to grave (e.g. deletion).

For this reason Dell EMC recently introduced a new module that can be embedded within our systems: the Analytic Insight Module (AIM). The diagram below highlights the creation of a catalog that can be used for valuation purposes.

At the heart of this diagram is the Data and Analytic Catalog (DAC). This component can serve as a catalog for every data set and every analytic model that is available for use within the system. Note that Attivio is used to scan an existing data lake and create the initial catalog by filling it with technicalmetadata.

I mentioned in CIO Insight #2 that ingest and data usage are the key IT touch points for valuation. The diagram above highlights that additional business/technical metadata can be created during ingest (by using technology from Zaloni). In addition, after data scientists perform their work in their own business context, they will "publish" their work results (analytic models and data sets) back into the data lake. This results in more business/technical metadata being added to the DAC.

The ability to associate metadata with data assets and models is the 4th piece of the puzzle for a full, system-wide implementation of data valuation. The remaining question pertains to whenthe actual association of value gets stored into the DAC.

I will outline the answer to this question in the Insight 5 of 5 on implementing a system for data valuation.

"Data scientists are often exploring data in the context of a business problem. This offers the opportunity to join business context together with data. The joining together of business and technical metadata maps nicely to CIO Insight #1: joining the discussion about data's value to the business."

Our research into data's value caused us to look internally at our own corporate data science initiatives to understand how our internal analytics projects were being tied to business value, and then associating that value back to the data sets and models. We discovered the work being driven by EMC's Chief Data Governance Officer, Barbara Latulippe. Barbara had been working with EMC's IT organization to build a new style of data governance framework that supported the valuation of data. This diagram is superimposed on an image of Barbara presenting her work during the Chief Data Officer and Information Quality Symposium at MIT this past July.

We discovered that an architecture had been created that accomplished the following:

Attivio has the ability to look down onto a repository (e.g. a Data Lake) and discover the technical metadata (e.g. schemas) that are available in the lake.

Collibra has the ability to track the business context in which the data is being used, e.g. as it is being ingested into the data lake, or as it is being used in data science activities.

The Spring layer has the ability to glue together the technical and business metadata into one "catalog".

The joining together of business and technical metadata allowed employees to "pipeline" data from raw ingest, to data innovation and consumption, to business insight, and all the way to the creation of one of EMC's first data services: MyService360. The diagram below highlights that as the governance (g) of metadata enrichment increased (the X-axis), the value (v) of the data increased as well (the Y-axis).

The ability of this framework to deliver a new data service was a turning point for our data value research. We began the research by hearing from our customers that they were struggling to create new data products and services. Internally we discovered that a team had successfully traveled down the path of data service creation. The key learning was the creation of a framework that combined business and technical metadata together.

The next phase of our research was to determine whether or not this key learning could be packaged up and delivered to the market at large.

The answer is yes, and in Insight 4 of 5 I will highlight new capabilities that can be leveraged in the fourth insight for the CIO community.

October 25, 2016

This week I began a set of summary blog posts from Evanta's Global CIO Executive Summit at the Skytop Lodge in Pennsylvania. During one of the sessions I presented a keynote focused on Data's Economic Value in the Age of Digital Business. In focusing on data value I followed the three themes of the summit: Innovation (summarizing several years of innovation in the area of data value), Execution (status update on some of the internal execution on those ideas), and Results (the industry results for calculating the value of data). I presented five key insights about data value, the first of which was recommending that CIOs insert themselves into the conversation about data value. In today's post I'd like to describe the second key insight: considering the IT touch points for data valuation.

Second Key Insight: Data Workflow and Ingest are the IT Touch Points for Measuring Data's Value

This second insight identifies exactly where within an IT architecture the data valuation algorithms should be located. Before we highlight these areas, let's review the algorithms themselves.

In my last post I introduced the exploration of the problem space: traditional product companies were struggling to increase the percentage of revenue coming from data products and services (as opposed to their traditional products and services). Dr. Jim Short conducted an extensive survey in this area and identified a sample set of problems being encountered in the area of data valuation. Corporations were struggling to...

accurately price data assets for sale/purchase

identify which data assets within an organization are the "most monetizable"

turn those assets into new products and services

understand which specific analytic models within an organization have resulted in business results

In spite of these and many other problems, the good news for CIOs is that industry luminaries such as Bill Schmarzo have made significant progress outlining a framework for having the conversation about data's value. Below is an image of Bill sharing his insights with Wikibon's Peter Burris. In the image below I have also depicted two articles (Bill's approach for calculating data value and analytic value). Bill contributed both articles to CIO.com in order to publicly document his approach (the full video of Bill's interview with Peter can be found here).

Bill's recommendations for calculating value could certainly be automated once the CIO has gained a full understanding of how data maps to specific business decisions. This would allow, for example, a valuation algorithm to be codified in a similar fashion to the algorithms described by Doug Laney as part of his Infonomics research.

This equation calculates the Business Value of Information by multiplying together several variables that represent data characteristics. Chief among them is the data's relevance: a summation of how many lines of business the data is mapped to, and how relevant (on a scale of 0 to 1) each data set is to each particular line of business.

So when considering the location that these algorithms run, CIOs have a variety of choices (as depicted below).

If we use the example of calculating business relevance, we see five potential options:

Calculate the value of content at rest (e.g. after it has been stored in a data lake). Earlier this year I highlighted potential methods of implementing this approach, which parses content and maps it against relevant lines of business. Two of the shortcomings of this approach are (a) too much data to parse, and (b) many CIOs do not want to disrupt the production system.

A second option that may be more attractive is to perform this valuation within the context of the data protection ecosystem. This not only allows valuation algorithms to execute outside of the context of a production system, but these algorithms also have a rich set of protection metadata that can inform valuation algorithms (e.g. application metadata, user metadata, backup schedules, etc). This approach has the shortcoming, however that the valuation algorithms may be evaluating older (potentially stale) content.

A third option is to perform valuation upon ingest. This option is often preferred because options #1 and #2 can be more difficult given the vast amount of legacy data that would need to be valued. CIOs can use frameworks such as Apache Storm for real-time and in-memory valuation.

A fourth option is to perform valuation via a tight integration with application deployment frameworks. In particular, if a devops team can correlate the frequency of continuous delivery to data sets, these data sets become more valuable as newer versions of applications generate and store new forms and types of data.

The final option is the lowest-hanging fruit: track the usage of data in the context of data scientists who are performing analytic workflows (which are ultimately intended for making business decisions that produce value).

After considering all of these options it became clear that option #5 was the lowest-hanging fruit. Data scientists are often exploring data in the context of a business problem. This offers the opportunity to join business context together with data. The joining together of business and technical metadata maps nicely to CIO Insight #1: joining the discussion about data's value to the business.

Our research also concluded that option #3 (valuation upon ingest) is highly desirable, and we therefore recommend that the CIO advise the infrastructure team to explore IT touch points for ingest as well as monitoring workflow for value.

The overall recommendation, therefore, is to focus on content workflow first. The implementation of workflow valuation frameworks is something that we have already done internally. Exactly how we perform this valuation is the 3rd CIO Insight, and I will focus on the specifics in Insight 3 of 5.

October 24, 2016

Last week I spoke at Evanta's Global CIO Executive Summit at the Skytop Lodge in Pennsylvania. My keynote focused on Data's Economic Value in the Age of Digital Business. One of the themes of the conference was "Innovate - Execute - Results". During the session we discussed (a) several years of innovation in the area of data value, (b) a status update on some of the internal execution on those ideas, and (c) the industry results for calculating the value of data. In this post I will begin to socialize the five key insights that were shared during the session.

First Key Insight: CIOs Need to Insert Themselves Into the Data Value Conversation

Consider the following statement from a joint 2014 Big Data Report conducted with Capgemini:

"Among our respondents, 63% consider that the monetization of data could eventually become as valuable to their organizations as their existing products and services".

Effective data monetization requires data valuation. As the industry struggled with data valuation in 2014, EMC commissioned San Diego Supercomputer Center's Jim Short to survey the state of the industry. Jim documented the use cases and came to the following conclusions:

Jim's research highlighted a number of data valuation struggles that are occurring in the industry. There is a lack of capability to....

accurately price data assets for sale/purchase

identify which data assets within an organization are the "most monetizable"

turn those assets into new products and services

understand which specific analytic models within an organization have resulted in business results

While there are no established business processes in place for data valuation, we did find that conversations on data value have started. These conversations, however, are typically not being held with the CIO. It is imperative that the CIO inserts him/herself into these conversations because IT support will be required for effective valuation.

There are numerous examples of industry experts driving valuation conversations.

BIll Schmarzo has laid out a framework for two different valuation discussions that he is having with Chief Data Officers. In each post Bill details a step by step approach to

Similarly, Doug Laney has been developing his Infonomics approach to data valuation and has generated a set of equations that CIOs can use as a baseline method for conducting a discussion on data value. During my presentation I listed one such equation on the slide below.

CIOs can take the research by Jim, combine it with learnings from industry experts like Bill and Doug, and use it to drive initial conversations about the value of data. Building a fluency in the discussion about data's value is the first and most critical step. Each organization will have different algorithms for calculating data's value. Once the CIO drives agreement on which algorithms are most important within their organization, they can move on to tackle the next most important insight.

Where should these algorithms run?

There are two key touch points within an IT architecture for implementing valuation. I will share these touch points in Insight 2 of 5.

October 17, 2016

Earlier this month David Goulden wrote about pushing information technology beyond yesterday's function. One of the key points that he made was that traditional IT systems function as a system of record. Much of the record keeping in a system of record is a function of company size (how many employees worked for the company), product portfolio (how many products are being sold), and sales engagements (how many customers are involved in potential or actual sales orders). Therefore the amount of data managed in a system or record grows fairly predictably, and business intelligence algorithms explore this data with relative ease.

The article proceeds to point out that those corporations that have built a system of record are being continually and increasingly disrupted by new companies that have built a system of engagement. A system of engagement interacts with external endpoints that generate orders of magnitude more data. This type of system is characterized, for example, by the following:

100% reliance on cloud-ready infrastructure

Continuous application delivery onto that infrastructure.

Data privacy built into the solution.

Delivery of security is evolutionary and agile.

Data storage is scalable based on the number of external people and things

User experience and interaction with people and things is natural and seamless

David's main point is that the successful transition between a system of record to a system of engagement is a spectrum; some companies are further down the path to a system of engagement than others. The survey conducted by Dell Technologies is summarized below.

When surveyed companies were asked to name their top priority IT investments over the next few years, the following answers surfaced:

Converged Infrastructure

Ultra-high performance compute

Analytics, big data, and data processing

Internet of Things technologies

Next generation mobile applications

The area in which I have been most interested in the last few years has been analytics, big data, and data processing (#3 on the list above) in terms of the critical importance of understanding the value of data. In my opinion, building a successful system of engagement means that corporate data assets are actually measured, managed, and monitored as capital assets. Later this week I will be attending the Evanta Global CIO Executive Summit. During my keynote (Data's Economic Value in the Age of Digital Business) I will summarize several years worth of data value research and recommend a best-practice approach for building a system of engagement that treats data as a capital asset.

October 11, 2016

Over the last few weeks I've been laying out a technical vision for Dell Technologies. The vision is intended to make clear an industry direction for the technology divisions that make up Dell Technologies: Dell Client, Dell EMC, SecureWorks, VMware, and Pivotal.

This visualization indicates that both Dell and EMC (pre-merger) contribute to each layer. For example, in a previous post I used the picture below to show both companies' contributions into Endpoint Technologies.

In this post I'd like to focus on the specific endpoint technology contributions made by Dell. I worked closely with Dell technologists Gaurav Chawla and Lee Zaretsky to understand as much as possible about the existing breadth of the Dell Client portfolio and potential future directions. In particular I asked Lee to propose a view into the Dell Client portfolio in the context of the left-to-right color scheme being used to describe the vision (blue is second platform optimization and green is third platform innovation). The product groupings shown above indicate the following:

These type of products have long represented the sweet spot of Dell's innovation efforts and will be a continued focus for Dell Technologies going forward. While some of these products are targeted towards the consumer market (e.g. XPS, Alienware), in this post I will emphasize the commercial market, where a high degree of manageability, security, and robustness are required. Each device can be deployed in a variety of business verticals: industrial, automotive, health, and oil and gas (to name a few). These endpoint technologies can be deployed in any use case where visualizing and generating data (be it graphical or textual) is required. While it is true that these client devices "grew up" in the client/server era (the blue shading in the diagram), they will continue to drive emerging use cases (green shading) for newer markets such as augmented reality and IoT. The data that they generate and display in conjunction with edge computing and upstream data centers makes these endpoints a critical part of the go-forward vision of a Digital Business Platform that spans all the way from the downstream endpoints (IoT&P) to the upstream cloud infrastructure.

In summary, these devices allow any commercial business to interact seamlessly with their employees (known as the "P" - people - in IoT&P).

I/O expansion products are wired or wireless docks that support a distributed computing model in a variety of differing environments.

The gateway solutions are not just interfaces that pass data (upstream) to cloud infrastructure for processing. The gateways can also implement rules based on some level of rudimentary machine intelligence, and they can facilitate local decisions made with very low latency happening at the gateway and locally (downstream) between controls and sensors. Certain data will be sent to upstream fog computing or cloud computing infrastructure for big data analytics, decision making, and policy definition/enforcement. Consider the following use cases where both gateways and I/O expansion products would be relevant:

Building Security

Environmental control systems

Camera/sensor systems and video surveillance

Predictive maintenance in industrial environments

Etc.

The products listed above give full, bi-directional access to the Internet of People and Things. As I described in a previous post, a strong interplay between downstream (client/endpoint technologies) and upstream (enterprise fog and cloud computing infrastructure) interactions can be combined (along with the cloud services layer in between) to form a next-generation Digital Business Platform that can programmatically (and automatically) influence business outcomes by interacting with both people and things.

In order for Dell Technologies' endpoint products to be run in this type of environment, each device must go through a rigorous design process. This process will result in devices that are best-in-class for qualities such as best display capability, best power characteristics, best wi-fi functionality, etc. Additionally, these devices will satisfy a set of anchor usages while also enabling Dell Technologies to pursue new markets. I will address many of these issues in an additional drill-down post.

Employer

Volunteer

Disclaimer

The opinions expressed here are my personal opinions. Content published here is not read or approved in advance by DELL Technologies and does not necessarily reflect the views and opinions of DELL Technologies nor does it constitute any official communication of DELL Technologies.