One size does not fit all
If you are a data architect or a data modeler, when you design a system or model a business process the first order of business would be understand the data-flow. Such an exercise would turn toward understanding data volume. Second aspect of that is, in cases where such models have been previously built, data architects and modelers will periodically check to see the ‘current applicability’ of their model(s) where data tends to (constantly) grow and change.

Is this important and if so why ? Yes and here is why.

A data model is not set on a stone
The assumption that once a business process has been modeled, it will stand time as businesses develop is wrong. Even when there are no changes to the model as is – that is the business expansion and process changes has not influenced one, except volume – data volume alone could dictate revisiting the model and in cases it will dictate changes as well. A fully functional data model might bring the business processes and system(s) to a grinding halt, if it does not address changes in data volume and misses out on volumetric’s.

Current state of data
Companies big and small rely more and more on data collection on more aspects of their business, more about their customers, more about their buying patterns. E-commerce and Mobile revolutions has made it possible for businesses to get a microscopic view of a transaction along with the profile of the buying or interested customer. Mixed with the influence of Social Networks, it is a data deluge.

More than ever, businesses are collecting information of all kinds – tags and identifiers, signals and readings collected from machine-parts, location coordinates from mobile devices, transactional data, customer demography, selling medium, buying patterns, geographical trends, effective promotions, cross-sell influences etc. That brings in a lot of data. That brings in a lot of stress on poorly designed systems and data models.

Advancements in Data Modeling
A data model reflects business, business processes and is constructed to efficiently manage data that is collected. Efficiency is measured when the business is able to get actionable intelligence out of it. Data architects are exploring, innovating and introducing news way to model information systems. As business grows, the data model evolves into ways in which it can address and manage the change and still serve the business analysts and data researchers of the organization. No longer is a modeler confined to a singular model or a structure – ER or others. Advancements in DBMS (Database Management System) also makes it possible to harness the power of underlying hardware – Client-Servers out of commodity hardware parts or Appliances which are built for a specific purpose.

Solutions after modeling
When the data volume grows into terabytes and petabytes, modeling demands newer approaches and solutions. As the legacy models while they solved and still solve the problems of the world, those problem statements were different. Expectations from such systems were different and mostly limited (in size). And so expansion (using such solutions) is limited too. Newer expectations are a different problem to solve. And hence they demand newer solutions.

And this is the case for Big Data solutions to deal with volume (one of the three V’s). We will see velocity and variety in detail and revisit volume to explore as how Big Data solutions deal with them all.

I get to interview candidates regularly, for my team or group or for others. Broadly, they fall into two categories – Technical or Business openings. Experience level of these candidates typically range from mid-to-senior level folks to senior levels.

Preamble

Starting point would be to clearly understand the job to which we are seeking candidates. No two jobs are same. So, I don’t take the interview towards these openings lightly. A good understanding of what we are looking (and what we are not looking for) provides clarity as how the interview should be structured

Equally, this leads to a better job description which helps the candidates to understand what’s out there and to see whether it matches their interest and expectations

All too often, I get resumes which are either irrelevant to the opening or, that it’d be stretch to even have a discussion with the candidate. This happens mostly because, the recruiter or the search-firm doesn’t have a good idea of the requirement or, they are stretched and under pressure to push few resumes. Either way, it doesn’t help us, so better to avoid such situations.

Preparing to interview

Assuming that all is taken care and one or more resumes have been short-listed to interview, next decision point is about a discussion over the phone or a face-to-face interview. I find these two, to be two different beasts to tackle.

A telephonic interview is quick, convenient and saves time. A face-to-face interview does take time, has possibilities to spin out of time but is a must and such an interaction naturally tells stories, which are simply not possible in a telephonic discussion.

Either way, I make sure that I get the resume ahead of time and make sure that I go through it, to understand more about the candidate.

While resume is a blob of text, that could be crafted in many ways, still it gives you the “context” of the candidate, and their background. This is important to me, to structure the questions or discussion points

I don’t write down questions to ask, but I always prepare on the topics I want to discuss. I then let the discussion lead the way toward spontaneous questions or follow ups. This also helps me, refine my questions over time, instead of just asking the same question to all the candidates.

Whether its a telephonic interview or face-to-face interview, leading to the time I always make sure the schedule is intact and gives me the time needed to have a good discussion.

If its a telephonic interview, I make sure that I have a good phone with stable connection and can reproduce good voice. And a backup if things go out of the way. Usually I call from a land-line and keep mobile-phone as a backup. (I make sure I put the phone on mute to not be distracted by new mails or phone calls or text messages)

If its a face-to-face interview, I always make sure to know the venue where I’d be meeting the candidate. And leading protocols as whether I’d bring the candidate from the lobby or, I’d be joining the candidate in the venue. On the room, proximity to rest rooms and break rooms.

Most importantly, I need to understand my position in the list of interviewers. Am I going first or in the middle or is this just a formal discussion where the decision (to hire or not hire) is already made. This again, helps me structure my questions with in the available time.

Next, I will write about how I conduct an interview and what I look for.

Let us continue to look at Big Data beyond the hype, by understanding two things – types of data that we deal with and need for change in the approach to deal with them.

Types of DataWe spend most of our time than ever, on our mobile devices – Mobile Phones, Tablets and traditions forms – Laptops, Desktops etc. Then you have other forms of electronic devices such as Nest, Internet Televisions etc. which are data driven and getting smarter by the day to give us a “integrated experience”. All of them are inter-connected consuming ton of information on the Internet – tons of different types of information. We use our phones with apps (or applications) to listen to songs, to catch up on movie trailers, to turn on our cars’ air conditioner before we get in and to even control heaters at home based on consumption and usage patterns.

Data from these can be classified broadly to have the following characteristics – Volume, Velocity and Variety or 3V’s

Volume refers to the large amount of information collected and that we have to deal with – in Gigabytes, Terabytes or Petabytes

Velocity refers to the streaming aspect of such information – Streaming data

Variety refers to the different types of information – Comments on Social Networks to Readings from room heaters to Blogs

ApproachOne could think about these and ask “Have we not been dealing with them already, what is new now?”

It is akin to the question “I have been using mobile phones for a long time, what is special and different about iPhone?”

When you have to deal with data which has the above characteristics, the approach that you have used so far, dealing with them in a small or restricted scale are no longer applicable or they need to be revisited. As volume grows, as variety of data flows in – approaches have to change as well.

Just as one would seek an iPhone for its rich user experience, big app ecosystem, powered by unparalleled technological innovation, a CIO of Data Manager would seek solutions that are driven by innovations in managing such data and to drive value of them. And that is the case for Big Data solutions. We will explore more on 3V’s in the next part.

It has been an interesting last few weeks, to discuss and brain storm on Big Data. Broadly these conversations were around the following

Should I invest in Big Data for my enterprise today ?

Is this just an hype to pass ?

What are the challenges ?

Hype
Let us be honest and agree that there is hype. Forrester as Gartner agrees with that. Big Data is a buzz word today. Also, the fact is Big Data matured amongst us, in a much rapid manner that it is hard to ignore. But it does not mean that it is without merit.

Before we decide to keep it or throw it away or to get carried away by the “buzz words” or “hype”, let us look at few common situations through the eyes of different persona’s.

If you are a CIO

Look at the amount of information that flows in and out of your enterprise

Is the data management program efficient enough ?

Look at your analytical systems and process

Are they constrained by existing system’ capabilities ?

Look at your analytical reports

Are you getting ‘actionable intelligence’ ?

Listen to the “voice of your business”

Is your business analyst in need of more, to take your business to the next level ?

Look at the types of data (Structured, Semi-Structured, Quasi Structured, Unstructured) you manage

Can your systems and applications handle them all ?

Look at your HA (High Availability) and Fault Tolerance thresholds

How economic is the existing setup ?

Look at your existing setup

What could be virtualized ?

Look at the ROI on your systems and processes ?

Take a strategic view of what could be introduced (anew) or changed or tweaked in your eco-system to bring in efficiency and value-add?

If you head a Services Company

Listen your Customers (Clients)

What is their business problem ?

Look at your current engagements

Can you identify opportunities to expand the conversation based on their business problems ?

I came across a very interesting question today in my MBA semester exam. The paper was on Quality Management, a quintessential practice in Data Engineering. The question was

“What is the role of leadership in Quality Management” ?

This question started to linger in my mind ever since, but more for Data Management. What is the role of leadership in Data Management ?

We are in the middle of a Data Revolution, where Data Engineering leverages all technological advances made in Distributed Computing, Complex Modeling and Storage. Big Data did not happen by accident. It reflects the maturity of space trying to address complex situations and requirements.

So what does it take to be a leader ?

Do you lead to participate in this revolution ?
OR
Do you lead the revolution itself ?

Further to my introduction, let us look at a typical Data Warehouse/Business Intelligence development process. This basic understanding is needed to decide whether any model for remote-development and/or distributed development could fit in.

Data Operations
Data Operations is an inter-layer between different groups with your company. They store and analyze information from every line of your business – Product, Sales, Inventory, Marketing, Finance etc.

These act as integrated unit with all business groups. They interact directly with stake holders to understand business flows, logic and purpose. They also work with them closely in helping them consume data – in other words ‘helping find value for business using data’.

Data Team
The data team typically has a business analysts who act as a channel between business groups and technical teams (of data). They have a project and program manager who manages projects and deliverable’s. They also would have one or more data architects who is the custodian of the entire data architecture and infrastructure. And they could have data engineers and reporting engineers, who handle develop programs to process data and do reporting respectively.

Stake Holders
On the business side of this, you have stake holder directly interacting with the data team members. They could be Business Owners, Domain Specialists, Functional Business Analysts or facilitators to the Executive Management of the company.

As you can see, the data team and its operation crisscross most if not all of the different functions of your business.

Development Model
A development model of a data team starts with business requirements (compiled as a BRD – Business Requirement Document or PRD – Product Requirement Document) compiled and driven by the business.

These are analyzed by the data team to prepare Functional and Technical Specifications. Project plan is drawn which includes details on the development plan. These delivery plans are typically iterative and has shorter response/release cycles for businesses to get incremental value.

With the basic understanding of different teams and their purpose, let us understand the Data warehouse Development Life Cycle more.