Disclaimer:

These are my personal views and are meant for Informational purpose only. Please verify the Information via Professional help or via Official references before acting upon the information provided in this Blog.

But I might have gotten lucky since I got into this for the right reasons. I was looking for a role that had a little bit of both tech & business and so few years back, Business Intelligence and Data Analysis seemed like a great place to start. So I did that for a while. Then industry evolved and the analytics maturity of the companies that I worked also evolved and so worked on building predictive models and became what they now call “Data scientist”.

It doesn’t mean that data science is the right role for everyone.

One of my friends feels that it’s not that “technical” and doesn’t like this role. He is more than happy with data engineer role where he gets to build stuff and dive deeper into technologies.

One of my other friends doesn’t like that you don’t own business/product outcomes and prefers a product manager role (even though he has worked as a data analyst for a while now and is working on transitioning away).

So, just based on the empirical data that I have, data science might not be an ideal path for everyone.

If you create bunch of reports and help answer what happened— then try to help business users with why it happened. [Example: Instead of just sending website traffic info, add why the traffic spikes (up/downs) are happening]

If you are working on building bunch of models that answer why questions then try to help build predictive models next [Example: You have been working on a model that helped you answer why customers churned. Now built upon that and predict which customers will churn next]

If you do analytics and data science well and are already answering what, why, what’s next questions and you’re killing it! Then figure out how can you help business owners take action. Or make it easier than ever before to take actions on your data/recommendations.

Other answers for questions are directly/indirectly covered if you do this:

You will have to pick the right tool for the job

You will have to continuously keep learning (by taking online courses and/or you-tube)

Don’t just be a data analyst, be a thought partner to business owners and if possible, transition into role that help you own business outcomes.

Like this:

There are lot of ways to apply a CLV (customer lifetime value) model. But I hadn’t seen a single document that would summarize all of them — Until I saw this: http://srepho.github.io/CLV/CLV

If you are building a CLV model, one of first things that you might want to figure out is whether you have a contractual model or non-contractual model. And then figure out which methodology would work best for you. Here are 8 methods that were summarized in the link that I shared with you:

Missing Data:

Some of the attributes of a field are missing: Like Postal Code in an address field

Non-standardized:

Check if all the values are standardized: Google, Google Inc & Alphabet might need to be standardized and categorized as Alphabet

Different Date formats used in the same field (MM/DD/YYYY and DD/MM/YYYY)

Incomplete:

Total size of data (# of rows/columns): Sometimes you may not have all the rows that you were expecting (for e.g. 100k rows for each of your 100k customers) and if that’s not the case then that tells us that we don’t complete dataset at hand

Erroneous:

Outlier: If someone;s age is 250 then that’s an outlier but also it’s an error somewhere in the data pipeline that needs to be fixed; outliers can be detected using creating quick data visualization

Data Type mismatch: If a text field is in a field where other entries are integer that’s also an error

Duplicates:

Duplicates can be introduced in the data e.g. same rows duplicated in the dataset so that needs to be de-duplicated

Like this:

Data cleaning takes up a lot of time during a data science process; it’s not necessarily a bad thing and time spent on cleaning data is worthwhile in most cases; To that end, I was researching some framework that might help me make this process a little bit faster. As a part of my research, I found the Journal of statistical software paper written by Hadley Wickham which had a really good framework to “tidy” data — which is part of data cleaning process.

Author does a great job of defining tidy data:

1. Each variable forms a column.

2. Each observation forms a row.

3. Each type of observational unit forms a table.

And then applying it to 5 examples:

1. Column headers are values, not variable names.

2. Multiple variables are stored in one column.

3. Variables are stored in both rows and columns.

4. Multiple types of observational units are stored in the same table.

It depends on your target industry & where they are in their life-cycle.

It has four stages: Startup, Growth, Maturity, Decline.

Generalization is great in earlier stages. If you are targeting jobs at startups; generalize. You should know enough about lot of things.

T-shaped professionals are great for Growth stage. They specialize in something but still know enough about lot of things. E.g. Sr Growth/Marketing Analyst. Know enough about analytics & data science to be dangerous but specializes in marketing.

Specialization is great for mature industries. They know a lot about few things. E.g. Statisticians in an Insurance industry. They have made careers out of building risk models.