Interview with Sandhya Gopalan, Data Scientist at DXC Technology

Sandhya Gopalan Post Graduated in Computer Applications. Right from her college days, Sandhya was fascinated with pattern mining, text mining and image recognition. Along with her college friends, they used to develop many image recognition and text recognition components for fun. Sandhya started her career as a Software Developer and worked in the mainframe. After 6 years into the job, she started working on small scale projects in Data Science. After a year from there, she moved fulltime into Data Science. The fascination and thirst for Data Science helped Sandhya Gopalan to switch from Software Development to Data Science. However, the heavy programming and database manipulation knowledge involved in software development definitely helped a lot.

How did you get into Data Analytics? What interested you in learning Data Analytics?

Sandhya Gopalan: While working as a software developer, I started exploring the data of customer at that time who happened to be in retail financial industry. With very basic statistics, powerful insights were unearthed from customer transaction data. This got the attention and we formed a small team and started working on exploratory and predictive analytics projects for the same customer. That was the beginning. The power of data and its ability to help an individual to a government when used wisely always kindles my interest and curiosity. Especially the scope of analytics for social good like reducing crimes against women and children, detecting human trafficking, aiding in agriculture, identifying cancer is awe inspiring.

What was the first data set you remember working with? What did you do with it?

Sandhya Gopalan: My first data set was transaction data of customers of retail financial client. I made a dashboard to showcase the usage and spending habits of customers.

Free Data Analytics Webinar

Date: 24th Jan, 2019 (Thursday)Time: 3 PM to 4 PM (IST/GMT +5:30)

Was there a specific “aha” moment when you realized the power of data?

Sandhya Gopalan: The ‘aha’ moment came when fraudulent activities were uncovered from transactional data. With a simple analytics model in place, we were able to identify and save hundreds and thousands of dollars from fraudulent activities.

What is your typical day-in-a-life in your current job? Where do you spend most of your time?

Sandhya Gopalan: Working on new use cases, requirements and projects take up the time. In a data science project, 50-60% of time is spent around data cleaning, manipulation, exploration and feature engineering. Rest of the time goes in experimenting with various machine learning algorithms, building models, testing and automation. The important part after all these steps is presenting results to business and story telling.

How do you stay updated on the latest trends in Data Analytics? Which are the Data Analytics resources (i.e. blogs/websites/apps) you visit regularly?

Sandhya Gopalan: I am a reader of KDnuggets, fast.ai. I follow the subreddit thread on machine learning. It is a long thread but there are a lot of gems in that thread. I am also a fan of openreview.net where we can find many machine learning papers and discussions on those papers.

Share the names of 3 people that you follow in the field of Data Science.

Sandhya Gopalan: Andrew Ng, Daphne Koller & Rachel Thomas.

Team, Skills and Tools

Which are your favourite Data Analytics Tools that you use to perform in your job, and what are the other tools used widely in your team?

Sandhya Gopalan: We at Analytics Datalabs, DXC technology widely use Python, Scala and R for exploration and modeling. Python is very versatile and efficient with its libraries capable from implementing regression to deep learning. For Visualizations, various tools like Tableau, Spotfire, Power BI are used. Depending on the size of the data, a solution is built on desktop or cloud platform. The language and platform is decided based upon the requirement, use case and infrastructure.

What are the different roles and skills within your data team?

Sandhya Gopalan: Everyone has advanced understanding on statistics and modeling. On top of this, each of us has specialization in either one or many of the methods like pattern recognition or forecasting or image recognition or Natural Language Processing or Speech recognition or Hadoop components or advanced visualizations.

Help describe some examples of the kind of problems your team is solving in this year?

Sandhya Gopalan: We solve problems across variety of domains like Health care, Automobile, Finance, Insurance, Energy, Logistics, IT and many more. A few of the problems we have solved/are solving- Enabling road safety for connected cars, Smart pricing for automobile spare parts, Forecasting power for energy supplier, AI powered information extraction for insurance, Predicting anomalies in credit card transactions.

How do you measure the performance of your team?

Sandhya Gopalan: All our projects are in direct collaboration with our clients, built with them and for them. Hence customer feedback is a strong testament to our performance. Not only what they say, but how it has helped their business. At a broad level, impact to the business remains the most critical factor.

Free Data Analytics Webinar

Date: 24th Jan, 2019 (Thursday)Time: 3 PM to 4 PM (IST/GMT +5:30)

Advice to Aspiring Data Scientists

According to you, what are the top skills, both technical and soft-skills that are needed for Data Analysts and Data Scientists?

Sandhya Gopalan: Good grasp of Mathematics for Machine learning like Linear Algebra, Calculus and Probability. To be a machine learning practitioner, it is not necessary to code various machine learning algorithms by themselves. We have amazing libraries available. But it is essential to understand the mechanisms of algorithm, input parameters and interpretation of output. Machine learning is neither a magical potion nor a black box. Lack of understanding will lead to inefficient analytics model leaving things to chance. It makes tangible difference to explore and implement research papers available for similar problems/use cases. Presenting the insights and results of data science project/solution with intuitive explanation to business is nonnegotiable. Storytelling is an important skill and like any skill requires lot of practice. Above all, continuous learning and curiosity to explore helps in the long run. In addition ability to write programs is an important skill as well.

How much focus should aspiring data practitioners do in working with messy, noisy data? What are the other areas that they must build their expertise in?

Sandhya Gopalan: Real world data is indeed noisy and messy. Very rarely, there will be data that can be consumed in raw form. It is like spotting a Unicorn. Hence developing skills in cleaning messy data, manipulating data to handle missing values is essential. 40-50% of effort in a data science project is spent in data manipulation and feature engineering. Learning to follow coding and programming standards to create readable and reusable components is equally important in data science as prevailing in software development.

What is your advice for newbies, Data Science students or practitioners who are looking at building a career in Data Analytics industry?

Sandhya Gopalan:

Learn mathematics behind machine learning

Spend more time in data manipulation and exploration and learn various techniques around it

Never take data for granted and always justify actions using statistical evidence

Understand and practice implementing different machine learning algorithms. Keep working on newer projects and show case using Github, Blogs etc.

Explore and present results using graphs and charts using tools such as Tableau, Spotfire etc. and also using programming languages/scripts like Python/R/D3JS.

Keeping solution simple is more important than using fancy algorithms. For example, if a solution with similar accuracy can be arrived using linear regression and Recurring Neural Network (RNN), choose linear regression over RNN

Continuously learn by keeping up with new trends

Learn to build end to end data science solution than a standalone solution

Resilience and Curiosity can make anyone as an amazing data scientist

Last but not the least, use common sense. No matter what features come out to be statistically important, unless that has intuitive business meaning, the models will be looked at with suspicion.

What are the changing trends that you foresee in the field of Data Science and what do you recommend the current crop of data analysts do to keep pace?

Sandhya Gopalan: There is huge progress in every front of data science – from consumption of data to presentation. There is equal progress in the hardware used in data science. One has to select their interesting area and keep themselves updated on trending topics. Currently, a lot of research is happening in making deep learning in algorithm, hardware and software end. Following your favorite researcher, reading papers in interesting topic will help to keep pace.

Would you like to share few words about the work we are doing at Digital Vidya in developing Data Analytics Talent for the industry?

Sandhya Gopalan:

Digital Vidya is playing an important role in creating talented data analytics professionals. The course content for data analytics is wholesome and hands on. They are also doing excellent work by publishing interviews of experienced data analytics professionals to help the beginners.

[VP, Digital Vidya] Shweta is responsible for the product development and delivery. She enjoys simplifying the concepts and theory; and teaching coding that is conveyed and understood to aspiring data scientists, and sees it as the core of learning. She has 19+ years of technology experience.