Archive for November, 2016

All of us are born with special talents. It’s just a matter of time until we discover it and start believing in ourselves.

Some people struggle when they start coding in R. Sometimes a lot more can be done than one can ever think! Some people have never ever coded, not even <Hello World> in their entire life. Below are the several non-coding tools available for data analysis.

List of Non Programming Tools

1. Excel / Spreadsheet

If anyone is transitioning into data science or have already survived for years, they would know, excel remains an indispensable part of analytics industry. Even today, most of the problems faced in analytics projects are solved using this software. It supports all the important features like summarizing data, visualizing data, data wrangling etc. which are powerful enough to inspect data from all possible angles. No matter how many tools a person knows, excel must feature in their armory. Though, Microsoft excel is paid but they can still try various other spreadsheet tools like open office, google docs, which are certainly worth a try!

Trifacta

Trifacta’s Wrangler tool is challenging the traditional methods of data cleaning and manipulation. Since, excel possess limitations on data size, this tool has no such boundaries and everyone can securely work on big data sets. This tool has incredible features such as chart recommendations, inbuilt algorithms, analysis insights using which anyone can generate reports in no time. It’s an intelligent tool focused on solving business problems faster, thereby allowing us to be more productive at data related exercises.

Rapid Miner

This tool emerged as a leader in 2016 Gartner Magic Quadrant for Advanced Analytics. It’s more than a data cleaning tool. It extends its expertise in building machine learning models. It comprises all the ML algorithms which used frequently. Not just a GUI, it also extends support to people using Python & R for model building. In short, it’s a complete tool for any business which requires performing all tasks from data loading to model deployment.

Rattle GUI

If anyone has tried using R, but couldn’t get a knack of what’s going in, Rattle should be their first choice. This GUI is built on R and gets launched by typing install.packages(“rattle”) followed by library(rattle) then rattle() in R. Therefore, to use rattle it’s a must to install R. It’s also more than just data mining tool. Rattle supports various ML algorithms such as Tree, SVM, Boosting, Neural Net, Survival, Linear models etc.

Qlikview

Qlikview is one of the most popular tools in business intelligence industry around the world. This tool derives business insights and presents it in an awesome manner. With its art visualization capabilities, it gives tremendous amount of control while working on data. It has an inbuilt recommendation engine updated from time to time about best visualization methods while working on data sets.

Weka

An advantage of using Weka is that it is easy to learn. Being a machine learning tool, its interface is intuitive enough to get the job done quickly. It provides options for data pre-processing, classification, regression, clustering, association rules and visualization. Most of the steps while model building can be achieved using Weka. It’s built on Java.

7. KNIME

Similar to RapidMiner, KNIME offers an open source analytics platform for analyzing data, which can later be deployed, scaled using other supportive KNIME products. This tool has abundance of features on data blending, visualization and advanced machine learning algorithms. By using this tool one can build models also.

Orange

As cool as its sounds, this tool is designed to produce interactive data visualizations and data mining tasks. There are enough youtube tutorial to learn this tool. It has an extensive library of data mining tasks which includes all classification, regression, clustering methods. Along with, the versatile visualizations which get formed during data analysis allowing to understand the data more closely.

Tableau Public

Tableau is a data visualization software. We can say, tableau and qlikview are the most powerful sharks in business intelligence ocean. The comparison of superiority is never ending. It’s a fast visualization software which allows exploring of data, every observation using various possible charts. It’s intelligent algorithms figure out by self about the type of data, best method available etc. For understanding data in real time, tableau can get the job done. In a way, tableau imparts a colorful life to data and allows sharing work with others.

Data Wrapper

It’s a lightning fast visualization software. When someone gets assigned BI work, and the person has no clue what to do, this software is a considerable option. It’s visualization bucket comprises of line chart, bar chart, column chart, pie chart, stacked bar chart and maps. So, it’s a basic software and can’t be compared with giants like tableau and qlikview. This tools is browser enabled and doesn’t require any software installation.

Data Science Studio (DSS)

It is a powerful tool designed to connect technology, business and data. It is available in two segments: Coding & Non-Coding. It’s a complete package for any organization which aims to develop, build, deploy and scale models on network. DSS is also powerful enough to create smart data applications to solve real world problems. It comprises of features which facilitates team integration on projects. Among all features, the most interesting part is that work can be reproduced in DSS as every action in the system is versioned through an integrated GIT repository.

12. OpenRefine

It started as Google Refine but looks like google plummeted this project due to unclear reasons. However, this tool is still available renamed as Open Refine. Among the generous list of open source tools, openrefine specializes in data cleaning, transforming and shaping it for predictive modeling purposes. As an interesting fact, during model building, 80% time of an analyst is spent in data cleaning. Not so pleasant, but it’s the fact. Using openrefine, analysts can not only save their time, but also put it to use for productive work.

Talend

Decision making these days is largely driven by data. Managers & professionals no longer make gut-based decision. They require a tool which can help them quickly. Talend can help them to explore data and support their decision making. Precisely, it’s a data collaboration tool capable of clean, transform and visualize data. Moreover, it also offers an interesting automation feature where a person can save and redo their previous task on a new data set. This feature is unique and hasn’t been found in many tools. Also, it makes auto discovery, provides smart suggestion to the user for enhanced data analysis.

Data Preparator

This tool is built on Java to assist in data exploration, cleaning and analysis. It includes various inbuilt packages for discretization, numeration, scaling, attribute selection, missing values, outliers, statistics, visualization, balancing, sampling, row selection, and several other tasks. It’s GUI is intuitive and simple to understand. Once someone starts working on it, it wouldn’t take lot of time to figure out how to work. A unique advantage of this tool is, the data set used for analysis doesn’t get stored in computer memory. This means it’s possible to work on large data sets without having any speed or memory troubles.

DataCracker

It’s a data analysis software which specializes on survey data. Many companies do surveys but they struggle to analyze it statistically. Survey data’s are never clean. It comprises of lot of missing & inappropriate value. This tool reduces agony and enhances experience of working on messy data. This tool is designed such that it can load data from all major internet survey programs like survey monkey, survey gizmo etc.

16. Data Applied

This powerful interactive tool is designed to build, share, design data analysis reports. Creating visualization on large data sets can sometimes be troublesome. But this tool is robust in visualizing large amounts of data using tree maps. Like all other tools above, it has feature for data transformation, statistical analysis, detecting anomalies etc.

17. Tanagra Project

This tool is old fashioned UI, but this free data mining software is designed to build machine learning models. Tanagra project started as free software for academic and research purposes. Being an open source project, it provides enough space to devise its own algorithm and contribute.

H2o

H2o is one of the most popular software in analytics industry today. In few years, this organization has succeeded in evangelizing the analytics community around the world. With this open source software, they bring lighting fast analytics experience, which is further extended using API for programming languages. Not just data analysis, but allows for building advanced machine learning models in no time.

Bonus Additions:

In addition to the awesome tools above, below are some more tools which might be interesting to look at. However, these tools aren’t free but available for trial:

Data Kleenr

Data Ladder

Data Cleaner

WinPure

End Notes

Once a person starts working on these tools they would understand that knowing programming for predictive modeling isn’t much advantageous. They can accomplish the same thing with these open source tools. Therefore, until now, if anyone was disappointed at their lack of non-coding, now is the time you channelize their enthusiasm on these tools.

The only limitation with these tools (some of them) is, lack of community support. Except few tools, several of them don’t have a community to seek help and suggestions. Still, it’s worth a try!

PS: All the above are personal perspective on the basis of exposure to information provided by Analytics Vidhya

The Analytics & Big Data sector has been consistently growing in the last five years despite an increasingly volatile and undetermined global outlook. Despite of this outlook the analytics and Big Data market is expected to grow in the overall IT markets. Here we assess the global scenario reasons for the growth, individual returns in terms of salary and what impact Analytics and Big Data has in todays economy.

Impact on businesses

Analytics & Big Data have revolutionized the way business is done around the world. All companies, small sized or fortune 500, rely on data and analytics to make critical business decisions. From understanding consumer behavior to predicting market trends, even right down to product features, many moves are driven by analytics and data in companies across the world.

In today’s global world Big Data and Analytics is used in Entertainment, Education, Transportation, Government, Defense, Retail, Health care, Finance.

For Example, Amazon is one of the leading consumer companies in the world using analytics and Big Data to configure their products, services and delivery. Amazon uses analytics to suggest products on their customer homepage based on the customer’s previous purchase history and browsing habits. They analyse the customer mindset based on the sites frequently visited and the products purchased from other sites.

Twitter uses analytics to fill your news feed with updates from people you interact with the most. Flipkart and Snapdeal use Predictive Analytics. The Postal Service invests in gathering and analyzing data to improve last mile delivery operations

Hiring trends

Earlier people used to be specialized in just one tool or domain but not anymore. Just getting a Master’s degree or a MBA on a resume does no longer impress hiring companies. Current day Companies are investing in employees who know how to use the entire tool set of analytics & Big Data. This year people who know R and Python command a premium. Companies are looking for Data Scientists. They want people who have the business knowledge and understanding of analytics. The person would need to know the market analysis and business knowledge of similar industries to get that job.

R and Python is the front-runner in the analytics race. If the skill set is more diverse and business knowledge oriented the more the person earns.

Strategists and Analysts skilled in both Big Data and Data Science are being snapped up at the highest salaries. Cash-strapped startups spend money on their star analysts but when it comes to tools, they prefer to use open-source ones like R.

Analysts can expect a steep increase in their salaries once they cross the 5-year mark.

Big Data analysts are on a better earning foothold. Big Data professionals earn more than data scientists, but at the same time if they are combined together and know how to work with both they would get a larger payout – Data scientist + Big Data Analyst.

Earnings and skill requirement as per Companies Size

Startup companies and mid-size companies need people who know R and they are willing to pay top dollar for it. R is in great demand across the board. But, if a person wants to join a large company, they would need to add SAS to their skills. This is due to the fact that larger companies can afford to pay for proprietary software’s like SAS which may not be available with smaller companies. The biggest jump in salaries is seen after the 5 year period, where analysts can expect up to a 70% raise with an average pay.

Guidelines for Anyone Considering a Career in Big Data Analytics

Self-assessment

Data scientist is person with an analytical mindset. Data analysts have an inquisitive mind and enjoy quiz and solving complex puzzles. They also spend time on analyzing numbers, inspecting huge financial data to see if they can perceive any meaningful patterns or tell any discrepancies.

After self-assessment one can analyse if they want to go to college get a degree in data science, or if doing an intensive certification course is more beneficial. The person can do a research to figure out which universities or institutes offer the courses or programs that suit their profile.

Familiarizing oneself with the data analytics landscape

Analytics comprises of various techniques and tools that can be utilized in different variations for the purpose of diversity in business or healthcare management. The person wishing to pursue data analytics must analyse the intended work ‘domain’ in order to decide the kind of courses they want to take. Keep in mind that some software skills are in greater demand in some professional sectors than others.

In-Demand Data Analytics Skills

While data analytics skills are in great demand these days, some skills are more in demand than others. A person’s hands-on experience with different kinds of software will help command better salaries than expertise in just one.

Big Data

Popular Big Data-specific skills include statistics, programming, and mathematical modelling. A combined knowledge of R and Python, can equip the person with these skills.

R

Sometimes referred to as ‘a hyperactive version of excel’, R is used by organizations as varied as Facebook, Google and leading news agencies. It’s used to sift through large data sets, that it can then easily ‘manipulate’ using modelling techniques and powerful data visualization tools.

Python

Python is a versatile, open-source programming language and framework. It is fairly easy to learn and pick up Python’s framework. It can be used to create web apps and also perform analytics Python is leading as one of the most popular coding language in the world. It was developed by Guido Van Rossum in the mid-90s.

Hadoop

Like Big Data, Hadoop is increasingly being referenced in job advertisements. Due to large capacity Hadoop computes with Big Data on a large scale. Hadoop’s growing demand and appeal shows no sign of decreasing in the coming years.

Following are the industries driving the growing demand for data science skills. This growing trend is expected to keep increasing in the coming years.

SAP Fiori is a platform that provides the porting of applications on mobile devices. SAP Fiori is based on SAP’s technology platform called NetWeaver. SAP Fiori platform was announced on May 15, 2013.

SAP Fiori enables applications to be used on desktops, smartphones & tablets. SAP Fiori supports HTML5. The initial release had few applications to start and gradually increased to complete the SAP Business Suite.

SAP Fiori is a collection of apps with basic and easy to utilize experience for frequently used SAP software functions that work harmoniously across devices. These apps provide simple and easy-to-use access across desktops, tablets, and smartphones.

SAP Fiori improves end user productivity by simplifying and displaying day to day tasks across any device.

Fiori is more than just a new user interface. It is a set of cross-device applications that, among other things, allow users to start a process on their desktop and continue it on a tablet or smartphone.

SAP is developing its Fiori apps on its latest user interface framework, SAPUI5.

SAP lists three types of Fiori apps:

Transactional apps- Here the end users carry out transactions on mobile devices and desktops. For example there is an app for creating a leave of absence request and another for approving timesheets.

Fact sheets – This app exhibits information about main business objects in SAP. For example, there is a fact-sheet app for viewing a central purchase orders; it allows end users to look into related entities, such as purchase contacts, terms & conditions under contract.

Analytical apps – This app allow users to display main performance measures and other important information about the business.

So far, SAP has released two waves of Fiori apps of 25 apps each, with additional waves under way.