Exploratory Data Analysis

Histogram plotting, input is a list of distributions we want to plot, specify bins, can also weigh each sample differently, it doesn't have to be count 1. hist function can return values. How many items in each bin, and the plot.

It is also important to do feature extraction, simply the data, reduce computational cost, dimensionality reduction before feeding data into a machine learning algorithm. Algorithms will run faster, more efficiently, use less memory space, and even perform better, in some cases.

Anomaly detection, outlier detection to handle or remove outliers and abnormality in the data to help the model generalize better and be a more accurate representation.

Machine Learning

Machine Learning is emerging as a popular field of data science. It has predictive power, employs applied statistics and pattern recognition technologies.

Machine learning is taking data mining to the next level.

Major machine learning tasks include classification, regression and clustering.

Questions that Business Analysts and Decision Makers are Interested In

Who are the best customers? aka Who are the customers with the best Customer Life Value

Causal relationship:

Results of recent experiments (More prevalent in Startup Culture)

Hypothesis if one segmentation is actually different from another

Is the result significant or is it random chance

Please note that causal relationship determination requires controlled studies to control for extraneous variables. In many industries, such as biotech, statistical significance is a must, a prerequisite for next step analysis or more business investments.

Sunday, August 5, 2018

Did you know that having a daily routine improves efficiency and productivity? Mark Zuckerberg of Facebook famously wear the same grey shirt and hoodie on a daily basis to simplify wardrobe choices and save minutes each day.

Automate everything: use API connectors to connect applications such as Gmail, Shopify and Trello without coding: Zapier, IFTTT, Do Button

Join an online initiative to go complaint free for one month and induce positivity in your life https://gonoco.com/

Easily distracted? Having trouble finishing meaningful tasks? Try a 30/30 timer rule: switching tasks every 30 minutes. There are iOS apps that time you and chime for you to make a switch and move on. http://3030.binaryhammer.com/

Social Network, Social Marketing and Growth Lifehacks

Use Tweepi to flush Twitter followers that are inactive or don't follow you back

Personal Finance Productivity

Use a stock, mutual fund screener to find stocks and funds that match your investment goals for your 401K plan

Developer Productivity:

Pair programming for productivity - AirPair and Pivot Labs, a premium development consulting agency for startups and new technology companies, talk about pair programming for developer productivity http://www.airpair.com/pair-programming/

Always look for shortcuts and do more things faster. Some developers even use fast notetaking apps like notational velocity and combine it with hot keys to shave fraction of seconds off their daily routine.

Code a mobile app without learning iOS development or Android ionic framework http://ionicframework.com/

Startup Productivity

Use prototype and wireframes as visual aid to communicate product visions and designs, clearly.

Did you know that having a 3D printed prototype generate 3x more feedback for architecture and physical product designers than just having a concept drawing?

Did you know that famous universities like Stanford teach students to print or draw iOS UIs and designs on paper and walk user through imaginary steps to get design feedback before they code?

Looking for great business ideas? Use a startup name or domain generator to get inspired!

Udacity Launches AI for trading with WorldQuant, also its hiring partner. Ready to do artificial intelligence for fintech, this may be your nanodegree! What's the ultimate dream? Probably join a quantitative traded hedge fund, eventually. It is said that a little less than 30% of all US trades are done by computers. Specifically you want python for finance and historical data skills.
- https://blog.udacity.com/2018/08/introducing-the-artificial-intelligence-for-trading-nanodegree-program.html

Author Adam Fisher launches Valley of Genius as told by the hackers, founders, and freaks who made it. If you like HBO's Silicon Valley, you will probably like these unicorn and innovator stories of Silicon Valley

Great Escape! Medium is running an August author challenge: tell Medium why and how you quit your job! https://medium.com/s/greatescape/tell-us-about-the-best-time-you-quit-your-bad-job-aaaf6d5b4e20 Your story may be featured. See this challenge post by Medium's editor.
- https://medium.com/s/greatescape

What does it feel like to be Steve Job's daughter? Her memoir now available for readers. See this article on Vanity Fair.
- https://www.vanityfair.com/news/2018/08/lisa-brennan-jobs-small-fry-steve-jobs-daughter

Youtube Machine Learning Artificial Intelligence celebrity Sraj wants to start his own School of AI. He wants it to be a "nonprofit". Strange but true. He's now recruiting Deans to head cities.

BIDW may employ more stable, heavy duty and less flexible architecture, schema and data store than startups in the Silicon Valley. Such may be a sacrifice for security, stability which many fortune companies rely on.

Structured Query Language (SQL)

Despite the popularity of many new data stores and technologies such as Hadoop, Spark, Pandas etc, many companies still require Business Analysts to be fluent in sql. Never forget SQL.

Graphical User Interface (GUI)

GUI interface helps business users query and drill data without the help of the development department. The schema and database are still designed and implemented by dev.

Online Analytical Processing (OLAP)

Provides a GUI to query platform for business users to do data explorations with minimum help from dev department.

Analysts and decision makers can quickly and efficiently do data analysis and ad hoc reporting without too much help from a data scientist or database administrator.

The schema, reports, and drilling depth may need to be pre-planned, designed and tested before being released to business users.

This is also a large scale system, suitable for companies such as Macy's, Gap, Walmart which have millions of new sales record per hour.

OLAP is for data exploration by large businesses.

Data Warehousing

Data Warehousing is a serious challenge for large companies with many transactional records, product offerings across many departments.

Many DW providers can also provide integrated data mining, business intelligence services build on top of proprietary DW hardware (including server stack) and software.

Best Practice

Sales teams on-the-road often needs faster, better data information on mobile devices to seal a deal. Don't be surprised if they get mad when numbers are off! They bring home the dough.

Questions that Business Analysts and Decision Makers are Interested In

Who are the best customers? aka Who are the customers with the best Customer Life Value

Causal relationship:

Results of recent experiments (More prevalent in Startup Culture)

Hypothesis if one segmentation is actually different from another

Is the result significant or is it random chance

Please note that causal relationship determination requires controlled studies to control for extraneous variables. In many industries, such as biotech, statistical significance is a must, a prerequisite for next step analysis or more business investments.

SELECT, INSERT, UPDATE with SQL

The Equivalent of HelloWorld of SQL

SELECT *

FROM table_name

Select all columns and rows from a table. In real life practice, we may want to avoid using SELECT * because it may be asking and displaying a lot of unnecessary records utilizing our precious computing resource, especially for large systems, companies with large databases.

A Basic Select Statement

SELECT ProductID, NameFROM ProductWHERE Price > 2.00

A Fancier Select Statement

SELECT * FROM CUSTOMERS WHERE AGE > 25 AND SEX = 'F' AND REGION='CA'

The * means all, specifically all columns and rows in this statement. All columns and all rows will be returned.

An Insert Statement

Useful SQL interview skills

Be able to compose advanced sql queries including aggregation, slicing and dicing.

Advanced SQL Query Select Count and Group By

It's easy to use SQL to display all the data columns and rows. But that's not practical. It's not practical for the business user to get the entire database, nor is it memory efficient.

How to view aggregate data? Use Group By, don't forget to use Count() too, else the result is again not meaningful.

SELECT COUNT(CUSTOMER_ID), STATE

FROM CUSTOMERS

GROUP BY STATE

ORDER BY COUNT(CUSTOMER_ID) DESC;

Group By helps aggregate and filter out data. In this case we are interested in aggregating data by State in the Customers table. What kind of state wide information are we trying to get? We are trying to count the number of customers in each state, as measured by customer_id. In addition, once data is aggregated, order the results in a descending order by count(customer_id) the largest count to the smallest.

Compare a Select all statement which just returns all the data rows
to
Select Count() and Group By statement that aggregates data by country

SQL is great for the following queries:

SQL Segmentation example, analyze by location, select * from sales group by location

Spark and the new way to run SQL queries on structured, distributed data

Firebase real time database and JSON

JSON objects

NoSQL databases like MongoDB

SQL Security

Cross Site Scripting and SQL Injection

If allowed to enter special characters in input boxes and forms on a website, hackers may use code to run SQL queries against your database and get data illegally about your website. Many websites do not allow special characters, such as yelp. Some websites stringify the user input before processing it on the server so special characters are turned into strings so to reduce security risk.

Thursday, August 2, 2018

The goal is for mobile developers to load images onto mobile applications when limited memory is available.

Android drawable images @drawable/my_img can be set as the source of an ImageView. Image file extension is optional. Drawable refers to the fact that the image can be drawn on the screen. Android manage all drawables in a res/drawable directory.
https://developer.android.com/guide/topics/resources/drawable-resource

Drawable supports mainly bitmap format including .jpg, .png, .gif. The unit element for these images is a pixel.

Density independent pixels (DIP) allows ImageView to scale and resize across screen sizes and pixel densities - across the wide variety of Android devices. Specifying button size using dp instead of px make sure the button is still reasonably sized and clickable on high resolution high density screens (high number of dots or pixels per inch).

Best practice to keep file size small is to include different image sizes for handling different dip's. Android does this automatically and load the corresponding dip drawable assets into the right folder: hdpi, mdpi, xhdpi, xxhdpi.

Developers also use ImageMagik to compress photos and Android Drawable Importer to convert images to drawable https://plugins.jetbrains.com/plugin/7658-android-drawable-importer

Bash can improve developer productivity. It is available on Mac via terminals. Developers can use bash to write build scripts, enhance dev productivity, use curl to visit and process websites, interact with file systems, modify files, pipe outputs into files.

SVM can use other functions to make data linearly separable. SVM can give non linear, intricate decision boundaries. SVM Decision Boundary is a straight line for linear SVM. Apply linear SVM. If it has 0% error, your data is linearly separable.

c parameter SVM controlls trade off between smooth decision boundary and classifying training points correctly (may not generalize well, get a smooth boundary or get more points classified correctly). Effects of C especially obvious in the RVF kernel. A large c means get more training points correctly. Larger c --> more intricate boundaries

Gamma Parameter
Gamma defines how far the influence of a single training example reaches. If gamma has a low value each pointer has a far reach, if gamma has a high value each point has a closer reach. A high gamma value will make decision boundaries pay close attention to those points that are close, but ignore those that are far. High value of gamma could mean a very wiggly decision boundary.

A point close to the frontier can really have a lot of weight and pull the frontier close to itself. Versus a low gamma, means more points will have weights of influence on the frontier, so the frontier end up being smoother.