See the code above. if intent.resolveActivity(getPackageManager()) == null is the exception handling.

Android Intent Exception Handling

Run an intent only if it is found in the system and can be resolved.

Android Intent is useful when you want to launch existing Android functionalities such as Camera, Send an Email, and more.

Special formatted data can be written in (Uniform Resource Identifier) URI - structure data in a specific way so that it can be processed by the receiving end. For example tel:415-xxx-xxxx where x is some number between 0-9 is considered a phone number on iPhones. When user tap the URI, a phone call will be made. Similarly on android, tapping an URI linked email address will open up the Gmail app.

Commonly Used Android Methods and Attributes

findViewById returns a View

TextView.setText: change text of a TextView

ImageView.setImageResource: change image url source

onClick attribute: an XML field and an event listener on the android view

Common Android Errors

Android incompatible type error: usually fix by casting e.g. trying to store an int returning function result into a string variable

Android Developer Tools

You can also click on the view source link by each class name to access the source code in git.
https://android.googlesource.com/platform/frameworks/base/+/refs/heads/master/core/java/android/widget/TextView.java
https://android.googlesource.com/platform/frameworks/base/+/refs/heads/master/core/java/android/widget/ImageView.java

TFIDF models how important keywords are within a document and also in the context of a collection of documents and texts known as a corpus. TFIDF is the key algorithm used in information retrieval. The importance factor is proportional to the frequency of the keyword appearance in the document, can be normalized by the length of the document, and then the inverse part: it is offset by how frequently the word appears in other documents in the corpus. This way naturally more frequently appearing words can be discounted such as economics, economy, etc in the economist magazine collection.

Note this may be frequently appearing words that are not stop words like the, a, and, however. Best practice of preprocessing text data may already include removing stop words and changing texts into lower case using .lower().

A bit of a tangent: some times, lower case removes meaning of nuances of words such as Anthropology. Capitalized Anthropology could mean the subject at college, the brand, but lower case anthropology could just mean: human societies, cultures and development. Even if we use stemming, we may never know that the author of the social media post actually is referring to a proper noun and a brand Anthropology, or Fossil.

tf-idf is a popular term weighting scheme. Think Google Search, SEO, ranking of search results, NYTimes article text summarization. One can definitely develop fancier algorithms on top of this elegant and powerful concept.

Term frequency (TF): It's intuitive. The more often a word appears in a document, the more likely that's a part of its main topic. Caveat 1: keyword spamming. Caveat 2: what if document_1 is much longer than document_2? You can normalize the term frequency by document length.

Inverse document frequency (IDF): Stop words like the, and, a appear very frequently in English texts, so regardless of whether they are useful in determining the actual meaning of the document, they will score high in Term Frequency. Remember our Economist Magazine? The word "Economist" may appear in the margin of every page spread. It's not helpful to help us distinguish article_1 and article_2. We may have to discount it.

How to calculate TF-IDF by hand?

See this wikipedia screenshot https://en.wikipedia.org/wiki/Tf%E2%80%93idf

Note the very interesting case where "the" appears in every document so inverse document frequency = log (number of docs in the corpus divide by number of docs containing the word "the") = log(2/2) = log(1) which is 0! So this stop word does not matter at all in our text analysis task.

Natural Language Processing (NLP) in general and with sklearn:

Tokenization: breaking sentences into words, and often take the count of the words. sklearn.CountVectorizer()

Here's a nice tutorial series on how to tokenize, stem, remove stop words using nltk library, a popular python natural language processing library.
https://www2.cs.duke.edu/courses/spring14/compsci290/assignments/lab02.html

It also shows how to marry tokenization and stemming with sklearn tf-idf frequency inverse document frequency TfidfVectorizer

If you read about Image Processing and machine learning using Convolutional Neural Network, you probably have heard of Max Pooling. This blog has amazing visualizations, see link below. This is their visualization for Max Pooling. Really helpful and easy to understand.

Can try decode features. It is important to guess the feature types to be on the right track of feature processing and also selecting the right model. Anonymous data processing is a true, advanced way to test data modeling skills of any data scientists. Intuition, domain knowledge may not apply well in this situation. It's a true fun challenge.

A great explanation from youtube : explanation of Heap and Priorities Queues for beginners, coders, learning to code, bootcamp graduates. Semi ordered. Parents, direct children and grandchildren follow this order.

Sunday, March 18, 2018

Did you know that Elon Musk is involved with OpenAI? In fact, OpenAI is managed by Silicon Valley titans such as Elon Musk, Sam Altman and Jessica Livingston of Y Combinator, Reid Hoffman founder of LinkedIn, Peter Thiel and Greg Brockman previously CTO of Stripe! OpenAI claims to do AI research and publish at key AI industry conferences. It also boasts sponsors such as infosys, microsoft, aws, asana, atlassin, buffer, cloudflare, github, cisco, nvidia, ... many of which are YCombinator prodigies. Here's its press page explain what are its long term and short term goals. https://openai.com/press/

entropy: how spread out is the distribution. many bits to transmit info. randomly uniform distribution (see our 1 minute post on randomly uniform distribution) have a lot of entropy, uniform distribution very little. e.g. if i tell you the price starts with 0 increase by 5% every day. that's very few summary stats telling a big distribution. Versus, i can't describe random.