Hello, I'm working on finding relatedness of words. The problem is the following: given 2 groups of words, I need an output which should be some confidence score that says whether there is or not a relationship between g1 and g2.Anyone knows a way to that? or Where can I start? I was thinking on word2vect.

@cuent I very new to the topic. I think you need to code the relationship first. Let's say you have setup the relationship matrix in a social network analysis. Than you can calculate the co-appearance of the words. Maybe another way is to use existing corpora and than estimate the likelihood of relationship. I believe Google n-grams can do it for you. Confidence intervals are only for your intuition. It is not written in stone. Hope it helps a bit.😬

If I have a big ass dataset that includes a categorical feature and I want to do some maths-type stuff on each of the categories would it be best to count the number of times each categories appears in the one feature or create new, binary features for each category?

People

Amazon and the one other main attribute of a DS: able to make "metric linkages"

I recently attended to a talk from the DS manager of Amazon Berlin. Very interesting thing, he mentioned a couple of key stuffs that no-one has mentioned in the past as a key skill from a data science. He called "metric linkage". He said that a data scientist should be able to create them. But what is that?

Here an example by challenging you:As you know, Amazon is also selling perishable products. In order to satisfy quality standards they have hired a group of QA employees who main role is to verify the quality of those products before being send to buyers.However, the company is pursuing to automate almost everything. One of the targets is to find ways to involve a system that could also automate the process of QA of perishable products.The example was for strawberries.Now:

if you are asked to implement a system to evaluate the quality of the fruits, what would you do?

HINTS:The QA activity is very manual and based on experience. It also involves evaluating not only visual cues but also tactful or organoleptic attributes of the fruit.

What do you think, considering the current advances that could be the best simple attribute to measure that you can follow better? Select only one.

After selecting the main attribute, how do you design a metric that also gives you a proxy to the other main sensorial attributes of quality?

You will be using machine learning.

@koustuvsinha How are you!!!!???

@cuent word2vec is good but as far as I know you cannot apply any algebraic operation between the distances you are going to get.

@cuent I actually haven't worked on trying to get a value representing the distance between two groups of words but I guess that Glove seems to be a better option as it is more statistically based? I can try to investigate that for you.

@koustuvsinha no change since you left us here abandoned man!

:)

@koustuvsinha I contact you tomorrow with LDA?

I did something already, just a bit, not much... But LDA don't think the best choice, as far as I remember... IMO... Tomorrow?

Again this idea of measuring a distance hasn't happened to me yet, but for what I have been discussing with other people I think there are who would refuse to claim that that measuring should be seriously taken. Why? The position of the groups will rely A LOT on your corpus and words are actually NOMINAL variables, so in theory they lack ordering. In theory, you can always re-group them according to the meaning you want to give them.