I have a question about Spatial pooler for word.
I want to get SDR of words, but size input of word is different.
ex: from EMINST dataset, it has a lot of sample letter A,B,C… with size is 28x28 for each sample
So when I put them together to make a word.
it means that each word has each size. So when go to spatial pooler. need same size in input.
How can i get SDR of word from spatial pooler?
Hope your help.
Thank you

The closed compare would with a artificial neural network would be that the SDR matrix
would be that the output side of deep neural network. But SDR do not use back propagation to
for setting output bits high.

Deep learning programmer randomly select a out put pixel for a given letter and chase it all the way to the front. Knock up weight for under voting. And knocking down weight voting all
the time.

Most of the work here is for temporal audio detection, and like data structures.
The closets machine leaning NN algorithm is the audio detecting LSTM neural
network.

Machine Learning community is still in the stone ages when it comes to unsupervised learning.
K means is where they are at.

A real SDR has sub feature activation, such as edges, fingers, and hands, and all the way up
to bigger pattern and temporal patterns that have a complete loops.

Deep networks skip all the sub feature during training. The whole NN is trained for a given
letter. From front to back and then back to front.
They are not trained for sub features fist. Such as lines, corners, and curves of various types,.
And then, They do not train for sub features to be use to train for bigger features.
I like both ways each catch what the other was missed. In a unsupervised manner.
Hinton said it was to much of a fuss.
But then look at capsule network?

Since deep neural networks are trained all at once there is no guarantee that the sub
features will localize in the fist layers. A self learned eye detector could be spread out on
all layer. Trouble shooting night mare.

Hi.
in the Spatial pooler paper, Numenta used MNIST to make SDR. and then recoginize digits, So i thought that I can use EMIST to make SDRs.
I think an example if we have a list of number, and try to recognize this list, so i might think that this is also belong to temporal.
if I am wrong, can you correct it?

I like a pin piont focus that scans over, or chase and edges. This will cause a data
stream that will depend on how fast the edge is scanned and the decision of
what edge to scan when edge line branches into two or more edge lines. Could
cause a memory over load some were.

Sounds like @Rodi’s problem was related to recognizing that there is a C, A, and T (versus how to combine them). This is because every word would have a different physical size (for example an image with the word “CAT” has a different width than one for the word “TACO”).

One naive way you could potentially solve this is to give each possible letter position a dedicated number of minicolumns. Each of these “slots” would need to be trained to recognize each possible letter. The benefit is that every word would get a resulting SDR from the SP process.

There are a couple drawbacks. One is that it assumes the letters are all roughly the same width. This of course limits the number of fonts that could be learned. Another problem is that it wouldn’t be the most efficient solution. Some of the minicolumns would be uses much more frequently than others. If you needed to support every word in the English language, you’d need 45 slots. The 45’th slot would only be used to recognize the letter “s” in a single word (pneumonoultramicroscopicsilicovolcanoconiosis).

I would solve this problem a bit differently. I think implementing a simplified form of saccades would be a better approach. @sebjwallace suggested an approach that could be adapted to work here (see this post). The lower-resolution views could be used to determine where the breaks between letters are located, then higher-resolution views could be cropped before sending to the spatial pooler. Then you’d need to pool all the letters together into one representation. One possible way to do that might be the variation of the temporal memory algorithm suggested for pooling variations of a face, described in this paper.

Hi.
As I think, if I see C, it means in my brain there is of SDR (take from SP).
and if I see CAT, it means my eyes collect SDR from C, A and T. and then connect together.
ie, to recognize CAN_ , CANO, CANDY, I think the space is important, but it is invisible physical vision, however, i think brain still encode this space into SDR. so simply I can put together. I just think that so i dont know it is right or wrong. please correct me if i am wrong.
Thanks so much.

Hi.
If I dont think about meaning of word, i just think about chunking letters into word (spatial), so can i put sdr of letter together, and then go though second SP to get SDR of word.
I mean in this case I use 2 SP stack to get SDR of word.
how do you think?

Will your word consist of multiple images (one per letter) or one image with all letters?

Also, would you want the SDR for “CAT” to be the same as the one for “TAC”? If not, your setup will need a temporal memory step (or alternately an SMI “input layer” with an allocentric location signal for each letter).

Hi.
First, I get SDR from each letter (one image) by SP. then I put SDR together follow location to make a word. and then I do SP again to get SDR of word. So i think SDR of CAT is diffrent with SDR of TAC.
Is it make sense?

I’m not sure I follow what you mean by this step. Do you take the minicolumns from SP of letter and choose cells in those minicolumns using location SDR, then send a union of these resulting letter/location SDR’s to a second SP? Seems like that would work.

Rodi:

So i think SDR of CAT is diffrent with SDR of TAC

Yes it would be if you did something like above, using the letter location.

Yes.
example, to make SDR of CAT. I have data is EMIST like MNIST, it just add more letter.
size of each letter is 28x28.
step 1:
I make SDR of each letter by first SP. with input size is 28x28 and output is SDR size 16x16.
step 2:
I put SDR of C,A,T together , I mention location here. size here is 3x16x16
step 3:
from input take from step 2, I go to second SP to get SDR of word. size of SDR of word is 16x16 too.
I see in step 3, the active minicolumn of ouput and input is 2%. no different.
from this, I can get SDR of CAT is diffent with SDR of TAC. because i mention location here.
Can you correct it?

By location SDR, I meant a matrix of cells which encode letter position in a word. This could be a collection of grid cell modules, or for this case just the output of a simple scalar encoder. It would provide distal input to the layer which has minicolumn activations from SP of individual letters.

Oh I see that makes total sense, context of the letter within the word. It seems that would really help with doing stuff like next-letter or word prediction.

It seems that this approach of getting word SDR’s by concatenating letter SDR’s will yield SDR’s who’s semantic overlap it totally based on shared letter sub-seqences and not meaning in the language. For example, the words ‘contentious’ and ‘pretentious’ overlap in letter sequences but not really in spoken meaning.