Saturday, August 14, 2010

Retrieving similar or related facts in the knowledge base

Searching for information inside a database is a very common task in computer science. Very efficient algorithms have been developed for a fast searching such as based on tree structures, on hash tables, binary method. Many softwares are designed to manage large database very efficiently (dBase, FoxPro, Access, MySql...), of course, they use fast searching algorithms mentioned above.

Unfortunatly and very logically [1] fast search algorithms are based on a strict matching, so they are not applicable to find the closest match. Approximate search methods require to compare each elements in the database with the searched element and to retain the element with the smallest difference. The comparison is performed by a function that measures the similarity or the difference between the elements compared [2]. This process is expensive [3]. The current processors have one or hundred of cores [4], so the comparison is repeated as many time that there are elements to compare in database divided by the number of cores.

To speed the fuzzy matching, the simple idea could be a giant memory-coupled processors array (MCPA) strutured like a neural network. Each element of the database is stored in one cell of the MCPA. The element to search is distributed over all the MCPA. The processor in each cell compares its element stored with the element to seach and returns a value indicating the difference. The element can be a string (corresponding to one word or one sequence of words or one sentence), or a matrix of dots (small portion of an image), or a limited time series (small portion of sound). The MCPA will contain the whole text, the whole image, the whole speech or song. The advantage is just a few cycles will be required to search the element and to find its closest match.

An example:

If we are able to design a MCPA of 1 million cells of 64 bits and with a clock at 500 Mhz, then it will be large enough to store the whole bible ( about 780,000 words ) where each cell will contain one word represented by a 64 bits value [5]. It will be possible to found any misspelled word in just a few nanoseconds, I will say 10 ns.

Unfortunately, there are several problems, the first one is how to load quickly 1 million cells? The memory bandwidth is the main bottle neck; if the databus has 64 lines, it will take 1 million cycles to load the whole chip. 1 million cycles at 500 MHz represent 2 ms, so the loading 200,000 times longer than the searching! If we use the both edges of the clock signal and use a databus of 128 lines, it will take 50,000 more times to load the data than to compare them. The library of Congress has 29,000,000 volumes. How long time that will take to load these books? Perhaps, something between 4 to 8 hours! The solution is to use thousands of MCPA chips and to fill up each chip with several books.

The second problem is the complexity of the chip in term of number of transistors. If each cell is a tiny 64 bits risc processor requiring 70,000 transistors (estimated) plus a 10,000 transistors for the tiny EEPROM and SRAM memories, the whole chip will contain at least 80 billions of transistors. It is much more than the current processors (Itanium 9300 contains 2 billion transistors). 80 billions of transistors in a chip is considerable but should be possible in a close future, probably around 2020.

The third problem is the price. This kind of chip will cost thousands of dollars the first years. Is it acceptable to spend so much just for a fast fuzzy pattern or string matching? In fact, there is probably no other possibility; with the current silicon technology, it's difficult to increase the speed of the processor fare above 3 GHz and so we have only one solution: create a giant array of processors in one chip. Because the transfert speed of data between the memory and a large multicores processor is a limitation, the solution is to integrate the memory and the processor in a cell memory-coupled processor.

The complexity of each cell memory-coupled processor is due to the separation of the storage of the information (the memory) and the processing of the information (the cpu). This structure requires many lines of communication between the cpu and the memory and some extra logic to manage the flux. The ideal solution will be to intregrate the memory and the logic in a single component. The crossbar latch [6] constitued of memristors [7] seems an interesting solution to combine the storage and the processing of the information.

[1]: Fast searching algorithm tries to minimize the number of data compared and ideally to none; just find the exact match in the first shoot, it's what the perfect hash table is expected to allow. In fact, it's exactly the opposite of a fuzzy search algorithm.
[2]: Edit distance of 2 strings such as Levenshtein distance, Hausdorff distance (smallest euclidian distance between points).
[3]: expensive in number of cycles, in energy and time. The algorithms that compute the distances require often O(m.n) cycles.
[4]: Tile Gx Processor ( TILEPro64 ) contains up to 100 identical 64 bits cores connected in a square matrix. The graphic processor GeForce GT200 can run 30,720 threads. The Fermi architecture http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf is fare more flexible than the G80 / GT200 architecture. An application for the classifier TexLexAn will discussed in a next post.
[5]: 64 bits (~ 1.84x10^19) is large enough to prevent the risk of collision (2 different words that a hash function will convert will the same value).
[6]:http://www.hpl.hp.com/news/2005/jan-mar/crossbar.html
[7]:http://www.hp.com/hpinfo/newsroom/press/2010/100408xa.html