GSoC'17: WEEK -2 Updates [7th June — 14th June]

@

Two weeks into the coding phase, I had the task of improving the search algorithm being used for OpenMRS AddOns.

How did I go about it?

My first step was to create a draft of the required changes.

I then went about reading all about elasticsearch cause just a little knowledge about elasticsearch wasn’t going to help me create an optimised solution. Hence, I took a few days off to read all about elasticsearch’s working and focused especially on the functions I was concerneed with and understood it’s working. This went a long way in helping me out. In the meantime I had also requested for some community feedback regarding the possible improvements for this feature.

The required changes were :

The following points describe the circumstance and the action taken under that

scenario:

If Query matches uid exactly then this search result is given the highest weight!

Query which matches title of module perfectly is given highest weight

Query which matches tag exactly given an equally high weight

Ex: Query=”Form-Entry” and tag =”Form-Entry” then that module gets the top rank.

Query which is sub string of the title is also given a medium weight

* Ex: ref sub string of reference application ( Current algorithm is not implemented like this and hence it gets pushed down

Query matching title using fuzziness=1(allows one spelling mistake) given low weight

Query which matches description as sub string given low weight

Reason is that many modules might contain the query as part of their description but only one will have it in it’s name and that module is given highest weightage. Example: “Reference Application” term is in the description of most ref app modules but it actually matches exactly with Reference application module. Moreover, when query=”ref”, modules with “ref “ in their title should rank higher than the ones with the term in their description

Query matching description using fuzziness given very low weight

Apart from the above :

Modules which are deprecated or inactive given least weight

So, we have been working on this for a while now. The main issue was ensuring that all the features work well together.

For example: If we give high rank to a name match then sometimes certain tag matches would be mistaken for a name match and hence the module with those tags would not show up first.

The solution? Trial and error! This and a complete understanding of what each function does is the does only solution. The same is also mentioned in an elasticsearch documentation!

So after a few trials , we have come up with a proper algorithm which does the job.