Implementing Technology Search on RemoteBase

Since the launch, many users have been requesting a freely typed technology filters on RemoteBase. I have recently built it.

RemoteBase used to have a fixed number of technology filters for
remote companies. But everyone uses all kinds of different technologies. To get a customized
list of companies that best fit our skills and tastes, we need more flexible filters.

With this notion in mind, I recently implemented a freely typed
technology search on RemoteBase. Let me share how I implemented the feature, and how it works under
the hood.

The original version

In the beginning, there were not much data to filter. So the filters were limited in number.
The screenshot below shows an early version with only 8 choices for technology filters.

An original version of RemoteBase

At this time, having a limited number of filters made sense, because adding new filters for all available data
could actually diminish the usefulness of the whole filtering functionality.

For instance, if there are only one or two companies in the database using .NET, adding .NET filter would not have
made filters more useful, because when used together with other filters, they might make selection too narrow
and would not return any result to the user.

But an increasing number of companies were being listed on RemoteBase, and data started to
mature. Now it made sense to expand the filters.

First iteration

The first iteration

I came up with the above implementation after working for a day or two. The input had an autocompletion built-in,
so that the suggestions would update as the user typed the query.

For hours, I tried to reinvent the wheel by creating an autocomplete component from scratch.
But it turned out there were too many edge cases to account for. So I ended up forking a React autocompletion
component from GitHub and customizing it.

There were a couple of problems I had to tackle:

Normalizing the data

There had not been a relationship between companies and technologies in the database. Instead, all technologies
for companies were hard coded as an array of strings. It is not efficient to implement
autocompletion with this setup, because there is no static source of truth for all suggestions.

So I wrote a script to loop through all companies and establish a relationship between technologies and companies.

Eliminating duplicate data

Since the technologies were denormalized at first, there were duplication when I
normalized them. For instance, when a user typed nod, four different results would come up:
node.js, Node.js, NodeJS,and nodejs. But they are all referring to the same technology.

So I wrote a script to loop through all companies and their technologies, find similar ones,
and replace duplicates with similar ones, if any. While at it, I also applied the
same code to collaboration_methods, and communication_methods, and migrated those fields too.

This script did not filter possible duplicates because it simply relied on regex to find similar items.
So I skimmed through the technologies, manually identified possible duplicates, and wrote another script to replace them.

Now the technologies, communication_methods, and collaboration_methods were mostly duplicate free, and
the free search was finally useful.

I loved using Python when writing these migration scripts. I found Node.js kind of hard to reason about with
this kind of task. I also tried to write Ruby, but working with MongoDB was kind of awkward in Ruby syntax.

Async fetching

In this first iteration, the autocomplete feature added some time to the initial page load because 100+ sources
were loaded along with the app. When a user typed a query, the autocompletion happened on the client side
using the preloaded sources.

It would have been better if there was an API endpoint that responded with suggestions, so that the app
did not have to load all the sources in the inital rendering.

Second iteration

Dealing with the challenges above, I shipped the second version of technology search.

The second iteration

This time, I made an API endpoint that responded with matching suggestions based on the user input. All
the suggestions are asynchronously loaded as user types a query. I did not measure how much loading time was
saved due to this improvement, but it kind of feels cleaner.

At the moment, the autosuggestion is based on regex matching. Here is the actual code I wrote:

Also, the autosuggestion input is using react-autosuggest by
@moroshko. It was very extensible out of the box. I think this library is an example of
level of generalization/abstraction that open source libraries should live up to.

What’s next?

At the moment, users can only select a single technology to filter companies by. But would it be useful
if we can select multiple? I personally think that such a feature is an overkill for now. But it may be useful in the future.

I could also improve the autosuggestion algorithm. Currently, the technology filter is not truly an autosuggestion,
because it uses a simple regex matching. For instnace, when a user types ‘node.js’, RemoteBase probably should
also suggest JavaScript. Maybe I should calculate the ‘relatedness’ of keywords and return the highest matches.

All in all, it feels good to ship a fun, and much-needed feature. I hope free technology search will make RemoteBase
more useful to all developers out there.