With our company's tagline being innovate++, we were delighted to work on a project whose main focus is towards innovation. The client had an insight of bringing their entire offline process online with this portal. Let us narrate the biggest challenge which we faced related to this project.

Data Aggregation

The main challenge ahead of us was the amount of data that we needed to aggregate from various sources. The data aggregation was mainly to solve scouts work of navigating to various sources to find data and instead have a dedicated search system where they can find most of the needed results. This wasn't an easy task at all. We had to address the following scenarios

Sites with different markup structure

Sites with frequently changing markup

Sites with data toggled using javascript

Sites with different ways to navigate across the site

Our solution

After a lot of research from our side, we developed a system which scraped data from all major research oriented sites. The major challenge was to make these scrapers fail-safe, which we were able to achieve only with time. We initially built a mechanism to store the failed links during scraping. We gradually built an interactive tool which would scrape all the failed links. Later, we built ways to scrape an individual link as well, which made our scraper friendly as well as robust.