Miscellaneous

Octoparse is a new modern visual web data extraction software. It provides users a point-&-click UI to develop extraction patterns, so that scrapers can apply these patterns to structured websites. Both experienced and inexperienced users find it easy to use Octoparse to bulk extract information from websites – for most of scraping tasks no coding needed!more…

Recently I’ve encountered a challenge to make a series of HTTP POST requests with different parameters. This has forced me to look for existing tooling in the marketplace; the features I am looking for are getting POST request, editing the request and resending it. What I’ve found useful for this is the FireFox browser + dev tools – really quick and usable for this purpose. All other methods are either not full stack (only resend without edit) or require much soft to plug in. more…

Scrapinghub is the developer-focused web scraping platform. It provides web scraping tools and services to extract structured information from online sources. The Scrapinghub platform also offers several useful services to collect organized data from the internet. Scrapinghub has four major tools – Scrapy Cloud, Portia, Crawlera, and Splash. We’ve decided to try the service. In this post we’ll review its main functionality and also share our experience with Scrapinghub.more…

Recently I’ve received a request on how to sum the total hours of a Youtube videos in a search result. I’ve made the simple JS iterator that fetches hours/min/sec from browser html info and sums them up.
See the code below: more…

Data Scraping Studio (DSS) is a new free, multi-threading studio for effective data extraction. It consists of two parts: (1) the Google Chrome extension with point-&-click interface to setup a web scraping agent and (2) the Desktop app for executing scraping agents. more…

We have already written some posts on CloudScrape, a Copenhagen, Denmark-based web scraping service startup. The service now has a new look and new features for data extraction and business intelligence – with the launch of new name: Dexi.io.

Pipes for aggregation and post-processing

Dexi.io has relaunched and rebranded from its early-stage name CloudScrape. The company has also released a new product, Pipes. Pipes adds intelligent data transformation to complement the point-&-click data extraction service. In a nutshell, Pipes is a data integration and post-processing engine inside of Dexi.io, that is able to aggregate, sanitize extracted data and a lot more. We’ll share more on it in the following posts.

Driving Innovation is the Key to Success

Stefan Avivson, CEO of Dexi.io explains: “Although Robotic Process Automation (RPA) is not a new concept, service providers have been using so called Robotic Processing for a decade now, but the amount of available data and the technology to process it has evolved tremendously over the past two years. There is basically no real limitation to the use of Big Data and there is no real effort in convincing people that RPA is the future. It’s more a question of knowing how! Utilizing the resources of our innovation team for our clients and partners has without doubt been one of the key drivers to our success!”

Let’s see more in the future of this cutting-edge cloud scrape service.

Is it possible to scrape an HTML page with JavaScript from inside of a web browser?

To be perfectly honest I wasn’t sure so I decided to try it out.

Full disclaimer here, I didn’t actually succeed. However, it was a great learning experience for me and I think you guys could benefit from seeing what I did and where I went wrong. Who knows, maybe you can take what I’ve done and figure it out for yourself!more…

I wanna provide you with a nice utility for quick summing of multiple DOM element values. Why? Well, suppose you’ve at a page like this and you want to sum up the total number of hotels in all the countries. more…

As web scraping is becoming easier to use, more and more people are able to leverage the world’s web resources. As this trend grows, structured data from the web empower businesses and enable a wave of new business ideas to become a reality. Now there is a new technology on the market called: “self-contained agents” that might just make this a tsunami! more…