The deathbycaptcha.com service, one of the oldest and most consistent services in the captcha solving market, has recently added new Node JS API instructions and examples to solve ReCaptcha v2 challenges. Click the link to check the API details! more…

Question: “How do I set up a daily automatic scraping of www.pollen.com data into a Google sheet?” (link)

Answer: Originally I doubted if svg HTML elements are scrapable. After some trial and error experience I realized, that svg elements are indeed scrapable; one can get their xPath, children nodes. Yet, they are scrapable by importXML() when being static html. more…

You have an idea for a web-project. You (or your team) have already thought over the concept and the strategy for to becoming successful in the field. Now it’s time to ask the main question – how should this awesome idea be brought to life? The great variety of solutions complicates the decision-making process: classic Java? Modern MEAN? Easy PHP & CMS?

Nowadays, it’s hard to imagine our life without search systems. “If you don’t know something, google it!” – is one of the most popular maxims in our life. But how many people use Google in an optimal way? A lot of developers use google commands to get needed answers as fast as it possible.

Even this is not enough today! Large and small companies need terabytes of data to make their business profitable. It’s necessary to automate the search process and make it reliable to satisfy the user with fresh news, updates or posts. In today’s article we will consider a very helpful tool – Real-Time Crawler (RTC) for the collection of fresh data. Let’s start! more…

Throughout the years of working in the data industry, the Octoparse team had never slowed down its pace in making data more accessible and readily to all people. It’s rooted in our belief that in the era of big data, anyone should be blessed with the capability to collect data so as to harness the power of big data.more…

Recently I was seeking how to open MS Word file on-the-fly for processing by the python-docx library. By trials and errors I could get the code work. I use web2py framework as a wrapper of POST request. more…

Agreed, it’s hard to overestimate the importance of information – “Master of information, master of situation”. Nowadays, we have everything we need to become a “master of situation”. We have all the needed tools like spiders and parsers that can scrape various data from websites. Today we will consider scraping Amazon with a web spider equipped with proxy services. more…

Today we are going to discuss the quite huge and engaging theme – REST API – and make our own web application based on the most popular Java framework – Spring. To start with, we will explain the two main concepts of this article – REST API and Spring. Note, these two concepts are quite complex, and, unfortunately, we can’t fully describe them, but in the article you will find links that will help you cope with moments where you might get stuck. more…

Why should you learn web scraping and who is doing web scraping out there? We are going to address this question by looking into the different industries and jobs that require web scraping skills. To do this, we’ve compiled and analyzed the data extracted from job sites, including Indeed, Glassdoor and LinkedIn. Followings are our findings to share with you.more…

For some of our readers from Russia, it’s a new challenge to get to www.linkedin.com, which has been officially blocked in Russia.

On 4 August 2016, a Moscow court ruled that Linkedin must be blocked in Russia because it stores the user data of Russian citizens outside of the country, in violation of the new data retention law. The law requires all companies doing business in the country to store their users’ data locally.