Introduction to Web scraping with Java

Web scraping or crawling is the fact of fetching data from a third party website by downloading and parsing the HTML code to extract the data you want.

Since every website does not offer a clean API, or an API at all, web scraping can be the only solution when it comes to extracting website information.
Lots of companies use it to obtain knowledge concerning competitor prices, news aggregation, mass email collect…

Almost everything can be extracted from HTML, the only information that are “difficult” to extract are inside images or other media.

In this post, we are going to see basic techniques in order to fetch and parse data in Java.