Monday, 24 October 2016

We get a lot of requests to scrape data from Yelp. These requests come in on a daily basis, sometimes several times a day. At the same time we have not seen a good business case for a commercial project with scraping Yelp.

We have decided to release a simple example Yelp robot which anyone can run on Chrome inside your computer, tune to your own requirements and collect some data. With this robot you can save business contact information like address, postal code, telephone numbers, website addresses etc. Robot is placed in our Demo space on Web Robots portal for anyone to use, just sign up, find the robot and use it.

This robot is placed in our Demo space – therefore it is accessible to anyone. Anyone will be able to modify and run it, anyone will be able to download collected data. Robot’s code may be edited by someone else, but you can always restore it from sample code below. Yelp limits number of search results, so do not expect to scrape more results than you would normally see by search.

In case you want to create your own version of such robot, here it’s full code:

// starting URL above must be the first page of search results.// Example: http://www.yelp.com/search?find_desc=Restaurants&find_loc=Arlington,+VA,+USA

Thursday, 13 October 2016

Web scraping is only way to get data from website when website don’t provide API to access it’s data. Web scraping involves following steps to get data:

Make request to web page Parse/Extract data that you want to scrape from website. Store data for final output (excel, csv,mysql database etc).

Web scraping can be implemented in any language like PHP, Java, .Net, Python and any language that allows to make web request to get web page content (HTML text) in to variable. In this article I will show you how to use Simple HTML DOM PHP library to do web scraping using PHP.PHP Simple HTML DOM Parser

Simple HTML DOM is a PHP library to parse data from webpages, in short you can use this library to do web scraping using PHP and even store data to MySQL database. Simple HTML DOM has following features:

The parser library is written in PHP 5+ It requires PHP 5+ to run Parser supports invalid HTML parsing. It allows to select html tags like Jquery way. Supports Xpath and CSS path based web extraction Provides both the way – Object oriented way and procedure way to write code

Scrape All Links

<?phpinclude "simple_html_dom.php";

//create object$html=new simple_html_dom();

//load specific URL$html->load_file("http://www.google.com");

// This will Find all linksforeach($html->find('a') as $element) echo $element->href . '<br>';

?>

Scrape images

<?phpinclude "simple_html_dom.php";

//create object$html=new simple_html_dom();

//load specific url$html->load_file("http://www.google.com");

// This will Find all linksforeach($html->find('img') as $element) echo $element->src . '<br>';

?>

This is just little idea how you can do web scraping using PHP.Keep in mind that Xpath can make your job simple and fast. You can find all methods available in SimpleHTMLDom documentation page.