The above popup message is displayed because we recently changed our .NET dependency from 3.5 to 4.5, thereby considerably reducing the installation package size, and more importantly the code signing agency of our digital certificate has been changed from GlobalSign to Comodo. So the above warning may appear till the new WebHarvy installer gets enough reputation from Microsoft which will take a few weeks time. In case you have any questions or require assistance please do not hesitate to contact our support.

True multi-level Category Scraping

WebHarvy now supports automatically navigating category/subcategory lists of a website to extract data from the final listing pages. Know More

Support for multiple input keywords

Any number of input text fields can be populated with lists of strings/keywords during configuration. WebHarvy will automatically apply all combinations of provided keywords during the mining phase. Know More.

Capture window with new options

Run JavaScript on Page

Run specified Java Script code on page – know more. This option can be used to load elements on a page which cannot be done using the default navigation options (link-follow, click) provided by WebHarvy.

Input strings to text input fields

Strings to be input to text fields can now be made a part of the configuration. Know More. Earlier such parameters were automatically taken from the PostData of the configuration. But sometimes, with some websites, the PostData will not contain the input strings submitted and this option helps to correctly load the page displaying data during mining phase.

Extract data from Popups

Know More. Helps to extract data by clicking each listing link/button and get data from a popup window or a view in the same page populated by data. This is different from ‘Follow this link’ option because here the data is loaded on the same page (no page navigation) and different from ‘Click’ option because after clicking each link data has to be extracted from page before clicking the next link.

Option to smoothly scroll page during mining to load all contents (lazy loading)

Smooth scroll to page end to load elements which are loaded (for example lazy loading of images) only when the elements are made visible by scrolling down. Know More.

Select drop-down/list-box options

Select drop-down/list-box/combo-box options during configuration and mining. Again this option allows navigation to result pages when normal configuration is unable to make these selections and load the result page. Know More.

]]>https://sysnucleus-blog.com/2016/06/21/webharvy-4-0-2-125-multi-level-category-multi-list-keyword-scraping/feed/0SysNucleuswebharvyWebHarvy crashes after installing the latest Windows update for Adobe Flashhttps://sysnucleus-blog.com/2016/01/01/webharvy-windows-810-crash-due-to-adobe-flash-security-update/
https://sysnucleus-blog.com/2016/01/01/webharvy-windows-810-crash-due-to-adobe-flash-security-update/#respondFri, 01 Jan 2016 07:07:23 +0000http://sysnucleus-blog.com/?p=427Continue reading →]]>Microsoft released a new security update for Adobe Flash Player for Internet Explorer (IE) a few days back (Dec 29, 2015). This update has caused many software (including Skype – see Skype Crash) to crash. See http://borncity.com/win/2015/12/30/windows-10-flash-update-kb3132372-issues/ for a list of other software titles affected due to this update.

Solution ?

Meanwhile we will try if we can update WebHarvy to overcome this issue. We are also hoping that there will be another security update from Microsoft which solves this problem since many software titles including their own Skype seems to be affected.

In these types of pages the pagination links are provided in sets. For example the first 5 pages will have direct links to load each of them at the bottom of the page. To load pages 6 to 10, an additional link should be clicked. Now each of the pages 6 to 10 will have direct links to load any of them at their page end, and also a link to load the next set of 5 pages.

The latest version of WebHarvy Visual Web Scraper can be downloaded from https://www.webharvy.com/download.html. Try and in case you need any assistance please do not hesitate to contact our support team.

]]>https://sysnucleus-blog.com/2015/09/30/webharvy-2-new-methods-of-handling-pagination/feed/0SysNucleusWebHarvy version 3.4 released !https://sysnucleus-blog.com/2015/06/10/webharvy-version-3-4-released/
https://sysnucleus-blog.com/2015/06/10/webharvy-version-3-4-released/#respondWed, 10 Jun 2015 05:43:21 +0000http://sysnucleus-blog.com/?p=420Continue reading →]]>We’ve just released a new WebHarvy update. The following are the changes in this version.

Major:

Support for pagination where a link/button has to be clicked to load the next set of pages. More Info

]]>https://sysnucleus-blog.com/2015/06/10/webharvy-version-3-4-released/feed/0SysNucleusWeb Scraping from Cloud – WebHarvy on Amazon EC2https://sysnucleus-blog.com/2014/11/17/web-scraping-from-cloud/
https://sysnucleus-blog.com/2014/11/17/web-scraping-from-cloud/#respondMon, 17 Nov 2014 06:42:06 +0000http://sysnucleus-blog.com/?p=403Continue reading →]]>WebHarvy requires Windows operating system to run. So in case you do not have access to a Windows PC or if you do not want to run WebHarvy on your local PC, you have the option to run WebHarvy from Cloud. Amazon Web Services (AWS) Elastic Compute Cloud (EC2) platform makes this possible. See the following link.

Once you connect to the Windows instance via Remote Desktop, you can download and install WebHarvy in it. You will have to make sure that .Net 3.5 is installed in the Windows instance so that WebHarvy can run properly. Please contact us in case you need any assistance.

]]>https://sysnucleus-blog.com/2014/11/17/web-scraping-from-cloud/feed/0SysNucleusScraping hidden details using WebHarvyhttps://sysnucleus-blog.com/2014/07/15/scraping-hidden-details-using-webharvy/
https://sysnucleus-blog.com/2014/07/15/scraping-hidden-details-using-webharvy/#respondTue, 15 Jul 2014 09:20:53 +0000http://sysnucleus-blog.com/?p=387Continue reading →]]>WebHarvy allows you to scrape hidden fields in websites which are displayed only when you click on a link or button. The ‘Click’ option in the Capture window can be used to display such ‘click to display’ fields. The following video shows the process.

The video below shows how contact details from Craigslist listing pages can be extracted using this feature.

WebHarvy also allows you to scrape data from the HTML of the page. For example, the following video shows how geo location (latitude, longitude) can be extracted from yellow page listings (map details) from its HTML – this data is not visible in browser.

]]>https://sysnucleus-blog.com/2014/07/15/scraping-hidden-details-using-webharvy/feed/0SysNucleusScraping images : various methods : WebHarvyhttps://sysnucleus-blog.com/2014/07/15/scraping-images-various-methods-webharvy/
https://sysnucleus-blog.com/2014/07/15/scraping-images-various-methods-webharvy/#respondTue, 15 Jul 2014 08:38:45 +0000http://sysnucleus-blog.com/?p=383Continue reading →]]>WebHarvy lets you scrape images from websites with ease (in addition to text). During configuration, you can directly click on an image to capture it. The resulting Capture window displayed will have a ‘Capture Image’ button, clicking which either the image file can be downloaded or its URL be captured. Know More.

Images can also be downloaded from its URL obtained by applying Regular Expression on its HTML content. This method is shown in the following demonstration video.

During configuration, after clicking on an item, the ‘Capture HTML’ option under ‘More Options’ of Capture window allows the HTML of the item to be captured and displayed in the preview area. After this, Regular Expressions can be applied (More Options > Apply Regular Expression) to select data from a portion of the HTML code displayed.

The following video shows how this feature can be applied to scrape URLs from HTML.