Scrap Dynamic Content

Nakul_Sargur

I am a newbie. I want to know how to extract/scrap dynamic content from web pages. I am able to get the static contents using Web harvest API. Thanks in advance for helping.

Regards,
Nakul Sargur

JMRKER

Why?

Nakul_Sargur

I am doing a trial assignment on web scraping. I m a fresher. Using Web harvest (Java) API i am able to extract static content. But some data are enclosed inside javascript functions and html element. Need some guidance. Thanks in advance for helping.

criterion9

If the javascript is completing the content you'll need to parse through it to find out where the data is actually coming from (i.e. javascript variable, ajax call, etc).

I am able to extract the contents "Hai" and "Hello". But i am unable to extract the contents "yyyyyy" ,"3435534". Because the contents are present within html tag. Currently i am using Web harvesting API for extracting the contents from website. This API gives the result after filtering the html elements. So that i am unable to extract html attributes value.

I am able to extract the contents "Hai" and "Hello". But i am unable to extract the contents "yyyyyy" ,"3435534". Because the contents are present within html tag. Currently i am using Web harvesting API for extracting the contents from website. This API gives the result after filtering the html elements. So that i am unable to extract html attributes value.

What does the viewdetails function look like? Is it using AJAX?

Nakul_Sargur

I think, there is no ajax call. Viewdetails function performs to display the parameter value (data) into new small window in that same page when you click view link in that page. The Viewdetails functions gets parameter value when the site loaded initially. I want to scrape the parameter value from site.

criterion9

Nakul Sargur;1215907 wrote:

I think, there is no ajax call. Viewdetails function performs to display the parameter value (data) into new small window in that same page when you click view link in that page. The Viewdetails functions gets parameter value when the site loaded initially. I want to scrape the parameter value from site.

Instead of guessing what might happen can you just post the function? I can guess at an answer or blindly suggest all sorts of things that may not help at all.

I am able to extract the contents "Hai" and "Hello". But i am unable to extract the contents "yyyyyy" ,"3435534". Because the contents are present within html tag. Currently i am using Web harvesting API for extracting the contents from website. This API gives the result after filtering the html elements. So that i am unable to extract html attributes value.

If you see the actual values in page source and details are always same in number (as it seems to me), then why don't you just try regex?

mukeshpatel

Hi,

This thread is too old but i am posting my answer for new readers.

If any one wants to extract data from web then they can use web data extraction tools which available (Free/Paid) on the internet.

Yes, this tool extracts data in html forms (Not sure about dynamic). I will give you one example.

If you have online market store and if you want to compare your product price with any other online store then you can use this kind of tool. You just need to run this tool and add URL which you want to add then it will give you whole business data in proper structure.

So these tools are very useful for your business intelligence solution.
If readers of this thread have any query feel free to ask.