Discussion on PHP Web Grabber

WiseLoop supports this item

111 comments found.

I have some problems with proxy. I tried about 30 different HTTP proxies and none of them are worked. Please, give an example with working proxy.

I tried it in my project and in your demo. In the first case it doesn’t make any effect, the IP is the same. In the second case, the page starts to load, but never stops (actually it stops, after 5-10 mins of waiting and throw this err http://snag.gy/0jSNy.jpg)

Hello and thank you for buying,
A cache file is derived considering the grabbed url and the grabbing parameters submitted. The same file is reused if the same url and parameters are used when grabbing. Of course, it is very possible that somebody will never use the same url and parameters and the cache file will just stay there. In order to control this, the developer can decide based on own judgement to use the wlWgUtils::clearCache() method that will clear the entire cache directory. This method will clear entire cache but if a specific logic to handle cache is needed, the developer can use wlWgUtils::getCacheDir() in combination with his own algorithm to implement the cache handling.Cheers!

First of all, I want to say a BIG THANK YOU for adding the post curl class. I made myself learn more about PHP tonight (took one energy drink and 5 hours) so I could finally use this software. I’ve always known it was super powerful but I had to learn about multi-dimensional arrays and PHP classes. Your updated documentation, demos and FAQ helped tremendously.

I normally build scrapers with python or curl but this script allowed me to avoid python completely and make my current curl grabbers extremely more powerful.

As far as inserting into databases. I almost got it! I’m currently grabbing from ‘

’ fields and I am at this point:
$databasefield1 = $result[0][0];
$databasefield2 = $result[0][1];
...
$databasefield5 = $result[0][4];
// one database row ends here.
I need to keep grabbing in sets of 5 until I reach the end of the array. Only problem is that I never know how big the array will be!

Hello and thank you for buying and for your very kind feedback,As far as I understood your question, the answer is pretty simple: count() function. Please have a look here: http://php.net/manual/en/function.count.phpHope it helps.
Cheers!

Hello and thank you for getting in touch,Yes you can, the grabbing engine is “transparent” enough and give you acces to the grabbed information so that you can do whatever you want with it: dysplay (by default) or even store it into a database. You must realize that the database saving mechanism is not included in within our product as the grabber deals only with grabbing; that small task shoul be carried on by the developer using the grabber engine.Cheers!

Hello and thank you for getting in touch,This is because probably yahoo changed the way its search result are displayed. The yahoo search is there just for demo purposes and it is not the actual product. Thank you for letting us know, we will fix that soon.Cheers!

Hello there i like your crawler i think its maybe something i am looking for. But it would be really helpful if you can answer the following questions.

1. Lets say there are 2 sites. Site A is not mine but site B is.

2. Can this crawler harvest the images + description data from “Site A” lets say they have buy and sell ads. Can that data be harvested from Site A and bring this back to Site B and insert it.

3. Can it take urls of the Site A ads that it harvested so it can check those ads later and if any of the ad is removed on Site A it also removes the ad from Site B which is mine to keep the data fresh?

I also have a developer (who is currently working on other bits of my work) can he further built this bot by adding removing features from it to suit our needs ?

Hello and thank you for getting in touch.2. No. I grabs only html contens and no media files.3. Yes, but you have to do some additional work.Regarding modifying the code of the component you should check the that with envato as they are responsible for the licensing. Cheers!