I'm thinking of trying Beautiful Soup, a Python package for HTML scraping. Are there any other HTML scraping packages I should be looking at? Python is not a requirement, I'm actually interested in ...

In my project I need the Google cache age to be added as important information. I tried to search sources for the Google cache age, that is, the number of days since Google last re-indexed the page ...

I'm working on an app which scrapes data from a website and I was wondering how I should go about getting the data. Specifically I need data contained in a number of div tags which use a specific CSS ...

What is the current state of libraries for scraping websites with Haskell?
I'm trying to make myself do more of my quick oneoff tasks in Haskell, in order to help increase my comfort level with the ...

I'm planning a webservice for my own use internally that takes one argument, a URL, and returns html representing the resolved DOM from that URL. By resolved I mean that the webservice will firstly ...

I'm looking for an example of requesting a webpage, waiting for the JavaScript to render (JavaScript modifies the DOM), and then grabbing the HTML of the page.
This should be a simple example with an ...

I'm not able to find any good web scraping Java based API. The site which I need to scrape does not provide any API as well; I want to iterate over all web pages using some pageID and extract the HTML ...

Just wondering if anyone knows of a web-scraping library that takes advantage of Scala's succinct syntax. So far, I've found Chafe, but this seems poorly-documented and maintained. I'm wondering if ...

I have a list of authors.
I wish to automatically retrieve/calculate the (ideally yearly) citation index (h-index, m-quotient,g-index, HCP indicator or ...) for each author.
Author Year Index
first ...

I'm currently trying to scrape Google Keyword Tools with CasperJS and PhantomJS (both excellent tools, thanks n1k0 and Ariya), but I can't get it to work.
Here is my current process:
Log in with my ...

I am in the process of writing a collection of freely-downloadable R scripts for http://asdfree.com/ to help people analyze the complex sample survey data hosted by the UK data service. In addition to ...

I would like to crawl a popular site (say Quora) that doesn't have an API and get some specific information and dump it into a file - say either a csv, .txt, or .html formatted nicely :)
E.g. return ...

I am trying to scrape links from a page that generates content dynamically as the user scroll down to the bottom (infinite scrolling). I have tried doing different things with Phantomjs but not able ...

I know the topic of web scraping has been discussed before (example), and I understand it's a bit of a grey area depending on a lot of factors (e.g. website's terms of use).
What I'd like to ask is: ...

I am trying to scrape PDF tables which span accross multiple pages. I tried many things but the best seems to be pdftotext -layout as advised here. The problem is that the resultant text file is not ...

What I need to do is browse to a webpage, login, then browse to another webpage on that site that requires you to be logged in, so it needs to save cookies. After that, I need to click an element on ...

Hi I want to create a desktop app (c# prob) that scrapes or manipulates a form on a 3rd party web page. Basically I enter my data in the form in the desktop app, it goes away to the 3rd party website ...

I am (was) a Python developer who is building a GUI web scraping application. Recently I've decided to migrate to .NET framework and write the same application in C# (this decision wasn't mine).
In ...

I am looking for a good paid/free web scraping library with .NET support which has decent support for JavaScript processing and offers very good performance.
It should have its own browser engine and ...

I've been using Scrapy web-scraping framework pretty extensively, but, recently I've discovered that there is another framework/system called pyspider, which, according to it's github page, is fresh, ...

I know there are certain web pages PhantomJS/CasperJS can't open, and I was wondering if this one was one of them: https://maizepages.umich.edu. CasperJS gives an error: PhantomJS failed to open page ...

I want to use R to scrape this page: (http://www.fifa.com/worldcup/archive/germany2006/results/matches/match=97410001/report.html ) and others, to get the goal scorers and times.
So far, this is what ...