I typically use other tools to extract the scrape, and then LC to analyze it, but, you can use "put url" to get the data from a URL.
I haven't tried it yet, because I just noticed this, last night, but the source for the browser widget is available in 8 right in the application bundle, so my long-delayed dream of using LC to scrape directly might be closer, once I see what the source is doing...

There are many, many tools. As I mentioned, I have used "put url" in LC, but if I'm doing a big scrape (think hundreds of thousands of records), the one I like the best is a plugin for Chrome called... "Web Scraper", from Martins Balodis. It takes a little fiddling, but once you get it set up, it works great, even when scraping huge sites, and it lets you set the delay between pages so that you don't piss off the operator by burning their pipe down. Martins has both a paid and free version. The paid version works from one of his servers, the free version from your machine. When you're done you end up with a csv file. You can have multiple scrapes going in different tabs at the same time. Note that if you are trying to do a big scrape on a single URL, breaking it into sections can be tricky, but if not, then you're grinning.

I have also paid him to write a custom scrape configuration, in the case where things were more complicated than what I was able to figure out. The price was cheap, I thought, and after I saw it, I gained new insight into how to use the tool, so now I'm generally able to write my own scripts without much difficulty, even for the most complex sites.

By the way, when I mean the source for the browser widget, I mean the LCB source, not C++ or OC, for those who are wondering. The reason to NOT use the "put url" technique is for cases where there is JS framework that has to be executed in the browser, as well. In many of those cases, the data is not retrieved when you retrieve the page source. In those cases, the meat, i.e. the data has to be separately pulled by the browser. In those cases, you can either read through the JS to figure out how to write the code to get what you want, or you can use a browser to get the net result and pull data from that.

Mikey wrote:The reason to NOT use the "put url" technique is for cases where there is JS framework that has to be executed in the browser, as well. In many of those cases, the data is not retrieved when you retrieve the page source. In those cases, the meat, i.e. the data has to be separately pulled by the browser. In those cases, you can either read through the JS to figure out how to write the code to get what you want, or you can use a browser to get the net result and pull data from that.

Given the growing need for JS blockers like NoScript, the prudent business owner should consider content requiring JS to be a bug.

Richard Gaskin
Community volunteer LiveCode Community LiaisonLiveCode development, training, and consulting services: Fourth World Systems:http://FourthWorld.comLiveCode User Group on Facebook :http://FaceBook.com/groups/LiveCodeUsers/