Saturday, October 21, 2006

Those of you who've been reading my blog for some time know that one of the techniques I utilize quite often in completing projects is directly querying web pages using the "MSXML.ServerXMLHTTP" object.

This works great for many applications. By simply returning the source HTML code for the web page we've requested...

SourceHTMLText = objHTTP.responseText

...and then using regular expressions to identify specify strings to parse out, you can return specific pieces of text or numeric values from the page.

Say you want to create a spreadsheet containing all of the pertinent data for a specific stock, say Google, from Yahoo Finance. You can simply set the URL you wish to parse to "http://finance.yahoo.com/q?s=goog". Then return the source HTML and parse out the current price, daily change, volume, high, low, etc., returning each of these values to specific cells in the spreadsheet.

Extending this, say you have an entire invenstment portfolio in a spreadsheet and you want to run a script that runs each hour (or day or week or every ten minuntes) and updates each stocks' data. You can simply run multiple queries, making the URL you request a dynamic variable, changing out the ticker symbol in the URL's query string.

But what if you encounter a website that doesn't use URLs with query strings? What if it instead does its database queries "in the background" and then returns the results on a more generic URL?

Example: A website, say http://www.some-fake-gas-price-info-website.com, has a search function that allows visitors to enter a zip code and returns all gas stations in that zip code and their respective fuel prices. Instead of using URLs with query strings that we could access directly -- say, http://www.s-f-g-p-i-w.com?zipcode=12345 -- it returns results on a URL like this: http://www.s-f-g-p-i-w.com/zipcodesearchresults.asp. What do we do? How did we get here?

Take a look at the source HTML for the page on which the search form is located, and look for the code relating specifically to the seach form (where you type in the zip code and click the "search" or "submit" button), which could look something like this:

(Note that I aded extra spaces after the ""s above to avoid errors when posting this article on this Blogger blog.)

The file specified in the "form action" tag -- "zipcodesearch.asp" -- is the specific page of this website that your zip code is "posted" to in order to generate the http://www.s-f-g-p-i-w.com/zipcodesearchresults.asp page. So what can we do with it?

We can use VBA to post a zip code value to this URL, which will generate the results page, then parse it. It might look something like this:

Now you can simply return the source HTML for the resulting page, and parse out the strings containing the data you're after, again using regular expressions.

Easy right?

There are countless ways this sort of technique can be used. Compiling stock price data, weather data, product pricing data, even replicating a portion of a website's underlying database, regardless of what sort of data it contains. I've used this technique to pull down data relating to thousands of U.S. patents, for example. If you know of a website that contains data you're interested in, this technique can likely be used to automatically pull that data down into Excel for you.

Questions? Comments? Concerns? Trying it out but getting stuck? Use my Meembo Me chat feature and shoot me a quick question. And remember, you can always just ask me to do it as a $50 Project.

Hi,I want to get Hotels rate from online booking website using excel web query,i am facing some issues , the results data (tables) are not fixed , i can't link it to my main table ,every time i refresh it change the data location .also ,how can i store this dynamic data so i can generate statistics reports.

Do you know where I might find more information about (1) how to become good at making web queries in Excel and (2) parameter lists for various websites, please?

This site http://www.gummy-stuff.org/Yahoo-data.htm shows a parameter list ("special tags") for Yahoo and I have tried but failed to find similar lists for MSN and other websites. Any ideas would be gratefully received.

Have you thought about writing an e-book on web queries? There is a surprising lack of information about how to do more than just establish a basic connection. I think the world might quickly become an even better place if advanced web query stuff was more widely known about.