Yahoo's site explorer is a great tool for folks keen on linkage data. Here is a quick rundown on its Web interface and the API. [12/06/2005: Y!SE was updated and greatly improved]

On September/30/2005 the Yahoo! Site Explorer (BETA) got launched. It's a nice tool showing a site owner all indexed pages per domain, and it offers subdomain filters. Inbound links get counted per page and per site. The tool provides links to the standard submit forms.

The number of inbound links seems to be way more accurate than the guessings available from linkdomain: and link: searches. Unfortunately there is no simple way to exclude internal inbound links. So if one wants to check only 3rd party inbounds, a painfull procedure begins:
1. Export of each result page to TSV files, that's a tab delimited format, readable by Excel and other applications.
2. The export goes per SERP with a maximum of 50 URLs, so one must delete the two header lines per file and append file by file to produce one sheet.
3. Sorting the work sheet by the second column gives a list ordered by URL.
4. Deleting all URLs from the own site gives the list of 3rd party inbounds.
5. Wait for the bugfix "exported data of all result pages are equal" (each exported data set contains the first 50 results, regardless from which result page one clicks the export link). Since December/06/2005 Yahoo provides a filter to exclude internal links (per domain and sub-domain).

The result pages are assorted lists of all URLs known to Yahoo. The ordering does not represent the site's logical structure (defined by linkage), not even the physical structure seems to be part of the sort order. It looks like the first results are ordered by popularity, followed by an unordered list. The URL listings contain fully indexed pages, with known URLs mixed in. The latter can be identified by the missing cached link.

Desired improvements:1. A filter "with/without internal links".
2. An export function outputting the data of all result pages to one single file.
3. A filter "with/without" known but not indexed URLs.
4. Optional structural ordering on the result pages.
5. Operators like filetype: and -site:domain.com.
6. Removal of the 1,000 results limit.
7. Revisiting of submitted URL lists a la Google sitemaps.
8. [Added December/06/2005] Filtering of AdSense scraper sites like ODP and Wikipedia clones.

Overall, the site explorer is a great tool and an appreciated improvement. The most interesting part of the new toy is its API, which allows querying for up to 1,000 results (page data or link data) in batches of 50 to 100 results, returned in a simple XML format (max. 5,000 queries per IP address per day).

Good news for site and mass submission addicts: as per December/06/2005 Yahoo accepts RSS/ATOM feeds and HTML pages in addition to the already supported plain URL lists in text files, which were dumped after the fetch triggered by a manual submission, unfortunately.