Archive for October, 2013

Linking has long been thought one of the cornerstones of the web, and thereby a key part of XML and related syntaxes. It’s also been frustratingly difficult to get right. XLink in particular once showed great promise, but when it came down to concrete syntax, didn’t get very far. My thinking at the time is still well-reflected in what is, to my knowledge, the only fiction ever published on XML.com: A Hyperlink Offering. That story ends on a hopeful note, and a decade out, I’m still hoping.

For what it purports to do, Skunklink still seems like a good solution to me. It’s easy to explain. The notion of encoding the author’s intent, then letting devices work out the details, possibly with the aid of stylesheets and other such tools, is the right way to tackle this kind of a problem. Smaller specifications like SkunkLink would be a welcome breath of fresh air.

But a bigger question lurks behind the scenes: that of requirements. Does the world need a vocabulary-independent linking mechanism? The empirical answer is clearly ‘no’ since existing approaches have not gained anything like widespread use, and only a few voices in the wilderness even see this as a problem. In fact, HTML5 has gone in quite the opposite direction, rejecting the notion of even a vocabulary-independent syntax, to say nothing of higher layers like intent. I have to admit this mystifies me.

That said, it seems like the attribute name ‘href’ has done pretty well in representing intended hyperlinks. The name ‘src’ not quite as well. I still consider it best practice to use these names instead of making something else up.

If you’ve come here because of something you noticed in your HTTP access logs, read on.

Who is doing this? This is a personal project of Micah Dubinko. It is completely separate from anything related to any employer.

What is ASLbot? In the immediate future, ASLbot is no more than a personal research project. It consists of a web crawler, like Google, with an emphasis on sites centered around American Sign Language, and in particular reference materials relating to particular signs. At the moment, there is no publicly available search site, but I would like to set that up as time allows. My long term goal is to promote ASL as an effective means of communication while at the same time making it easier to research and learn about.

Will this affect my site? No. I have the crawl settings turned down very low, so that sites crawled have no discernible impact on performance. I also crawl very infrequently, as ASL dictionaries don’t tend to change terribly often. Once a search site is operating, you may notice an increase in traffic as more people are able to find and visit your site.

What do you intend to do with the crawled data? First off, this is a technology experiment. I’ve noticed that Google/Bing/Yahoo do only an “OK” job on queries like “asl sign for awesome” and think a dedicated site can do better. Once the basics are up, I’d like to do a lot more, but this will necessarily take a long time, as this is not my full-time work. For example, I would like to (possibly with manual input, especially from native signers) categorize signs by handshape, position, and movement in a manner similar to William Stokoe‘s groundbreaking research on ASL linguistics. Keep in mind that this, if it happens at all, is far in the future—imagine someone searching for “M handshape shoulder” and getting a list of hits that link to existing ASL dictionaries.

Do you plan to charge money to access the site? Never.

Do you automatically download videos? No. Only web pages.

How do I make it stop? Think of it this way: Does your site appear in Google? If so, people will be searching and finding particular signs anyway, but without the aid of an ASL-positive web tool. But if you really want to, put an entry for “ASLbot” in your robots.txt file, which this crawler fully honors.

This is awesome, how do I help? Or, I still have questions: Feel free to email me using the contact information listed on this site, or ( <my first name> @ <this domain.info> )