05.20.08

This is a theme that’s been bouncing around my subconscious for months; something I’ve blogged about in the past.

But really, syndication is only part of the problem. Syndication normalizes data, and makes it readily accessible to 3rd parties — but it doesn’t push data where you want it. It’s a pull-focused technology.

For push, we need some sort of alerting capability.

Recently, I’ve been in the habit of checking delegate counters for the 2008 Presidential Election Primary Races. I check them daily; seeing updates to pledged delegates, super-delegates, etc.

Checking for updates doesn’t take a significant amount of time, but it’s yet another activity that can conceivably interrupt my work flow. Leveraging of automation would be a much better way to do this.

Recently, my company added Alerts capability to our AlchemyGrid beta service. You can create an alert based on anything — any sort of web content (syndicated or not). Alerts can travel over many communications mediums (Email, AIM, SMS, Twitter, etc.). They support lots of customization options (regarding how often to check for updates, what’s considered a “unique update”, etc.).

I used this new service to create an Alert that monitors delegate counts for the Democratic Presidential candidates (GOP has already chosen their candidate). Any updates to delegate counts are automatically posted to the twitter account “demdelegate08″. You can see the Twitter feed (and follow it if you wish) here:

A few implementation notes, for the geeks out there: We’re using a custom-engineered AIM, Email, and SMS backend for our Alerts implementation. We’re interacting with external services directly at the Protocol/API level, not piggy-backing off 3rd-party gateways or using other unreliable modes of communication.pills online usadiscount code 5%:_879981pharmacy online usa

1. AlchemyGrid’s Term Extraction facility supports multiple languages (English, German, French, Italian, Spanish, and Russian!). This was an important requirement for us, to enable contextual content generation for non-English websites/blogs. There are significant differences between languages in terms of punctuation rules, word stemming, and other details. Hats off to our Term Extraction developers, you’ve done a great job ensuring good initial language coverage.

2. Our Term Extraction facility is entirely statistical in its basis, not using a hard-coded lexicon, etc. This enables it to extract contextually-relevant topic keywords even when they’re (a) new topics, (b) rarely used common nouns/people-names, or (c) misspelled.

We’ve just integrated Term Extraction into our Grid service, so there may be a few minor kinks to work out in the coming weeks — but overall we’re happy with the initial results. Contextual capability vastly expands the utility of AlchemyGrid widgets, as their content can now be automatically customized to relate to your content. This applies to *any* input-enabled widget in the grid (ALL widgets are contextual). Here’s another contextual example (a related Amazon book):

We’ll be enabling the other supported languages in coming weeks, as well as rolling out some additional enhancements to our text processing algorithms (for the geeks in the audience, enhancements to our sentence boundaries detector, inline punctuation processor, etc.).pills online usadiscount code 5%:_879981pharmacy online usa

02.14.08

I use the Internet constantly during the course of my everyday life — looking up telephone numbers, reading restaurant reviews, etc.

One task I’m frequently engaged in is Content Monitoring; that is, checking (and re-checking) websites of interest for updates and new information.

Now wait a sec — wasn’t syndication (RSS, ATOM, etc.) supposed to do this for me? Sure, if a website actually exposes data feeds. If they don’t, you’re mostly out of luck.

Alas, there are many websites out there with no form of syndicated access. This is just plain irritating.

Luckily tools are starting to appear that can eliminate this irritant. My company released a new service earlier this week, which makes great strides at solving this problem.

This new service performs Automated Content Monitoring: a way of programmatically monitoring information sources that currently lack syndication features.

I’m a big fan of leveraging automated techniques to optimize my daily workflow — many of my previous blog posts have focused on this topic. Leveraging algorithms to improve efficiency, access to information, and integration of data is a central theme to the Implicit Web, both a personal interest of mine and business interest of my company. Automated Content Monitoring fits perfectly within this arena.

I’m currently using our new Automated Content Monitoring service to track a variety of information sources: new events at my preferred concert venues, special deals offered by local radio stations, etc. Monitoring each of these information sources automatically frees up my time for more useful activities, and gives notification of website updates far sooner than if I were performing these tasks manually.

11.04.07

Tomorrow (Nov.5th) is the beginning of the Defrag Conference, here in Denver Colorado.

Bottom line: Defrag is a conference about solving the “augmentation” of how we turn loads of information into layers of knowledge; about the “aha” moment of the brainstorm. As such, it encompasses many technologies we’re all familiar with (wikis, blogs, search) and many new, developing technologies (context, relevance, next-level discovery) — and tries to see them all through a new prism.

10.11.07

If you haven’t heard already, a new conference on Implicit Web topics will be held Nov. 5-6, in Denver, Colorado:

Defrag is the first conference focused solely on the internet-based tools that transform loads of information into layers of knowledge, and accelerate the “aha” moment. Defrag is about the space that lives in between knowledge management, “social” networking, collaboration and business intelligence. Defrag is not a version number. Rather it’s a gathering place for the growing community of implementers, users, builders and thinkers that are working on the next wave of software innovation.

This conference is being organized by Eric Norlin, who has been blogging on implicit web topics for a while now. I’ve started getting to know Eric via e-mail and look forward to meeting him and other Implicit Web folks in person at this conference.

“While the original vision of the semantic web is grandiose and inspiring in practice it has been difficult to achieve because of the engineering, scientific and business challenges. The lack of specific and simple consumer focus makes it mostly an academic exercise.”

The post touches upon some of the practical issues keeping semantic technology out of the hands of end-users, and potential ways around these roadblocks. Summaries are given for three top-down “mechanisms” that may provide workarounds to some issues:

Leveraging Existing Information

Using Simple / Vertical Semantics

Creating Pragmatic / Consumer-Centric Apps.

I can’t agree more with the underlying principle of this post: top-down approaches are necessary in order to expose end-users to semantic search & discovery (at least in the near-term).

However, this isn’t to say that there isn’t value in bottom-up semantic web technologies like RDF, OWL, etc. On the contrary, these technologies can provide extremely high quality data, such as categorization information. In the past year, there’s been significant growth in the amount of bottom-up data that’s available. This includes things like the RDF conversion of Wikipedia structured data (DBpedia), the US Census, and other sources. Indeed, the “W3C Linking Open Data” project is working on interlinking these various bottom-up sources, further increasing their value for semantic web applications. What’s the point of all this data collection/linking? “It’s all about enabling connections and network effects.”

My personal feeling is that neither a bottom-up or top-down approach will attain complete “success” in facilitating the semantic web. Top-down approaches are good enough for some applications, but sometimes generate dirty results (incorrect categorizations, etc.) Bottom-up approaches can generate some incredible results when operating within a limited domain, but can’t deal with messy data. What’s needed is a “bridging the gap” between the two modes: leveraging top-down approaches for initial dirty classification, and incorporating cleaner bottom-up sources when they’re available.

So how do we bridge the gap? Here’s what I’m betting on: Process-oriented, or agent-based mashups. These sit between the top-down and bottom-up stacks, filtering/merging/sorting/classifying information. More on this soon.buy pills online usadiscount code 5%:_879981online pharmacy usa

09.17.07

We’re letting a select number of individuals take a “first look” at our AlchemyPoint platform in the form of a Technology Preview. We’re looking for user feedback to help us expand and improve the AlchemyPoint system before its final release.

Access to the Technology Preview is being provided in a staged fashion with priority given to users who apply first, so sign up today!

08.23.07

“Imagine for a moment that you can take any piece of online content that you care about – a news feed, an image, a box score, multimedia, a stream of updates from your friends – and easily pin it wherever you want.”

[...snip...]

“This isn’t some far off vision. It’s the near-term future. It’s the coming era of the Cut and Paste Web.”

It’s exciting to see discussion on this topic, as this is something my company has been working towards for some time now. Our AlchemyPoint mashup platform enables the visual cutting and pasting of web content, even dynamic content (like search results). “Clipped” content can be inserted anywhere — into your home page or blog, Google results pages, CNN articles, etc.

Below are several screencasts that illustrate cut-and-paste clipping of web content:

Grabbing content from a page via the mouse, and storing it in a “Clipboard” for later reuse.

Inserting content into a new page, selecting from the available “Clipboard” of previously grabbed content.

Using this methodology one can clip any arbitrary piece of web content (images, articles, headlines, blog posts, etc.) and insert it into any other web page. It’s worth noting that this process occurs almost entirely using the mouse; the only keyboard interaction required involves typing out a name to identify the clipped content.

On a technical level, cutting and pasting web content is difficult; one cannot simply grab and re-insert raw HTML fragments into web pages. There are a number of hurdles to overcome in order to perform these types of manipulations reliably. A few items that must be considered include: relative URL links, CSS content, Javascript, name/class/id conflicts between a web page and any pasted content, character set differences, how remote servers deal with Referrer headers, etc. We’ve had a good time working out solutions to these issues and others not mentioned above.

For those interested in playing around with cutting-and-pasting web content, we’re going to be opening up invitatations to our AlchemyPoint Technology Preview in the next few weeks. This preview supports the ability to perform all sorts of web manipulations, cut-and-paste of web content being just one example.buy motiliumbuy domperidone 10mgmotilium generic namemotilium costodomperidone for salebuy motilium domperidonedomperidone cost

06.18.07

One of the central themes surrounding the Implicit Web is the power of the electronic footprint.

Wherever we go, we leave footprints. In the real world, these are quickly washed away by erosion and other natural forces. The electronic world, however, is far, far different: footprints often never disappear. Every move we make online, every bit of information we post, every web link we click, can be recorded.

The Implicit Web is about leveraging this automatically-recorded data to achieve new and useful goals.

One area that’s particularly exciting to me is the utility provided by merging implicit data collection/analysisand automatic information retrieval.

The folks at Lijit have done some pretty interesting work in this arena, with their “Re-Search” capability. Using some Javascript magic, Lijit Re-Search detects when you visit a web blog via an Internet search. For example, if I visit my blog via a Google search for “implicit web,” Re-Search activates. My original Google query is then utilized to look up additional related content via the Lijit Network of trusted information sources. Any discovered content is then automatically shown on-screen.

Neat stuff! I love the idea of “re-searching” automatically, leveraging an Internet user’s original search query.

A few days ago I decided to mess around with this “re-search” idea and ended up with something that I’ve been calling “pre-search.”

Pre-search is the concept of preemptive search, or retrieving information before a user asks for it (or even knows to ask). This idea can be of particular use with blogs and other topical information sources.

I created two basic pre-search mashups for Feedburner-enabled blogs, using the Feedburner FeedFlare API:

06.15.07

Implicit, automatic, passive, or whatever you call it, this form of content analysis is starting to be recognized as a powerful tool in both a business and consumer/prosumer context. Companies like Adaptive Blue are using automatic content analysis techniques to personalize and improve the consumer shopping experience, while others like TapeFailure are leveraging this technology to enable more powerful web analytics.

Content analysis takes clickstream processing one step further, providing a much deeper level of insight into user activity on the Web. By peering into web page content (in the case of Adaptive Blue) or user behavioral data (as with TapeFailure), all sorts of new & interesting capabilities can be provided to both end-users and businesses. One capability that I’ll focus on in this post is automatic republishing.

Automatic republishing is the process of taking some bit of consumed/created information (a web page, mouse click, etc.) and leveraging it in a different context.

Let me give an example:

I read Slashdot headlines. Yes, I know. Slashdot is old-hat, Digg is better. Yadda-yadda. That’s beside the point of this example.

Note that I said “I read Slashdot headlines.” This doesn’t include user comments. There’s simply too much junk. Even high-ranked posts are often not worthy of reading or truly relevant. But alas, there is some good stuff in there — if you have the time to search it out. I don’t.

So this is a great example of a situation where passive/implicit content analysis can be extremely useful. Over the course of building and testing my company’s AlchemyPoint mashup platform, I decided to play with this particular example to see what could be done.

What I was particularly interested in addressing related to the “Slashdot comments problem” was the ability to extract useful related web links from the available heap of user comments. Better yet, I wanted to be able to automatically bookmark these links for later review (or consumption in an RSS reader), generating appropriate category tags without any user help.

What I ended up with was a passive content analysis mashup that didn’t modify my web browsing experience in any way, but rather just operated in the background, detecting interactions with the Slashdot web site.

When it sees me reading a Slashdot article, it scans through the story’s user comments looking for those that meet my particular search criteria. In this case, it is configured to detect any user comment that has been rated 4+ and labeled “Informative” or “Interesting.”

Upon finding user comments that match the search criteria, the mashup then searches the comment text for any URL links to other web sites. It then passively loads these linked pages in the background, extracting both the web page title and any category tags that were found. If the original Slashdot article was given category tags, these also are collected.

The mashup then uses the del.icio.us API to post these discovered links to the web, “republishing them” for future consumption.

Using an RSS reader (Bloglines), I subscribe to the del.icio.us feed generated by this mashup. This results in a filtered view of interesting/related web links appearing in the newsreader shortly after I click on a Slashdot story via my web browser.

This is a fairly basic example of content analysis in a user context, but does prove to be interesting because the entire process (filtering user comments, harvesting links from comment text, crawling any discovered links, extracting title/tag information from linked pages, and posting to del.icio.us) happens automatically, with no user intervention.

I think we will see this this type of automatic prefiltering/republishing become increasingly prevalent as developers and Internet companies continue to embrace “implicit web” data-gathering techniques.pills online usadiscount code 5%:_879981pharmacy online usa