Cuil was downloading our robots.txt about 1000 times for every actual page they downloaded. Due diligence is great, and, yes, you never know when we might change the robots.txt file and block search engines from something, …but… Well, we hardly see Twiceler at all anymore - but at least they got us to make the fetching of robots.txt as minimally taxing on our resources as we possibly could [which is pretty minimal]

Bing has indexed about 500 pages, last we checked, out of several million in our sitemap. The SEO experts say Bing focuses on and ranks by the age of a website - that 11 year-old sites beat out measly old 10 year-old sites. Which just means that Bing will be irrelevant with respect to Mattters.com [which is new] for the remainder of my lifetime [and I’m old, but not THAT old]. I understand the 80-20 rule, and letting Google do the heavy work and index EVERYTHING while Bing goes for the cream, but this particular approach is flawed [certainly in our case! :-)].

Yahoo, besides Slurp’s love affair with 404s [some kind of 404 fixation issues, numerology experts please step up] kind of sucking because it has to go all the way to our backend each time, their site registration is broken. But no worries - they are about to merge with Bing, with some of the best WebMaster information-tracking control panels around…. [ :-), that is irony AND sarcasm - just FYI].

Meanwhile, AOL seems to be actually working albeit on a reduced scale, as do numerous small fry. And, of course, Google. One of the goals of Mattters is to replace search engines for various specific tasks related to people following and enjoying their interests as opposed to, oh, I don’t know… Writing a paper for school or something. I think we have a chance here.

]]>http://gui.net/blog/2010/08/29/search-engines-do-they-work/feed/Alexa Problemshttp://gui.net/blog/2010/08/19/alexa-problems/
http://gui.net/blog/2010/08/19/alexa-problems/#commentsThu, 19 Aug 2010 21:22:25 +0000Michaelhttp://gui.net/blog/?p=75These problems seem to have been universal - though I have found nothing written up about them. Hmmm….

It all started about 2 weeks ago when the Alexa dailies started swinging wildly in the middle of the week for many of our comparable websites [not really our competitors, just those we keep an eye on], and then finally us as well. Usually the chart dips on a holiday or weekend for most websites - rarely on a weekday.

Then the stopped updating all together. The charts were frozen and the ranks were frozen. OK. They are working on a problem.

Days go by. What was it, 5? 6? Then they updated the charts for a few of the first days that had no data. And then they stopped updating the chart again for a few MORE days. Finally, about 3 or 4 days ago, they caught up to their usual 3-days behind daily chart updates and then, about 2 days ago, the ranks started updating again.

Now, today, they have now removed yet another country. First it was Hong Kong. Now New Zealand. Hong Kong was our number 2 country at the time and New Zealand was our number 1 country yesterday. Funny how that works. Either they are consolidating smaller countries into larger ones - or there is a bug that they just fixed that caused New Zealand, for example, to score way too high sometimes [but our Australia channels are some of our most popular].

Don’t know. But it looks like Alexa might be Baaaack now. For all those that decry its accuracy - we have found that the more sophisticated laymen, or unsophisticated experts, rely on it heavily and use it to judge a website and whether they take the website [i.e. our website] seriously or not. So it is not so much an SEO thing as a marketing tool

]]>http://gui.net/blog/2010/08/19/alexa-problems/feed/Alexa, Compete, Quantcast - Idiosyncrasies Oh Myhttp://gui.net/blog/2010/07/30/alexa-compete-quantcast-idiosyncrasies-oh-my/
http://gui.net/blog/2010/07/30/alexa-compete-quantcast-idiosyncrasies-oh-my/#commentsFri, 30 Jul 2010 19:58:41 +0000Michaelhttp://gui.net/blog/?p=74This post was is large part prompted by Alexa making a change that made the Mattters chart dip a bit - but I’ve been meaning to write this for a long time - as sort of a personal (public) cheat sheet. The change Alexa made, BTW, was to drop Hong Kong (perhaps to fold it into China?) from their international rankings from which they calculate the global Alexa Ranking. Hong Kong was Mattters’ 2nd most popular country (the US is 1st).

We’ll start with Alexa. As you all know, it is based on totaling up the number visitors to website having the Alexa toolbar, i.e. SEO and SEO-savvy visitors, and from these visits extrapolates the total number of visits to a particular website.

The Alexa rank is a comparative rank - which means that since everybody lost Hong Kong, the rank should on average remain unchanged. Except in reality those that had the most traffic from Hong Kong suffered, and those that had the least must of got a little spike in their charts.

This might make more sense if you consider a case, for example, where your traffic may go down quite a bit on weekends. If, on average, everybody else’s traffic went down MORE, your Alexa Rank will then actually go up on weekends.

Quantcast is said to be an completely accurate measure of visits - at least I have never been able to find anyone who questions this (and I have LOOKED!). But in reality it favors… well what it does is it discounts visits from Google [perhaps to the point where it just out and out ignores them]. It does like visits from Yahoo quite a bit - so it is not discriminating against all search engines, just Google. At least this is what I have observed experientially, comparing it against all other sources of data at our disposal. [Also, not sure about Bing (more on bizarro search engine behavior in a different post) or other - from our perspective at this point - minor player search engines].

Compete has one of the lowest numbers for number of visits of any reporting tool we use - but it APPEARS to be largely consistent in its under-counting - can’t really tell though except by comparing it to what other reporting tools say about relative rankings of websites. Funny thing about Compete - the free reporting tools they offer, I mean - is that they update the numbers only once each month and around the middle of the month - so around the 15th I imagine everyone [us anyway!] starts going over there to see if the new numbers are in. This month they didn’t get updated until the 25th or so. Ugh.

Oh, the link to the Compete reports is (took awhile to find this - back in the day):

]]>http://gui.net/blog/2010/07/30/alexa-compete-quantcast-idiosyncrasies-oh-my/feed/NGINX Configuration Files and ReWriteshttp://gui.net/blog/2010/07/04/nginx-configuration-files-and-rewrites/
http://gui.net/blog/2010/07/04/nginx-configuration-files-and-rewrites/#commentsSun, 04 Jul 2010 18:16:35 +0000Michaelhttp://gui.net/blog/?p=73Some of this post will just be random notes, some will be more descriptive of just what madness occurs inside one of these files. For version 0.7.63.

1,

ancient_browser “msie 6.0″;

Whether quotes, like on the nginx site, capitalized or not, or unquoted as it is here:

(linked to from the main site), the instruction: if ($ancient_browser) is true for the google bot. If you want your site crawled this ain’t so good. In fact, horrible.

2.

As mentioned before try_files appears to do nothing at all. Instead one can use if -f filename. This generates an error for each try in the error.log.

3.

When performing a break at the end of a rewrite or in an if (…) { break; } what this does is send the process back to the top of the configuration file again, this time with the new modified $filename_uri ($request_uri is not changed by rewrites, tho you can assign it a new value, if I remember correctly)

4.

Our backend is an apache on a different ip. We needed to use:

resolver: xx.xx.xx.xx

and the x ip worked whether we used the ip of the backend, or ips found in /etc/resolv.conf. Nginx did seem to need this instruction regardless.

5.

For speed (and in our case to make things work) we added several lines like:

which are called quite frequently and the ‘=’ comparison is reported by nginx to be the fastest and first comparisons made when a url is received.

6.

The real problem with Nginx programming is the inability to AND more than 2 things together - doing this only by specifying a Location - for the first part of the AND, and inside a Location using an ‘if’ statement for the second.

As a stupid example:

location ^~ /images {
if (request_uri ~ .gif) {
…
}
}

The only way to AND 3 or more things together is to create ‘faux’ locations, and redirect 302 to the ‘faux’ location in which the 3rd AND, using an if statement, can occur. More contrived examples…

In Apache, one just has to add as many RewiteCond as one cares to. Very easy. Nginx’s overly constrained environment is kind of fun, in a masochistic, how can I actually do this, who cares how many lines of code and contortions it takes… can it even be DONE? kind of way.

Finally figured out that the issue here is that there is no abstraction of the concept of ‘variable’ to include both variables and aliases. So everything depends on context and which command is being used - which creates redundant functionality and unpredictability.

Same for location etc.

This lack of ability to use and apply abstract concepts and constructs leads to a ton of ’special cases’ and ’special purpose’ one-off code. Very little is generalizable. And this seems to be OK and the way many people [I would call them junior] think - for example the recommended code to remove a leading www. from urls happens to remove ALL sub domains as well. I.E. special purpose but not narrowly defined. Oh boy.

No versions associated with the documentation, so you never know that ‘try_files’ does not work, for example, until you try it. Wasting some time.

Oh and what happened to plugins? This compiling in modules at compile time seems a bit … unwieldy. And no, it can’t be for speed… so… yes, generic plugin APIs are HARD [it is the ONLY thing that Eclipse got right (the code, the SWT, GED, it all sucks) and boy did they get it RIGHT. So it is not only hard, it can make a product very successful as well].

The pluses here, though, are pretty good error reporting [way better than Varnish so far] even though the error log appears to be missing a lot of debug information [I must be doing something wrong, where ‘notice’ and ‘debug’ have the same output]. I also presume that the reliability at its primary job of serving up data to clients is pretty high, based on its reputation.

Anyway, based on the popularity of all these newish web servers Mattters has changed the Apache channel to the Web Server Channel and we will add more links and support for Nginx and Varnish, Lighttpd etc over time

]]>http://gui.net/blog/2010/06/24/nginx-configuration-file/feed/A Tweet classification schemehttp://gui.net/blog/2010/06/06/a-tweet-classification-scheme/
http://gui.net/blog/2010/06/06/a-tweet-classification-scheme/#commentsSun, 06 Jun 2010 20:32:04 +0000Michaelhttp://gui.net/blog/?p=71When writing software to look at the content of posts on the internet, on blogs and on Twitter, one can just use keyword and keyword context metrics, or one can be a little smarter.

Using somewhat heuristically based algorithms, one can classify blog posts and tweets, and in particular, tweeps, into several common categories for the purpose of assigning a relevance and usefulness level to a poster of information, in particular to other people outside their inner circle.

1. Chit chat: communicating with friends

2. Promotion: most of the posts are links to a particular product or blog

3. News disseminator: many links to stories found in many places on the net

4. Observer: observations and comments about what is happening

Some people have posts/tweets that fall into multiple of these categories, and often they will is some large part mimic the others in their inner circle.

One can’t help but observe that the number of followers / readers (aka success) typically is inversely related to the number of the category on this list. (i.e. observers / pundits get most of the traffic and interest on average)

This success function is easy to see with, say, Justin Bieber’s tweeps where just a few popular tweeps post observations and comments - but who are also the most popular tweeps.

The Tweeps for Wine are more balanced, many chatting a little, but mostly observing and disseminating news. The High-end Audio tweeps, however, are mostly promotional, with just a little news dissemination thrown in.

]]>http://gui.net/blog/2010/06/06/a-tweet-classification-scheme/feed/The ontology of the software universehttp://gui.net/blog/2010/05/17/the-ontology-of-the-software-universe/
http://gui.net/blog/2010/05/17/the-ontology-of-the-software-universe/#commentsMon, 17 May 2010 16:56:42 +0000Michaelhttp://gui.net/blog/?p=69Organizing the discussions in the online software universe into some kind of hierarchy is a difficult and yet well-understood and frequently under-taken task.

and in general the sites themselves are implicitly organized (separated) into this scheme.

Software is one of the most prevalent topics on the internet, lots of data points here :-)

]]>http://gui.net/blog/2010/05/17/the-ontology-of-the-software-universe/feed/Website Traffic Analytics and the Stock Markethttp://gui.net/blog/2010/05/10/website-traffic-analytics-and-the-stock-market/
http://gui.net/blog/2010/05/10/website-traffic-analytics-and-the-stock-market/#commentsTue, 11 May 2010 04:17:58 +0000Michaelhttp://gui.net/blog/?p=68Being one of the few to get caught up in the late 90’s stock market bubble (ha ha ha … Not) I learned a lot about reading stock charts.

I am forgetting some of the terminology after 10 years (yay!) but can still read a chart: Bollinger bands, day, weekly and month trendlines, gaps and filling gaps, retrenchments… All of this knowledge can be applied to your website statistics.

Not sure if this helps with SEO necessarily. More likely it tells you when to make a major push on your online marketing campaign because, for example, your highs and lows are converging to a single value and your traffic is going to gap down or up real-soon-now. And you would most likely want it to go UP!

And, finally, an idea for all you website/twitter analytics start-ups… try using some of the well known stock chart visualization techniques (Yahoo Finance is a really great site for examples). A great differentiator… and useful too :-)

]]>http://gui.net/blog/2010/05/10/website-traffic-analytics-and-the-stock-market/feed/To support smart phones or not to…http://gui.net/blog/2010/05/05/to-support-smart-phones-or-not-to/
http://gui.net/blog/2010/05/05/to-support-smart-phones-or-not-to/#commentsThu, 06 May 2010 04:05:16 +0000Michaelhttp://gui.net/blog/?p=67As many have notes, first it was Microsoft IE that was a obstacle in the write once, run everywhere dream of every hard working programmer programming their heart out.

And we thought THAT was bad. Ha!

Now we have Apple, with its (yes, it is theirs, who else would want it?) horrifically antique Objective-C giving us yet another hurdle with the iPhone SDK. OK, so write twice run everywhere, right? We just have to delay every release, reduce overall functionality, and release that Objective-C is for baby apps, and not try anything that is real and substantial with it if we want our sanity to survive until 2112 (when we all die anyway, right? :-))

Then we have Google’s Android and Android development and Java (better than Objective-C, by a million miles of gray hair, yes indeed, but still growing a little moldy on the vine) but at least they allow cross compiling (unlike Apple).

So the write once, write twice, uh write four times, run a lot of places paradigm is kaput.

Small developers just do not have the resources to develop for four platforms. So they will have to pick one. Let’s see…. the internet? Kind of crowded with 100M’s of competitors. The smart phone? Like the CD ROM apps before it, all the baby apps have been done to death - the next gen smart phone apps will have to have a substantial back end that provides some kind of interesting functionality (i.e. not Foursquare).

So that leaves the iPad. So all the small developers have to develop for it because they at least have a chance [and it is cool and fun, too, which doesn’t hurt :-)]. And large developers have the resources to throw money at everything. But how about the little guy who has already started developing for platform X - what do they do? They can’t afford to add support for yet another platform. They bite their fingernails.

]]>http://gui.net/blog/2010/05/05/to-support-smart-phones-or-not-to/feed/iPad development anxietyhttp://gui.net/blog/2010/05/01/ipad-development-anxiety/
http://gui.net/blog/2010/05/01/ipad-development-anxiety/#commentsSun, 02 May 2010 04:50:23 +0000Michaelhttp://gui.net/blog/?p=66We want to but we don’t want to.

It’d be fun. But 1.0 of Apple products look so anemic in the rear-view mirror.