Archive for the ‘videos’ Category

We haven’t been doing much regular blogging lately, but we’re hoping this will change in the coming weeks.

In the meantime, we’ve recently done some housekeeping on our website, so if you haven’t visited recently we’d encourage you to do so. We’ve updatedmanypages with new content, but here are two sections in particular that we’d steer you toward:

Examples Section. This is a long overdue section that puts together some quick examples of how Kirix Strata™ can be applied to common data problems. The section is still a work in progress with more videos still to be produced. However, we expect what we have now will prove useful to new and old Strata users alike. Check it out.

Video Tutorials and Archive. We’ve done a bunch of different videos and screencasts over the past year or so, but they’ve been they’ve been posted all over our website. This new section wrangles all of the videos together in one place for posterity. The feature tutorials, in particular, are worth viewing as they help give a more comprehensive look at how to use specific features in Strata. Take a look.

So, in a nod to the Matrix, where one cannot be told what it is, but one must see for oneself, we’ve tried to make some high quality video documentation available. Stay tuned for more to come. Enjoy!

Benford’s law is one of those things your high school math teacher would break out on a slow, rainy day when the students’ attention span was even lower than usual.

He’d start out by asking the class to look at the leading digits in a list of numbers and then predict how many times each leading digit would appear first in the list. The students would make some guesses and eventually come to the consensus that the probability would be pretty close — about 11% each.

Then, the teacher would just sit back, smile, and gently shake his head at his simple-minded pupils. He would then go on to explain Benford’s law, which would blow everyone’s mind — at least through lunchtime.

This counter-intuitive result applies to a wide variety of figures, including electricity bills, street addresses, stock prices, population numbers, death rates, lengths of rivers, physical and mathematical constants, and processes described by power laws (which are very common in nature).

Boiling it down, this means that for almost any naturally-occurring data set, the number 1 will appear first about 30% of the time. And, by naturally occuring, this can mean check amounts or stock prices or website statistics. Non-naturally occurring data would be pre-assigned numbers like postal codes or UPC numbers.

Besides being fun to play with, Benford’s is used in the accounting profession to detect fraud. Because data like tax returns and check registers follow Benford’s, auditors can use it as a high-level check of a data set. If there are anomalies, it may be worth investigating closer as potential fraud.

If you’re interested in further information about fraud detection using Benford’s, definitely give these two articles by Malcolm W. Browne and Mark J. Nigrini a read.

We’re happy to announce that we’ve teamed up with the good folks from Lokad to create a Kirix Strata™ forecasting plug-in, which you can use with your own time-series data.

Lokad is a company that has created some slick forecasting software and, thankfully, offers it as a web service via their API (you can also upload data directly to their site). Here’s a link where you can find lots of good information on their technology. Bottom line, they offer some great business forecasting tools at a cost-effective price. Their API was a piece of cake to work with and so we were able to quickly put a GUI on it and create the Strata Lokad forecasting extension.

Obviously, there’s quite a bit of forecasting that goes on day to day within companies. When you veer toward the largest companies, you’ll find departments dedicated to forecasting with automated processes built into their ERP systems. With smaller companies, forecasting is likely performed by someone without the word “forecast” in their job title. For instance, a warehouse manager may need to forecast inventory to make solid replenishment orders. Proper forecasting prevents the costly mistake of either overbuying (spoilage, locked-up cost of capital) or underbuying (lost sales).

However, the sweet spot for the Strata Lokad extension is ad hoc forecasting; it’s for people who have various, changing data sets and need their forecasts on-the-fly. Business consultants who provide forecasts for their clients would fall in this category. In addition, this extension can benefit sales analysts who don’t have adequate forecasting from their OLAP systems or financial analysts interested in different cash flow forecasts.

The great thing about forecasting algorithms is that they apply to a wide range of circumstances. So, if you’ve got some historical data to throw at a situation, you can get back some good results.

P.S. We’re pleased to note that this is the first extension we’ve made public that takes advantage of Strata’s web scripting capabilities that brings a web API to the privacy and comfort of your own desktop. Got another web API you’d like to see work with Strata? Let us know.

We’re pleased to announce that Kirix Strata™ is now officially out of beta! A lot of hard work and late nights of coding have been put into this li’l tool and we hope it makes people’s lives a little bit easier when it comes to data analysis.

Thanks to everyone who has been involved in the beta process — we’ll get those free licenses rolled out to you within the week.

Also, you’ll notice a big ol’ website redesign too. We want to give a special shout out to Jeff, Benni, Peter and David for the bang up job they did with the design and implementation of the website. And, here’s our new overview video, check it out:

(NOTE: See screencast video below for a quick look at some of the new features!)

Hope everyone had a lovely holiday season!

We’re happy to report that our developers provided lots of shiny new toys in our Strata stocking over this past month, including further work on Data Links, the inclusion of a “Quick Filter” mechanism and the introduction of our new report writer. Please feel free to download Strata Beta 7 and let us know what you think!

Here’s more information on what’s new in this latest version:

Data Links

The ability to bookmark data files is coming into its own. We’ve got things working pretty well on CSV and RSS files at the moment, with some more work still to do on HTML tables. Here’s a general synopsis:

Open a CSV or RSS table from the web.

Perform your own analysis, using calculated fields or marks.

Save the data URL as a simple bookmark.

Click the Refresh icon or open up the bookmark in the future. Your data (and your calculations) will refresh based upon the new or updated data on the server.

We’ve been finding this quite useful internally, particularly in relation to analyzing our web log data. Check out the screencast below for further info.

Report Writer

With Beta 7, we are also introducing our new report writer.

You can create your report in a design view (similar to a template) and then toggle to a layout view for a preview of what you’ll see when you print. As a bonus, the layout view enables you to manipulate and format your data directly, instead of being bound to a “print preview” mode.

Another cool thing is that, besides creating reports from data in your project, you can also create reports directly from external data, such as local CSVs or MySQL tables. (First go to File > Create Connection, then you can select it as your source data in the report writer). Check out the screencast below for a quick demo of the report writer in action.

Please note that there are a few known bugs with Report Writer in Beta 7. These include:

When using groups, the first group does not display properly.

The layout view can be extremely slow when using large files. Now that we’ve got some big features in, optimizations will soon follow.

Items in the Report Header in the design view do not display properly on the top of the page.

A few days ago, the always datariffic folks at Juice Analytics posted an article about MacGyver-ing call volume data and pushing it into an online mapping application called Mapeteria. Basically, they were doing some ad hoc data visualization comprised of public web data, private phone call data and a web service that provided the visualization (which in turn used the Google Maps API).

Huh… local data, web data and web APIs? Sounds like a perfect application for a data browser (well, it would’ve been perfect if the web service accepted a POST command, but I digress). A data browser enables you to easily access web data, combine it with local data, perform any required data clean up and then push/pull data from the web — without ever leaving the tool.

It also would’ve saved Juice a bit of time, particularly with grabbing area codes and prepping that file. Let’s look at the four steps they went through and we’ll see how Kirix Strata™ might improve the experience:

1. Pull out the area codes.

The data had phone number values like “12345678901″ as well as “2345678901″, so they used the following formula to pull out the area codes using Excel:

=VALUE(IF(LEFT(E7,1)="1",MID(E7,2,3),MID(E7,1,3)))

Strata would use a similar formula:

iif(left(tel,1)="1",substr(tel,2,3),substr(tel,1,3))

The main time savings here (particularly with large files) is that the calculated field populates automatically for every record in Strata, instead of needing to paste formulas. OK… not terribly exciting thus far.

2. Convert area codes into states

This is a multi-part step:

a) Locate a table from the web that has area code data associated with a state ID (while fending off parasitic scammers). b) Clean up the table as necessary. c) Do a lookup from the phone call data that adds in the state where the call originated from.

Strata can really cut down the amount of time spent on this step. Because of the website used, the folks at Juice surely had to create his lookup table manually. I went to Delicious, searched for “area codes” and found this very useful website, which had all the data in a nice HTML table. With Strata, I simply right-clicked and selected “Import Data” and immediately had the table I needed for the lookup.

Finally, I created a relationship between my two tables and dragged in the state codes (e.g., CA, IL, NY, etc.) into the phone call data.

3. Create a summary data set

This was done using a pivot table in Excel. Strata doesn’t have classic pivot tables in its feature set at this point, but it does have a nice li’l grouping utility. So, once I knew what csv format was required for the Mapeteria web service, I grouped the data accordingly.

4. Create colorized map the of U.S.

This is the “almost perfect” part I referred to above.

Though Mapeteria is a very cool visualization service using Google Maps, it needs to fetch a CSV file embedded in a URL from elsewhere on the web. If the service was able to accept data via a POST command (or something like an “Upload Data” button), Strata would have been able to just take the table we created and push it to the web service, no csv transformation required (in fact, we’ve got some stuff cooking in our labs that would make this as easy as copy and paste). And, if we were just able to push the data out like this, we would have immediately gotten the map without ever leaving our data browser.

But, like Zach at Juice, I had to save the file in a CSV format and then upload it to a server before I was able to get my map. Here’s a screencast of the entire process… once I found the area code data on the web, it took less than 5 minutes to get my map.

If anyone wants to try this process out for themselves, please feel free to download Strata and give it a try. This data browser is in beta and completely free to use; we’re also giving away free full licenses to anyone who provides feedback during the beta period. Oh, and here is the sample phone call volume data I used for this exercise:

This is a pretty simple example of how Strata can be used for ad hoc data access and manipulation with data from the web (or, as one can imagine, within a corporate intranet) and make this kind of analysis very efficient. Throw in some web services, web APIs or very large files into the mix, and you’ve got the chance to do some fairly interesting things.

This afternoon I was doing some analysis on our web logs and thought it may make for a good screencast and blog post. We currently use a combination of AWstats and Google Analytics for our web stats but are increasingly using Kirix Strata™ to dig deeper into the raw web logs for the more customized things that aren’t readily available otherwise.

Also, honestly, it is kind of fun to plow through almost a million records on your own. Hmmm, maybe I should get out more.

The topic of the screencast below are the search terms people enter to find things in our phpBB3 support forums. These terms are embedded in the “request” field of the apache logs and I couldn’t find a way to get them without digging into the logs themselves (NOTE: I wouldn’t doubt that there is some way to do this via a mod to phpBB or a filter in Google Analytics… but since I couldn’t find anything via a quick Google search, using Strata just ended up being a lot faster).

An example of a search string we’re dealing with is:

GET /forums/search.php?keywords=proxy HTTP/1.1

So the trick was to parse the search keywords out of the field and then group them together to see what people were searching for… and in turn give us the chance to improve our support area by targeting some of these search terms and expanding our documentation accordingly.

TECHNICAL NOTE:

I downloaded the Apache logs from the server and, due to the file size, decided to import them into Strata rather than open the file and work with it directly. To import your logs, go to Import, select text-delimited files, and then import as space delimited with quotation marks as the text qualifier. Update: You can now use a handy little log parsing extension to pull in your web log files without having to mess around with a straight text import.

TECHNICAL NOTE 2:

For posterity, here are the functions that were used in this screencast:

Well, it took a lot more blood, sweat and tears than we expected, but we’re really excited to announce our first public beta release of Kirix Strata™, the data browser.

And what, pray tell, is a “data browser”?

Well, Strata is a specialty browser that lets you access and manipulate data from pretty much anywhere on the web. For instance, Strata will let you grab HTML tables or RSS Feeds or even open up CSV files directly from a URL (wow, that’s a lot of acronyms).

Then when you’ve got the data in a table, you can do all sorts of ad hoc analysis. You can create calculations or sort and filter or create queries and reports — similar to the kinds of things you might do with a desktop database or a spreadsheet. In addition to web data, you can still work with data from your desktop or in a database system like Oracle or MySQL Enterprise.

And for those more technically-inclined, Strata also includes an implementation of ECMAScript — so anyone familiar with Javascript should feel right at home. The nice thing about the scripting is that it also includes bindings for SQL and HTTP — which can make for a lot of fun when connecting to Web APIs, creating “desktop mashups” or building extensions. And to boot, it runs on both Windows and Linux (at this moment, only Ubuntu is supported officially).

We also just want to give a quick shout out to the excellent folks at wxWidgets (we use their GUI library) and Mozilla (Strata incorporates the Gecko engine) — without which, Strata would only be a mere twinkle in our eye.

So, without further ado, check out the Kirix Strata introduction video:

About

Data and the Web is a blog by Kirix about accessing and working with data, wherever it is located. We have a particular fondness for data usability, ad hoc analysis, mashups, web APIs and, of course, playing around with our data browser.