Thoughts on Open Source, Analytics

Monthly Archives: June 2005

There have been some posts about this recently, and there’s an official FAQ on OTN. Most Oracle employees are quite excited about the fact that JDeveloper is now “free” and tons of developers will jump on board to really put a lot of momentum behind the tool. The slight of hand that occurred (in the same breath I might add) is that Oracle has completely decimated the biggest selling point of JDeveloper: free runtime license for ADF and choice of Application Server/Database!

Yesterday you could pay Oracle the modest 1k for JDeveloper seats and you got “productivity with choice.” You were paying for features for productive development and this included ADF since that is how JDeveloper delivers most of it’s “productivity.”

Today you get JDeveloper for free and you get “productivity with hooks.” The “productivity” of JDeveloper (heavily based on ADF) is now runtime licensed. In essence, the FREE JDeveloper license is worth far less than the 1k developer seat since ADF now must be licensed in production.

I don’t use JDeveloper for projects; I’ve only fired it up occasionally to measure progress and to check in with my old hat (I used to be a Java developer). So, I suppose I’m not really qualified to rant but I find this a disturbing change that Oracle feels free to change it’s licensing at will after organizations have chosen it with it’s licensing in mind. I understand you can actually “choose” to deploy ADF to other J2EE/DB but now that there’s a runtime license for ADF you might as well run Oracle AS/DB. I see the business wisdom in this new “loss-lead” but it’s crap, I say. What happend to paying for what you want and getting fully functional products?

Perhaps this will be more salient to bayon blog readers: What if Oracle were to license the OWB runtime engine, out of the blue? You can have the OWB seat but now you have to pay per CPU to “run” the code. If someone had chosen OWB over Informatica because of licensing issues what a blow for their investment!

Perhaps those at Oracle can change my mind, and help me understand why JDeveloper customers are not actually getting the shaft in the guise of “a free lunch.” I don’t get it either, no one else has pointed this out yet. Am I way off base here? Send an email through to me to tell me if I’m bonkers.

It was one year ago today that the bayon blog was born, with a post about simple beginnings. It apologized in advance for misspellings, hasty or inaccurate assumptions which I think was well placed; my high school English teacher would cringe if he were to read my posts. The community has been quite accepting of my inattention to these details, and I thank them for allowing me to be a little sloppy in this area.

There have been 72 entries in 365 days which means I post something use(less/ful) every 5 days.

Mostly I post about Oracle. Here is the breakdown of entries by category (Oracle = 40, Open Source = 12, Tech Industry = 10, Professional = 10, General BI = 6, Grid/Distributed = 2).Notice the sum surpasses the total because of posts being categorized to multiple categories; an issue familiar to dimensional modelers and quite challenging to boot.

Traffic is steadily increasing… 10k page views per month is starting to breach respectable traffic! Which is thanks to…

I’ve used RealVNC for quite sometime, and find that it is a quick and easy method for occasional remote access. Did I mention it’s free?
Since I typically only need it for Windows machines I had only used the Windows VNC Server version. With Linux, “ssh -l user -X myhostname.company.com” would usually suffice. The Windows version polls to check for updates to the window, or screen, or underneath the mouse, etc.

I’ve been duly impressed by the headless VNC server that comes with my White Box Linux. I was expecting a similar experience with delays, pixelation, screen refresh issues but I’ve experienced NONE of that. When I full screen my VNC client I notice little difference than if I were at the console. Anyhow, I just thought I’d mention that the headless Linux VNC is much much better than the Windows polling VNC. Happy network computing!

Business Intelligence software, databases, and their supporting hardware are expensive. I mean really, really expensive (hundreds of thousands to millions of dollars). Many people working in the Business Intelligence/Data Warehousing fields have seen their “operational application” colleagues adopting open source solutions (Linux, JBoss, Eclipse, Apache, etc.) but have seen little attention paid to the software required to build and deliver Business Intelligence. That is beginning to change.

I’ve blogged about this before, specifically my experiences with downloading and testing Mondrian, an open source ROLAP server written in Java. It appears as if there is some gaining momentum and maturity of projects suitable for BI in the Open Source(OS) world. I’ve felt for some time that the open source community had not embraced BI in quite the same way they have other applications of technology. It is, in earnest, a technology stack to make bigger companies bigger and smart companies smarter. While these precepts aren’t in opposition of open source ideals, they aren’t what typically motivates communities of developers to band together to make software for free (ie, change the world, provide a framework used by 10,000 websites, etc.).

The state of open source BI was relatively slim not too long ago. There were a variety of possible toll sets one could use for ETL (Clover, Enhydra Octopus), some initial OLAP components (Mondrian, JPivot), some portal frameworks for dashboards (JetSpeed, JBoss Portal), and some databases with maturity for DW situations with smaller volumes (MySQL, Postgres). Things have been heating up this past year, and we should review whats going on in the Open Source BI realm. The lead is buried, make sure you check out Pentaho at the bottom.

CA’s Open Source release of Ingres
Albeit a funny OSI approved license (there are many provisions which will scare away the OS purists, and make others at least think twice about including it in their products or service) Ingres is officially open source and free. Ingres has some pretty significant “enterprise” features including replication, partitioning, and “in the works” linux clustering (a la RAC). This is great news because Ingres is a rather mature database and is better suited for large DW volumes than MySQL and PostGres. It is noticeably (and perhaps critically) lacking the vibrant community required to increase uptake. At this point it feels like CA is still the only one “interested” in Ingres. This might change, but I believe the funny CATOSL has hindered acceptance from open source communities.

Netezza/DATAllegro are using open source
These two providers of DW appliances are using open source databases as part of their solution. It’s a mixed technology stack, which means that unless you purchase the appliances you will benefit from none of the work that these two companies have put into their implementations. One uses Postgres, the other uses Ingres. There must be quite a bit of technology surrounding it to make it actually work for corporate DW environments. Netezza is actually doing rather well I believe, and some of the bigger vendors are starting to “see them on the radar” as a player in the space.

GreenPlum (aka Metapa) takes another shot
When Metapa wasn’t getting the traction with marketing their inexpensive proprietary Clustered DB implementation they figured they needed something to get more traction. Open Source is powerful enough that even a few years into the hype it still attracts attention. They relaunched themselves as an Open Source solution and are sponsoring the BizGres project (a few extensions to PostGres that are useful for BI environments) along with allowing the single instance version of their product to be used for free. I don’t think they’ll get the OS community embrace they desire because people are discerning these days; the only interesting work GreenPlum is doing is related to their MPP and shared nothing clustering technology which is very much NOT open source. I don’t think they’ll get the OS thrust they expected, because they are only opening their kimono an inch, not even a halfway mark.

Mondrian/JPivot releases
These two projects underwent new releases this year that provided the most visible part of an open source DW/BI system their legs. While not comparable to commercial OLAP interfaces they are certainly suited for ISV/Developers to embed in their application. These are great components for including in a project, and if your report consumers don’t really care to write their own reports (a la graphical report builder) and just want to pivot and page this could be an excellent, inexpensive solution.

BIRT and JasperReports are actually pretty good
Two commercially backed (one by Actuate, the other by JasperSoft) projects that are building the basis for business quality reports. Don’t turn off your Crystal installation yet because these both have a way to go, but they’re improving at a steady pace.

Pentaho Nation
This is truly the most exciting thing I’ve found in the Open Source BI space, and they’ve just begun their work so I’m running on faith at this point. Industry veterans who are passionate about BI and open source have pooled their minds and money (they’ve made $$ from previous entrepreneurial activities) to build a pure, 100% open source distribution for BI. They are collecting various open source projects, building their own components and releasing the whole thing as open source. A partial list of the projects they are planning (no official distro yet): Mondrian OLAP server, JPivot, Firebird RDBMS, Enhrydra ETL, Shark and JaWE, JBoss, Hibernate, JBoss Portal, Weka Data Mining, Eclipse, BIRT, JOSSO, Mozilla Rhino.
The company will follow in RedHat footsteps and make money on support, training, and consulting. Their plans are ambitious, but they are focused on assembling and configuring all these disparate projects into a comprehensive platform that will be at least comparable to the “big boys” at Hyperion, Cognos, Microstrategy, etc.
They are engaging the community, clearly understand the need in the space, and are committed to the ideals of getting paid for solutions instead of software. They are certainly strong in the presentation, dashboard, BPM/workflow, OLAP end of the spectrum but don’t appear to be including much in the ETL/DW end (there is some, but it appears to be for data movement and loading as opposed to building a DW). I’m not sure if it’s strategic or not, but it might makes sense. Most people adopting an open source BI platform for their reporting users will feel comfortable rolling their own ETL/DW for the backroom. It should also be noted that they haven’t made any releases yet, so what we’re seeing is all conceptual now but they’ll be rolling something out sometime in 2005. It appears as if the founders have a track record of “doing what they say they’ll do.”

What does this all mean?
There are three things that will happen as the Open Source and BI worlds start dating.

Hardly anything for your current BI project and technologies. It is still emerging and is just now being utilized by early adopters.

Cost pressure on the “big boys” will occur as the maturity of these components provide at least comparable options. Currently the small number of vendors along with their constantly increasing prices will show up as an area to be trimmed (ironic enough probably in a financial report provided inside the software in question). I don’t believe that it will have a significant impact, but will have a small impact over the next 3-5 years. It will also affect prices of BI OEM and inclusion of BI capabilities in vertical applications (more BI in existing products).

Increased adoption of BI at small and mid sized business who can now afford to enter into the BI space. Previously inhibited by the exorbitant software costs business can now spend a few thousand dollars to start their foray into BI.

Netflix has been using the old Amazon trick for a while now. You know, people who liked book X also like this rubber spatula. Infamous and lucrative…

Netflix has taken this to a whole new level. They’ve mined the suggestion data, cross referenced it with my friends data, and provided me recommendations based on what my friends liked and I like. As if that wasn’t hip enough, they’ve turned this mined information into an interactive experience:

Two things that keep people engaged in your site: relevant content (mined) and interactive media (participation). Netflix hit this one out of the park, in my opinion!

I look around my office I see a few rack servers, desktops, and a couple of laptops… Mostly Dell, some offbrand, and my Gateway 200ARC notebook. I am 100% certain that the notebook will be the last piece of revenue I ever contribute to Gateway.

I am, apparently, not the only one who feels this way as Gateway has been losing money (and revenue) the past three years. Based on my latest experience with them this is not only likely, but desirable. Gateway product support and customer support was the worst I’ve received of ANY software or hardware vendor, ever (I’ve worked in IT since 1995, so that might tell you something). No company that can perform as poorly as Gateway did with me should thrive in economies of healthy competition. If Gateway makes it, Adam Smith will roll over in his grave and demonstrate the “Doh” made famous by one Homer Simpson.

If readers aren’t interested in descriptions of poor reasoning, business sense, baffling company policies and rude customers service agents than stop now! If you are curious how a well known brand can make a series of mistakes that will cost them a customer (and perhaps many more of warnings heeded from this blog), read on.

It’s quite simple really; the 200ARC was manufactured with a defect that allows the hard drive to become disconnected. This means that after business trip, one needs to whip out a screw driver and reset the drive. No biggie, unless it happens when it’s sitting on your desk running XP. This happened to me last Friday and corrupted my XP operating system.

After four calls to Gateway support (all with relatively helpful CSRs that were doing well executing a series of procedures to troubleshoot) it was clear what had happened. Due to defect, the drive was partially corrupted but no physical damage was done. So my data was safe, which should be a good thing… unless you are talking to Gateway.

Since no physical corruption occurred the solution is to use the recovery disk. Most reading this are tech savvy, so you know what this does: completely wipes out your hard drive. Complete loss of data (which for me, meant rolling back to a full backup 2 weeks prior). That was not, acceptable, as I knew (from a quick linux CD boot) that “My Documents” was in tact and not corrupted.

So, I need a way to get going on the recovery, but to be able to recover the data files that have changed in the past two weeks. I need an identical drive that I can use to reinstall and I’ll recover the files from the other drive over the next couple of day.

Nope… No replacement drive. Their faulty product caused this and I am covered by their highest tier full coverage accidental super duper warranty program (cost an extra $200 USD at purchase). I can’t get Montie (badge CA358) to send me a drive. How about a loaner? Just send me a drive and I’ll send you mine (which you assure me is just fine) in 30 days? Nope, no way… Data Corruption is not Gateways problem, even if it was caused by our product (that was an actual quote).

So here I am with a major pain in the butt situation, and a $50 hard drive can mitigate a huge customer concern and some knucklehead thinks it’s more important to value a corporate guideline than a customer relationship.

That’s what sealed the deal: Gateway is clearly unaware their business is about more than parts, UPCs, and customer support procedures. Dell gets it. The newest Dell commercials, have nothing to do with the amount of RAM in their computers, it’s a guy calling Dell support to make sure Dell will value them as a customer and help them do the things that matter to them (email Aunt Maude and browse oprah.com).

I can’t stress enough to those reading, please heed a warning from a knowledgable purchaser of technology (notebooks, desktops and servers). Gateway doesn’t get it so buy only if you really want the hardware and that’s it!

There is a relatively happy ending, and I love the fact that it is these sweet words: Linux saved the day! 🙂 A simple download, iso burn of Trinity Linux I was able to “scp” all of my files to a server and then proceed through the process of wiping out my hard drive per Gateway procedures.

My parting advice is download this small bootable CD and burn it right now. It’s simple memory only boot, file system, network support is the convergence of exactly what one needs for recovery situations.