Why bothering downloading data to Excel when you have pivot tables, conditional formatting and a huge variety of graphs with drilling/action capabilities all in OBIEE. A platform where analysis can be shared by passing a single URL instead of emailing huge XLS files?

Sometimes however there is a good reason to export to excel, like when preparing a presentation on top of OBIEE data/analysis. The following are the possible ways of achieving the OBIEE/Excel integration:

Dashboard and Analysis can be exported to excel with a single click

A BI Publisher version of a dashboard can be created and used by default when exporting

Excel can be linked via Smartview to a single Analysis: Data and Visualisations can be downloaded and refreshed upon request with configurable parameters.

Excel can directly query the BiServer Subject Areas via Smartview.

Excel version of Dashboard and Analysis can be delivered by email via Agents.

As you may already know by now, it’s OTN Appreciation Day! The idea was thought up by Oracle ACE Director Tim Hall to give thanks to the Oracle Technical Network that we all love and use on a daily basis (see #ThanksOTN on Twitter).

The focus for these blog posts is not only to give thanks to OTN, but also to generate some interesting conversation around different features of various Oracle products. One of my favorite features of Oracle Data Integrator 12c is its flexibility. This isn’t necessarily a single feature of the product, but more of an overall assessment of ODI as a whole. Let me breakdown a few of the features that make ODI flexible and easy to adapt. Note: I actually mentioned a couple of these in my previous blog post, “Oracle Data Integrator 12c: Getting Started - What is ODI?”, but why not point them out again?

Knowledge Modules

The typical main goal of Oracle Data Integrator is to develop mappings that use on or more source datastores to load one or more target datastores. These mappings can have many different components that join, filter, aggregate or transform the data before it reaches its final destination. Logically, this is completed in ODI by dragging lines between boxes and configuring properties for each component.

The physical implementation of a mapping is set to use one or more Knowledge Modules to generate the code or objects required for specific loading and integration types. These KMs are code templates that use the ODI Substitution API to fill in the mapping metadata based on the logical implementation. The great part is that these KMs can be easily customized and modified to fit your data integration needs! Your environment doesn’t have to conform to the tool, you get to make the tool conform to your environment.

"But wait!", says the current ODI developers. “We can’t modify Component Knowledge Modules.” Yes, that’s a true statement. The current ODI 12c version has two types of KMs: Template and Component. The former have been in the product since inception and are easily customizable while the latter are more of a black box. But, as announced at Oracle OpenWorld last month, a brand new, highly extensible Knowledge Module framework, and the ability to edit all KMs, is coming soon!

Procedures

If Knowledge Module customization doesn’t give you the flexibility necessary to perform specific actions in your data integration project, we also have ODI Procedures. These objects are used to execute a specific set of code for any sort of task. With so many different technologies accessible via ODI, you can write nearly any bit of code to execute. Procedures are often used for exception handling, file processing, and all types of scripting via Jython or Groovy. These objects just add to the ultimate flexibility of Oracle Data Integrator.

Java API

Last, but definitely not least, we have the ODI SDK Java API. Using the SDK, we can perform nearly every action that can be completed via ODI Studio. That’s a huge amount of flexibility! Now if there is a batch creation of objects or change necessary, I can write a few lines of Groovy code, execute the script, and it’s all completed in seconds rather than days of manual work. Take a look at a couple of examples I’ve posted in the past, adding columns to a set of ODI Datastores and creating Interfaces (in ODI 11g) based on a SQL query. Any chance I get I try to use the ODI SDK to make my development life just that much easier.

There you have it, my favorite feature of Oracle Data Integrator 12c - flexibility. I hope you all have a great OTN Appreciation Day and thanks for reading! And of course, #ThanksOTN!

I've just attended my first user group in Poland, the very-excellent POUG. This was the first international version of the conference, having been run previously just within Poland. Based on these two days, I would say it was a resounding success! There was a great atmosphere there, really engaged and enthusiastic. The size of the event, friendliness, and fun made it a very welcoming one, and reminded me a lot of my first ever conference that I attended (back in 2010!), the Rittman Mead BI Forum.

I travelled out to Warsaw on the Thursday, and attended the speakers dinner. It's always nice to see familiar faces and meet new ones - as well as enjoy some jolly good food. The next morning I went for a run at the nearby park, enjoying the sunrise over the water

The conference had two tracks, focussing primarily on the database but with some BI content too, as well a talk about property graphs. The first session I attended was also one of my favourites. It was "DBA, Heal Thyself: Five Diseases of IT Organizations and How to Cure Them", presented with great humour and energy by Jim Czuprynski. Some of the uncomfortable truths about mistakes made in the field of IT were laid bare, with no prisoners taken! Whilst it was based on the database, much of it was applicable to OBIEE. Things like reliance on bespoke scripts, avoidance of new features, and general ignorance in the field were covered. You can find an article on the topic from Jim here.

After Jim's session was Hans Viehmann talking about Property Graphs. I wrote an article that was published on OTN just last week about this and it was great to get chance to meet Hans, and hear him speak.

I skipped the next session to do a bit of slide polishing and geek chat with Christian Berg, before getting lunch - which was really good:

The day finished with the POUG After Party, which was at a bar in the center of Warsaw. Good beer, good food, good music - and plenty of geek talk! I really have to take my hat off to the organisers of POUG, they did a great job.

The second day of POUG brought more good presentations. I got to see Neil Chandler speak, about execution plans and how they can vary - and how to ensure they don't unless you want them too. It was really interesting, and took me back a few years to when I last looked in-depth at this kind of stuff. After Neil was Jim again, talking about analytic functions. Most of these I knew about, but one that was new to me (and I'll definitely be using) was the PERCENT syntax for FETCH FIRST - very neat.

The audience at POUG seemed to be predominantly DBAs and database developers, and this kind of talk is just great for spreading awareness of new functionality that is going to make people's jobs easier, and their code run faster. The final talk of the morning was from Martin Widlake, presenting a great discussion about efficient and maintainable bulk processing with SQL and PL/SQL. With his very accessible and engaging presentation style, Martin's talk was an extremely pragmatic and valuable one. Everyone loves a deep-dive geekout on system internals (don't they??), but arguably the most value to the widest section of the audience comes in learning, or being reminded of, how to code and design systems well.

Even without lots of BI content, I found the conference very useful. Whilst Oracle CBO internals may not be my day to day work, many of the topics discussed in a database context can easily be transplanted to the BI world. Performance is performance. Diagnostic approaches are tool-agnostic. As well as the presentations, the opportunity to exchange ideas and war-stories with other experts in the industry (over a beer, usually…) is the kind of thing you just don't get from reading the manuals or a bunch of PDFs.

So that was POUG, and all too soon time return home. Bravo to the organisers of POUG, and here's hoping they run the conference again next year!

There is something about sharing stories with others about the technology you love that gets the blood flowing. I have written blog posts and articles, presented sessions at conferences, and recorded podcasts and tech tips about Oracle Data Integrator over the years. And it’s been a blast! But one thing I find is that I’m always drawn to sharing the difficult or overly technical aspects of ODI, often writing about integration with GoldenGate or strange quirks in the product that I’ve had to overcome for my clients. In a sense, that’s why you’re reading, right? The Rittman Mead blog has always been about solving the difficult problems with Oracle technologies and sharing that knowledge in a simple and easy to understand format. But what about a true “getting started” type of article?

I think it’s time for a refresher on Oracle Data Integrator. What is ODI? How has it evolved over the years and where is it going? And, of course, how do you get started with Oracle Data Integrator? I plan to share what I love about ODI and hope to get some folks interested in this great product as well. If you want to learn even more, I’d love to have you join me or one of my colleagues at one of our ODI 12c bootcamps. We’ll dig deep into the details and answer all of your questions about data integration. But until then, let’s get started with a little about where Oracle Data Integrator came from.

History of ODI

Oracle Data Integrator was the product of an acquisition that Oracle made in late 2006. Back then, Oracle already had a data integration tool for developing ETL (Extract, Transform, and Load) mappings, Oracle Warehouse Builder (OWB). At the time, OWB was delivered with the Oracle Database license, which made it the go-to data warehouse development tool for Oracle developers. In October of 2006, Oracle announced that they had acquired a French data integration company called Sunopsis who focused their product on E-LT (Extract, Load, Transform) rather than the traditional ETL.

What did that mean for OWB? The “standard” (and free with the Oracle DB) ETL development platform for Oracle was now in competition with a new product, Oracle Data Integrator. ODI was a separate license cost and resided in the Fusion Middleware stack, outside of the database. As you might imagine, OWB’s days were numbered. With the possibility to sell additional software licenses and the uprising of Fusion Middleware and Fusion Applications (of which ODI was a big part of), eventually Warehouse Builder was to be merged with Data Integrator.

The data integration product from Sunopsis became Oracle Data Integrator 10g after the acquisition. In August 2010, Oracle made their first updates to ODI with the release of Oracle Data Integrator 11g. This version moved ODI into the framework utilized by other products such as Oracle SQL Developer and JDeveloper, and introduced new features such as the JEE Agent and the ODI Console. While the 11g version was a step up from ODI 10g, it was still not widely regarded as a typical ETL development tool. The concept of Interfaces being a single unit of work for ETL versus the usual flow-based mapping approach found in most ETL tools, including OWB, led to a slower adoption rate. It took several years, but finally Oracle Data Integrator went flow-based with the release of ODI 12c, integrating some of the best features from Oracle Warehouse Builder into the current ODI product. In late 2013, the initial version of ODI 12c was made available to the public. Besides the switch to flow-based mappings, ODI 12c has also included integration with big data sources and targets, lifecycle management capabilities, many performance enhancements, and a migration utility for those moving from OWB.

Now that we’re caught up with that brief history lesson, let’s look at how ODI is able to differentiate itself from competitors.

What Makes ODI Different?

E-LT
I alluded to Oracle Data Integrator’s use of ELT (Extract, Load, and Transform) earlier, but what does that mean - and why are the letters in the acronym ETL out of order? The main difference is in the architecture.

Oracle Data Integrator is built to pushdown the transformation work to the source or target datasource. This means there is no need for a middle tier ETL engine to perform transformations, as many of the traditional ETL tools employ. In fact, in most implementations the ODI agent, which performs orchestration of the ETL processes, simply sends the code to the target server to be executed. This architecture allows ODI to use the power of the target datasource to execute the transformations. Why waste the processing power of your Oracle Database when it’s built for this type of SQL execution?

Knowledge Modules
The concept behind Knowledge Modules (KMs) is quite simple. In a nutshell, KMs are generic code templates that are applied to an ODI mapping and use a substitution language to input metadata from within the mapping to produce executable code at runtime. If you want to change the physical implementation of your mapping, say switching from an insert-only “append” integration method to an incremental update approach, you can simply switch the Knowledge Module applied to that particular mapping. KMs can even be customized or created from scratch to suit your specific data integration needs, adding to the overall flexibility of the tool.

Customization and Flexibility
The Knowledge Modules are just one aspect of Oracle Data Integrator that can be customized. With KMs, you can change how your mappings are physically implemented allowing for the ultimate flexibility. But it’s not just mappings, an object called a Procedure in ODI will allow any bit of code or command line call to be made using nearly any type of technology: Groovy, Jython, OS commands, Oracle SQL, MySQL, SQL Server…the list goes on and on. The great part about customizing Oracle Data Integrator is that you can make it adapt to your data warehouse - and not the other way around. Too often companies must adopt a “standard” because it is built into the software they use rather than a good business practice. Thankfully, that’s not the case with ODI.

Beyond customization within the Oracle Data Integrator objects, you can also access the application backend via the ODI SDK Java API. With this level of access, you can perform almost every task that can be completed within ODI Studio. Imagine, you need to create 400 source to staging mappings, all one-for-one column mappings. With the ODI SDK, and less than 50 lines of code, you can create them all in about 5 seconds! The power of the SDK is generally found when there is a need for a batch creation or modification of objects. But it’s not only those cases where the SDK shines, you can also perform actions such as automation of code deployment or even create a development quickstart for your standard mappings, all with custom code building ODI objects.

As you might have noticed, I’m a big fan of Oracle Data Integrator 12c. This is just the first of many in the “Oracle Data Integrator 12c: Getting Started” series. Up next, we’ll really look at how to go about getting started with ODI 12c. As always, please feel free to send me an email or message on twitter - or comment below - if you have any questions. And if I missed your favorite feature of Oracle Data Integrator, please share in the comments!

This series of blog posts goes into great detail on the specifics of testing, and diagnosing, OBIEE performance. Topics such as how to define an "OBIEE report" are covered in depth, along with a discussion of the tools available to carry out performance testing of OBIEE and monitor the results. There's also a whole post on optimisation (frequently called 'tuning') and a hefty rant against some of the common anti-patterns that we see adopted when it comes to troubleshooting poor performance.

BI Server caching can be brilliant for providing the "icing on the cake" of a well-built system, giving that extra little performance boost. Where BI Server caching is a bad idea is when it is used to "paper over the cracks" of a fundamentally poorly-built OBIEE system. If the foundations are rotten, gold-plating the building on top is not going to do anything to improve things in the long run. If your OBIEE performance is bad, you need to get to the bottom of why and then rectify it. Using caching to mask it only "brushes the problem under the carpet", storing up the same problem or worse for the future as your system usage increases.

With that caveat made, here are two articles that explain how to best use BI Server caching:

This article has been published on Oracle Technical Network (OTN), and examines in detail what happens when there's sub-optimal design in the implementation of an OBIEE system. Disks fill up, IO bandwidth is exhausted, the server is overworked - and performance suffers.

The Rittman Mead Performance Analytics Service is the result of our deep understanding of the OBIEE stack, performance optimisation theory, and numerous client engagements. We understand why performance is so crucial to the success of an OBIEE project. We know how OBIEE works, how to diagnose performance issues -- and how to fix them!

Our Performance Analytics Service offers a freebaseline Performance Analytics Report for your current OBIEE system, using data to demonstrate where your performance issues lie and prioritising the areas in which to focus investigation. Building on this baseline report, we also offer a detailed Performance Analytics Report. This is prepared by one of our expert OBIEE performance consultants based on deep analysis and understanding of your specific OBIEE implementation and performance pain-points. Not only do we diagnose the specific issues, but we demonstrate the relative effort involved to rectify each one and the expected performance benefit that each will yield.

To provide powerful monitoring and diagnostics for the performance of your system we built the OBIEE Performance Analytics Dashboards. These give a full-stack view of the metrics that can impact performance of your OBIEE reports. This gives you the information needed to accurately diagnose issues when they arise, as well as perform historical analysis. Have a look at these videos to see exactly how the dashboards can help, and get in touch with us to request a demo!

Oracle has published white papers available through My Oracle Support that detail further options available within the OBIEE stack for tuning performance. In a blog article I do somewhat grumble about these papers and how they can be misused ("oooh here's a knob that I can fiddle with") if taken without proper context and diagnostics. In the right hands, however, they are useful reference for some of the more advanced configuration settings and their impact on performance.

See also page 13 of this document for good advice on working with exports to Excel from OBIEE: Doc ID 1558070.1

General Performance Resources

OBIEE Performance is nothing particularly fancy. The problems are the same as any other stack; just different components. The standard principles apply, and that is why I'd strongly recommend reading these articles as they give a fantastic background to the world of performance diagnostics and optimisation.

The new version of OBIEE 12c, 12.2.1.1 to be exact, is out, so let’s talk about it. It’s my intent that after reading this, you can expect some degree of comfort in regards to possibly doing this thing yourself (should you find yourself in just such a circumstance), but if not, feel free to drop us a line or give us a ring. It should be noted that Oracle documentation explicitly indicates that you’re going to need to upgrade to OBIEE version 12.2.1.0, which is to say you’re going to have to bring your 11g instance up to 12c before you can proceed with another upgrade. A colleague here at RM and I recently sat down to give the upgrade process (click there for the Oracle doc) a go on one of our hosted windows servers, and here’s the cut and dry of it. The examples throughout will be referencing both Linux and Windows, so choose how you’d like. Now, if you’ve gone through the 12c install process before, you’ll be plenty familiar with roughly 80% of the steps involved in the upgrade. Just to get this out of the way, no, it’s not a patch (in the sense that you’re actually going through the OBIEE patching process using OPatch). In fact, the process almost exactly mirrors a basic 12c install, with the addition of a few steps that I will make darn sure we cover in their entirety below. Speaking of which, I’m not going to do a play-by-play of the whole thing, but simply highlight those steps that are wholly unfamiliar. To provide some context, let’s go through the bullet points of what we’ll actually be doing during the upgrade.

First, we’ll make sure we have a server appropriate, supported version of java installed (8_77 is the lowest version) and that this guy corresponds to the JAVA_HOME you’ve got set up.

Next, we’ll be running the install for the WebLogic server into a NEW oracle home. That’s right, you heard me. A. new. oracle. home.

After that, we’ll be running a readiness check to make sure our OBIEE bits won’t run into any trouble during the actual upgrade process. This checks all OBIEE components, including those schemas you installed during the initial install process. Make sure to have your application database admin credentials on hand (we’ll talk about what you need below in more detail). The end of this step will actually have us upgrade all those pieces the readiness checker deems worthy of an upgrade.

Before we begin, however, let’s check off a few boxes on the ‘must have’ list.

Download all the files here, and make sure you get the right versions for whatever kind of server your version of OBIEE hangs out in. The java version will be 8_101 which will work out just fine even though the minimum needed is 8_77.

Get those database credentials! If you don’t know, drop everything and figure out how you’re going to access the application database within which the OBIEE 12c schemas were installed. You’ll need the user name/pass for the SYS user (or user with SYS privileges), and the database connection string as well, including the service name, host, and port.

Make sure you have enough disk space wherever you’re installing the upgrade. The downloads for the upgrade aren’t small. You should have at least 150GB, on a DEV box, say. You don’t want to have to manage allocating additional space at a time like this, especially if it involves putting in a ticket with IT (wink wink)!
Speaking of which, you’ll also need the server credentials for whichever user 12c was installed under. Note that you probably don’t need root if it was a linux machine, however there have been some instances where I’ve needed to have these handy, as there were some file permission issues that required root credentials and were causing errors during an install. You’ll also need the weblogic/obiee admin user (if you changed the name for some reason).

Lastly, make sure you’re at least a tad bit familiar with both the path to the oracle and to the domain home.

SETTING UP JAVA

After downloading the version of Java you need, go ahead update it via the .rpm or .exe, etc… Make sure to update any environment variables you have set up, and to update both the JAVA_HOME variable AND the PATH to reference the new Java location. As stated above, at the time of this blog, the version we used, and that is currently available, is 8_101. During the upgrade process, we got a warning (see below) about our version not being 8_77. If this happens to you, just click Next. Everything will be alright, promise.

A NEW HOME FOR ORACLE

Did you click the link to the upgrade doc yet? If not, do so now, as things are about to get kind of crazy. Follow along as we walk through the next steps if you’d like. First, stop services and disable the SSL like it tells you to. Then, start OBIEE services back up and then run the infrastructure jar (java -jar fmw_12.2.1.1.0_infrastructure.jar) for the WebLogic server install. Again, I’m not going to go pic by pic here as you can assume most everything resembles the initial 12c install process, and this part is no different. The one piece of this puzzle we need to focus on is establishing a new oracle home. After skipping those auto updates, move onto step 3 where we are, in fact, going to designate a new oracle home. You’ll see that, after completing the WebLogic install, we’ll have a bunch of updated feature sets, in addition to some new directories in our 12.2.1.1 oracle home. For example, if your original home is something like:

/u01/app/oracle/fmw

change it to:

when it asks you to enter a new one.

Breeze through the rest of the steps here, and remember to save that response file!

UPDATING OBIEE

Unzip both of the fmw_12.2.1.1.0_bi_linux64_Disk#_#of2.zip files, making sure that your OBIEE install files are in the same directory. For windows, this will be the executable from the first zip file, and the zip file from the second part of disk 1. Execute the binary (on linux) or .exe, going through the usual motions and then in step 3, enter the NEW oracle home for 12.2.1.1. In the example above, it would be:

/u01/app/oracle/fmw2

for Linux, and likewise, for Windows:

Again, there isn’t too much to note or trap you here beyond just making sure that you take special care not to enter your original oracle home, but the one you created in the previous section. Proceed through the next steps as usual and remember, save your response file!

UPDATING THE 12C SCHEMAS - USING THE READINESS CHECKER AND UPDATE ASSISTANT

Run the readiness checker from:

NEW_ORACLE_HOME/oracle_common/upgrade/bin/ua -readiness

This next series of steps will take you through all the schemas currently deployed on your application database and confirm that they won’t explode once you take them through the upgrade process. In step 2 of 6, make sure that you’re entering the port for EM/Console (9500 by default). Remember when I said you’re going to need the DB credentials you used to install 12c in the first place? Well here’s where we’re going to use them. The readiness checker will guide you through a bunch of screens that essentially confirms the credentials for each schema installed, and then presents a report detailing which of these will actually get upgraded. That is to say, there are some that won’t be. I really like this new utility as an extra vote of confidence for a process that can admittedly be oftentimes troublesome.

Once you’ve validated that those schemas ready for update, go ahead and stop OBI12c services using the EXISTING oracle home.

Pro tip: they’ve made it super easy to do this now by just pointing your bash_profile to the binaries directory in OBIEE’s bitools folder (ORACLE_HOME/user_projects/domains/bi/bitools/bin). After logging this entry in your profile, you can simply type start.sh or stop.sh to bring everything up or down, not to mention take advantage of the myriad other scripts that are in there. Don't type those paths out every time.

I digress… After the services come down, run the upgrade assistant from within the NEW oracle home, as below:

Citing the previous example:

NEW_ORACLE_HOME/oracle_common/upgrade/bin/ua

After bringing up the install dialogue box, move on to step 2, and select the All Schemas Used by a Domain option (as in the example above), unless of course you’d like to hand select which ones you’d like to upgrade. I suppose if you were thinking about scrapping one you had previously installed, then this would be a good option for you. Make sure the domain directory you specify is from your existing/old 12c instance, as below:

Move through the next series of steps, which are more or less self explanatory (no tricks here, promise), once again validating connection credentials until you get to step 12. As always, save the response file, select Upgrade, and then watch the magic happen,….hopefully. Congratulations, you’ve just updated your schemas!

WHO INVITED A WIZARD TO THE PARTY? - RECONFIGURING THE BI DOMAIN AND UPDATING THE BI CONFIGURATION

Like I said before, I won’t be covering every single step of this process i.e, doing the map viewer portion, which means you’ll have to still consult the…oracle, on some of this stuff. That being said, don’t gloss over backing up the map viewer file..you have to do it. This is simply an attempt to help make the upgrade process a little easier to swallow and hopefully make some of the more confusing steps a bit clearer. Moving on. Guess what? It’s time to run another series of dialogue boxes. Beats the heck out of scripting this stuff though, I guess. Open up the RECONFIGURATION WIZARD!!!!! as instructed in the documentation, from the location within your NEW oracle home. The first step will prompt us for the location of the domain we want to upgrade. We want to upgrade our existing 12c domain (the old one). So type that in/browse for it. Right now.

Validate your java version and location in step 3 and then click your way through the next few screens, ensuring that you’ve at least given your stamp of approval on any pre-filled or manually filled entries in each dialogue box. Leave step 7 alone and click Next to get to the screen where we’re actually going to be starting the reconfiguration process. Click through and exit the RECONFIGURATION WIZARD!!!!!

Don’t forget to restore the map viewer config file at this point, and then launch the configuration assistant again, this time selecting the All Configurations Used By a Domain option in step 2. Make sure you’ve entered the location of the existing 12c domain in this step as well, and NOT the one created under the new oracle home.

Click through the next steps, again, paying close attention to all prompts and the location for the map viewer xml file. Verify in step 7 that the directory locations referenced for both domain and oracle map viewer are for the existing locations and NOT those created by the install of the update.

WRAPPING UP AND NOTES

You can now boot up ssl (as below) and then start OBIEE services.

DOMAIN_HOME/bitools/bin/ssl.sh internalssl true

Note: if you have tnsadmin or ldap.ora, place copies under NEW_ORACLE_HOME/network/admin

You can ignore the new oracle home created at this time, as, in my opinion, we’re going to have to do something similar for any following updates
for 12c. What did you think of the upgrade process and did you run into any issues? Thanks so much for reading, and as always, if you find any inconsistencies or errors please let us hear about them!

I have been working with BI Applications since OBIA switched to ODI in version 11.1.1.7.0. I have installed and worked with all of the 11.x versions using several different sources. This most recent version of OBIA may only be a Patch-Set release, but it has some new features that make it very compelling; ETL automatic error handling, Health Check, ETL Diagnostics, and built in Email notifications.

If you've been using earlier version of OBIA 11.x (or 7.x for that matter), now may be the time to upgrade. In an Oracle Advisor's Webcast, Rajesh Lakra, BIApps Global Product Support Lead Engineer, said this version will be the upgrade path to the OBIA 12.x, which is slated for release sometime in 2017. Also, it may just be circumstances I've been exposed to, but this latest version seems more performant as well. Since I've not had an opportunity to do benchmark time testing against the same source data, I cannot verify yet. However, I am setting up some environments to do just that. Keep an eye out for a future blog post for performance comparisons.

Check if there are any other running load plans

Check Previous Load Plan Runs is a package that only has the following procedure:
Check State of Previous Load Plan Runs
1) Checks the status of Previous Load Plan Runs. Throws Exception if any other Load Plan is still in Running state.
2) Checks Blackout schedule period from w_etl_control_schedule table. If there is a Blackout schedule then LP would fail and comes out from the execution.

I’ve not found any documentation about this package or procedure. However, the procedure itself has three steps.

Check if Prev Load Plans are Running

a. SELECT COUNT(1) FROM SNP_LPI_RUN WHERE STATUS IN ('R','W') AND I_LP_INST!=(?)

b. >>>>>" + ctLPRunning + " Load plan(s) found to be running/waiting apart from the current one. ABORTING this Load Plan as running multiple Load Plans in Parallel can cause DATA CORRUPTION <<<<<<

Check Blackout schedule

a. select -1 from w_etl_control_schedule cs where sysdate between cs.control_start_dt and cs.control_end_dt and cs.control_code = 'NO_ETL'

b. >>>>> Blackout schedule coming up as seen in w_etl_control_schedule table, hence no Load plan(s) will be executed. ABORTING this Load Plan as there will be a down time to the repository due to the Blackout schedule <<<<<<

Check Source Upgrade

a. SELECT 1 FROM W_ETL_SRC_VERSION_HISTORY WHERE SRC_PLV_NEW IS NOT NULL AND SRC_UPGRADE_STATUS IN ('PENDING','ERROR','STARTED’)

I believe this has been a good feature to add because Oracle’s OBIA documentation has always recommended not running more than one load plan at a time. Now if there is a load plan running, automatically the load plan will stop and there will be a message warning you about data corruption if more than one load plan is running.

I think it is interesting to see this Blackout schedule and Source Upgrade. I’ve done some looking in the documentation and on support.oracle.com but didn’t come up with any clues to what exactly these are for. It seems to me like the Blackout schedule is a calendar of sorts for when the repository will be down for maintenance. As for the Source Upgrade, that is more of a mystery to me.

Health Check is a preliminary ETL phase in which a diagnostic report is generated to identify problematic source data that might cause ETL failure or data loss or corruption in the data warehouse. The report is downloaded from Configuration Manager and includes any problematic data, a description of detected issues, and actions to resolve health check failures.

Referring again to Oracle's OBIA ETL Guide, the Manage Load Plans page is where you can download the Health Check. Highlight the relevant load plan that has run and/or failed and click on Show Data Problems

Select the desired DIAGNOSTIC HEALTHCHECK and download the zip file to your computer.

This file gets generated through a number of different steps starting with the Execute Healthcheck package and followed by the Create Report and Summarize Healthcheck package.

Execute Healthcheck is a package that only has the following procedure:
Run Diagnostics
This procedure has 12 steps. The two that are the most interesting are: Define Execute Diagnostic Function and Run Diagnostics

The bulk of the code for the Diagnostics are in the step Define Execute Diagnostic Function. The code is Jython and it is really too long to paste here, but I highly recommend digging into how it works.

Then the step Run Diagnostics actually kicks off the process as shown here.

Once the diagnostics are run, there is a step in the load plan called Create Report and Summarize Healthcheck. This package has two procedures.

The first procedure, Create Diagnostic Report does exactly what you might guess. It does a lot of formatting and puts the diagnostic report into an html page.
The second procedure, Summarize Healthcheck Result, takes the information, inserts it into a table, then organizes it for an email to be sent out.

During the September 21st Oracle Advisor Webcast on BI Applications 11.1.1.10.2 New Release Overview, Rajesh Lakra, BIApps Global Product Support Lead Engineer, said the Health Check is most useful during a domain load. It checks to ensure source domain data is valid, collectively letting you know if these domains will cause issues further down in the ETL process.

In Part 2 of OBIA 11.1.1.10.2 New Features, I will show you what the html files look like, how to setup the email process, and how the automatic error handling is now working within many of the knowledge modules. Tune in next time for more exciting parts of the new features in this version of BI Applications!

In my last post, I described the basics around connecting to social media data sources using Python. This is in aid of collating information for the Lightyear Foundation, a charity specialising in science education in the UK and Ghana. In this blog, I will demonstrate how to use those integrations to record data into a PostgreSQL database. I've chosen this database in so as to keep everything free and open source.

One of the downsides with some of the publicly available APIs is that they only store a certain amount of history. These makes analysing long term trends a bit more difficult. By recording the data periodically, in an ETL-like fashion, we can circumvent this. Given the vast amount of data available for social media it would not be appropriate to store all of the data in a relational database. Instead, we can store summary statistics for long term trend analysis.

All of the code used in this blog is on this public Git repository. You are welcome to download this and poke around as you follow the blog. Note that there is configuration in config.ini and a keys/google-api.json file required in order to get these working yourself.

Database Integration

The Python module psycopg2 was used to execute transactions against the PostgreSQL database with Python. The script lyf/psql.sql contains a host of functions I wrote to perform useful database and ETL actions.

psql.truncate : Truncates a table.

psql.insert : Inserts a single row to a table.

psql.upsert : Updates a row to a table if a matching key is found, else inserts the row.

psql.update : Updates values in a table based on matching keys.

psql.delete : Deletes rows from a table based on a WHERE clause.

psql.lookup : Updates columns based on a lookup to another table. Useful for assigning foreign keys from natural keys.

psql.load_csv : Loads data from a CSV file into a table, mapping header names in the file to columns in the table.

psql.query : Executes a SQL statement and returns the data.

Google Analytics

For Google Analytics (GA), there is availability of many dimensions and metrics but I have chosen merely a useful subset to put into a star schema.

Dimensions

For efficiency, I wrote a method to dynamically load dimension tables from a tsv file.

The column table refers to the table name in the PostgreSQL schema, which are created manually before executing the script. ga_dims is a comma separated list of GA dimension codes. Note that this is an upper limit of 7 dimensions allowed for a GA API query. columns is another list of equal length describing the mapping to the columns in the database table. Finally keys is a comma separated list of which columns are unique natural keys for the database table. This is so the script can perform updates as well as inserts.

This input file is parsed by load_ga_dims.py which executes a Google API query for each dimension and upserts the records into the database. Essentially, the ETL for this process is defined quite simply by the TSV file and can be changed with relative ease. It doesn't have the scope, flexibility or scale as ODI or Informatica, but for a quick, free version it will suffice.

Fact

The fact table is much simpler to load as we just need to execute a single GA query which specifies only the dimension natural keys and the metrics. The script load_ga_fact.py will go and fetch the dimension warehouse key (always integer) as well as any metrics specified in the file. I've opted for:

Sessions

Bounces

Bounce Rate

Average Session Duration

Session Duration

Page Views

Time on Page

Finally there is an integer date ID and some denormalised attributes: Longitude and Latitude. With Google its easy to retrieve a "full load" as the API allows querying of history as well as incremental loads which filtered for today's date. Below is an example query returning the total session time and number of sessions by continent for Jan 2016:

The extract I wrote for Facebook was much simpler, using date as the sole dimension. If required, dimension tables could be created for posts and videos or other similar data, but in this case we just want to look at trends over time. The measures selected are:

Total Likes

Total Posts

Total Videos

Total Video Likes

New Posts

New Videos

Impressions

Reach

Engaged Users

New Likes

New Unlikes

Video Views

Page Views

Post Likes

There's no way of extracting history data from Facebook's API, so we need to load data regularly using f_facebook_daily.py. Below is a time series query of likes and reach for a few days:

The Twitter API is not as rich as the Google or Facebook APIs and is limited to only 15 queries an hour. For my ETL scripts this shouldn't matter, but once again no history is available. The script for this load is f_twitter_daily.py. The measures we've got are:

The data for videos and views over time are stored in a flatten fact table. Given the small number of attributes for videos and the lack of interconnectivity with the other information, I just denormalised on them onto the fact table. For measures we have:

Total Views

Views

Total Likes

Likes

Total Dislikes

Dislikes

Once again this is loaded incrementally, storing records for each video for each day. The incremental loading script is f_youtube_daily.py. The below query shows how many views the videos got by month:

The MailChimp API allows extraction of information about subscriber lists and campaigns. f_mc_lists_daily.py will load in today's fact data as well as updating the subscriber list dimension. The measures include:

Members

Unsubscribed

Cleaned

Total Members

Total Unsubscribed

Total Cleaned

Total Campaigns

Open Rate

Average Subscribe Rate

Below is an example query showing a snapshot of the subscriber list data as of 9th September 2016.

Now we've got all these different facts, we can write BI style summary queries to compare all of the data, or provide regular reports. This is a query across multiple fact tables conforming on the date dimension. It shows summary statistics for each fact per month.

So there we have it, a open-sourced ETL package for a handful of social media sites that will track history. The final tables can be used as a source for any reporting tools you want, and the data can all be tied together by date. This is particularly useful for showing summary trends over time. However, I don't store even a fraction of the amount of data that we can get in real time from the APIs. The next blog entry will show how to use open source technologies to visualise this data as well as the real time information.

The big show is upon us and once again Rittman Mead will be actively participating at Oracle OpenWorld 2016. Oracle's premier conference runs September 18–22 at the Moscone Center in beautiful San Francisco, California. I'm looking forward to another great conference, as well as attending my first ACE Director briefing, where Oracle ACE Directors get to hear future plans on the direction of Oracle products and can provide feedback to product teams (and even ask questions of Thomas Kurian), on the Thursday/Friday prior to the event.

I'm fortunate enough to have been selected by the ODTUG to present a user group forum session on Sunday, September 18th, the first official day of Oracle OpenWorld 2016. I'll be sharing an updated version of my GoldenGate and Kafka integration talk, Oracle GoldenGate and Apache Kafka: A Deep Dive into Real-Time Data Streaming (Session UGF6161), previously presented at Collaborate and KScope16. The focus will be on how to use the new Oracle GoldenGate for Big Data release 12.2 to replicate database transactions to Apache Kafka. The topic of this session, real-time data streaming, provides a great solution for the challenging task of ingesting data from multiple different sources, and different structures, and making that data readily available for others in the organization to use. By combining GoldenGate, Oracle’s premier data replication technology, and Apache Kafka, the latest open-source streaming and messaging system for big data, we can implement a fast, durable, and scalable solution. Stop by Moscone South 301 on Sunday at 10:30 a.m. to check it out!

On Tuesday morning, Mark Rittman will join Oracle Data Integrator Product Manager Julien Testut and Oracle A-Team Director Christophe Dupupet to discuss Oracle Data Integration and Big Data in their session titled Oracle Data Integration Platform: A Cornerstone for Big Data (Session CON6624). They'll take a look at how the Oracle Data Integration suite of products (Oracle Data Integrator, Oracle GoldenGate, Oracle Metadata Management, and Oracle Big Data Preparation Cloud Service) can help to avoid the complexity often found in big data initiatives and harness the power of the big data technologies. Join them in Moscone West 2022 on Tuesday, September 20th, at 11 a.m. to learn more!

With many successful Oracle Data Integration and Big Data engagements completed throughout the world, the experts at Rittman Mead continue to lead in the Oracle Big Data space. If you'd like to get together to discuss any of the above topics we're presenting, or anything big data, data integration or BI related, feel free to reach out via email (info@rittmanmead.com), Twitter (@mRainey) or drop a note in the comments below. Hope to see you there!

Oracle GoldenGate for Big Data (OGG BD) supports sending transactions as messages to Kafka topics, both through the native Oracle handler as well as a connector into Confluent's Kafka Connect. In some research that I was doing with it I found an interesting problem that I am going to demonstrate here and show the troubleshooting tools that may be useful to others encountering similar issues.

The source for the data is Swingbench running against Oracle 12c database (pluggable instance). OGG has been configured as follows:

So from this we've got the table (ORCL.SOE.CUSTOMERS), log offset (6636), and from the stack trace even a hint at what the issue may be (something to do with the Schema, and a column called SYS_NC00017$).

Now let's see if we can find out more. A colleague of mine pointed me towards Logdump, which is well documented and also covered by Oracle's A-Team blog here.

Launch logdump from the OGG BD folder, ideally using rlwrap so that you can scroll and search through command history:

But ploughing through the file a transaction at a time is no fun, so lets zero-in on the problem record. We can either just jump straight to the transaction offset that we got from the error log using POSITION (or POS) followed by NEXT:

or we can also use the FILTER command, but we'll come back to that in a moment. First let's have a look at the record in question that's causing the Kafka handler to abend. It's shown in full above.

The table name matches - ORCL.SOE.CUSTOMERS, and we can see that the operation was an INSERT along with the values for sixteen columns. Now, since we know that the error thrown by the Kafka handler was something to do with schema and columns, let's take a step back. The record we're looking at is the actual data record, but in the trail file will also be metadata about the table itself which will have been read by the handler. We can look for all records in the trail file relating to this table using the FILTER command (preceeded by a POS 0 to move the read back to the beginning of the file):

I spy with my little eye ... SYS_NC00017$, which was named in the debug log that we saw above. Also note:

Columns: 18

So the OGG metadata for the table shows it with eighteen columns, including two SYS_[...]. If you look at the data shown in the record at position 6636 above you'll see that there are only sixteen columns of data. Let's now check out the schema for the table in question in Oracle.

So we have a function-based index, which in the background appears to implement itself via two hidden columns. My guess is that the Kafka handler code is taking the metadata definition record of 18 columns too literally, and expecting to find a value for it in the transaction record when it reads it and falls over when it can't. Similar behaviour happens with the Kafka Connect OGG connector when it tries to process this particular record:

I'm in the fortunate position of being in a sandbox environment in which I can modify the source schema to suit my needs - so I just dropped the function-based index. In reality this evidently would not be a good approach on the assumption that the index was there for a good reason!

DROP INDEX "SOE"."CUST_FUNC_LOWER_NAME_IX";

Having run this, we still have the question of how to get the replicat working. To do this we could go the whole-hog and drop and recreate the extracts; or, we can get the replicat to skip the section of the trail file with the records in that we can't process. Assuming you've run the above DROP and then written more data to the table, there'll be a second metadata record in the OGG trail file. We can use the FILTER command to find this:

Here's the new table metadata, for sixten columns only and minus the SYS_[...] columns. Its position as shown in the record above is RBA 14702330. To get the commit sequence number (CSN), which we can use to restart the replicat, we need to enable the display of OGG-generated data in the records (ref):

GGSTOKEN ON
GGSTOKEN DETAIL

The Metadata record itself doesn't have a CSN, so disable the filtering

Per Handling Other Database Properties, virtual columns can be handled by using the TABLEFETCHCOLS configuration on the extract to read the virtual values and MAP of the replicat to map them to actual columns on the target. Unfortunately, the system-generated column name isn't accepted by OGG in the FETCHCOLS syntax:

Another tack to try, given that in our case we simply want to make sure the virtual columns don't get picked up at all - is to try and ignore the column altogether. Unfortunately from my experimentation with COLSEXCEPT it appears that OGG excludes specified columns from record data, but not the initial metadata (which is what causes the above problems in the first place). Even if this had worked, COLSEXCEPT doesn't like the system-generated column name, abending the Extract process with:

Oracle GoldenGate is a fantastic way to stream changes from many different RDBMS to a variety of targets, including Kafka. The potential that this offers in terms of data integration and pipelines is great. This post has hopefully shed a little bit of light on how to go about troubleshooting issues that can occur when using this set of tools. Do let me know in the comments below if you have better suggestions for how to deal with the virtual columns created as a result of the function-based index!

We don’t often get a chance to share our customer success stories due to various reasons, such as required anonymity of the client or certain other legalities. But when we do, it sure feels great! Recently, Eurocontrol, a longtime Rittman Mead client in Brussels, Belgium, performed an upgrade of Oracle Data Integrator from 11g to 12c. With the help of Rittman Mead, and specifically data integration expert Jérôme Françoisse, Eurocontrol was able to perform the upgrade successfully without any downtime. The upgrade process, and outcome, caught the attention of the Oracle Data Integration product management team, prompting the case study titled “Eurocontrol Improves ETL Development Time-to-Market After Upgrading to Oracle Data Integrator 12c” to be written and shared with the world.

Eurocontrol is the European organization for the safety of air navigation, handling management of the air traffic network, route charging, and working with other organizations to build a Single European Sky. Due to the critical air traffic demand data flowing through their ETL system, the upgrade had to be completed quickly, efficiently, and with minimal downtime. With Rittman Mead’s Oracle Data Integrator expertise, there was no issue exceeding these expectations. Beyond performing the upgrade, Jérôme and team also modified Groovy scripts, custom Knowledge Modules and other objects to work with the updated ODI SDK, and helped perform a minor cleanup of the mapping objects.

Eurocontrol can now take full advantage of the latest and greatest features that Oracle Data Integrator 12c has to offer. From flow-based mappings and component KMs to deployment specifications and reusable mappings, ODI 12c is being used in full-force. Even more impressive, Eurocontrol has not had any unplanned ODI agent downtime since the upgrade!

If your company is planning an upgrade of Oracle Data Integrator 11g to the latest version ODI 12c, as Eurocontrol did, go ahead and give Rittman Mead a shout at info@rittmanmead.com and let’s see how we can help you along the way.

It's easy to talk about a technology using only jargon. It's much harder to talk about a technology without using jargon. I have seen many meetings between business and IT break down because of this communication barrier. I find it more discouraging when I see this communication breakdown happen between advanced IT staff and new IT staff. For those of us in any technological field, it's easy to forget how long it took to learn all of the ins and outs, the terminology and jargon.

During a recent project, I had another consultant shadowing me to get experience with OBIA. (Hi, Julia!) I was 'lettering' a lot so I decided it was time to diagram my jargon. My scribbles on a whiteboard gave me the idea that it might be helpful to do a bit of connecting the dots between OBIA and data warehousing jargon and naming conventions used in OBIA.

BI Applications Load Plan phases:
SDE - Source Dependent Extract

SDE is the first phase in the ETL process that loads source data into the staging area. SDE tasks are source database specific. SDE mappings that run in the load plan will load staging tables. These tables end with _DS and _FS among others.

SIL - Source Independent Load

SIL is the second phase in the ETL process that takes the staged data from the staging tables and loads or transforms them into the target tables. SILOS mappings that run in the load plan will load dimension and fact tables. These tables end with _D and _F among others.

PLP - Post Load Process

This third and final phase in the ETL process occurs after the target tables have been loaded and is commonly used for loading aggregate fact tables. PLP mappings that run in the load plan will load aggregate tables ending with _A. Aggregate tables are often fact table data that has been summed up by a common dimension. For example, a common report might look at finance data by the month. Using the aggregate tables by fiscal period would help improve reporting response time.

Since the SDE tasks are source database specific, the SDE mappings' names also include an acronym for the source system in the mapping name. Below are the supported source database systems and the acronyms used in the names and an example for each.

Oracle E-Business Suite - ORA

SDE_ORA_DomainGeneral_Currency

Oracle Siebel - SBL

SDE_SBL_DOMAINGENERAL_CURRENCY

JD Edwards Enterprise One - JDEE

SDE_JDE_DomainGeneral_Currency

PeopleSoft - PSFT

SDE_PSFT_DomainGeneral_Currency_FINSCM

Oracle Fusion Applications - FUSION

SDE_FUSION_DomainGeneral_Currency

Taleo - TLO

SDE_TLO_DomainGeneral_Country

Oracle Service Cloud - RNCX

SDE_RNCX_DomainGeneral

Universal - Universal

SDE_Universal_DomainGeneral

This wraps up our quick "Back-to-Beginnings" refresher on naming conventions and the jargon used in relation to ETL and mappings. Let me know in the comments below if there are other topics you would like me to cover in my "Back-to-Beginnings" series. As always, be sure to check out our available training, which now includes remote training options, and our On Demand Training Beta Program. For my next post I'll be covering two new features in OBIA 11.1.1.10.2, Health Check and ETL Diagnostics, which are the missing pieces you didn't know you've been waiting for.

There's also kind of a fourth option, which is JDBC, as explained by Fiston in this excellent post. It's OBIEE 12c only and not fully documented/supported.

In this article we're going to dig into the third option (native ODBC) and look at how it can be used, and also how to troubleshoot it (and the OBIEE stack in general) on Linux.

ODBC (Open Database Connectivity) is a documented API designed to enable applications to work with databases, without one explicitly supporting the other. A comparable protocol is JDBC, which is also widely used (but as yet not fully supported/documented within OBIEE). To use native ODBC drivers with OBIEE on *nix, you install the driver on your OBIEE server and then configure OBIEE to use it. Mark Rittman wrote an example of how to do this with the original Hive ODBC drivers here (before they were formally bundled with OBIEE), and the manual shows how to use it with a native driver for Teradata.

My interest in this is the Apache Drill tool, which enables querying with SQL against a multitude of datasources, including things like JSON files, Hive tables, RDBMS, and more. MapR have released an ODBC driver for it, and I thought it'd be neat to get Apache Drill to work with OBIEE. It evidently does work for OBIEE running on Windows, but I wanted to be able to use it on my standard environment, Linux. This blog post is a warts-and-all exploration of the process I went through to get it to work, since I thought it might be of interest to see some of the forensic methods available when trying to get things to work. For just the headlines, see Using Apache Drill with OBIEE 12c.

First Things First - Setting Up Apache Drill

Drill can be deployed in distributed configuration (with all the parallel processing goodness which that brings), but also run as a single instance locally. For the sake of simplicity that's what I'm going to do here. It's rather easy to do:

You need to make sure you've got a recent JDK available, and if you're running it on SampleApp, watch out for this odd problem that I had which was related to classpaths and manifested itself with the error java.lang.NoSuchMethodError: com.fasterxml.jackson.databind.JavaType.isReferenceType()Z.

Per the system requirements, we need to make sure that we're using one of the supported ODBC Driver Managers, so we'll install iODBC to start with (the other option being unixODBC):

sudo yum install -y unixodbc

Now follow the configuration instructions. To start with we'll do this in isolation of OBIEE to check that it works, and then bring it into OBIEE's world (for example, odbc.ini already exists in OBIEE).

Here I've set the log level to Trace, so that we can see what's going on in depth—in practice this would generate huge amounts of unnecessary log data so set it to a lower value (e.g. 0) for actual use.

From where you can enter a command (don't include the ; suffix as this is added automatically)

SQL> SELECT version FROM sys.version
version
--------
1.7.0
result set 1 returned 1 rows.

If you get Unable to locate SQLGetPrivateProfileString function then check that your LD_LIBRARY_PATH includes the location of the Driver Manager (libiodbc.so.2), and that the exact library specified by ODBCInstLib exists—in my installation it was called libiodbcinst.so.2 rather than libiodbc.so as shown in the docs. Also check that you've set MAPRDRILLINI environment variable.

libiodbc installs into /usr/lib64 which is one of the default paths checked for library files, hence not including it explicitly in the LD_LIBRARY_PATH environment variable.

If you just get Have a nice day. with no error but no SQL> prompt, check your ODBC DSN configuration. I hit this issue when inadvertently omitting the AuthenticationType parameter.

If you head over to /tmp/odbc.log/ you should see a file called driver.log with a bunch of trace data in it. Here's the data from the above session, excluding TRACE logs:

Environment variables are defined per-component in a .properties file, which for the BI Server (OBIS / nqsserver) is BI_DOMAIN/config/fmwconfig/bienv/obis/obis.properties, so on SampleApp /app/oracle/biee/user_projects/domains/bi/config/fmwconfig/bienv/OBIS/obis.properties. To this file (which in 12.2.1 is empty by default) we add:

Add the Drill DSN to odbc.ini which for OBIEE already exists and is populated with other ODBC configurations. You'll find the file in BI_DOMAIN/config/fmwconfig/bienv/core, which on SampleApp is /app/oracle/biee/user_projects/domains/bi/config/fmwconfig/bienv/core/odbc.ini.

We can check that the BI Server has picked up our new environment variables by using the /proc pseudo file-system (you can also see an environment variable dump as part of the obis1.out logfile during startup):

Let's pick through this, because it's a common error and easily misinterpreted.

Odbc driver returned an error (SQLExecDirectW).

When you run an analysis/dashboard, Presentation Services (OBIPS/sawserver) connects to the BI Server (OBIS/nqsserver) to send the request as Logical SQL, and it connects to the BI Server using ODBC. Therefore any kind of issue running the request will always show as an "ODBC error".

Learning: nothing, other than that Presentation Services hit an error.

So it looks like the BI Server crashed. In the folder above where obis1.out is kept, there's a corresponding crash report (note that the process ID matches the log message above), nqsserver_7059_crashreport.txt. Some bits of interest from it:

So now let's do some digging. Using strace (which I've written about previously) we can see every system call that OBIEE makes, letting us audit the files that it reads and libraries that it loads. You can invoke strace at process startup or, easier, attach it to a running one. Since the BI Server also creates child processes, and also may crash (as here), we'll use a loop to make sure we've always got a strace running against it until we opt to cancel it:

The output is written to the home folder, in files named nqs_trace_ followed by the timestamp at which the trace started. To cancel it, hit Ctrl-C several times to break out of strace and the while loop. If you leave this running for a long period of time you'll likely fill the filesystem and possibly make the server unusable…

Using your text editor of choice, you can search through the strace output to try and pick apart what's happening. Even if you don't find the cause of the error, you can at least rule other causes out this way. For example, whether a configuration file that you think is being used actually is, or where a library is actually being read from.

First up in searching the trace for odbc is the BI Server's own ODBC gateway library being opened and read:

and then looks for the ODBC trace library, and this is interesting because it can't find it. One of the very useful things with strace is to be able to see all the paths that a program tries to read for a given file. Here you see a selection of the paths it's trying, and the failure (-1 ENOENT (No such file or directory)) each time:

You can install this on your system with sudo yum install -y mlocate, and if you've recently changed files (e.g. installed a new package) refresh the database with sudo updatedb before running locate.

This is one interesting point of note here, then, that the ODBC trace library isn't loading properly. Let's note this for now, and carry on through the trace file. Next up it opens and reads the odbc.ini file:

Where do these paths come from? LD_LIBRARY_PATH! And we can confirm that by comparing the above sequence of folders with the value of LD_LIBRARY_PATH that we saw above, split out onto lines here with a bit of sed magic:

So from this we can see that the ODBC driver is being loaded from the path we expected, and the configuration files that we expected are being read too. So no smoking gun - but a couple of possible problems ruled out.

Reproducing the Crash on Demand

With this kind of problem it's often a case of working through lots of different configuration settings to obtain further diagnostic evidence - which requires reproducing the problem each time. Instead of manually running the query through Answers which triggers the problem, I moved this part of the diagnostics to the command line, with nqcmd. I copied the logical SQL to a file drill-test.lsql:

One of the routes of investigation is the Driver Manager. This is for two reasons; one is that on Windows ODBC is native, the driver manager is built in. As mentioned, @cfiston has got this to work - but on Windows. So platform differences aside (he says casually), this could be something. The second reason is the difference in the driver manager that shows in the ODBC log between a successful (via iodbctest) request

Before we dig into this further let's take a look at the driver manager libraries that we've got on the system. DataDirect and unixODBC are both libodbcinst.so whilst iODBC is libiodbc.so. We'll use a bit of bash magic to find all the files, and list their sizes and checksums so we can spot which are the same:

What's interesting from this is that the libodbcinst.so installed within OBIEE (bifoundation/odbc/lib/) is the same size as the DataDirect 5.3 one, but a different checksum.

OBIEE uses ODBC for its own internal connectivity between Presentation Services (OBIPS) and BI Server (OBIS). We saw in the strace output above that odbctrac.so wasn't successfully loaded, so let's fix that in /app/oracle/biee/user_projects/domains/bi/config/fmwconfig/bienv/core/odbc.ini:

So back to driver managers. The DataDirect manager is not listed in the system requirements - perhaps it's not supported, perhaps it's not been tested. Let's see if our iodbctest works when we force it to use the DataDirect driver. We'll do this by setting the ODBCInst parameter in ~/.mapr.drillodbc.ini, along with the accompanying LDLIBRARYPATH. The latter is necessary for dependent libraries used by the driver manager. We can use strace to verify the paths being picked up.

So ODBCInstLib is used, but the calling application also plays a role. We can see from strace that both applications are loading libodbcinst.so, but still recording different Driver Managers in the driver.log

Of note here is that only nqcmd depends on a driver manager (libodbcinst.so) - so at a guess the other two tools interact with the ODBC drivers directly? Although the strace above did show each one loading the library, so ... ?

At the moment libodbcinst.so is a soft link to the .2.0.0 unixODBC version

So after all that - an incorrect configuration of the MapR ODBC driver was the culprit.

Conclusion

Hindsight is a wonderful thing - at the end of an article like this it's easy enough to wonder if strace and other tools were really necessary to get to the bottom of the issue. However, for me, using these tools is a great way of ruling out what the problem isn't.

The trick is to judge when to jump into the deep-end of system call tracing, and when to maybe take a step back and RTFM closely. Particularly if you're doing something that is supported, that is documented, then you probably wouldn't be reaching for strace. But edging away from what is supported or documented, these can be great tools to help explore how applications are running and can be made to work with other components (such as 3rd party ODBC drivers).

Outside of my job at Rittman Mead, I'm fortunate enough to be involved with a project called the Lightyear Foundation. We are a charitable organisation which aims to promote the techniques and philosophies of science, working with children and teachers in the UK and Ghana. Particularly, we try to exemplify the fact that fundamental science principles can be demonstrated without the need for formal labs and expensive equipment. We believe that in increasing the accessibility to science, it will increase the prevalence of broadly applicable scientific skills such as critical thinking, communication of information, and the perception of failure.

As a charity, we have need for marketing and donations and are thus required to understand our donor base, just as any business might try to understand their customers. Which brings us neatly to this blog. It is common for businesses to analyse the efficacy of marketing campaigns with regards to revenue, which in the modern world can range from TV to Twitter. The collection and analysis of this data has then formed a multi-million dollar industry in and of itself. Unfortunately as a charity, we don't have multi-millions to spend on this analysis. Fortunately, we do have a search engine, a bit of ingenuity, and some elbow grease. Also, Rittman Mead was kind enough to donate a server to tinker with.

Looking for a Wheel

Before re-inventing the wheel, it seemed like a good idea to see if any wheels existed and indeed what they looked like. As you might expect, there are a great deal of analytics tools aimed at social media. The problem is that most of them require some sort of payment, and the vast majority of the free ones only allow analysis of a specific source. For example, Facebook provide an extensive insights platform for your page data, but this does not access anything from Twitter.

A notable exception is Cyfe, an absolutely fantastic platform that is free to use if you don't require historic analysis (queries are limited to the last 30 days). This is a very impressive application, and I will be revisiting this in a later blog. However, historic data is essential for holistic analysis rather than simply reporting, so I'm going to take a look at how difficult it would be to build something to allow that.

Exploring the Data

With any analytical project, exploring, collecting, and processing the data is the most important and time consuming part. To start with, a few data sources were identified as being useful for the investigation:

Website

Facebook

Twitter

YouTube

MailChimp

This would give us information about our reach across the most important social media platforms as well as the website statistics and e-mail subscriptions. One notable omission from this blog is the donation system itself which is currently being changed. Each of the sources has an API for accessing various levels of information, with differing limitations. The first step was to try out each of the APIs and see what kind of data was retrievable.

Website Data

The easiest way to collect website data is to use Google Analytics, a free and excellent service that can be used to monitor and analyse traffic to our WordPress site. It works by embedding a small amount of JavaScript to each page, which then executes Google's tracking function, sending a variety of information about the session.

The above image is an excerpt of the kind of data provided by Google Analytics, showing the number of sessions activated on the website over a period of a few months. The platform has an incredibly detailed and vast store of data, including (among others):

Session

Clicks

Geographic

User Information (although this is often missing or incomplete)

Referrals

Network

Browser

Naturally it also has an API, allowing online access in real time or offline access secured by a key on the server. For my initial testing, I setup an API key and wrote a python script to query the service:

The authentication is performed using OAuth2 and is well documented. There is also a web interface for the API complete with handy auto-complete fields for each of the parameters.

While the platform is very fast and detailed, there is one limitation, which is that you can only choose up to 8 dimensions at a time for a query. For most use cases this is perfectly acceptable, but is worth noting.

Facebook

Facebook provides their Graph API for querying data in this fashion as well as an explorer tool for testing. Authorisation can be granted and stored in the form of an API key. There is a permissions system of which the most important features are Manage Pages and Read Insights which give access to Facebook page information and reach data respectively. Some insights data is publicly available but appears to be limited to viewing a page's fanbase by country. Also, they store your data for two years, purging the rest. This means if you wish to store longer historic trends, you will need to siphon off the desired data and store it manually (we'll get to this later). Result sets are paginated, but feature a useful key-based system which allows easy retrieval of subsequent pages. As with Google, the performance of querying is very impressive, returning detailed data about 500 posts made over the last two years in under 30 seconds.

Once again, it was simple to produce the API HTTP requests using python's requests library. This library is able to automatically parse the returned JSON payload into a useful Python object. Below is an example of some simple queries run once the API was configured:

Twitter is a little trickier to work with. They have a public API which can be used to interface with Twitter quite extensively, but is not necessarily appropriate for analysis. Specifically, I was looking for metrics about reach and the number of impressions, similar to those obtained from Facebook's Insights API. These are available from Twitter's website, but only for a given time frame, as seen below.

They do provide an analytical API for their Ads Service, but this does not appear to be publicly or freely available. Additionally, Gnip provides the official commercial outlet for analytical Twitter data.

Nevertheless, it was quite easy to get a Python integration set up for the public API by storing the consumer keys and access tokens for our Twitter account. Then I used Tweepy as a wrapper for Twitter's REST API. At the very least, follower and tweet count information is accessible, which can give some indication of popularity. Also, we'll be able to use this API to automate some tweet management if we need to at a later date.

YouTube

YouTube data is expectedly accessible using the same Google framework as before. This allows us to access publicly available YouTube data with only a small amount of additional set up. YouTube Analytics also has a very rich data platform similar to Google Analytics which can be seen below.

However, while it has an API, it cannot be used with a service account, the server-to-server setup I used earlier. So for now, we'll have to settle with collecting simple statistics (views and likes per video), but if we want the richer data set (watch time, demographics, audience retention) we will need to set up a live OAuth2 connection.

MailChimp

Last but not least, MailChimp also provides some analytics capability from within the site itself.

In addition, there is also a well documented API for looking at your subscription lists and campaigns. This can be simply queried using the requests library and supplying an API key.

Once again, as well as the ability to perform maintenance tasks, a decent amount of information is available:

Click Rates

Subscription Numbers

Open Rates

Geographical Information

Recipients

Devices

Revenue

Monthly History

Summary

That wraps up this first exploratory part, just dipping my toes into the various data streams out there. It's clear that there's a lot of data available but there still lies some complexity on tying it together for meaningful analysis. The next part of this blog series will look at trying to collect some of this data on a single platform so they can be queried together.

If you'd like to have a go at using the code from this blog, it's all on GitHub. Bear in mind that you will need to fill in a config.ini file from the sample, and configure it for authentication with your own social media accounts.

If you'd like to know more about how Rittman Mead can help your organisation make the most of this kind of data, please get in touch via the website.

Apache Drill enables querying with SQL against a multitude of datasources including things like JSON files, Parquet and Avro, Hive tables, RDBMS and more. MapR have released an ODBC driver for it, and I thought it'd be neat to get it to work with OBIEE. It evidently does work for OBIEE running on Windows, but I wanted to be able to use it on my standard environment, Linux.

To get OBIEE to work with Apache Drill we'll use the third option - native ODBC drivers. I'm doing this on SampleApp v511.

First Things First - Setting up Apache Drill

Drill can be deployed in distributed configuration (with all the parallel processing goodness which that brings), but also run as a single instance locally. For the sake of simplicity that's what I'm going to do here. It's rather easy to do:

You need to make sure you've got a recent JDK available, and if you're running it on BigDataLite VM watch out for this odd problem that I had which was related to classpaths and maniested itself with the error java.lang.NoSuchMethodError: com.fasterxml.jackson.databind.JavaType.isReferenceType()Z.

Next, create the MapR Drill ODBC driver configuration file. A sample one is provided, which you can copy from the default installation path of /opt/mapr/drillodbc/Setup/mapr.drillodbc.ini, or create new. I put it in the default path (~/.mapr.drillodbc.ini).

The most important is DriverManagerEncoding. If you leave this as the default of UTF-32 OBIEE will crash (SIGSEGV) when you try to query the data in Apache Drill. You can read all about my trials and tribulations trying to figure this out in a separate blog post coming soon.

I've set LogLevel to 2 and LogPath to a valid path, so that there's some log files to check if things go wrong

Set the ODBCInstLib to libodbcinst.so which matches the built in DataDirect ODBC Driver Manager library file.

Add the necessary environment variables to BI Server. This is done per-component in a .properties file, which for the BI Server (OBIS / nqsserver) is BI_DOMAIN/config/fmwconfig/bienv/obis/obis.properties, so on SampleApp /app/oracle/biee/user_projects/domains/bi/config/fmwconfig/bienv/OBIS/obis.properties. To this file (which in 12.2.1 is empty by default) we add:

Add the Drill DSN to odbc.ini which for OBIEE already exists and is populated with other ODBC configurations. You'll find the file in BI_DOMAIN/config/fmwconfig/bienv/core, which on SampleApp is /app/oracle/biee/user_projects/domains/bi/config/fmwconfig/bienv/core/odbc.ini.

On a Windows machine I installed the MapR Drill ODBC driver too and created a DSN of the same name as in my odbc.ini file above. In the Administration Tool I set up a new Database (type: ODBC Basic) and associated connection pool (ODBC 2.0) pointing to DrillDSN.

Now to try it out! In Answers I build a Direct Database Request:

and run it

Nice. We can query data held in HDFS too, again with a Direct Database Request:

Exploring Drill data with OBIEE

So the above DDR prove the connectivity works. But as any ful kno, DDR is at best a 'tactical' solution, at worst, a complete hack and maintenance nightmare. Let's use the force luke, or at least, the RPD. The first obvious thing to do is Import Metadata from the connection pool that we've defined. But doing this, there's no objects shown:

That's because 'tables' in Drill are not quite as clearly defined as in a standard RDBMS. A table could be a single file, multiple files matching a pattern, or even literally a table if connecting Drill to an RDBMS. So to expose a set of data through Drill, we define a view. This is where Drill Explorer comes in as it gives a simple GUI over the available files

from where you can use the SQL tab and Create As option to create a view

Having done this, launch the Import Metadata dialog again (right click the Connection Pool and select Import Metadata), and make sure you tick Views on the Metadata types to view. Now you'll see the object. Unfortunately, it just has a single column - *. I've not figured out yet how - if if it's possible - to get a view to explode out all columns in the underlying select clause. Import the view:

You'll get an error about the * column name, but the table and schema still get brought across.

Now the slightly tedious bit - define each physical column, and define the physical and logical model, done very simplistically here:

Looking at the query being run, OBIEE is using double quotation marks (") to quote identifiers, but Drill requires backtick (`) instead. Heading over to DB Features can fix this:

And refreshing the report gives:

Conclusion

This is the very basics necessary to get up and running with OBIEE and Apache Drill. It would be good to see if there's an optimal, least-friction, way for getting tables in Drill exposed to OBIEE without needing to enter each physical column.

One of the many powerful features of Drill is being able to access nested and array JSON values, which I've discussed in my Introduction to Apache Drill post. The above examples just use root-level attributes, and could easily be expanded out to process some of the nested fields (such as hours in the business data above). For the time being this would be done with DDR, or a Drill view wrapped around it imported into the Physical layer of the RPD.

Apache Drill is an engine that can connect to many different data sources, and provide a SQL interface to them. It's not just a wanna-be SQL interface that trips over at anything complex - it's a hugely functional one including support for many built in functions as well as windowing functions. Whilst it can connect to standard data sources that you'd be able to query with SQL anyway, like Oracle or MySQL, it can also work with flat files such as CSV or JSON, as well as Avro and Parquet formats. It's this capability to run SQL against files that first piqued my interest in Apache Drill. I've been spending a lot of time looking at Big Data architectures and tools, including Big Data Discovery. As part of this, and experimenting with data pipeline options one of the gaps that I've found is the functionality to dig through files in their raw state, before they've been brought into something like Hive which would enable their exploration through BDD and other tools.

In this article I'll walk through getting started with Apache Drill, and show some of the types of queries that I think are a great example of how useful it can be.

Getting Started

It's very simple to get going with Apache Drill - just download and unpack it, and run. Whilst it can run distributed across machines for performance, it can also run standalone on a laptop.

To launch it

cd /opt/apache-drill-1.7.0/
bin/sqlline -u jdbc:drill:zk=local

If you get No current connection or com.fasterxml.jackson.databind.JavaType.isReferenceType()Z then you have a conflicting JAR problem (e.g. I encountered this on Oracle's BigDataLite VM), and should launch it with a clean environment

So far, so SQL, so relational - so familiar, really. Where Apache Drill starts to deviate from the obvious is its use of storage handlers. In the above query cp is the 'database' that we're running our query against, but this is in fact a "classpath" (hence "cp") storage handler that's defined by default. Within a 'database' there are 'schemas' which are sub-configurations of the storage handler. We'll have a look at viewing and defining these later on. For now, it's useful to know that you can also list out the available databases:

Note databases command is a synonym for schemas; it's the <database>.<schema> that's returned for both. In Apache Drill the backtick is used to enclose identifiers (such as schema names, column names, and so on), and it's quite particular about it. For example, this is valid:

This is because default is a reserved word, and hence must be quoted. Hence, you can also use

0: jdbc:drill:zk=local> use cp.`default`;

but not

0: jdbc:drill:zk=local> use `cp`.default;

Querying JSON data

On the Apache Drill website there's some useful tutorials, including one using data provided by Yelp . This was the dataset that originally got me looking at Drill, since I was using it as an input to Big Data Discovery (BDD) but struggling on two counts. First up was how best to define a suitable Hive table over it in order to ingest it to BDD. Following from this was trying to understand what value there might be in the data which would drive how long to spend perfecting the way in which I exposed the data in Hive. The examples below show the kind of complications that complex JSON can introduce when queried in a tabular fashion.

First up, querying a JSON file, with the schema inferred automagically. Pretty cool.

So Apache Drill enables you to run SQL queries against data in a multitude of formats and locations, which is rather useful in itself. But even better than that, it lets you federate these sources in a single query. Here's an example of joining between data in HDFS and Oracle:

An error? That's not supposed to happen. I've got a JSON file, right? It turns out the JSON file is one complete JSON object per line. Except that it's not on the last record. Note the record count given in the error above - 2819:

[oracle@bigdatalite ~]$ wc -l geo_tweets.json
2818 geo_tweets.json

So the file only has 2818 complete lines. Hmmm. Let's take a look at that record, using a head/tail bash combo :

Within the twitter data there's root-level fields, such as text, as well as nested ones such as information about the tweeter in the user field. As we saw above you reference nested fields using dot notation. Now's a good time to point out a couple of common mistakes that you may encounter. The first is not quoting reserved words, and is the first thing to check for if you get an error such as Encountered ".":

To filter out records that may not have array values (such as hashtags, which not every tweet has) and without with the query may fail, use IS NOT NULL against an attribute of first index of the array:

JSON files are relatively easy to interpret because they have a semi-defined schema within them, including column names. CSV (and character delimited files in general), on the other hand, are a bit more of a 'wild west' when it comes to reliably inferring column names. You can configure Apache Drill to ignore the first line of a CSV file (on the assumption that it's a header) if you want to, or to take them as column names. If you don't do this and query a CSV file that looks like this:

From the Drill web interface you can view existing storage plugins, or define new ones. To create a new one, enter its name (for example, hdfs, but could be fred for all that it matters - it's just a label) under New Storage Plugin on the Storage page, and click on Create. Paste the necessary JSON definition in the Configuration box, and then click Create. If you don't want to use the GUI there's also a REST API.

Storage plugin configuration is stored either within Zookeeper (when running Drill distributed), or locally in the sys.store.provider.local.path path when running standalone. By default this is under /tmp which gets cleared down at server reboot. To persist custom storage configurations amend the sys.store.provider.local.path in drill-override.conf, for example:

As well as the connection parameter itself for HDFS, the important bit in this configuration is the formats section. This tells Drill how to interpet files that it finds, without the end-user having to explicitly declare their type.

For the filesystem-based plugin dfs (which can include local files, HDFS, even Amazon S3), you can browse the available "tables":

Drill relies on the format clause of the storage extension configurations in orer to determine how to interpret files based on their extensions. You won't always have that luxury of extensions being available, or being defined. If you try and query such data, you'll not get far. In this example I'm querying data on HDFS that's in JSON format but without the .json suffix:

Here the problem was with the external table that Oracle was querying (ORA-29913: error in executing ODCIEXTTABLEOPEN). It's actually an Oracle external table over a Hive table, which obviously Drill could be querying directly - but hey, we're just sandboxing here...

Query Execution

Just as Oracle has its Cost Based Optimiser (CBO) which helps it determine how to execute a query, and do so most efficiently, Apache Drill has an execution engine that determines how to actually execute the query you give it. This also includes how to split it up over multiple nodes ("drillbits") if available, as well as optimisations such as partition pruning in certain cases. You can read more about how the query execution works here, and view the explain plan for a query using explain plan :

You can also use the Drill web interface to see information about how a query executed:

Drill Explorer

The MapR Drill ODBC driver comes with a tool called Drill Explorer. This is a GUI that enables you to explore the data by navigating the databases (==storage plugins) and folders/files within, previewing the data and even creating views on it.

Apache Drill is a powerful tool for using familiar querying language (SQL) against different data sources. On a small scale, simply being able to slice and dice through structured files like JSON is a massive win. On a larger scale, it will be interesting to experiment with how Apache Drill compares when querying larger volumes of data across a cluster of machines, maybe compared to a tool such as Impala.

For more information about Apache Drill see how to access Drill from within OBIEE, as well as bonus geeky blog coming soon explaining the debug tools I used to try and figure out why it wouldn't initially work...

Oracle has recently released version 12.2.1.1.0 for OBIEE 12c, which has a variety of exciting new features for Data Visualization, BI Publisher, Dashboards in OBIEE, and ODBC connections. It can be performed as an in-place upgrade (performed by the Upgrade Assistant) or as a fresh install. The upgrade process is explained at length here. There is also an example Linux in-place upgrade from 12.2.1.0 to 12.2.1.1 which can be found here. Let's take an in-depth look at some of the new features.

Data Visualization (aka Visual Analyzer)

A variety of features have been added to Data Visualization, which brings it closer to its standalone counterpart, Data Visualization Desktop.

Visualize Data from Oracle Applications

One of the most powerful exploratory features of Data Visualization Desktop is the ability to use your OBIEE analyses as direct datasources. This allows user to "revisualize" and explore the data contained within one or more analyses in different ways. Oracle has added this same feature in data visualization. To access this feature from the DV Home Page, click on Data Sources > Create New Data Source.

You are then presented with three choices: From a File, From Oracle Applications or From Database (more on this later).

Selecting From Oracle Applications opens a new window where you enter your connection information.

Once you enter the connection information, you are able to view all of the folders you have access to within the OBIEE catalog. Selecting an analysis and clicking ok brings you to another screen where you can specify between measures and attributes for your columns and change the specified aggregation for your measure columns.

Clicking Add Data Source adds the analyses to the Data Source page

.

Visualize Data from Databases

Another feature added to DV is the ability to connect directly to a database. It allows you to add tables directly from a database schema or write a SQL statement to select the data you want.

To connect to a database as a data source, select Data Sources from the Home Page then Create New Data Source > From Database. This brings up a connection window to enter your connection details.

From here, you can also select from a large selection of database types other than the default Oracle Type. Clicking on Oracle brings up the list which matches the selections available in the latest release of Data Visualization Desktop.

Choosing to add tables from a database schema allows you to add data sources, one table at a time.

In some situations, you might want to create one datasource selection from a database from multiple tables. You can achieve this by clicking on the SQL tab after making your database connection and writing a custom sql statement.

Clicking OK brings you to a results screen with all of the columns previously defined in your sql statement and also give you the ability to rename your datasource and to change the aggregation type for your measure columns.

Modifying Uploaded Data Sources

Also known as "Data Wrangling", Oracle has added the ability to manipulate a dataset depending on the column data type. By invoking a logical SQL function, Data Visualization does the work for you and can create a new column or edit an existing one. To show an example of this, I created a new VA Project and uploaded a sample Excel file.

In the top left menu bar, there is an option called Stage. Clicking on it opens the datasource and shows all of the columns present and the first 100 records.

Note: If you have more than one file, clicking on the name of the file in the top left will reveal a dropdown menu where you can choose between them.

To edit or add a column, navigate to the right of the column you wish to edit and click on the options icon .

This brings up the options menu where you can select from a variety of different options for editing or adding column depending on the datatype.

Using the Concatenate function, I was able to create a new column that showed the full name of each customer combined with data in the Age Group column separated by a colon.

If you chose to create a new column, a Create New Data Element window opens allowing you to input logical SQL functions similar to the edit column formula feature in OBIEE Analytics.

New and Enhanced Visualization Types

DV now includes new visualizations including donut charts, text boxes, sunburst, combo, scatter (cat.) and stacked scatter (cat.), which brings it in line with the offerings from Data Visualization Desktop.

Also included (assuming you have installed and configured Oracle's R distribution) is the ability to add clusters, outliers, reference lines, trend lines and forecast. There are two ways to add these to your visualizations. The first is by selecting the Analytics option in the horizontal menu bar on the far right side and then choose the desired function. Double clicking on a function automatically adds it to your visualization.

Another option is to add them directly from Menu > Properties > Analytics within your visualization. You can add analytic functions and, depending on the function, there are a variety of different options to change how the function is displayed.

You can also add URLs or links to insights within Tiles, Text Boxes and Image visualizations and, if you use Chrome for Windows or Android, there is a dictation option within properties that you can use to add descriptions.

Customize Color Schemes

One of the more frustrating absences from DV up to this point was the ability to customize colors in your visualizations. Oracle has not only added this feature but given you the ability to customize and save these customizations, making color conformance for a group of users or across an entire organization a breeze.

Let's start by looking at an example of applying color to an entire project. To access the project color properties, click on Canvas Settings > Project Properties. By clicking on the color selection in the Color Series section, you can choose from one of four default selections or create your own custom palette.

Notice how each measure is assigned a color and that color is continuous across the entire project.

You can also manage colors from inside of an individual visualization. To do this, click on Menu > Color > Manage Color Assignments. From here you can assign colors to each measure individually and it is then displayed across the entire project.

Another option is to change a color by right clicking on an individual data point.

In this example, I right clicked on Technology and changed its color which then changed it for every visualization in the project.

Data Blending

Oracle added the ability to specify which data source takes precedence over the other when blending two data sources together. This can be very useful when you have a project that includes data from two data sources and there are match values in one data source that are not in the second. Let's look at an example.

I have two data sources. One contains population information and one contains country name and they are matched on the country codes.

Notice in the example that there are countries that have no population facts and and population facts that have no countries.

So that we can see only the countries which have population information, click on one of the columns in the report, right click and select Properties > Data Sets. Here you are presented with an option for each source: All Rows or Matching Rows.

Changing the source that contains country information to Matching Rows will keep only the countries that have corresponding population information.

Share Reports as Read-Only

It is now possible to share a report with other users as read-only. They will still be able to interact and edit filters, but won't be able to see the authoring content controls. They are also able to go back an forth between presentation mode.

Other Features

Two other features for DV introduced in 12.2.1.1 is the ability to upload data files up to 50mb in size. Also you can hide the filter panel from view to increase canvas space for your projects.

BI Publisher

Oracle has also updated BI Publisher 12c to version 12.2.1.1. You can see the official document here.

Deliver documents to Oracle Document Cloud Service

You are now able to deliver BI Publisher reports to the Oracle Documents Cloud service via a delivery channel in BI Publisher. You can deliver reports to cloud storage from both on-premise or a cloud deployment of BI Publisher.

You can set up this feature under Administration > Delivery > Document Cloud Services within BI Publisher 12.2.1.1.

Here, under the Document Cloud Services tab, you can add the connection details to your cloud server.

Use WebCenter Content as a Data Source

You can now read text data files from WebCenter Content. You can create a Content Server data source under Administration > Delivery > Content Server

After you create your Content Server as a data source, you can create Content Server as a dataset in a BI Publisher data model and retrieve text data files stored in Web Center Content by Document ID.

Attach PDF to Your Invoices

You can now send PDF attachments along with invoices with BI Publisher 12.2.1.1. You can attach PDFs along with the invoice while bursting. The process is explained in more detail in the BI Publisher 12.2.1.1 documents here.

Integrate Using RESTful APIs

BI Publisher 12.2.1.1 introduces a set of REST APIs that allow you to view reports in an application. It connects to the BI Publisher Server through a URL and uses JSON objects to contain data. You can read more about RESTful APIs in BI Publisher in the BI Publisher 12.2.1.1 new features guide here.

Deliver Email Body in HTML Format along with Report as Attachment

You can use standard HTML4 formatting tags to create the email body, include a logo or images, add hyperlinks and more. Reports will be generated as attachments to the email.

Manage Custom Fonts

In BI Publisher 12.2.1.1, you can manage custom fonts from the Administration page, and once uploaded, these custom fonts can be displayed in reports.

Manage PGP Keys for FTP Delivery Encryption

In BI Publisher 12.2.1.1, Administrators can upload and manage PGP Keys from the Administration page. FTP delivery encryption is now a self-service feature. More details can be found in the BI Publisher 12.2.1.1 documents.

Dynamic Memory Guard

You can now separately configure limits for online and offline reports. The limits can be made dynamic by using variables, system defined functions and operators and, when the data changes, the dynamic memory guard will adjust.

You can access the Memory Guard in the Administration page under Runtime Configuration > Properties.

You can now enable File Data Encryption in the Administration page and your data files with be encrypted at the time of uploading them on the server.

OBIEE Dashboard Subpages

A new feature for OBIEE 12.2.1.1 is the ability to add subpages within dashboards. This feature will allow dashboard designers to create multiple subpages for each dashboard page. To add a subpage, click on the desired dashboard page and click the "Add Dashboard Page" icon and select "Add Subpage."

The Add Subpage window will then be displayed where you can name your subpage and add a description.

Once you click OK, the subpage will be added to the dashboard page and adding content to the subpage is the exact same process as adding content to a dashboard page.

RPD - Enable Data Driven Fragment Selection

A new feature for RPD developers is the ability to improve the performance of fragmented logical table sources by using the Enable Data Driven Fragment Selection feature. More information on how to enable this feature can be found in the 12.2.1.1 documents here.

Additional Data Sources Supported

New data sources are supported for 12.2.1.1 using DataDirect 7.1.5. These include Amazon RedShift, Oracle Service Cloud, Greenplum, Salesforce, and Teradata. Connections to Apache Spark and MongoDB are also available using 8.0.

Conclusion

All in all, the 12.2.1.1 update for OBIEE 12c adds some very powerful features. Data Visualization is now a very complete product and it's new features and native connectors make it a powerful addition to the new era of data visualization. For Oracle Cloud users, the integration for BI Publisher and Data Visualization makes the "report and share" process easier than ever.

In my previous post I looked the latest release of Oracle Stream Analytics (OSA), and saw how it provided a graphical interface to "Fast Data". Users can analyse streaming data as it arrives based on conditions and rules. They can also transform the stream data, publishing it back out as a stream in its own right. In this article we'll see how OSA can be used with Kafka.

Kafka is one of the foremost streaming technologies nowadays, for very good reasons. It is highly scalable and flexible, supporting multiple concurrent consumers. Oracle Streaming Analytics supports Kafka as both a source and target. To set up an inbound stream from Kafka, first we define the Connection:

Once the Connection is defined, we can create a Stream for a given Kafka topic:
If you get an error at this point of Unable to deploy OEP application then check the OSA log - it could be a connectivity issue to Zookeeper.

Assuming that the Stream is saved with no errors, you can then create an Exploration based on the stream and all being well, the live tweets are soon shown. Unlike the example at the top of this article, these tweets are coming in via Kafka, rather than the built-in OSA Twitter Stream. This is partly to demonstrate the Kafka capabilities, but also because the built-in OSA Twitter Stream only includes a subset of the available twitter data fields.
Avro? Nope.

Data in Kafka can be serialised in many formats, including Avro - which OSA doesn’t seem to like. No error is thrown to the GUI but the exploration remains blank.

Looking in the OSA log file there’s a whole lot of errors recorded similar to this:

line 1:0 no viable alternative at character '?'
line 1:1 no viable alternative at character '?'
line 1:2 no viable alternative at character '?'
line 1:3 no viable alternative at character '?'
line 1:4 no viable alternative at character '?'
line 1:5 no viable alternative at character '?'
line 1:6 no viable alternative at character '?'
line 1:7 no viable alternative at character '?'
line 1:8 no viable alternative at character ''

JSON? Kinda.
One of the challenges that I found working with OSA was defining the “Shape” (data model) of the inbound stream data. JSON is a format used widely as a technology-agnostic data interchange format, including for the twitter data that I was working with. You can see a sample record here. One of the powerful features of JSON is its ability to nest objects in a record, as well as create arrays of them. You can read more about this detail in a recent article I wrote here. Unfortunately it seems that OSA does not support flattening out JSON, meaning that only elements in the root of the model are accessible. For twitter, that means we can see the text, and who it was in reply to, but not the user who tweeted it, since the latter is a nested element (along with many other fields, including hashtags which are also an array):

So what to do if the inbound streaming data is in nested-JSON format? It seems to me the only option is to pre-process it to flatten it. There are a variety of tools that could be used here - in the first instance I’d generally reach for Logstash, it being the one I’m most familiar with. To get an idea of the schema of a JSON record you can use jsonschema.net. Funnily enough when I was researching this blog post I came across the exact same problem on a forum posted by … me! Early last year I was working with the same dataset, and had the same issue with embedded arrays. The way to do it in Logstash is with a bit of Ruby code to flatten the arrays, and a standard mutate to bring nested objects up to the root level. Sample code:

You can find the full Logstash code on gist here. With this logstash code running I set up a new OSA Stream pointing to the new Kafka topic that Logstash was writing, and added the flattened fields to the Shape:
We can then see in the Exploration the fields that we wanted to get at - user name, hashtags, and so on:
Other Shape Gotchas

One of the fields in Twitter data is ‘source’ - which unfortunately is a reserved identifier in the CQL language that OSA uses behind the scenes.

It’s not clear how to define a shape in which the source data field is named after a reserved identifier.

Further Exploration of Twitter Streams with OSA

Using the flattened Twitter stream coming via Kafka that I demonstrated above, let’s now look at more OSA functionality.

Depending on the source of your data stream, and your purpose for analysing it, you may well want to filter out certain content. This can be done from the Exploration screen:

The Business Rules section of the Exploration enables you to define rules about the data and set field values based on it. This can be static values, or expressions based on data in the stream. There doens’t seem to be a way to add arbitrary fields via this, so I amended the Stream Shape to include a ‘spare’ field that I then populated:
Kafka Stream Transformation with OSA

Here we’ll see how OSA can be used to ingest one Kafka topic, apply a transformation, and stream it to another Kafka topic.

The OSA exploration screen offers a basic aggregation (‘summary’) function, here showing the number of tweets per language:

Using the Windows icon to the right of the Sources box the time window can be defined, along with the refresh frequency:

This means that the count of tweets per language will be calculated looking at the data for the past 30 seconds, and this will be evaluated every five seconds. More complex functionality such as pivoting on the group-by column (so as to be able to chart out the number of tweets per language as separate metrics) doesn’t seem to be present in this release; arguably this is moving over into per analytics territory such as would be found in Oracle’s Big Data Discovery.

Taking the summarised stream (count of tweets, by language) I first Publish the exploration, making it available for use as the input to a subsequent exploration. Then from the Catalog page select a Pattern, which I’m going to use to build a stream showing the most common languages in the past five seconds. With the Top N pattern you specify the event stream (in this case, the summarised stream that I built above), and the metric by which to order the events which here is the count of tweets per language.

For completeness, I’m going to stream the output of this pattern exploration back to a Kafka topic

Note that I’ve defined a new Shape here based on the columns in the pattern. In the pattern itself I renamed the COUNT column to a clearer one (tweetcount5_sec). Renaming it wasn’t strictly necessary since it’s possible to define the field/shape mapping when you configure the Target:

For the target to take effect, I publish the pattern exploration, and then using kafka-console-consumer can see the topic being populated in realtime by OSA:

Being able to apply transformations to streams in realtime like this and stream the results is pretty useful. There are some limitations to the capabilities of OSA through the front end GUI. For example, support for nested json, and integration with the Kafka Schema Registry to automatically derive Shapes for inbound topics would both be great. Lower-level, the option to specify the consumer group id, as well as the start point for consumption (beginning of topic, or streaming at the end) are both things that would probably be necessary sooner or later using OSA for full-blown development.

OSA and Spatial

One of the Patterns that OSA provides is a Spatial one, which can be used to analyse source data that includes geo-location data. This could be to simply plot the occurrence of the data point (as we'll see shortly) on a map. It can also be used in a more sophisticated manner, to track a given entity's movements on a map. An example of this could be a fleet of trucks reporting their position back at regular intervals. Areas on a map can be defined and conditions triggered as the entity enters or leaves the area. For now though, we'll keep it simple. Using the flattened Twitter stream from Kafka that I produced from Logstash above, I'm going to plot Tweets in realtime on a map, along with a very simplistic tagging of the broad area in which they came from.

In my source Kafka topic I have two fields, latitude and longitude. I expose these as part of the 'flattening' of the JSON in this logstash script, since by default they're nested within the coordinates field and as an array too. When defining the Stream's Shape make sure you define the datatype correctly (Double) - OSA is not very forgiving of stupidity and I spent a frustrating time trying to work out why "-80.1422195" was coming through as zero - obviously defined as an Integer this was never going to work!

Not entirely necessary, but useful for debug purposes, I setup an exploration based on the flattened Twitter stream, with a filter to only include tweets that had geo-location data in them. This way I knew what tweets I should expect to be seeing in the next step. One of the things that I have found with OSA is that it has a tendency to fail silently; instead of throwing errors you'll just not get any data. By setting up the filter exploration I could at least debug things a bit more easily.

After this I created a new object, a Map. A Map object defines a set of named areas, which could be sourced from a database table, or drawn manually - which is what I did here by setting the Map Type to 'None (Create Manually)'. One thing to note about the maps is that they're sourced online (openstreetmap.org) so you'll need an internet connection to do this. Once the Map is open, click the Polygon Tool icon and click-drag a shape around the area that you want to "geo-fence". Each area is given a name, and this is what is used in the streaming data to label the event's geographical area.

Having got our source data stream with geo-data in, and a Map on which to plot it and analyse the location of each event, we now use the Spatial General pattern to create an Exploration. The topology looks like this:

The fields in the Spatial General pattern are all pretty obvious. Object key is the field to use to track the same entity across multiple events, if you want to use the enter/exit/stay statuses. For tweets we just use 'Enter', but for people or vehicles, for example, you might get multiple status reports and want to track them on a map. For example, when a person is near a point of interest that you're tracking, or a vehicle has remained in a set area for too long.

If you let the Exploration now run, depending on the rate of event ingest, you'll sooner or later see points appearing on the map and event details underneath. The "status" column is populated (blank if the event is outside of the defined geo-fences), as is the "Place", based on the geo-fence names that you defined.

Summary

I can see OSA being used in two ways. The first as an ‘endpoint’ for streams with users taking actions based on the data, with some of the use cases listed here. The second is for prototyping transformations and analyses on streams prior to productionising them. The visual interface and immediacy of feedback on transformations applied means that users can quickly understand what further processing they may want to apply to the stream using actual streaming data to inform this.

This latter concept - that of prototyping - is similar to that which we see with another of Oracle’s products, Big Data Discovery. With BDD users can analyse data in the organisation’s data reservoir, as well as apply transformations to it (read more). Just as BDD doesn’t replace OBIEE or Visual Analyzer but enables users to understand how they do want to model the data in these tools, OSA wouldn’t replace “production grade” integration done by Oracle Data Integrator. What it would do is allow users to get a clearer idea of the transformations they would want performed in it.

OSA's user interface is easy to use and intuitive, and this is definitely a tool that you would put in front of technically minded business users. There are limitations to what can be achieved technically through the web GUI alone and something like Oracle Data Integrator (ODI) would still be a more appropriate fit for complex streaming work. At Oracle Open World last year it was announced (slides) that a beta would be starting for ODI using Spark Streaming for ETL and stream processing, so it'll be interesting to see this when it comes out.

Oracle Stream Analytics (OSA) is a graphical tool that provides “Business Insight into Fast Data”. In layman terms, that translates into an intuitive web-based interface for exploring, analysing, and manipulating streaming data sources in realtime. These sources can include REST, JMS queues, as well as Kafka. The inclusion of Kafka opens OSA up to integration with many new-build data pipelines that use this as a backbone technology.

Previously known as Oracle Stream Explorer, it is part of the SOA component of Fusion Middleware (just as OBIEE and ODI are part of FMW too). In a recent blog it was positioned as “[…] part of Oracle Data Integration And Governance Platform.”. Its Big Data credentials include support for Kafka as source and target, as well as the option to execute across multiple nodes for scaling performance and capacity using Spark.
I’ve been exploring OSA from the comfort of my own Mac, courtesy of Docker and a Docker image for OSA created by Guido Schmutz. The benefits of Docker are many and covered elsewhere, but what I loved about it in this instance was that I didn’t have to download a VM that was 10s of GB. Nor did I have to spend time learning how to install OSA from scratch, which whilst interesting wasn’t a priority compared to just trying to tool out and seeing what it could do. [Update] it turns out that installation is a piece of cake, and the download is less than 1Gb … but in general the principle still stands - Docker is a great way to get up and running quickly with something
In this article we’ll take OSA for a spin, looking at some of the functionality and terminology, and then real examples of use with live Twitter data.
To start with, we sign in to Oracle Stream Analytics:
From here, click on the Catalog link, where a list of all the resources are listed. Some of these resource types include:

Streams - definitions of sources of data such as Kafka, JMS, and a dummy data generator (event generator)

Connections - Servers etc from which Streams are defined

Explorations - front-end for seeing contents of Streams in realtime, as well as applying light transformations

Targets - destination for transformed streams

Viewing Realtime Twitter Data with OSA
The first example I’ll show is the canonical big data/streaming example everywhere – Twitter. Twitter is even built into OSA as a Stream source. If you go to https://dev.twitter.com you can get yourself a set of credentials enabling you to query the live Twitter firehose for given hashtags or users.
With my twitter dev credentials, I create a new Connection in OSA:
Now we have an entry in the Catalog, for the Twitter connection:
from which we can create a Stream, using the connection and a set of hashtags or users for whom we want to stream tweets:
The Shape is basically the schema or data model that is applied for the stream. There is one built-in for Twitter, which we’ll use here:
When you click Save, if you get an error Unable to deploy OEP application then check the OSA log file for errors such as unable to reach Twitter, or invalid credentials.
Assuming the Stream is created successfully you are then prompted to create an Exploration from where you can see the Stream in realtime:
Explorations can have multiple stream sources, and be used to transform the contents, which we’ll see later. For now, after clicking Create, we get our Exploration window, which shows the contents of the stream in realtime:
At the bottom of the screen there’s the option to plot one or more charts showing the value of any numeric values in the stream, as can be seen in the animation above.
I’ll leave this example here for now, but finish by using the Publish option from the Actions menu, which makes it available as a source for subsequent analyses.
Adding Lookup Data to Streams
Let's look now at some more of the options available for transforming and 'wrangling' streaming data with OSA. Here I’m going to show how two streams can be joined together (but not crossed) based on a common field, and the resulting stream used as the input for a subsequent process. The data is simulated, using a CSV file (read by OSA on a loop) and OSA's Event Generator.
From the Catalog page I create a new Stream, using Event Generator as the Type:
On the second page of the setup I define how frequently I want the dummy events to be generated, and the specification for the dummy data:
The last bit of setup for the stream is to define the Shape, which is the schema of data that I’d like generated:
The Exploration for this stream shows the dummy data:
The second stream is going to be sourced from a very simple key/value CSV file:

attr_id,attr_value
1,never
2,gonna
3,give
4,you
5,up

The stream type is CSV, and I can configure how often OSA reads from it, as well as telling OSA to loop back to the beginning when it's read to the end, thus simulating a proper stream. The ‘shape’ is picked up automatically from the file, based on the first row (headers) and then inferred data types:
The Exploration for the stream shows the five values repeatedly streamed through (since I ticked the box to ‘loop’ the CSV file in the stream):
Back on the Catalog page I’m going to create a new Exploration, but this time based on a Pattern. Patterns are pre-built templates for stream manipulation and processing. Here we’ll use the pattern for a “left outer join” between streams.
The Pattern has a set of pre-defined fields that need to be supplied, including the stream names and the common field with which to join them. Note also that I’ve increased the Window Range. This is necessary so that a greater range of CSV stream events are used for the lookup. If the Range is left at the default of 1 second then only events from both streams occurring in the same second that match on attr_id would be matched. Unless both streams happen to be in sync on the same attr_id from the outset then this isn’t going to happen that often, and certainly wouldn’t in a real-life data stream.
So now we have the two joined streams:
Within an Exploration it is possible to do light transformation work. By right-clicking on a column you can rename or remove it, which I’ve done here for the duplicated attr_id (duplicated since it appears in both streams), as well as renamed the attr_value:
Daisy-Chaining, Targets, and Topology

Once an Exploration is Published it can be used as the Source for subsequent Explorations, enabling you to map out a pipeline based on multiple source streams and transformations. Here we're taking the exploration created just above that joined the two streams together, and using the output as the source for a new Exploration:

Since the Exploration is based on a previous one, the same stream data is available, but with the joins and transformations already applied

From here another transformation could be applied, such as replacing the value of one column conditionally based on that of another

Whilst OSA enables rapid analysis and transformation of inbound streams, it also lets you stream the transformed results outside of OSA, to a Target as we saw in the Kafka example above. As well as Kafka other technologies are supported as targets, including a REST endpoint, or a simple CSV file.

With a target configured, as well as an Exploration based on the output of another, the Topology comes in handy for visualising the flow of data. You can access this from the Topology icon in an Exploration page, or from the dropdown menu on the Catalog page against a given object

In the next post I will look at how Oracle Stream Analytics can be used to analyse, enrich, and publish data to and from Kafka. Stay tuned!

Since the release of OBIEE 12c in 2015, we have received countless inquiries about how we would be offering our training. Our customers are familiar with our ability to provide on-site private training for a team and we are well known for hosting training classes in our offices in the UK and the US. But what most people aren’t aware of is that we now offer OBI remote training.

Our public training schedule offers a variety of courses monthly, some of which are offered exclusively as remote classes. And for any one of our public classes that is hosted in our U.S. offices, we also offer a limited number of seats to remote attendees. What does this mean for you? This means you have options!

One of our goals here at Rittman Mead is to provide unhindered access to the great wealth of information our team has accumulated through their extensive real-world experience. Now we've translated this goal into more accessible training. We understand budgets can be tight and travel may not always be an option for you or your team, but we don’t want that to be the reason you can’t attend our training.

In mid-2015 we started testing our ability to deliver remote training. Our main concern as we began testing was whether we’d be able to deliver the same value to our customers in a digital classroom that we've traditionally been able to deliver in a physical classroom. Our fear was that when you lost the face-to-face interaction between the instructor and students, you would also lose some of the rhythm and chemistry of the training, and, consequently, our students would feel less engaged. Other more technical concerns were on our minds, ranging from sound and video quality to connectivity. Much to our surprise and satisfaction, however, our concerns quickly dissolved as, time after time, we were able to deliver the training without issue.

So after plenty of testing, we are pleased to offer remote training as a regular option in our training schedule.

We are aware that remote training (or online training) has been around for some time—we are not claiming to be innovators in the ways of online learning—but we feel that the platform for online learning has finally reached a level that is in line with the quality we demand for our training.

In fact, we have consistently received high marks from customers who have attended our remote training, solidifying our confidence that it does in fact live up to our standards. We invite you to check out our training options. Whether it be on-site training (public or private) or remote training, rest assured that you will be receiving expert-level training from Rittman Mead’s best.

For a full list of our scheduled trainings, see our US or UK calendars.