Main menu

Category Archives: Big Data Discovery

Post navigation

Oracle Big Data Discovery (BDD) is a nice visual analytic tools providing powerful capabilities that can turn raw data into business insight within minutes. The knowledge of hadoop or big data is not required. In this blog, I am going to show how to create a simple BDD project and do some interesting analysis within 5 minutes.

Next, go to BDD login page. I just did a fresh install of BDD on our X3 full Rack Big Data Appliance (BDA) a few days back. So I login to URL: https://enkbda1node05:7004/bdd/web/home. The default installation location is 5th node on BDA and port 7003 for HTTP and port 7004 for HTTPS protocol.

After logon, click Add Data Set.

Click Browse button to find out the CSV file just downloaded from USGS website, click Next.

You will see Preview screen. For certain columns don’t really care (in this case, net column), just uncheck the column header, the column data will not show up in the analysis. I unchecked column net, id, updated, depthError, horizontalError, magError, magNst, type, status and locationSource, then click Next. The nice thing about BDD is that it detects the data and set the header information, delimiter and other setting information automatically.

In the Data Set screen, input the data set name, description, and Hive table name. Then click Create.

After the data is loaded and indexed, the Explore screen shows up. You can see there is 9100 records in the dataset and 18 attributes (or columns) are indexed. Click Add to Project on the top right to create a new project for the data set.

Input the Project name and Description, click Add.

You can add one or more attributes to the scratchpad. For example, I want to add Mag column to scratchpad.
After adding to the scratchpad, it shows more detail about the target attribute.

Select latitude column for Latitude and longitude column for Longitude. Give a new attribute name location, then click Add to Script, then click Commit to Project.

You will see the new attribute location is added to data panel, and is committed to the project. After finish, click Discover on the top.

Drag MapComponent from right to the main panel. Automatically a nice map showing the earthquake location shows up.

I want to know anything happened in CA. Click Search button, and input information for LA.

A nice view of earthquake activities show up on the screen.

Ok, next, I want to filter by mag column and want to see only the quakes a litte bigger. I chose mag between 1.8 and 5.2.

You can see majority of the quakes are smaller than 1.8 with only 832 results out of 5070 selected.

It seems there is an issue for BDD to recognize my transformed location column as geocode column. So I modified the CSV file and add a new column called geocode using the rule like LatitudeLongtitude and created a new dataset. In this way, BDD can recognized this new column as geocode-enabled. I added this attribute to the scratchpad.

Goto the Discover page and zoom in this thematic map to the US area.

Zoom in more by checking out California.

From the graph, it tells me the central California area (Mono County) has the largest earthquake activities with 732 records, followed by two South California counties, Riverside County (569) and Imperial County (428). The data covers only 30 days and definitely can not tell whether it is normal or not. It needs to be compared with a long history in the past. I would definitely leave the earthquake forecasting to experts and would not comment anything about the earthquake. Anyway, this blog just demonstrates how easy we can do the data discovery within a few minutes using Oracle BDD. It looks like an impressed tools and and I believe it will have strong potential in the big data world.

Many applications requires to disable firewall on Linux. The most common used commands are as follows:

Stop the ipchains service.
# service ipchains stop
Stop the iptables service.
# service iptables stop
Stop the ipchains service after reboot.
# chkconfig ipchains off
Stop the iptables service after reboot.
# chkconfig iptables off

Another popular one is to set SELINUX=disabled in the /etc/selinux/config file to disable some extra security restrictions.

The above usually works fine with me when turning off firewall. Recently I run into a situation that makes me to add extra check for firewall stuff. The consultant tried to install Oracle Big Data Discovery on a Red Hat Linux VM and connect it to an Oracle Big Data Appliance (BDA) X6-2 Starter Rack. He used similar approaches as above to turn off the firewall and Linux security between this Red Hat VM and BDA. But still run into a weird issue when BDD application on BDA nodes try to pull a request from a web service on this Red Hat VM. The result has never come back.

I tried ping and ssh. Both worked. Hmm, it does show the connectivity between both. Looks like
firewall issue. Check with network infrastructure team. It has firewall rules between the two, but not enabled yet.

I noticed the OS is Red Hat 7.1 Linux. Could be some new firewall feature in 7.1? After some investigation, yes, it does. On Redhat 7 Linux, the firewall run as firewalld daemon. So let me find out what it does.