Category Archive

We’re only one month into 2018 and business is already bursting with dazzling initiatives like the chatbots mentioned in our previous article. But not all the brilliant ideas come from Apple or Google, and today we wanted to place the spotlight into a local community event that has caught our attention: Next week, four of the most prominent communities in …

To continue the previous post NoSQL vs Relational: Which database to use, we will explain the different types of NoSQL databases depending on their target usage. Why should the target usage make a difference? Because bear in mind that when data is saved, not all of it holds the same structure. For example it’s not the same if we work with …

People in the Big Data field are used to a changing landscape. The year 2017 was no exception as it brought many changes in our way of processing, understanding and interacting with data. These are some of the highlights of what happened: Data streaming If you’ve worked with any relational database system (DB2, Oracle, Postgres) or even with traditional Hadoop/Hive, …

In this second part of the Kerberos series, we’re going to review how to configure our system to get a properly secured environment. The post will provide some important tips about the configuration, but it isn’t a typical command guide. Also, if your are starting with the basics, in our previous post Kerberos & Hadoop: Securing Big Data (part I) we …

As announced in a previous post we’re now going to introduce you to Apache Nifi, the latest trend in ingestion tools. A new project from the Apache Software Foundation that allows you to manage data flows with a cool graphical interface. If we didn’t catch your attention yet, wait until you hear this: NSA created it!!! Nifi – the UPS of data …

When I began to use Hadoop with Kerberos I felt as I was in the middle of the ocean. I found a lot of information about Kerberos technology but it was very difficult for me to find something about how to use it on Hadoop, why to use it and how to configure it for working with Hadoop. This trilogy of posts is going to …

Today we are going to talk about Morphlines, an open source framework developed by Cloudera, that provides a new way to do ETL on Hadoop. What are these morphlines? Morphlines are simple configurations files that defines how to transform data on the fly. It consists on a file that describes the steps a data flow has to pass in order to …

Nowadays information collection has changed a lot. Everybody wants to save more data and allow our users to consume that information in real time and in an easy way. This means that performance, scalability and availability are three key factors for database implementations. For this reason NoSQL databases have made their appearance. What’s a NoSQL database? A NoSQL database (“non SQL”, …

To end this series of articles in Ingestion & Searching we are going to see the Flume Architecture for High Availability and see some benchmark tests. Flume Architecture To achieve high availability we have two flume characteristics to play with: 1. File Channel vs Memory Channel This is a decision on 100% delivery vs fast ingestion. With file channel the …