Blog about DWH, BI, Big Data and other tech

Excel is still commonly used to store some static data, which often needs to be analyzed as part of Power BI reports. There some guidelines about how to connect Excel with Power BI web app. But what if we need to have it in Power BI Desktop?

Also, Power BI is a great tool which allow user to mix different data sources in single report. So, our goal is to configure refresh for the workbook created in Power BI Desktop with mixed data sources: Excel file stored in OneDrive and traditional RDBMS source.

How to capture output from Hive queries in Oozie is an essential question if you’re going to implement any ETL-like solution using Hive. Most commonly used approach is a shell-action, however it requires Hive CLI to be installed on each node, also it doesnt works for remote clusters. Here i wanted to share more generic approach using custom Java action. Continue reading →

In previous post i’ve described flume installation and configuration. I will use the same EC2 node in this article. But everything i talk here will work for any other Apache Flume installation.Continue reading →