What do you like best?

I especially like how easy it is to trigger commands, not only from the Qubole console but also using the REST API to trigger actions from our custom scripts. Also, being able to switch between technologies such as Hive, Presto and Spark with a simple dropdown is great. Finally, the capability of switching between environments to sandbox changes

What do you dislike?

Notebooks are sometimes unstable and we get weird exceptions that are solved by waiting for a little while and retrying. Also, the displayed tables are not automatically refreshed.

Qubole lacks some good videos to explain how to configure the interpreters and add custom artifacts; I had a hard time finding that out on external forums

Recommendations to others considering the product

More documentation on the REST APIs to facilitate the usage

What business problems are you solving with the product? What benefits have you realized?

Running adhoc queries on our datalakes and increase/decrease the size of our cluster on demand. We also connect our scheduler to this platform to submit jobs on a regular basis. Whenever we want to take a look at something in particular, we review the logs and find the job id to dig into the logs. The most important benefit we've experienced is the cost reduction of using our cloud provider since Qubole expands and shrinks the clusters accordingly and it is extremely easy to use.

Sign in to G2 Crowd to see what your connections have to say about Qubole

What do you like best?

Qubole provides easily accessible history of all queries with results, I can share Query ID with query itself and it's results with someone else in my team or across the company. This is the killing feature really. Also clusters are started, scaled and shut down on demand. Also excellent Qubole support. They help to resolve all problems, simple and complex ones. Qubole also developed a lot of features, like Hive-JDBC-Storage-Handler for accessing other JDBC data sources from Hive, and many other features.

What do you dislike?

Impossible to run Hive from the shell script. Impossible to execute simple light-weight shell without Map-Reduce started. When using GUI, need to re-login every few minutes, this is annoying feature. I'd prefer my sessions not to expire during my working day. Cost views are heavy and it seems query costs even when it stuck waiting for resources and not running at all.

Recommendations to others considering the product

You can access lot of tools for data processing such as Presto, Hive, RDBMS, etc. using the same API and GUI. Also Qubole supports AirFlow clusters.

What business problems are you solving with the product? What benefits have you realized?

Running Data Warehouse using different tools using the same GUI and API. Easy development because of reach unified API.

What do you like best?

Qubole simplifies the operational side of running Spark clusters and jobs, both scheduled and ad-hoc. It allows researchers and data engineers to focus on business logic instead of platform maintenance and tuning.

What do you dislike?

Once you have a hammer, every problem starts to look like a nail. Sometimes the overhead of running a Spark job dominates the actual data processing work to be done. See the blog post "Don't use Hadoop when your data isn't that big" and think about whether you might achieve faster turnaround without reaching for Spark. Initial exploratory work, especially, can go faster if you just run everything from a fast laptop without any distributed processing.

Recommendations to others considering the product

It's a solid choice if you want some of the most popular Big Data tools and don't want to spend time maintaining them yourself.

What business problems are you solving with the product? What benefits have you realized?

We're performing content and user classification, content NLP, and user activity analytics with Qubole. We've been able to standardize around Spark and Qubole for batch jobs so that there's a common reference framework used by everyone on the team. We also don't need to devote time to maintaining the cluster and can focus on business logic.

What do you like best?

Easy to setup, use, and maintain. End to end Big Data (Lake) Complete Platform within less than an hour setup. A no brainer for any company who want to have a platform for experience users to use, or even for new beginner user to start learning and advancing their knowledge. Proven on my previous company, within 6 months, have more than 10 Data Engineers with great technical expertise with Spark, Hadoop/Hive, and Airflow.

What do you dislike?

Initially, the inconsistency on product releases, that can cause unexpected errors because of bugs. But over the time has improved significantly.

Recommendations to others considering the product

No Brainer for companies who are looking for cloud native Big Data Platform. Simple to setup, great framework and environment for new user to learn Hadoop, spark, airflow, and big data in general (including Python, Data Science stuff, ML). Cost optimization, fantastic auto scaling capabilities, and fantastic Presto performance for Analytics.

What business problems are you solving with the product? What benefits have you realized?

What do you like best?

I do not have to worry about any installation or set up process. I just select the config I need and I can start working on spark or presto. There are some configurations that must be done (connection to S3, keys, iAM roles and so on), however, they pale in comparison to running an installation from scratch.

Scaling out or scaling up is also made simple by just choosing a few things from the cluster config options.

Built in solutions like Scheduler help with scheduling and automating a few jobs, and when things get too large, you can use airflow on Qubole.

What do you dislike?

Sometimes there are things that work on Zeppelin or the underlying technology in general, but due to some issues with Qubole value added product, these things fail (example is connection between spark & redshift). When these things are highlighted to the support team, they get addressed, but the resolution time varies.

Recommendations to others considering the product

It is very convenient, but it has some issues. The support team is quite active and they address measure problems immediately (if problems were not fixed, the root cause analysis is provided along with a fix estimation)

What business problems are you solving with the product? What benefits have you realized?

What do you like best?

With Qubole, our non-engineering colleagues are no longer required to talk to engineers in order to get immediate insights out of our data lake. This saves time for both sides, such that everyone can focus on what they do best.

What do you dislike?

I think Qubole's UI could be a little friendlier when it comes to displaying query execution progress. It's easy for me to parse as an engineer, but some non-technical colleagues find it difficult to determine if a query is on track to complete or not.

Recommendations to others considering the product

Be sure to have clear guidance regarding creation of tables. It's too easy to end up with a big pile of tables that lack consistent ownership. This leads to confusion on the part of users, as well as operators who cannot tell which tables are safe to delete, or when maintenance can be performed on underlying data. In addition, keeping this information in a centralized location provides more opportunities for tighter access control at the underlying store level.

What business problems are you solving with the product? What benefits have you realized?

The most important one is the ability of product managers to quickly gain insights into patterns hidden in the data in order to deliver a more impactful product to our customers. This results in less rework by engineers later on. Everybody wins!

What do you like best?

I like the simplicity of the Analyze interface as well as the built-in scheduling capabilities. The functionality to search through old queries, both saved and unsaved, is powerful and intuitive. Additionally, I like the ability to easily manage multiple clusters for managing workloads of varying size. Moreover, the available cluster UI makes it simpler to investigate failed queries.

What do you dislike?

I'd like to be able to set default values for schema so that I don't need to specify schema every time if I am predominantly referencing just one schema. I also find the log process updates annoying. While it is helpful to see specific log details, I get annoyed scrolling through a list of % completes. I feel that a status bar would better use the screen real estate and make it a bit simpler to read through the logs.

What business problems are you solving with the product? What benefits have you realized?

We are trying to make large, log-level data sources available to cross-functional teams to run both automated and ad hoc analytics to meet their varied needs.

What do you like best?

Autoscaling and way Qubole uses AWS spot instances minimizes compute cost for customers. Our team has experienced close to 30% reduction in cost compared to previous cluster without Qubole.

Amount of time it tales to on board new process onto Qubole is very less, and we can have many clusters each designed for its own purpose.

Qubole also supports notebooks backed by spark clusters, which is very handy for quickly iterating on ideas for developers.

Qubole support is really good, they respond to tickets in timely manner and we always have someone helping us out via slack/emails on time critical projects..

What do you dislike?

Qubole sometimes has lot of bugs and lot of their features are not well documented. You have to engage support to get those things figured out.

Personally Im not very much impressed with UI and interface of qubole query editor, most of times it doesn't show results in well formatted manner.

Recommendations to others considering the product

You will save money if you are currently using public cloud like AWS.

What business problems are you solving with the product? What benefits have you realized?

Ablility to query big data sets for adhoc analysis, generate reports, data processing to generated ML models. We use presto clusters for near realtime queries which mostly support our backend, performance of these clusters is really amazing.

What do you like best?

I really like that tables are easily achievable and i can fast switch between different tables and see the content. Also I can see what should be queried mandatory.

What do you dislike?

I don't like that I can open a table content only once being on the same page. Sometimes I can close the table content, but then I need to open it again, and I can't do that before refreshing the whole page

Recommendations to others considering the product

Would be amazing to have this product more easy to use and improve the errors description.

What business problems are you solving with the product? What benefits have you realized?

We are using cubole to have an access to huge set of data we are operating and can't put everything in our UI.

What do you like best?

I like the features within the Qubole program such as templates and the sheduler. I use the templates regularly which makes it very easy to use when i need to regulalry run specific queries.

What do you dislike?

I dont know if this could be a bit optimistic or whether it is possible - but maybe there could be more help with troubleshooting queries. Also, query run times can vary.

What business problems are you solving with the product? What benefits have you realized?

Working in analysis, Qubole has enabled me to be able to analyse our clients website more granularly. For example we have been able to segment out their website using qubole queries to create our custom pixels and analyse specificly on these segments. This can provide more in depth analysis which the client is always looking for.

What do you like best?

The interface is amazing. With respect to queries and all in handling data, it looks really easy. but at the same time, there is no visualization tool which can help us make dashboards for the queries. The tool looks complete since we can schedule queries as well as also work on large queries smoothly. The scheduler and crones are the best things I and my team uses the most.

What do you dislike?

There is no info available with respect to the limit available on the system for a particular queries. A lot of time it takes hours to run a query and it fails in the end due to limit on the amount of resources. There is no visualization for the data. Data is visible only in tabular format but no dashboard is available to track data from queries.

Recommendations to others considering the product

It is one of the easiest places for analysts people to run queries, and manage it. Despite not having enough knowledge about scheduler and crones and their background working, We are able to run large scale queries in order to make the best possible insights.

One thing that bothers us is after fetching data from Qubole, we have to upload it to some dashboard management tools.

Out of 10, would grade it 9 with respect to ease of use.

Can improve on dashboard fact I believe

What business problems are you solving with the product? What benefits have you realized?

Analyzing large scale data and making meaningful insights out of it. We run, schedule and manage queries over it and use it to run about 100s of queries per day

What do you like best?

I really like the ease of use of the platform. Several SQL languages are available within the platform, but I typically find the Presto SQL is sufficient to meet most of my needs and is easy to use for anyone with any prior SQL knowledge.

What do you dislike?

There are some shortcuts that other database management software companies offer that are not available in Qubole. For instance, you cannot set a default schema, so the schema must be explicitly called for every table in a query. Additionally, when using the 'Explore' tab to manually review data in a table, you cannot set-limits for the number of rows you'd like to see. Simple dimensional tables that have over ~50 rows won't display entirely due to the software's default setting of returning only 50 rows or so.

What business problems are you solving with the product? What benefits have you realized?

We are using Qubole to manage data from various external sources in a data lake. My role then frequently queries the data lake to develop standard reporting and analytics for internal teams.

What do you like best?

It's much more complete when compared to other products but at the same time also managing and handling queries or jobs are at times failed because of bugs from qubole's end which are managed very late or are time consuming

What do you dislike?

Need to talk to the executive a lot of times while handling issues. Many times the answer is this will be fixed in next release but that is way far in future

Recommendations to others considering the product

A lot of times the customer would need to have a feature or things being fixed as soon as possible. It can't wait for the next release to come up, mostly when its in the production environment. It'll be great in terms of handling if the requests are made quickly without hassle

What business problems are you solving with the product? What benefits have you realized?

Hadoop cluster runs over qubole'. Easy to manage resources, just need to add or subtract the count. There is no option to remove a specific nodes but still it's very easy to manage and complete as it contains scheduler, Crons etc.

What do you like best?

Notebook is great for manipulating s3 data in all kinds of ways. I develop spark scala code, test it, then create scheduler job to run it. I can query files for specific rows, or counts of specific data in parquet or avro.

What do you dislike?

Notebooks are not hooked up to git because the security setup is not simple and our company has not set it up.

What do you like best?

Qubole has a very intuitive user interface for data scientist to run queries as well as spark jobs. It also provides in-built Notebooks that help data scientists to experiment and obtain results at a fast pace without worrying about the infrastructure setup.

What do you like best?

What do you dislike?

Recommendations to others considering the product

Reach out to support often. The team is responsive and hands on!

What business problems are you solving with the product? What benefits have you realized?

We are a team with mixed technical experience. We use Qubole to query data across various clusters. Qubole is user friendly and allows the analysts to run queries in languages like Hive and track progress and failures which may be reported to engineering teams.

What do you like best?

The fact that it can autoscale based on needs is very important when it comes to running extermely large data processing and also reducing cost when not needed. Most of the latest big data tools are available in one place to pick and choose from.

What do you dislike?

Would be nice to have some scheduling mechanism to run automated jobs

What business problems are you solving with the product? What benefits have you realized?

What do you like best?

Tracking of jobs, history, etc. selection of clusters, env.

What do you dislike?

the user interface has many issues... like how about being able to highlight, copy and paste text from the navigator on the left. table lists, s3 lists sometimes stop expanding and have to reload. Attempting to rename a task is unpredictable, sometimes it works, mostly it does not.

Recommendations to others considering the product

Qubole is excellent for simplified access to various data sources. There's really no other option out there that I would recommend. Notebooks is a must-have for the speed of execution if you can wait a few minutes, stick with Analyze.

What business problems are you solving with the product? What benefits have you realized?

What do you like best?

Qubole is a simple entry into big data analytics.

What do you dislike?

I don't like the speed of updates to Airflow clusters.

Recommendations to others considering the product

What about Serverless? Qubole currently requires someone to manage/configure the different cluster. I seems that most cloud service providers are moving to serverless approach (e.g. BigQuery, AWS GLUE)

What business problems are you solving with the product? What benefits have you realized?

Learning about Qubole?

* We monitor all Qubole reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. Validated reviews require the user to submit a screenshot of the product containing their user ID, in order to verify a user is an actual user of the product.