Solr Bad Health when using Sentry in Quickstart VM

I have been experimenting with CDH 5.8 in the Quickstart VM toward some initial evaluation work. I wanted to explore Sentry related features in the VM. I setup up Kerberos locally in the VM and installed the Sentry service to the VM cluster. I skipped setting up recommended TLS Encryption since I don't require hardened security configurations for my functional testing. After enabling Sentry service in Solr, I immediately began seeing problems with Solr's health tests.

I was able to do my functional review of Sentry based security features in CDH and just skipped evaluating Solr's security features. I then wanted to play with HDFS Transparent Encryption so I enabled TLS Level 3 Encryption. After I completed my HDFS Encryption evaluation, I decided to go back to the Solr issue to see if it was somehow related to TLS encryption not being enabled. When I enabled Sentry service for Solr, it still had bad health issues.

I'm now curious to debug the Solr problem further, so I'm starting this thread.

Here's a summary of the current behaviour with TLS level 3, Local Kerberos, and Sentry service installed:

When I start Solr with "Sentry Service" Solr config value set to "none", restart the service, then Solr starts successfully and remains in good health.

If I change that setting to "Sentry" for Solr, restart the service, then Solr starts up initially in good health, but once all the health tests become enabled, then it immediately goes Red for Bad Heatlh with the "Solr Server API Liveness" and "Web Server Status" tests failling. The Solr parent process is still running and in good health, but the Solr server itself has died. When I look at /var/log/solr/solr-cmf-solr-SOLR_SERVER-quickstart.cloudera.log.out, there are some log entries that are not observed in the non-Sentry behaviour above. The suspicious log fragments follow below:

At first I thought this might be a red herring but since the origin of the stacktrace is SecureCollectionsHandler, then I thought it could be the root cause after all.

Steps to rule out obvious causes:

1) I thought it might be related to some indices I had created as part of working through the tutorial. To rule this out I disabled Sentry and started up Solr, then removed all existing indices and dependant dashboards via Hue UI. After enabling Sentry for Solr and restarting, the behaviour was unchanged.

2) I searched around for anything to do with the specific error or snapshots in general. The only thing that I found was that there is an HDFS snapshot feature. As a long shot I thought it might be a bug where Solr was never tested with Sentry enabled WITHOUT there being HDFS snapshots present. I enabled HDFS snapshots and took a few while Solr was running with Sentry disabled. I then re-enabled Sentry for Solr and restart to find same behavious with LISTSNAPSHOTS error.

3) Memory issues. I've had problems when turning on TLS in Quickstart VM where there was strange behaviour in Cloudera Manager with Service Roles dying without obvious cause. I ultimately tracked them to extremely low heap sizes being configured below minimum recommendations for manager service roles. In those instances the behaviour was fixed by increasing the memory to affected services to the minimum recommendation. I decided to try this with Solr despite it already being allocated at the minimal levels, so I doubled the heap and total memory allocations for Solr and cold-restarted the service. The bad health behaviour and the LISTSNAPSHOTS error continued.

Any ideas?!

Finally my curiousity got the best of me and I decided to post here to see if anyone has any ideas on how to resolve or at least debug it further.

I would like to be able to explore Solr from a security point of view in the QuickStart VM if possible!

Re: Solr Bad Health when using Sentry in Quickstart VM

I never got any help from Cloudera on this, nor from the community here. I gave up trying to evaluate secure Solr in the Quick Start VM.

The only update, which I have in this vein, is that when installing CDH 5.9 on a single node PoC cluster using Installation Path A and then securing it with Windows AD-provided Kerberos, then this error did not occur. Based on that though, it's still unknown if the problem we are experiencing is related to: A) being in the QuickStart VM vs. from-scratch installation or B) being 5.8 vs. 5.9. C) being secured with standalone MIT Kerberos vs. AD-provided Kerberos.