Issue 1: Disk full
Log in to the vCenter Server Appliance through SSH and enable bash shell with the following command

shell.set --enabled true
shell

now you are in shell run df and found the disk was full on couple of the /var/log partitions. after a few du -sh * command I identified the issue to be 3.6GB audit.log file under /var/log/audit
after deleting that and a few other large log files, thought I resolved the issue.

Issue 2: root user password
On reboot the system was still failing, checked the audit.log file again and realized there was some auth issue, based on this VMware KB looks like I need to change the password for user root.

On more mucking around and looking into /var/log/vmware/eam/eam.log it complained about empty fields so when i looked at /etc/vmware-eam/eam.properties it was empty.
thanks to some kind soul on vmware forums I found a template file and punched in my details
unfortunately there was a error in line 34 should not start with eameam just eam.

If you are behind a proxy and want to proxy docker registry or have multiple machines pulling the same images over and over (CI/CD/ML/DL etc..) and just want to cache them locally the following is a good choice.

create a folder docker-registry-local-cache and create docker-compose.yml file as follows and customize it with your env variables.

runcurl http://10.0.0.7:5000/v2/_catalog
should output in something similar to this{"repositories":[]}

Next configure your docker client to use this mirror. See this previous post on how to do that.

Once client side is configured, you can pull a image from a remote dockerhub via your local mirror. For example rundocker pull ubuntu:17.10
like you normally would. then runcurl http://10.0.0.7:5000/v2/_catalog"
again to see the following{"repositories":["library/ubuntu"]}

This should significantly improve the speed of any subsequent pull from the local clients. Hope you someone finds this useful.

Since its been a while I decided to upgrade my ml box to cuda 9.0, man that was fun, lots of googling with multiple visits to ubuntu and nvidia forums and reading up on several blog posts and stackoverflow articles and almost at the end of the long day am running cuda 9.0, Cudnn 7 and tensorflow 1.5 GPU enabled with models with Keras 2.1.x.

the short version is almost 80% of problems were from lingering packages and changes made to the machine during the last install . So the key is to make sure you roll back and remove the packages cleanly before proceeding. the final step is actually very simple, good job nvidia!.

This option seems to have been significantly improved, it automatically installed the correct nvidia drivers (390.30) via the cuda-drivers package and the blas package (cuda-cublas-9-0) without any mucking around from the user, it does take a while though.

Once its complete, go ahead and reboot the machine and once its back up you should have the nvidia module loaded

If you have come this far, installing theano or tensorflow is pretty trivial these days thanks to anaconda python distribution, in my case i use the miniconda installer and then install the required packages and dependencies.

I finally upgraded from my previous GTX 980 Ti to GTX 1070 last week, unfortunately that meant revisiting some of my previous issues with ubuntu and various incompatibilities among the graphics drivers and cuda components. In any case I decided this time I will document some of this stuff more cleanly so I can refer to it later.