Our lovechild aptly named LightlyCloudy, was as you know - spawned by Quick and Dirty, and YOU are hereby invite to the baby shower. As the proud Godfather I'm back as promised, I have been doing some node.js programming and completed first production release of lightlycloudy. This is what users are met with after they login, quite soothing on the eys.

I have been heavily critized for that it's looks too much like the IBM or BSOD color scheme, I have carefully listend to the cirtisizm, and subsequently chosen to ignore it completely. I have shown the source code to some collegues and asked them to help me sexify the GUI. One guy willingly said "no, problem, I will do that before breakfast, what frameworks do you use?" and I replied "none...." and then he said "...oh, so it's really old skool", and then I didn't hear from him again, typical.

To summarize the lessens learned during this process were;

Asynchronus programming i.e. debugging is quite interesting, just like this blog entry suggests, it's like learning to program all over again.

The SmartOS commands vmadm, imgadam and nictagadm, does not strictly adhere to standard UNIX conventions regarding stdout stderr exitcode, which makes debugging a hassle combined with the asynchronus nature of node.js.

Sometimes it's just easier and faster to make a shell script than mess around with asyncronus node.js code. For this reason we made the nictagvms script which resides in the /usr/local/bin directory in the lightlycloudy zone, but is actually executed from the global zone. This way all code can be contained within the lightlycloudy zone.

Dependencies are still not good. I started using off by using the SSH2 framework, but due to a buffer error in the framework, I decided to drop it and use the built in function for executing shell commands.So there are no frameworks whatsoever, it's pure uncut node.js straight from the source.

Speaking of source, you can download it from Github There is a README included file which contains the very simple installation procedure.

Now we just need a project mascot, I was thinking something soft, cuddely and nice smelling like Ashley, click on the picture to see the full picture.

So now I'm working on version 2 of the application. I will try to incorporate support for multiple hypervisors and prerequisite check for creating and updating virtual machines.

I'm still looking for contributers, so don't be reluctant, it's my party and YOU are invited.

Yes, yes, FiFo is still down and customers are crying their eyes out, and I have listed the various alternatives to FiFo in my last blog, now - which one of these options would satisfy all our needs?

None.... so my solution is; write a new GUI ourselves, I mean how difficult can it be? My argument is that since we only need basic functionality to start with, then all we have to do is write some kind of webified wrapper for imgadm and vmadm which seems easy peasy.

As soon as I had finished my thought, the young office punk started to argue that it was a complete waste of time. It is much better to contribute to the existing FiFo project in unison with all the other contributors and reap all the inherent benefits of a real open source project he argued. But I was not entirely convinced, the FiFo project has already grown out of proportion and I was confident that no one else would subscribe to my minimalistic approach.

In the above picture the resident office punk is in the process of convincing me (it's me on the floor) about all the happy joys and benefits of Open Source and that sharing is caring. After being overwhelmed by loving arguments, I agreed to publish the source code on Github if ever I got around to program the darn thing.

In an effort to keep things simple and not fall in the same pitfalls as FiFo it is nessecery to keep ambitions to an absolute minimum, or even better, don't have any ambitions at all. In any case the ambitions or lack of same amount to:

Main features

Simple Graphical User Interface written in node.js running in a single SmartOS Zone. Management is done via SSH into Global Zone. All machine and User infoirmation are stored in JSON files.User sessions are tracked with a single cookie.

User Access Control

Logon

Logoff

Virtual Machine administration

List

Create

Delete

Start/Stop/Reboot

Update

Snapshot create delete rollback

Get VNC info

Automatic IP delegation

As you can probably see from above, it is very lightweight; hence the name LightlyCloudy for this project was born. There is only one external dependency which is SSH2 and it's used for the purpose of SSH'ing into the Global Zone and execute the imgadm and vmadm commands. And you will also notice the complete lack of a backend database, there are simply no justification for using a database and add unnecessary complexity with the amount of users we plan to support.

Since we have no immediate plans to save the entire world within the next weeks, there are some features that will have to wait for version 2.0 of LightlyCloudy. They are, just to list a few; support for multiple hypervisors, virtual network administration, prerequisite checks, and performance considerations.

Now with a clear plan in mind I started to write the all the code by myself, I'm still looking for voluntary contributors, but haven't found anyone yet, I'm considering to adopt the convincing methods I've learned earlier.

Anyway, stay tuned for next episode, where I will share the code and experiences gained with anyone interested.

In my last blog we saw how to manage the private cloud, and now the unthinkable happend, suddently our management system called FiFo stopped working all by it self...When clickking on the tabs it does not display any content, and below is a screenshot of what the js console reports:

Fortunately the underlying SmartOS and all the Virtual Machines keept on working as if nothing happend at all.. But now our users are unhappy because they can't manage their Virtual Machines through a GUI, I think the below picture tells more than a thousand words :-(

That's me on the right, trying to comfort one of our dearest customers...

Obviously we sought help in the FiFo googlegroups, and the guys there are pretty fast to respond and offer suggestions. We started to debug as per their suggestions and this checklist, but as we soon discovered, it was an insurmountable task due to the complexity of FiFo, even the log files did not give any hints as to the root cause of the problem.

So we abandoned the attempts to recover FiFo and decided to go for a new install, but unfortunately the FiFo database would also be overwritten by a new install, so this was not a viable option as we have a lot of user information in the database. Subsequently the new install attempt was also abandoned.

As a last resort we followed the upgrade guide from a-z, the upgrade was successful - but FiFo refused to restart after the upgrade, and our customers remains unhappy, what to do? Obviously FiFo is time being not the ideal way for us to pursue, we have to look into alternatives.

So what are the alternatives? well, if we look in the market there are OpenNebula, OpenStack, CloudStack and obviously SmartDataCenter. SmartDataCenter is not really an option, since it costs a lot of money, and we can't justify such expenses at the moment. What concerns the first three alternatives there are no support for the SmartOS hypervisor and there are no current plans to support SmartOS.

OpenNebula, OpenStack, CloudStack all support the AWS EC2 API, which means that we could write our own web service that runs on SmartOS and is EC2 compatible. Again the task is complex, and then there are all of the subsequent maintenance to secure continued compatibility.

We could also contribute to the FiFo project - but again, it's complex and FiFo tries to be omnipotent just like the other four alternatives, and we don't have time to wait for a workable product when our customers demand it now and not tomorrow!

Stay tuned for the next exiting part in this series, to see how we solved the problem.

In my last post I covered the process of seting up a private cloud. Ok, so now we have the darn thing up and running, and the big question remains, how do we sell it to our internal and external customers?

As with all things you must give the people what they want and they will come, and only thereafter you can sell them what they yet don't know they need. So, in Anno Domini 2014 the customers crave virtualisation because then they can continue using their familiar legacy environmets.

Fortunately SmartOS have KVM feature that supports virtualisation of most types of OS'es, which is all fine, but how can you provide that all important self-service to your customers? most of our end users are command line impared, so what we really need is a fancy GUI that will enable customers to easily create and manage their virtual machines on a wholesale basis.

There are currently only two such cloud orchestration products available, the first is Joyents propritary solution the SmartDataCenter, which is a complete enterprise solution, and then Project FiFo which also is enterprise (like) and at that also open-source.

The choise was fairly simple for us, since we time being can't justify the purchase of a copy of Joyents SmartDataCenter, we are left with only Project FiFo. We downloaded version 0.4.1 - "Hopping Husky" and the installation was simple and straight forward, and we had a usable GUI up and running in absolutely no time.

FiFo want's to do everything you would ever need when you have a private cloud to manage.... except serving virtual machine images. What FiFo does instead, is to connect to e.g. datasets.at and datasets.joyent.com but none of those servers offer Windows images for obvious license reasons, since our marketing stategy is to proliferate our private cloud among Windows developers too, we need to create our own image server.

If you wan't to make your own image server there are two alternatives. The first solution is suggested by Joyent Creating a Poor Man's Image Server, but FiFo is only compatible with the new API so this solution is not workable. The second alternative is Setting up a local Dataset API repository for SmartOS (dsapi) the setup is fairly simple and straight forward.

The only minor complaint about this image server is that it is using memory ad libitum, you need to allocate memory that is twice the size of your image file. In our case we have a 8Gb Windows Server 2012 image, so we need to permanently allocate 16 Gb of RAM, which is kind of a waste of memory.

Now we have our private cloud completely up and running, and everyone is happy. Above is a picture of me (left) with one of our numerous happy users.. to be continued.

SmartOS seems to be the logical choice if you don’t want to go down the path trodden by Oracle, Microsoft, Citrix, VMware and their rather bloated cloud solutions. There are other good reasons why to choose SmartOS, but is better if you read some of the many other blogs covering this issue. So, SmartOS it is!

Since I have access to our own friendly datacenter, I thought it would be quickest to boot SmartOS on some of their Cisco UCS B200 M2 servers. But here we encountered the first problem, apparently the available network drivers are not SmartOS compatible, and the attempt to make SmartOS run was aborted by the hosting techies. Here is the error they showed me:

Instead of wasting more time I decided to procure some hardware that is verified to work with SmartOS from the "Works for Me" Hardware Configurations page. The decision fell on the following configuration X9DR7-LN4F from Supermicro, 2 x Intel E5-2620 2GHz 6-core, 256MB DDR3 1600MHz, 6 x 1TB SAS2 discs.

Here is a picture of my first server in an old chieftec chassis, stuffed like a Christmas turkey. Note the USB stick in the bottom, I followed the instructions on Creating a SmartOS Bootable USB Key and then SmartOS booted without problems from the USB key.

Being happy with the chosen configuration I ordered a second server with the same configuration, fully assembled and burn-in tested. The costs amount to about €4,300 for this fine server (Anno Domini 2013). So now we have two similar servers enabling us to test in a real private cloud environment.

The next step is to rack mount both servers in our top-secure-redundant datacenter.

I recently started experimenting with smartOS from Joyent, you know, the Solaris based virtualisation OS. Being a long time Windows user I'm easily impressed by the various features offered by other *UX like OS'es, and this time it is the hotplug disks and the zpool command that have my interest. In windows it is always a pain to expand disk capacity on a live installation, so behold my amazement when I discovered how simple it is done under ZFS ....

And the ZFS pool 'zones' consists of only one disk 'c0d1'. The next thing to do is to plug in a few more disks, however until my new Blade serveres have been provisioned I'm unfortunately stuck with my VMware, so it is not possible to simulate hotplug without restarting the VM. Anyway I created a 5G (c0d0) disk on the PCI bus and a 7G (c2t1d0) disk on SCSI bus, and rebooted the entire thing in order to run the format command.

Recently I was approached by a colleague who asked me a couple of questions about ThreadPool. Firstly what is the maximum number of concurrent threads when using ThreadPool? And secondly he was experiencing some undesirable uneven processor load on a multi core system, what could be the reason? All in all some interesting questions.

The Windows operating system have no hardcoded fixed upper thread limit, but the number of threads is ultimately limited by the size of available memory and other basic resources. Generally speaking on a 64-bit system the default stack reserve for a single thread is 256K and typically 24K is committed. I suggest reading Pushing the Limits of Windows: Processes and Threads by Mark Russinovich for a more in-depth discussion about memory limits.

But according to Microsoft documentation, the ThreadPool itself has a built-in upper limit of 32.737 threads on 64 bit platforms as of .NET version 4.0 (however only 1.023 threads on 32 bit platforms). Ok, that was pretty easy, but unfortunately here in the year 2013 it is only possible to exhaust the ThreadPool and still have a responsive system, if your threads aren’t doing any real work. In real life applications you will hit other constraints before you manage to exhaust the ThreadPool.

Besides the memory limitations, the other obvious limitation is the number of physical CPU threads. These threads also have to serve the ThreadPool along with the operating system, needless to say that CPU time slices quickly can become a limited resource. The best practice for economizing with CPU threads is to suspend ThreadPool thread execution whenever possible and especially by using asynchronous methods, following this simple design rule will give you better thread mileage even if you are not performing CPU intensive tasks.

Another major constraint is when your application is relying on 3rd. party services, this is very difficult to diagnose because it is not always obvious when you have reached a limit, and the lack of immediate performance indicators can make the diagnostics an exercise in guesswork. I found that, implementing “thread wait reporting” is very helpful i.e. when a thread reaches a certain wait threshold it reports the offending task e.g. a WebRequest, and from then on it is a simple matter of addition to identify the bottleneck. Remember that your multimega-threaded high-performance application is only as high-performance as the bottleneck.

So the answer to the question of maximum number of concurrent threads, I would say ”it depends” but theoretically it is 32.737 concurrent threads.

Regarding the matter about undesirable uneven processor load, the ThreadPool was specifically designed to remove all the headaches of manual managing of threads, it has been developed over many years and is doing a very good job at scheduling tasks for execution. Therefore I must admit that I was really surprised to hear about such behavior, so I made a simple application that fires up 4000 threads that each calculates a factorial recursively. As you can see from below taskmanager screenshot, ThredPool does a really awesome job of managing the threads and spreading the load evenly.

So the only explanation that I can come up with is that something in the application have altered the process’s ProcessorAffinity which forces spawned threads to run on a specific subset of processors. As you can see from below taskmanager screenshot, I have set my application to run on only 2 processors, so I think this might be the most plausible explanation.

A final note, ThreadPool may not be suitable for all purposes, but it’s my favorite choice when it comes to quickly fire up several thousands identical threads without having to waste time on thread management considerations.

Above will round sampletime to nearest 10 minutes interval which is desired in some cases, but if you e.g. use aggregate functions, your result will only be correct each n’th minute. Therefore I have created this little query which do not round, and is excellent for graphing time sensitive data.

Below is the output it generates, if @starttime is e.g. 12:13 then the first row will show the average value for QueryDC in the interval between 12:13 and 11:53, and the second row in the interval between 11:53 and 11:33 etc. etc.

I noticed on Tom Kyte’s bog that the Oracle Database Enterprise Edition (EE) option Total Recall has ceased to exist as an separate option at list price of 5.800 $ per processer.

The Functionality has been added to the EE option Oracle Advanced Compression at 11.500 $ per processer with the name “Flasback Data Archive (Total Recall)”.

Not that this the first time Oracle has changed content, add or removed EE options but what is inconvenient is that there is no official placed to follow the changes that Oracle makes over time to product.

The individual Oracle manuals has a part numbers and the “Oracle Database Licensing Information 11g Release 2 (11.2)“ guide is at present called “E10594-26” where the “-26” stands for version/revision. This indicates that the license guide has changed 26 times since it was released but this is not true since I can remember that the first version of this manual that was at OTN was called “E10594-04”.

But back to the problem there is no official way to track changes in the Oracle manuals like there is when upgrading to an Oracle database patch set where it is possible to download the “11.2.0.3 Patch Set – List of Bug Fixes by Problem type [ID 13483003.1]” that describes generic changes in the patch set.

Since changes in the manual could be minor change (spelling error) or like in this case a change of a database option you properly should reference to a part number if it is in a legal context and by the way download the manual(s) since Oracle don’t have place where you can find old versions of the manuals (I know it is possible to Google a part number and find a manual but it does not work for all versions).

I think that Oracle should add a change list to the individual manuals that describes the changes made to the manual at each revision / version and a place where we could find the revision / version.