Back from vacation … wow, it always seems to take longer to recover from vacation than the last time, but had a great trip! I saw some amazing places and did some fun things. We got to visit the place they filmed Jurassic Park, but did not see any of the dinosaurs they left behind. Guess they were sleeping.

Right after vacation, I went to a conference on Healthcare technology. During one session, a consultant exclaimed – “We didn’t define requirements for moving to the cloud, because we weren’t going to submit an RFP.” First, I was impressed by the consultant’s score of the holy grail – consulting work without an RFP. Then, I was amazed with the comment. I was shocked by the number of other comments surrounding the fact that companies did not think requirements weren’t needed for cloud work.

Let me ask a question … when you go to buy a car, do you just tell the salesperson “any car would be fine”? Yes, I am equating a car dealership with a cloud provider, but think about it…the dealership has different car types (patterns if you will), different performance levels and different capacities. So again, when you are planning to buy a new car, don’t you at least have an idea of what you want to buy? Things like how fast you want the car to go, how many people you want to place in the car and the reliability features of the car.

So, the next question – is buying a car more important than your company’s systems in the cloud?

Got your attention? I hope so, because requirements are more important for cloud implementations than they are for on-premise implementations! Remember, you are relying on someone else to run your business, shouldn’t you have input on the way it is done?

I am constantly amazed by the consideration that clients entering the cloud think they don’t have a choice about what they want. A main focus of our cloud migration process includes the definition of requirements for the systems while in the cloud.

Why define requirements for the implementation, migration and steady state support for cloud implementations? The same reason you choose the type of car you want … in your mind, you are thinking you want this type of car…

Unfortunately, if you have not communicated correctly with the sales person … you may end up with …

Seriously though, the requirements support multiple phases of the migration to the cloud. The following bullets identify these phases. Keeping with the car analogy, the following table presents some of the areas where requirements impact the implementation of a cloud solution.

Requirements driven results

Car Analogy

Supports the appropriate selection of cloud providers (not just for RFP purposes)

What type of car are you going to purchase and where will you purchase?

Defines success criteria of the implementation including patterns and process

Cloud is easy, right? Someone builds a data center, runs cables, slaps some hardware and software together and *poof*, you are running a cloud. If a company wants to join, they just sign some documents and push their applications into the deep blue and everyone is happy!

Of course, I am being sarcastic. Generally, when we engage a company about a failing cloud initiative or a new cloud initiative, we generally get the same thoughts. Cloud is easy, and anyone can put one together. Even the companies / teams that have failed after 2+ years with nothing to show wonder why they have not delivered.

Over the past 6 years, I have been focused on deploying private cloud implementations for multiple clients. Although most have been on engineered or converged systems, there have been multiple with physical and / or virtual systems over many technologies. These engagements include either “bail-outs” of failed implementations from other big-named firms or a complete end-to-end deployment. For the failed engagements, we generally start with a system health check to identify the challenges. Our thoughts were cloud providers understand the process for on-boarding a client – which is a false statement.

Over these engagements, we have implemented a methodology that has been very successful in the preparation and rapid delivery of cloud implementations. Over the next set of blog posts, I will discuss the methodology and some of the common errors that we have seen.

“We generally start everything with a health check – because everyone is ready for the cloud … right?”

Technology is fun and cloud technology is more fun. The issue that we find very often is that people understand cloud technology, but they don’t understand the cloud. Whether it is a public cloud or a private cloud, most people we talk with understand the benefits of implementing a cloud. The excitement over the benefits often overwhelms corporate leaders with promises of cost savings and rapid deployments. But, when we ask them what they want – generally, we get a blank stare. In fact, if you read my last blog, we get a lot of blank stares at the beginning. In reality, we find the client is ill prepared for the migration to the cloud, either through expectation or level of effort. The blank stare presents the moment when reality becomes apparent.

The blank stares will never cease, as I believe the disconnect is in the expectation, not the comprehension. Regardless of the topic, technology is supposed to be simple. Bottom line, cloud is simple if you are prepared. It is just like riding a bike! But just like riding a bike in today’s world, you have to know what you want and how you are going to use the bike. Then you have to be prepared for the hills, the elements, the virtual riding– so really, is cloud simple?

This is a picture of my bike, in my house “torture chamber”. I can guarantee that it is more complicated than just getting on and peddling!

So, over the next few weeks we will talk about being prepared for the cloud. Whether you are going public cloud or private cloud, the preparation is nearly the same. If you are wondering “why do I need to prepare my environment for a cloud?”, please read-on, it will change the way you deploy and save money.

Here is what I will discuss over the next few weeks …

“Are those requirements? There’s no requirements in the cloud!”

Does a reference architecture really help?

Patterns, not just for sewing anymore!

Service Catalog and Christmas List – hope eternal

Automation, Supply Chain and physics eternal

Technology is fun, and the cloud is fun. As an infrastructure engineer, database architect and application architect, I spend more time debating with myself than other people. They are good debates and I generally stop the discussion before they get violent. I hope we can have some fun as we go through this cloud journey and I welcome comments and thoughts!

Over the past year, I have been involved in numerous Disaster Recovery (DR) engagements – including review and implementations. When I start, my first question is NOT “What is your RTO/RPO?”. (although important).

My first question is “What is your goal?” It is amazing the blank stares I get from this question. Obviously, the thoughts are probably “Idiot, my goal is disaster recovery!” But, I then explain that there are many different types of disasters and many types of disaster recovery options. The most entertaining response I get when I ask this question is “Well, someone told us we needed a disaster recovery solution”. Of course, this is normally in Florida during hurricane season …

Anyway, I thought for a fun refresher, I would throw out the discussion of WHY we do a Disaster Recovery implementation, NOT HOW do we do a DR implementation. Of course, determining how to do a DR implementation is easy once we determine why we want to do a DR solution.

Seriously, most teams jump into the technology solution before considering the requirements or the goals of the solution. So let’s go and have some fun!

Goals of Disaster Recovery

Most people know me as an advocate of flexibility and a DR implementation is no different. The company DR goals should be defined by the type and scope of the disaster. Some types of disasters are obvious, including natural and man-made disasters resulting in a “smoking hole”. However, other failures may also require a disaster declaration or at least utilize the environment for recovering from an incident. The following graphic provides a focus on the goals of a disaster recovery solution.

The DR implementations focus on an Enterprise focus, which maintains a secondary site for their production site. Most of the time, the disaster recovery site reflects a “lights out” location and rarely tested. This implementation only satisfies a compliance person or government compliance check box, which is important. However, how does this support the business objective or the customer’s experience?

The disaster recovery goal should focus on a) How does this support our customer experience and b) how does this allow us to drive business?

After all, the resources for disaster recovery are expensive. Also, the management of data centers and environments requires focus and support of tools and personnel. Why not make these environments work for the company as well?

The following graphic represents a goal oriented focus on the disaster recovery solution. As the goals become more focused and flexible, the beneficiary transitions from internal operations to client experience and business focus.

Expanding the goals of the disaster recovery implementation for flexibility removes the self serving goal of checking a box. I have reviewed DR solutions which force customer applications to fail to disaster recovery due to a business application failure. Or worse, I have evaluated a client environment that isolates customers into separate environments for customer isolation; however, the DR plan requires all customers to transition to DR in the event of a single customer outage.

Isolating applications into groups supporting distinct applications or customer installments is great. Providing a “Site Switching” strategy for each application group is excellent and improves the customer experience and confidence! The isolation of applications, databases and incidents provide an effective solution for disaster recovery. Although moving a mountain during a disaster gets notice. Causing a customer multiple days outage due to a deleted table gets more notice.

As a manager of operations for a workforce staffing company, I recall an incident with Peoplesoft. A very capable person was performing an upgrade of a Peoplesoft application and mistakenly deleted the Vendor table. Our ability to quickly rebuild this table in our DR location and place it back in production not only saved our butts, but save the company 2 – 3 days of embarrassment while we were trying to rebuild the table.

Disaster Recovery Testing Frequency

Testing a disaster recovery cut-over requires coordination and resources. I am often asked “how often should our company perform a DR test?”. The question is simple, but most of the time, misguided.

A disaster is not convenient or forgiving. In operations at the time of a disaster, everything is on fire and everyone is yelling. Only experience and muscle memory provides the difference between minutes and days.

Testing DR is not proving the ability to move applications from one point to another, that is done daily. The POINT to testing DR is for providing experience and process for the individuals.

In one of my favorite movies – The Last Samurai – Tom Cruise’s character challenges a soldier to shoot him while being attacked. Of course the soldier panics and fails the challenge and the movie continues. But, the point is clear – things change in times of stress!

The following graphic provides a guide line for when to test the switchover for the efficiency of the team.

As the testing flows down the graphic, the team’s efficiencies also improve. As we test the common Site Switching on a monthly or quarterly basis, the experience of the supporting teams increase as well. Therefore, the testing of the Enterprise cutover would encompass smaller flexible site transitions, which are tested frequently.

Can your team perform a disaster recovery test? Can your team DEFINE a disaster for your operations? If not, you have some homework!

Better question, when was the last time you checked your spare tire? When was the last time you changed your tire? Your tire will not go flat when it is convenient, it will only go flat when it is raining and on the freeway.

My last story … when I was a director of operations, I managed a team of DBA and system administrators. I had a triage DBA that rotated between application and triage, but during triage they handled the weekly issues and requests. In the middle of this stressful period, I would write the name of a database, a date and a time on a piece of paper and put it on their desk. They would have to recover the database, while they were doing triage, based on that time frame. The process was to keep them sharp, not to test their abilities.

I rarely get nostalgic and think of the “good ole times”, but perhaps this is one of those times. We at Enkitec still joke about the constant use of the term “Best Practices” as the attempts of vendors to sell more product. While we began to hear the term more and more, it always seemed to be a fun discussion when we talked to engineers and friends from Oracle about installing an engineered system. These discussions typically ended with our providing our experience and execution to our friends for consideration. However, lately the ballad of “Best Practices” has left the engineering discussion and moved to consultants.

The latest incident occurred a few weeks ago at a customer location. The vendor’s pre-sales consultants engaged us, and the customer, in a 4 hour discussion about the installation of four engineered systems. This discussion focused around the physical installation; however, did not discuss the application requirements, system requirements or the customer’s infrastructure abilities. After the four hour meeting, these “engineers” left and generated a 13 page document citing “Best Practices” recommendations. What was missing from this document was things like “customer / application requirements”, physical data points, application observances. During the discussion with these consultants, they could not provide solution benefits or experiences, they simply stated “Best Practices” as the answer to each question. I left the meeting desperately wanting those four hours back.

Now, as I ponder this, I lament … “Are the days of actually talking with a customer and defining the best solution for the customer’s situation … gone? “

The jokes around the Enkitec office circled around the laziness of installers, but I am starting to believe the use of “Best Practices” is more of a strategy than plain laziness.

Are “Best Practices” necessary?

I may be alone here, but I believe they are necessary. As a performance engineer for a vendor, I participated in many TPC and AIM benchmarks. Those benchmarks provided a decent baseline for performance in a controlled environment. I think the same is true for best practices. The concept for best practices identifies a baseline of a perfect system or application in a perfect configuration installed by an engineer that did not have anything else to do. As we all know, this is rarely the case within a customer’s infrastructure and application environment. However, the customer can evaluate the solution and the comments at the end of the best practices documentation. These comments provide the pros and cons of each solution. So yes, best practices are necessary, but they are not an excuse. Full disclosure is required.

Is it Lazy?

As I mentioned earlier, we used to joke about the use of best practices. We thought, at the time, that the individual citing best practices were simply using someone else’s work as a reference. Unless the consultant could provide the full disclosure associated with the best practices comment, we generally knew two things: 1) the consultant has probably never installed the solution and 2) the consultant has probably never experienced the solution in the wild. They could stand behind the work of someone else and claim “Best Practices” without having to provide an adequate defense.

So, an inexperienced consultant could provide a “solution” without 1) providing a defense, 2) collecting or gathering data and 3) performing physical analysis of data. Then, to beat everything else, they also relinquish any responsibility for the “solution”. In the past, yes, we would call that lazy. But now, I think it has become a strategy of the consulting firm.

Strategy to level the playing field?

As I sat in my Georgia Tech MBA class on Global Product strategy today, I started wondering if it was not a smart strategy. I began to think, how can a vendor that does not have a history of experienced consulting compete against experienced consulting firms? Utilizing consultants with less than 5 years of operational experience to deliver a sound solution is challenging. However, if you provide the consultant an “equalizer”, such as “Best Practices” in every document, then it is easier to sell the “solution” as a sound solution for the customer. Therefore, the vendor no longer requires a solid staff of experienced engineers, it simply needs to define a generic solution and socialize the solution as a Best Practices solution.

Are Vendor Best Practices real?

The solution is real in most cases and in most cases, a good idea. I believe some one sat in a lab and performed the processes defined in a best practices document. I am also sure that the solution, if performed correctly, provides the benefits as indicated. But, are they really best practices? After all, one primary characteristic of the “Best Practices” definition is the term – Widely Accepted – which means that the solution is widely accepted by the community. However, most vendors publish best practices at the same time a solution is published, which challenges the “widely accepted” requirement of defining a best practice. So, as we implement systems and solutions for customers, we should be wary of the term “Best Practices” as it comes from the specific vendors – as they may be accepted by the vendor, but not widely accepted by the community.

Why be wary?

With respect to citing “Best Practices” as the only way to go, is this a bad thing? As indicated above, there is a tendency to call new technology a type of best practices, although it is not widely used or accepted from the community. Also, utilizing the mantra of best practices, it validates consultants that may not ultimately understand the technology or the use of the technology. The inexperienced consultant will cite “Best Practices” as the reason for implementing a solution, regardless of the benefit or detriment to the client. Therefore, as with anything else, we have to do our homework to make sure a solution is widely accepted and is in the best interest of the customer.

It’s all about the customer

Why do we care about the socialization of best practices? Because, in the end, we end up having to rescue customers from the latest “Best Practice”. Most customers don’t have the luxury of a test lab and some don’t have the skilled resources dedicated to the solution. Most customer resources play the role of utility player, knowing how to support an assortment of products at a high level.

How do we approach a customer to know when to implement the best practices stated by these vendors? As consultants, we should do as we have always done:

Listen and understand: Listen to the customer and understand what their team can implement and support. Just because a best practice is written, does not mean that it will fit in a customer’s environment. Our role as consultants produces the expectation to provide the best solution for the customer’s environment.

Understand the technology: Don’t recommend a product because it is the latest technology. Recommend the product because it support’s the customer’s requirements and provides flexibility. Sometimes a best practice uniquely leverages a vendor’s product, which limits the flexibility for growth or integration with other products.

Read the fine print: Most best practices come with multiple implementation options – just as most technologies. Although rarely stated by consultants, probably because they don’t understand, these options come with benefits or deficiencies. Some of these indicate the solution is complicated to implement or support. The issue may include a costly implementation due to licensing.

As I step away from my computer, I will maintain the traditions of most experienced consultants. I will continue to evaluate technology in terms of how it helps customers. Technology is a tool for us to use to meet our requirements and provide us benefit. Too often, technology sold to a customer becomes an entrapment into a vendor or solution and becomes a cage.

I guess, as I look at it, the above 3 bullets become the “Best Practices” for consultants. Remember, success is gauged by a successful customer implementation, not a technology implementation. There have been many successful technology implementations that served little purpose.

I had the opportunity to install one of the first Exadata X4-2 frames prior to Oracle’s announcement, which occurred on December 11, 2013. The Exadata X4 improves on the already popular Exadata brand, while proving the scalability and flexibility of the Exadata platform.

The installation also introduces new versions of software. The new software includes software for the configuration tool, installation and storage cell. This blog post addresses the changes associated with the configuration tool and the installation process. I am sure that we will be posting more blogs with respect to the storage cell software changes as we continue to test in our lab.

Configuration Tool

The new configuration tool represents a completely re-vamped configuration tool. The new configuration tool supports the old platforms, including Windows and Linux. However, they also include support for the Mac OS platform. As a Mac user, I am very happy for this addition, which makes it easy for preparing customer installation documents.

Apart from the fact that the configuration tool now supports Linux, Windows and MacOS, there are multiple changes within the tool and the output of the tool. I will post another blog supporting the complete changes to the new configuration tool.

With respect to the X4 implementation, the changes to the output of the configuration tool represent a “no-nonsense” approach to the new tool. The following bullet points outline the new output.

<cluster>-checkip.sh – Reads the “xml” file and performs the checkip process that was a pre-requisite of the old installation process. This file was added with the December version of the installation files and was not originally planned.

<cluster>-InstallationTemplate.html – represents a new layout of the installation template from before. The new layout includes a new table identifying most of the required information in a much smaller file. Although useful, the new layout leaves some detail out. I believe Oracle is still adjusting this information.

<cluster>-preconf_rack_0.csv – Represents the “preconf.csv” file from before, which is used during the “apply config” procedure. This file supports the definition of the IP addresses for all the Exadata machines.

First Look

So, as expected, the visual inspection of the new Exadata X4 does not reveal anything different from the Exadata X3 frame. The quarter rack X4 looks the same as the quarter rack X3. However, the new half and full rack X4 will be different than the standard X3 rack of the same configuration. With the X4 standard configurations for the half and full rack, the Infiniband Spine switch is not included as before. However, upon detailed inspection of each component, the changes are visible.

Compute Node

The compute node details represent the changes to the configuration of the compute nodes. These changes include larger local disks, new processor class and core count as well as changes in the memory configuration.

The storage processor output reveals the new frame type, the machine identifier is removed for customer anonymity.

Internally, the review of the processors reveals forty-eight entries like the one listed below.

The forty-eight entries represent two processors with twelve cores for a total of twenty-four cores. Each core is dual threaded, which provides the forty-eight count.

At the memory level, the Exadata X4 provides a minimum of 256GB of RAM, which is expandable to 512GB of RAM. The display below represents the customer’s minimum configuration of 256GB of RAM.

The rest of the compute node configuration remains consistent with the X3 implementation.

Storage Cell

The storage cell configuration includes the same number of processors, but additional memory and different disk configuration options. Also, the X4 includes the new storage server software, as indicated in the below diagram of the imageinfo command.

As indicated in the following diagram, Oracle increased the amount of memory for the storage cells from 64GB RAM to 96GB RAM.

The capacity of the storage cell components increase as well. These components include the flash memory and the physical disk. The Exadata X4 storage cell disk options include either a 1.2 TB 10,000 RPM disk high performance drive or a 4 TB 7,200 RPM disk high capacity drive. The associated diagram represents the customer’s choice of the High Capacity selection with the 4TB drive.

The flash component includes four F80 PCIe cards, each with 4 200GB flash modules as presented in the below capture.

The following diagram represents the “list cell physicaldisk” presenting the 12 physical disks and the four F80 flash cards with four independent flash modules each. The total amount of flash by cell is now 3.2 TB of flash cache.

Configuration Process

The physical implementation process contains the same steps for the hardware configuration. However, the expansion of the local drives and a modification to the “reclaimdisks.sh” represent a change in the duration of the pre-configuration.

In the past configurations, the reclaim disk process would run in the background and would run in about one hour. At the end of the disk reclaim process, the nodes would re-boot.

However, the new reclaim disk process forces the reboot of each associated node and then executes the disk rebuild before network services are available. The only way to monitor the reclaim disk (or access the system) is through the console. The new reclaim disk process takes approximately three hours, as indicated in the below capture of the console display.

As indicated above, the new size of the local drives (600GB) contributes to the new duration of the disk reclamation process.

At the end of the disk reclamation process, the compute nodes reboot and the Exadata frame is ready for the IP assignment through the “applyconfig.sh” process. At the completion of the “applyconfig.sh” process, the configuration moves from the hardware procedure to the software configuration process and the “OneCommand” initiation.

OneCommand Procedure

With the Exadata X4 implementation, the software configuration includes a new “OneCommand” process. This process includes fewer steps than the previous Exadata frame process, but these steps include a consolidation of the old steps.

The following diagram represents the new set of steps for the installation process.

The following sections outline a few notes about the above steps.

The first thing that becomes evident is the “missing” /opt/oracle.SupportTools/onecommand directory. In previous versions of delivered Exadata frames, the “onecommand” directory would contain a version of the onecommand scripts. Generally, we would replace this directory with a copy of the latest onecommand scripts downloaded from MOS.

The new implementation implies a direct correlation between the latest MOS version and the new configuration script. This correlation also challenges the old “image” process that some installers utilize, as the image may change with each patch update.

Location for Implementation Files

After the new onecommand directory is populated, the preparation step consists of loading the “required” files for the installation. These required files include the Oracle distribution files, the latest patch files and a version of OPatch placed in a staging directory. With the new configuration process, the new staging directory is now /opt/oracle.SupportTools/onecommand/WorkDir.

The second step in the configuration process validates these files are placed in the correct location.

The other change, with respect to the installation process, consists of the execution of the “OneCommand” process. The new configuration process requires the execution of the install script, the identification of the configuration script and the step. The following command executes the “list step” process from the install command.

# cd /opt/oracle.SupportTools/onecommand

# ./install.sh –cf <cluster>.xml –l

The execution of specific steps include the following command.

# ./install.sh –cf <cluster>.xml –s <step #>

The log files supporting each step are now located in the following location:

/opt/oracle.SupportTools/onecommand/log

With future blogs, I will review the configuration process for the new Exadata environment.