Posts tagged with 'industry'

I have seen this setup documented a few places, but not for Ubuntu so here it goes.

I have used this many time to verify or diagnose Device Mapper Multipath (DM-MPIO) since it is rather easy to fail a path by switching off one of the network interfaces. Nowaday, I use two KVM virtual machines with two NIC each.

Those steps have been tested on Ubuntu 12.04 (Precise) and Ubuntu 14.04 (Trusty). The DM-MPIO section is mostly a cut and paste of the Ubuntu Server Guide

The virtual machine that will act as the iSCSI target provider is called PreciseS-iscsitarget. The VM that will connect to the target is called PreciseS-iscsi. Each one is configured with two network interfaces (NIC) that get their IP addresses from DHCP. Here is an example of the network configuration file :

We can see in the dmesg output that the new device /dev/sda has been discovered. Format the new disk & create a file system. Then verify that everything is correct by mounting and unmounting the new file system.

All that is remaining is to add an entry to the /etc/fstab file so the file system that we created is mounted automatically at boot. Notice the _netdev entry : this is required otherwise the iSCSI device will not be mounted.

Recently, I have realized a major difference in how customer support is done on Ubuntu.

As you know, Canonical provides official customer support for Ubuntu both on server and desktop. This is the work I do : provice customer with the best level of support on the Ubuntu distribution. This is also what I was doing on my previous job, but for the Red Hat Enterprise Linux and SuSE Linux Enterprise Server distributions.

The major difference that I recently realized is that, unlike my previous work with RHEL & SLES, the result of my work is now available to the whole Ubuntu community, not just to the customers that may for our support.

Here is an example. Recently one of our customer identified a bug with vm-builder in a very specific case. The work that I did on this bug resulted in a patch that I submitted to the developers who accepted its inclusion in the code. In my previous life, this fix would have been made available only to customers paying a subscription to the vendors through their official update or service pack services.

With Ubuntu, through Launchpad and the regular community activity, this fix will become available to the whole community through the standard -updates channel of our public archives.

This is true for the vast majority of the fixes that are provided to our customers. As a matter of fact, the public archives are almost the only channel that we have to provide fixes to our customers, hence making them available to the whole Ubuntu community at the same time. This is different behavior and something that makes me a bit prouder of the work I’m doing.

A while back, Mark talked about some of the professions in the computer industry. Unfortunately I can’t seem trackback those articles. Instead of waiting for him to talk about the on I am part of, I thought of doing it myself.

In a train back from Paris, where I have spent the day doing an intensive session of troubleshooting for one of our customers, I am reminiscing about the work I have achieved today, the good decision I took which helped us identify a potential cause for the suspend/resume issue that we were investigating.

First of all, this time I was not the one with the knowledge : Colin King was. I was there to assist and help in identifying why the laptop was not resuming from suspend. Since the customer plans to deploy 2000 of this specific model, it is better to identify and fix those kind of issues beforehand.

So we went in, him in somewhere in England, me in Paris, both in a constant IRC chat, shooting ideas back and forth. Well mostly him sending ideas and me shooting back the results and observations, things that I was seeing, important or not, trivial or what I thought was important. And in those situations where you are part of a chain and not necessarily the one with the most knowledge in the issue being investigated, it is very important to avoid judging on the importance of the information that you provide.

Too often, the dreaded « oh, I thought that this wasn’t important so I didn’t bother to mention » spell just brings tons of bad magic on the investigation efforts. I learned about it the hard way, being too often at the other end of the chain, trying to make sense of an investigation with the help of a more junior support colleague or a customer who, too many times, has more important things to do.

This time, one of the most important piece of information came about this way :

(12:08:29)caribou: it’s about lunch time, I may step out to grab a sandwich in the meantime(12:12:08)cking: sure(12:12:16)caribou: fyi, it doesn’t completely powers off(12:12:30)cking: AH(12:12:45)cking: this means it’s definitely a problem with the power management on the southbridge(12:12:51)caribou: after the « shutdown », the screen stays on with [28. ....] Power down.(12:13:19)cking: if it can’t S5 then we can do a S3 either, that’s the same power management register on the southbridge

By simply indicating that, after invoking the ‘shutdown’ command, the laptop did not power off but stayed powered on in some limbo, I brought to Colin’s attention, a situation that for me was trivial, but for him was very important. I did not know that the suspend/resume functionality and the power off functionality use the same mechanism, the same power management register and that if one did not work, the other couldn’t work either !

From that point on, we were able to target the whole power management subsystem. After a few more tests on the operating system side, cking became convinced that it was a hardware issue. One easy way to find out is to swap the operating system and see if the problem persists. So here I go, installing Windows 7 on this laptop, installing the graphic driver to get the suspend functionality working to finally be able to test the same functionality on Windows 7.

One other important thing highlighted by cking was to make sure that Windows 7 was also using the ACPI functionalities and not the old APM, which would have rendered our test useless. A few minutes of searching the web with the support engineer’s best friend : the web search engine pointed to many articles on Microsoft’s website talking about ACPI & Win7. That was sufficient to confirm that it was also using ACPI.

So I went ahead and actioned the « Suspend » functionality on Win7. The laptop’s screen went blank, but the power button kept blinking with a few more LEDs turned on, which was the main symptom on Ubuntu as well. This and the fact that shutting down Windows 7 left the laptop in the same kind of limbo’s half powered down state, achieved to convince us that we were facing a hardware problem and not some bug of our operating system. The customer still have to find out if this is systematic on that model of Sandybridge laptop or this is only happening on this single unit. But now, they know were to look at, especcially since their back up plan was to deploy those laptop on Windows 7 instead of Ubuntu.

I wanted to write about today’s session because I thing that it shows the kind of investigating work that is expected from a support engineer. Most of the time, we are not there to fix bugs, but to clearly identify them, to find ways to reproduce them as much as possible. Our job is to provide the data that will be necessary to identify the flaw and fix it. Sometimes, the first action is to find workarounds so the user can continue his work while we look for a more definitive solution. Some of us will even go further and suggest fixes to the developpers or at least help them in fixing those bugs.

A support engineer needs to be investigative, to keep an open mind, to avoid the mistake of being convinced that he already knows the answer before starting to look at the data. He must be able to ask for help, to be humble about his knowledge and respect the knowledge of others. And often, it is best to take a step back and look at a problem from different angles. Sometimes the solution is easier to find from a few steps back.

I have been doing this job for almost fifteen years and I still get a kick at identifying bugs, in helping users with their issues, in trying to make their computing experience as easy as it can be. I know that many of my colleagues feel the same about it. I also think that a support engineer is not just some sysadmin who doesn’t have the skills required to become a developer. I’ve seen a few too many devs digging in the source code, looking for answers to a problem, when simply opening the log files and searching for known issues was enough to identify and fix the problem.

And don’t get me wrong, I admire the skills of the developers and I’m still hoping to get more and more involved in works similar to theirs, but I also have a great respect all those support engineers that go after the next problem and find a fix for it.