tech answer guy

Wednesday, May 25, 2016

In prep to play the new DOOM 4 by installing a Windows SSD in my video editing machine, I ran a BIOS update on my Asus P9X79. After the update, the Intel onboard C600 chipset RAID controller dropped two of the four RAID1+0 members that I had created from RAID members to "Non-Member" disks. After researching the problem, it seemed that many people have had the issue with the Intel RAID:

Apparently, Intel RAID gets upset when the BIOS gets updated. At a high level, what you do to resolve is

A. Turn all members into Non-Raid Members

B. Re-create the RAID in the exact same order that it was created

- Don't Delete the RAID set, as that has caused data loss for some people

C. Usually, the partition table gets blown away when you recreate the RAID. You then have to go find it again by running a program like TestDisk to find and rewrite any missing partitions

------------------

The Path I Took

Of course, when I ran TestDisk, I compounded the problem by choosing the wrong partition. I basically got confused, as TestDisk allows you to select from a couple of choices:

1) Intel/PC partition

2) EFI GPT partition map

Now, my BIOS is UEFI and the drives all had GPT partition maps. I thought that was the logical choice and I selected it. It showed me two main drives:

- MS Data partition, size 200MB

- MS Data partition, size 2.7TB, the size of my RAID set

First, I wrote the 200MB partition table having the 200MB filesystem. This was the wrong thing to do. I got confused because my drives were GPT and because most of the people on the threads I had been reading were Windows. It was only after spending Saturday and Sunday reading that I found a couple (out of the hundreds) of posts on people with Linux OSs that actually had the correct instructions to fix my issue.

For my Linux Fedora 22 system, I actually needed to select "1) Intel/PC partition". Before I did this, I decided to:

1) Pull off any data that I could using PhotoRec, a program that reads files block-by-block, written by the same guy who wrote TestDisk, Christophe Grenier. I wrote that data to my 3GB SATA. This took about ten hours.

- I ran through this exercise, but of course, the files that PhotoRec outputs are labeled F000001.txt, F000002.sh, F000003.gz and so on. The names and directory structures are totally lost. The content of the data files, however, is kept and I was able to find the video editing scripts that I had spent 100s of hours developing. That was good, but it would be an enormous pain-in-the-A to dig through them and rename them all.

2) After pulling off the data, I was going to blow away the 200MB MS Partition, as creating this was a mistake. I did this with some trepidation, obviously, as I didn't want to do any more harm.

Once those two things were done, I went back into TestDisk and selected "Intel/PC partition." This time, I saw a Linux partition of the correct size, 2.7TB. TestDisk gives you the option of viewing the directory structure. When I did this I got a foreboding message:

“Can’t open filesystem. Filesystem seems damaged.”

Ugh. I pressed on and wrote the partition table to disk. Mounting the filesystem, I got a new error:

mount: wrong fs type, bad option, bad superblock on /dev/md126p1

Ugh. Doing some research, I found this article on fixing bad superblocks, which entails finding a good one from the many that are sprinkled throughout an ext4 filesystem:

mke2fs showed me a number of superblocks available. I prayed they were corrupt. Using e2fsck, I chose one of the superblocks, 32768, from the output of mke2fs:

sudo e2fsck -f -b 32768 /dev/md126p1

When the program ran, it found a few errors. In case there were many errors, I CTRL-C'd out of e2fsck and restarted the program by adding the "-y" switch to let it run unattended and accept all fixes with a "yes." I let it run. After I came back to the terminal, I found the screen full of the text output of the program, with thousands of sector id codes scrolling by. I wasn't sure if this was bad or good..but I let it finish. Afterward, I mounted the drive.

The mount came back without error. Good. The moment of truth was when I ran an "ls" and saw my files and directories! However, I wasn't out of the woods yet. The proof would be if I was able to access them. I played some music files, opened a few videos, started up a VM..they all worked! YAHOO!! I didn't lose my stuff!! Even though I have pretty much everything backed up, I did not have my scripts backed up and I've spent 100s of hours on those. Thank God I got them back!!

So..hooray for me!

:)

Now onto playing DOOM, the reason why I got into that mess in the first place!!

Friday, April 12, 2013

I spent a good few weeks researching and reading up on my latest monster video editing box. Though it was expensive, I decided to go for the i7 3930K chip, a pricey $539 on sale from NewEgg. Listed below are the parts that I assembled for the box. The mainboard and the chipset were my primary concerns, but the other parts were chosen because they conformed to my build specs and they were on sale or discounted in one way or another during one of NewEgg's Build sales.

Because I don't build too often, it took about a day to assemble the main board and the components in the above picture. It took another 1/2 day to port my four drive RAID 1+0 set from my old video editing workstation to this one. Of course, I had to create a new RAID set on the new box. I will no longer be using the trusty 3Ware 9650SE RAID card, as the new mobo has both Intel and Marvell RAID chipsets on board. The Marvell only supports two drives, so I used the Intel to connect up my four 1.5TB Western Digital Green drives.

2) backup content using fsarchiver to a system drive with enough space to hold the 1.4TB of content

3) as my livelihood depends on my content, test the fsarchiver restore process by restoring the archive to a third drive, a 3TB Seagate partitioned with GPT and formatted using ext4

4) when the restore is validated as good, copy the files from the restore over to my new RAID 1+0 set.

I love the Corsair 400R case. With the nice cable tunnels, I was able to keep the mainboard area pretty free of cables, though it doesn't look like it from the below pic. You can see the pipes to the Corsair H100i in the center of the photo.

I ran most of the cables through the tunnels. You can see how many cables were hidden using these tunnels:

I'm not a great cabler and I just wanted most of the cables out of the way. Given that there are five 3.5" drives, one 2.5" SSD and a DVD player in the box, I think I did alright.

The real test was firing the box up for the first time. I was rather shocked when it did come up, mainly for the worry that the memory wouldn't be compatible. But it was. Also, I had invested a great deal of time before the build in reading the 176-page manual and watching a bunch of build videos listed at the bottom of this post. So the base build with a single SSD hard drive powered up successfully. Hooray!

The next hurdle was installing a basic operating system, Fedora 18 64-bit on my single SSD in the machine. Again, surprisingly, this worked without a hitch. I had spent a good deal of time reading the FedoraProject's list of UEFI bugs and I was suspect of problems, but was very happy that I didn't encounter any.

The final hurdle was migrating my four drives from my old workstation to run a RAID 1+0 configuration on the new box using the ASUS's Intel RAID chipset. Once I moved the drives over, I configured the BIOS in the Asus P9X79 to run RAID. I then created the RAID set and rebooted the box. I then saw three main screens on bootup:

1) The Marvell RAID BIOS bootup screen:

2) The Intel RAID BIOS bootup screen:

3) The American Megatrends BIOS screen:

The American Megatrends BIOS and ASUS's UEFI BIOS screens are completely configurable and nicely laid out. I won't be using many of the options, but it is just a pleasure to have a system that is so well stocked, but boots up quickly. I'd say it takes about 30 seconds to get from cold start to my initial Fedora 18 grub2 prompt.

I haven't overclocked the mobo yet, but according to what I've been reading and the fact that I have good chip cooling via the H100i, I should be able to push the i7 from 3.2 to at least 4.6Ghz.

Wattage Used for Typical Tasks
Here is a chart of the tasks running in Fedora 18 x86-64 and the wattage used

Gnome System Monitor graphic
On the top row, the CPU History chart, you can see that my H264 video encode took about 40% of the CPU. You can see all twelve CPUs being used. On the bottom row, the Network History, you can see my upload to Vimeo. The upload of my 530MB file took about 150s at a peak upload rate of 40Mbps. That 40Mbps upload is courtesy of FIOS.

Next up: installing Windows 7 Professional so that I can do some baseline performance measurements.

ciao!
TAG

UPDATE 2013/11/23
I've been so busy at work for the last six months that I haven't had time to use the new box. Turns out that I made a mistake in copying over my files to my new drive..I hadn't preserved the time/date stamp of all my files, normally accomplished with a "cp -Rp .."! This means that I then can no longer tell when I edited any of my scripts, took a picture, etc, etc. Bollucks! So I had to grab the timestamps from my original drive and apply them via "touch". The procedure looked something like this:

Then I plopped a #!/bin/bash at the top of the shell script, chmod a+x fixDate.sh, and off I went. But the script worked, thankfully! Unfortunately, I wasted about two hours dealing with this today. Super drag!

A second problem I see is when I use "mv" to move files to my new RAID set, the mv command gives me:mv: setting attribute ‘security.selinux’ for ‘security.selinux’: Operation not permitted

Sunday, March 17, 2013

OK. I've been in three weeks of hardware hell, mainly due to the fact that I wanted to get my backups for all my machines (a MacBook Pro, my main Linux video editing workstation and an older Windows Vista digital audio workstation (DAW)) properly backed up. I detailed my strategy for this in my last post. This post is more of a rant than anything else, so please excuse the lack of any real mentorship on problem solving, except maybe "Google is Your Friend."

Issue #1: Drobo runs out of space

The Drobo has been a fine unit for me. But as time goes on, you acquire more media and your available space runs out. You'd think it would be a simple matter of buying a new disk, putting it in the Drobo and letting the BeyondRAID rebuild it's array. Well, the first drive I bought, a Western Digital Green 1TB, died after the first rebuild. That never happened to me before, where a drive failed out-of-the-box for me. Never having that problem before, I didn't truly believe it was dead.

With my non-belief firmly in place, I tried to use the drive in different capacities. So as a test, I formatted the disk using my Thermaltake BlacX connected to my Mac. I was able to copy files over to it (though I didn't copy gigs and gigs worth as a true test). But when I put the unit back in the Drobo, the Drobo gave an immediate "red" light for that drive bay, indicating the drive was bad. I switched drives in the Drobo unit around, because I thought it could have been a faulty drive bay.

Then, I had the bright idea to move the data off my 2TB system drive of my main Linux machine to the new Western Digital, put the 2TB in the Drobo and use the new 1TB (which I really thought was a good, error-free drive) as my Linux system drive. So still thinking that the 1TB drive was good, I would have to do some fancy footwork in order to make this possible as the system drive was a logical volume. This entailed a week of work to figure out how to shrink a logical volume in order to fit the used space of the 2TB drive (which was less than a terabyte) onto the 1TB.

I learned a lot from that experience, to be detailed in a later post. Suffice it to say that in the end, the 1TB was truly dead and I ended up getting a new 1TB (a Western Digital Black) from BestBuy and that solved my Drobo storage issue. Kudos to BestBuy, as they were able to give me the Black at the same price as the Green for my trouble.

Issue #2: Mac Time Machine "the identity of this backup disk has changed" (Sparsebundle Problem)

This was an odd one. After installing the new disk in the Drobo, Time Machine started showing the error "the identity of this backup disk has changed". From the below post:

I executed the "chflags" command listed. This ran for about four hours. After, I tried to execute the "hdutil" command listed, but the Mac said it had already ran the command. So testing the result of the chflags command, I shutdown and restarted the Drobo. When Time Machine started backing up, it no longer gave me the error. Hooray. Another one down.

Issue #3: Windows Vista DAW crashes

So after a week spent on #1 and #2, I was ready to start work on a new musical project with some friends. Firing up my old Dell 400SC running Windows Vista (OK, OK..I know I need to upgrade Win7, but I've got a recording session coming up soon and didn't want to change OS's yet), I was presented with this error:

c\windows\system32\config\system corrupt

Oh, wonderful. So I popped in the Vista Ultimate DVD and selected "Repair". After it ran, the system rebooted and I was pleasantly surprised to find that this fixed the problem and that I was able to get back into the system.

Getting back into the system, I reasoned that if the drive was going bad, I'd better make a backup. So I ponied up $40 for Drobo's PC Backup product, the ugly step-brother of the seemless Drobo integration with Mac Time Machine. Assuming the PC product worked the same way the Mac product did, I selected the defaults. Well, the defaults do NOT backup the entire drive. Only your user data. My bad for not reading the fine print, but I believe that a Drobo product should be consistent between systems and the default should be to backup your entire drive with all system data included, as long as you have the space on your Drobo. But that's just me.

The missing data would be crucial for what happened next.

Issue #4: Windows Vista DAW crashes again

After taking a two day hiatus from my backup shenanigans, I fired up the DAW again. And guess what..a new error appears:

Oh great. Going back to my ritual, I loaded in the Vista Ultimate DVD and selected "Repair". However, after the reboot, no go..still the same "missing or corrupt" error. I tried a number of times doing the repair, as the Vista repair process would show slightly different screens every time it booted and recognized the system. This gave me false hope that the DVD was actually repairing something correctly. Also, the frustrating part of this process that for whatever reason, the DVD would take 10 minutes to load on my Dell. I'm not sure what the problem was there. So I chewed up a few hours doing this multiple times.

Finally, after reading some Google posts by people with the same issue, I decided to run "chkdsk /r" from the command line, rather than relying on the non-informative Windows Vista screen to run some unknown fix command. I had to specifically boot into the System Recovery Options screen as shown in the below post:

Once I was there, I selected "Command Prompt" and typed in good ol' "chkdsk /r", the "repair" option to chkdsk. This time, I was rewarded with an actual status screen that told me "bad clusters found", Windows was marking the clusters as bad and was moving the files located on those clusters to good sectors on the disk. (Sectors and cluster primer here: http://t.co/DLFjrXAp5C). This process took about three hours, unlike the half-hearted effort that Windows Vista attempted. I wonder why Vista did not default to doing a real "chkdsk /r". That doesn't help anyone who has a failing disk. Bad default!

After the bad cluster identification and repair, I was really glad to see Vista boot up properly! But since there were so many bad clusters, I had to make a full backup or clone of that drive but quick! For this, I popped in an unused 500GB SATA I had lying around. I repartitioned and formatted this drive. It had been a second Vista system disk and one point, so I knew the drive's main partition was marked as bootable. So I was good to go there. I then dragged all the files from my C: onto the new E: (my DVD being the D:). However, on bootup, Vista showed an error:

"System volume on disk is corrupt"

I suspected this was a problem with the NTFS boot files on the 500GB drive as they had links from the partition map from the old 256GB drive that was failing. Luckily, when I ran Vista repair, Vista was able to fix this issue and the system started properly.

Issue #5: Windows Vista continually keeps "preparing your desktop"

After the system came up, I made sure all my applications (Reaper, Drobo PC Backup, etc) were working properly. Unfortunately, they were not, as Vista continually kept giving me the message "Preparing Your Desktop" when I logged into my profile. I tried a number of things from Google, but those suggestions did not work. I didn't have any critical data in the old profile, so I figured I'd bit the bullet and create a new profile. After doing this, the message disappeared and I was able to save my desktop settings and application preferences properly.

In Sum

Wow. So this has been three weeks of hell. I "think" I am back to steady state with my systems. I was able to reset Drobo PC Backup to a full system backup of my Vista DAW to the Drobo. The Drobo is backing up the Mac just fine and CrashPlan is encrypting my main Linux box backup to the Cloud.

Saturday, March 09, 2013

I've spent a number of years trying various backup methods for my Linux box. I think I finally have a pretty good one down. The main idea is to setup my system in order to make backup and restore easier. This setup involves two components:
- a data source: my documents, videos, audio files and pictures are stored locally on a redundant hardware RAID5 set
- the separation of system and data partitions on different physical devices (hard drives)

This backup strategy has been working well, though not without hiccups along the way. It protects my important data by providing redundancy at multiple levels. At a high level, here is how this is implemented:
- disk redundancy via RAID5 set
- local redundancy via network addressable storage
- global redundancy via CrashPlan if my house is destroyed

More specifically:
- my Linux system drive, Fedora 17, is one physical SATA drive
- my data drive is a hardware RAID5 unit using a 3Ware 9650SE with four physical SATA drives
- when I install a new OS, I use symbolic links (screen cap) in my user's home directory to point to my data, explained below

The bottom line is that no single backup method should be your entire backup strategy. If you only have two of these methods implemented, you're better off than most people.

System Setup
Symbolic links are the key to segment the system partitions from your data partitions. Segmenting is important because it separates your system from your data. With this separation, it is much easier (from a Linux perspective), to upgrade and try different versions of Linux on your system drive, while your data drive stays essentially untouched and less prone to upgrade or experimentation tragedies. Welcome to Linux!

With the symbolic links in place, I've made the link from my system to my data.

The Archive-Backup Process
I am going to use the terms "archive" and "backup" synonymously. The overview is that I'll show how I back up my data partition using fsarchiver and then I'll copy those backups to both my local network and global storage solutions. FSarchiver is a bit different than regular backup systems in that it archives entire filesystems only. It is not a file-based backup method. So when you go to save or restore a filesystem, you specify a filesystem to backup and have limited ability to exclude (but not include) directories or files with the "exclude" switch only (as of 5/2016).

Below, I show the archive process for the data partition. Feel free to extrapolate the information herein to do the same for your system partition.

2) verify available space on the destination filesystem
The source partition is using 1.4TB on vg_ogre-lv_root and I have 1.8TB available on the destination for the backup, lv_home. So..good to go.

3) if there is enough space on the target filesystem, prepare to run fsarchiver and unmount the data partition.
In order to keep the filesystem from being updated during the archive process, fsarchiver asks to unmount the target filesystem before making the backup. Like so:[sodo@computer ~]$ sudo umount /mnt/

The nice thing about the split system-data partition setup is that it is unnecessary to load a Live CD in order to backup the data partition. Normally, one has to boot with a LiveCD in order to backup the system partition.

4) Once the filesystem is unmounted, run fsarchiver. As one of the destinations for the backup is the cloud, use the -c option to encrypt with a password:[sodo@computer ~]$sudo fsarchiver -j8 -c [password] -o savefs ~/f17backup/backup_lv_root.fsa /dev/mapper/vg_ogre-lv_root

Over the past few years, I've expanded the drives within it about three times now. I went from four 500GB drives, to four 1TB drives and one 500GB to my configuration as it is today, five 1TB drives. The drive upgrades were easy, though time consuming. I removed each of the older drives one at a time and replaced them with the larger drives. Each time a drive was upgraded, the Drobo would automatically and non-destructively rebuild it's storage protection. The Drobo's storage protection is called BeyondRAID, Drobo's own custom algorithm on top of RAID.

The integration with Mac is seemless; however, the Windows/CIFS file share can be a bit wonky, as the share has a tendency to become unavailable for whatever reason. The resolution is to shutdown and restart the Drobo and that seems to fix the problem.

Cloud-Based Storage
A last layer of protection above and beyond the local and network copy of my data is to copy the encrypted archive to a cloud-based solution. The purpose is to protect my data in case of a natural disaster that destroys all my local storage media. With the increasing amount of natural and man-made disasters happening these days, I've recently invested in a data protection plan with www.crashplan.com. I got an unlimited plan to store my 1.2TB of data to the cloud. Most of that data is audio, video and image files.

After tweaking the CrashPlan app to pump more data through my local and wide area network (from 1280KB and 2560KB, respectively, to about 6400KB for both), it took about two weeks to upload this amount of data to CrashPlan's cloud!

In sum, if you have a lot of data, all these procedures take time. From backing up, local network copy and then cloud copy. If you're using Linux, you probably have the stomach for all this. In the end, though, you'll have a backup solution that is pretty solid and relatively easy to implement unlike custom scripted solutions.