The best laid plans of mice and men…

Entries Tagged as 'Ubuntu'

I’ve been experimenting with Linux as a server for several months now; and I have to say for the price it’s a clear winner over Microsoft Windows Server 2008.

Other than desktop search, Linux has been a clear winner across the board. Network file sharing, application services, etc all seem to work, and work well. Plus with the webmin GUI for managing the server, it’s extremely easy — easier in fact that figuring out where to go to do the task at hand in Windows Server 2008.

With my success using Linux as a server, I have decided (once again) to investigate Linux as a desktop replacement for Windows… after all, how much does one normally do with a desktop?

I experimented briefly with Ubuntu on a laptop when I was cloning the drive in it, but I didn’t put it through exhaustive paces (I was quite impressed that Ubuntu auto-magically installed drivers for all the hardware in the notebook; though that feat was no better than Windows 7).

I need to go over my requirements a few more times before I start the test, but what I believe is important is:

Office (which OpenOffice will work the same as it has been on Windows)

Financial Management (I guess I’ll have to move over to MoneyDance; it’s not free, but it’s fairly well thought out)

Media Playback (VLC runs on Linux just like Windows, plus there are a number of media players I’ll take a look at)

DVD RIPping (my last try to do that on Linux wasn’t very successful)

Video transcoding (I think HandBrake is broken on the current version of Ubuntu — so that might take a little work)

I’ll also evaluate it for ease of use and customization…

The evaluation will be done on an Intel DG45ID motherboard (G45 chipset)with an Intel Core2 E7200 with 4GB DDR2, multiple SATA2 hard drives, SATA DVD-RW, and I’ll test with both a nVidia 9500 and the Intel GMAC controller (X4500HD) running both a 32-bit and 64-bit Ubuntu 10.04LTS distribution.

I’ve been playing with Ubuntu here of late, and looking at the characteristics of RAID arrays.

What got me on this is when I formatted an ext4 file system on a four drive RAID5 array created using an LSI 150-4 [hardware RAID] controller I noticed that it took longer than I though it should; and while most readers probably won’t be interested in whether or not to use the LSI 150 controller they have in their spare parts bin to create a RAID array on Linux, the numbers below are interesting just in deciding what type of array to create.

These numbers are obtained from the disk benchmark in Disk Utility; this is only a read test (write performance is going to be quite a bit different, but unfortunately the write test in Disk Utility is destructive, and I’m not willing to lose my file system contents at this moment; but I am looking for other good benchmarking tools).

drives

avg access time

min read rate

max read rate

avg read rate

ICH8

Single

1

17.4

ms

14.2

23.4

20.7

MB/s

ICH8

Raid1 (Mirror)

2

16.2

ms

20.8

42.9

33.4

MB/s

ICH8

Raid5

4

18.3

ms

17.9

221.2

119.1

MB/s

SiL3132

Raid5

4

18.4

ms

17.8

223.6

118.8

MB/s

LSI150-4

Raid5

4

25.2

ms

12.5

36.6

23.3

MB/s

All the drives used are similar class drives; Seagate Momentus 120GB 5400.6 (ST9120315AS) for the single drive and RAID1 (mirror) tests, and Seagate Momentus 500GB 5400.6 (ST9500325AS) for all the RAID5 tests. Additionally all drives show that they are performing well withing acceptable operating parameters.

A while ago I published a post on Desktop Search on Linux (specifically Ubuntu). I was far from happy with my conclusions and I felt I needed to re-evaluate all the options to see which would really perform the most accurate search against my information.

My test metrics would be to take a handful of search terms which I new existed in various types of documents, and check the results (I actually used Microsoft Windows Search 4.0 to prepare a complete list of documents that matched the query — since I knew it worked as expected).

I was able to install, configure, and launch each of the applications. Actually none of them were really that difficult to install and configure; but all of them required searching through documentation and third party sites — I’d say poor documentation is just something you have to get used to.

Beagle, Google, Tracker, Pinot, and Recoll all failed to find all the documents of interest… none of them properly indexed the email files — most of the failed to handle plain text files; that didn’t leave a very high bar to pick a winner.

Queries on Strigi actually provided every hit that the same query provided on Windows Search… though I have to say Windows Search was easier to setup and use.

I tried the Neopomuk (KDE) interface for Strigi — though it just didn’t seem to work as well as strigiclient did… and certainly strigiclient was pretty much at the top of the list for butt-ugly, user-hostile, un-intuitive applications I’d ever seen.

After all of the time I’ve spent on desktop search for Linux I’ve decided all of the search solutions are jokes. None of them are well thought out, none of them are well executed, and most of them out right don’t work.

Like most Linux projects, more energy needs to be focused on working out a framework for search than everyone going off half-cocked and creating a new search paradigm.

The right model is…

A single multi-threaded indexer running in the background indexing files according to a system wide policy aggregated with user policies (settable by each user on directories they own) along with the access privileges.

A search API that takes the user/group and query to provide results for items that the user has (read) access to.

The indexer should be designed to use plug-in modules to handle particular file types (mapped both by file extension, and by file content).

The index should also be designed to use plug-in modules for walking a file system and receiving file system change events (that allows the framework to adapt as the Linux kernel changes — and would support remote indexing as well).

Additionally, the index/search should be designed with distributed queries in mind (often you want to search many servers, desktops, and web locations simultaneously).

Then it becomes a simple matter for developers to write new/better indexer plug-ins; and better search interfaces.

I’ve pointed out in a number of recent posts that you can effective use Linux as a server platform in your business; however, it seems that if search is a requirement you might want to consider ponying up the money for Microsoft Windows Server 2008 and enjoy seamless search (that works) between your Windows Vista / Windows 7 Desktops and Windows Server.

Macbuntu isn’t a sanctioned distribution of Ubuntu like Kubuntu, Xubuntu, etc; rather it’s a set of scripts that turns an Ubuntu desktop into something that resembles a Mac running OS-X… but it’s till very much Ubuntu running gdm (GNOME).

I don’t recommend install Macbuntu on a production machine, or even a real machine until you’ve taken it for a spin around the block. For the most part it’s eye candy; but that said, it does make a Mac user feel a little more comfortable at an Ubuntu workstation, and there’s certainly nothing wrong with the desktop paradigm (remember, the way GNOME, KDE, XFCE, Enlightenment, Windows, OS-X, etc work is largely arbitrary — it’s just a development effort intended to make routine user operations intuitive and simply; but no two people are the same, and not everyone finds a the “solution” to a particular use case optimal).

What I recommend you do is create a virtual machine with your favorite virtualization software; if you don’t have virtualization software, consider VirtualBox — it’s still free (until Larry Ellison decides to pull the plug on it), and it’s very straight forward for even novices to use.

Install Ubuntu 10.10 Desktop (32-bit is fine for the test) in it, and just take all the defaults — it’s easy, and no reason to fine tune a virtual machine that’s really just a proof-of-concept.

After that, install the virtual guest additions and do a complete update…

Once you’re done with all that, just open a command prompt and type each of the following (without elevated privileges).

Once you’ve followed the on-screen instructions and answered everything to install all the themes, icons, wallpapers, widgets, and tools (you’ll have to modify Firefox and Thunderbird a little more manually — browser windows are opened for you, but you have to install the plug-ins yourself), you reboot and you’re presented with what looks very much like OS-X (you actually get to see some of the eye candy as it’s installed).

Log in… and you see even more Mac-isms… play play play and you begin to get a feel of how Apple created the slick, unified OS-X experience on top of BSD.

Now if you’re a purist you’re going to push your lower lip out and say this isn’t anything like OS-X… well, maybe it doesn’t carry Steve Job’s DNA fingerprint, but for many users I think you’ll hear them exclaim that this is a significant step forward for making Linux more Mac-ish.

There are a couple different efforts to create a Mac like experience under Linux; Macbuntu is centric on making Ubuntu more like OS-X, and as far as I can see it’s probably one of the cleanest and simplest ways to play with an OS-X theme on top of Linux…

If you find you like it, then go ahead and install on a real machine (the eye candy will be much more pleasing with a manly video card and gpu accelerated effects), and you can uninstall it if you like — but with something this invasive I’d strongly encourage you to follow my advice and try before you buy (so to speak — it’s free, but time and effort count for a great deal).

I’ll make a post on installing Macbuntu for tomorrow so that it’s a better reference.

There are two distinct features that Windows Server 2008 outshines Linux on; and both are centric on compression.

For a very long time Microsoft has supported transparent compression as a part of NTFS; you can designate on a file-by-file or directory level what parts of the file system are compressed by the operating system (applications need do nothing to use compressed files). This feature was probably originally intended to save the disk foot print of seldom used files; however, with the explosive growth in computing power what’s happened is that compressed files can often be read and decompressed much faster from a disk than a uncompressed file can. Of course, if you’re modifying say a byte or two in the middle of a compressed file over and over, it might not be a good idea to mark it as compressed — but if you’re basically reading the file sequentially then compression may dramatically increase the overall performance of the system.

The reason for this increase is easy to understand; many files can be compressed ten to one (or better), that means each disk read is reading effectively ten times the information, and for a modern, multi-core, single-instruction/multiple-data capable processor to decompress this stream of data put no appreciable burden on the processing unit(s).

Recently, with SMBv2, Microsoft has expanded the file sharing protocol to be able to transport a compressed data stream, or even a differential data stream (Remote Differential Compression – RDC) rather than necessarily having to send every byte of the file. This also has the effect of often greatly enhancing the effect data rate, since once again a modern, multi-core, single-instruction/multiple-data capable processor can compress (and decompress) a data stream at a much higher rate than most any network fabric can transmit the data (the exception would be 10G). In cases of highly constrained networks, or networks with extremely high error rates the increase in effect through put could be staggering.

Unfortunately, Linux lags behind in both areas.

Ext4 does not include transparent compression; and currently no implementation of SMBv2 is available for Linux servers (or clients).

While there’s no question, what-so-ever, that the initial cost of a high performance server is less if Linux is chosen as the operating system, the “hidden” costs of lacking compression may make the total cost of ownership harder to determine.

Supporting transparent compression in a file system is merely a design criteria for a new file system (say Ext5 or Ext4.1); however, supporting SMBv2 will be much more difficult since (unlike SMBv1) it is a closed/proprietary file sharing protocol.

Microsoft has really shown the power of desktop search in Vista and Windows 7; their newest Desktop Search Engine works, and works well… so in my quest to migrate over to Linux I wanted to have the ability to have both a server style as well as a desktop style search.

So the quest begun… and it was as short a quest as marching on the top of a butte.

I started by reviewing what I could find on the major contenders (just do an Internet search, and you’ll only find about half a dozen reasonable articles comparing the various desktop search solutions for Linux)… which were few enough it didn’t take very long (alphabetical):

I immediately passed on Google Desktop Search; I have no desire for Google to have more access to information about me; and I’ve tried it before in virtual machines and didn’t think very much of it.

Begal

I first tried Beagle; it sounded like the most promising of all the search engines, and Novel was one of the developers behind it so I figured it would be a stable baseline.

It was easy to install and configure (the package manager did most of the work); and I could use the the search application or the web search, I had to enable it using beagle-config:

beagle-config Networking WebInterface true

And then I could just goto port 4000 (either locally or remotely).

I immediately did a test search; nothing came back. Wow, how disappointing — several hundred documents in my home folder should have matched. I waited and tried again — still nothing.

While I liked what I saw, a search engine that couldn’t return reasonable results to a simple query (at all) was just not going to work for me… and since Begal isn’t actively developed any longer, I’m not going to hold out for them to fix a “minor” issue like this.

Tracker

My next choice to experiment with was Tracker; you couldn’t ask for an easier desktop search to experiment with on Ubuntu — it seems to be the “default”.

One thing that’s important to mention — you’ll have to enable the indexer (per-user), it’s disabled by default. Just use the configuration tool (you might need to install an additional package):

tracker-preferences

Same test, but instantly I got about a dozen documents returned, and additional documents started to appear every few seconds. I could live with this; after all I figured it would take a little while to totally index my home directory (I had rsync’d a copy of all my documents, emails, pictures, etc from my Windows 2008 server to test with, so there was a great deal of information for the indexer to handle).

The big problem with Tracker was there was no web interface that I could find (yes, I’m sure I could write my own web interface; but then again, I could just write my own search engine).

Strigi

On to Strigi — straight forward to install, and easy to use… but it didn’t seem to give me the results I’d gotten quickly with Tracker (though better than Beagle), and it seemed to be limited to only ten results (WTF?).

I honestly didn’t even look for a web interface for Strigi — it was way too much a disappointment (in fact, I think I’d rather have put more time into Beagle to figure out why I wasn’t getting search results that work with Strigi).

Recoll

My last test was with Recoll; and while it looked promising from all that I read, but everyone seemed to indicate it was difficult to install and that you needed to build it from source.

Well, there’s an Ubuntu package for Recoll — so it’s just as easy to install; it just was a waste of effort to install.

I launched the recoll application, and typed a query in — no results came back, but numerous errors were printed in my terminal window. I checked the preferences, and made a couple minor changes — ran the search query again — got a segmentation fault, and called it a done deal.

It looked to me from the size of the database files that Recoll had indexed quite a bit of my folder; why it wouldn’t give me any search results (and seg faulted) was beyond me — but it certainly was something I’d seen before with Linux based desktop search.

Of the search engines I tried, only Tracker worked reasonably well, and it has no web interface, nor does it participate in a Windows search query (SMB2 feature which directs the server to perform the search when querying against a remote file share).

I’ve been vocal in my past that Linux fails as a Desktop because of the lack of a cohesive experience; but it appears that Desktop Search (or search in general) is a failing of Linux as both a Desktop and a Server — and clearly a reason why choosing Windows Server 2008 is the only reasonable choice for businesses.

The only upside to this evaluation was that it took less time to do than to read about or write up!

There are a number of reasons why you might want to use a dynamic black list of IP addresses to prevent your computer from connecting to or being connect to by users on the Internet who might not have your best interests at heart…

Below are three different dynamic IP filtering solutions for various operating systems; each of them are open source, have easy to use GUIs, and use the same filter list formats (and will download those lists from a URL or load them from a file).

You can read a great deal more about each program and the concepts of IP blocking on the web pages associated with each.

A disk mirror, or RAID1 is a fault tolerant disk configuration where every block of one drive is mirrored on a second drive; this provides the ability to lose one drive (or have damaged sectors on one drive) and still retain data integrity.

RAID1 will have lower write performance than a single drive; but will likely have slightly better read performance than a single drive. Other types of RAID configurations will have different characteristics; but RAID1 is simple to configure and maintain (and conceptually it’s easy for most anyone to understand the mechanics) and the topic of this article.

Remember, all these commands will need to be executed with elevated privileges (as super-user), so they’ll have to be prefixed with ‘sudo’.

First step, select two disks — preferably identical (but as close to the same size as possible) that don’t have any data on them (or at least doesn’t have any important data on them). You can use Disk Utility (GUI) or gparted (GUI) or cfdisk (CLI) or fdisk (CLI) to confirm that the disk has no data and change (or create) the partition type to “Linux raid autotected” (type “fd”) — also note the devices that correspond to the drive, they will be needed when building the array.

Check to make sure that mdadm is installed; if not you can use the GUI package manager to download and install it; or simply type:

apt-get install mdadm

For this example, we’re going to say the drives were /dev/sde and /dev/sdf.

Create the mirror by executing:

mdadm ––create /dev/md0 ––level=1 ––raid-devices=2 /dev/sde1 missing

mdadm ––manage ––add /dev/md0 /dev/sdf1

Now you have a mirrored drive, /dev/md0.

At this point you could setup a LVM volume, but we’re going to keep it simple (and for most users, there’s no real advantage to using LVM).

Now you can use Disk Utility to create a partition (I’d recommend a GPT style partition) and format a file system (I’d recommend ext4).

You will want to decide on the mount point

You will probably have to add an entry to /etc/fstab and /etc/mdadm/mdadm.conf if you want the volume mounted automatically at boot (I’d recommend using the UUID rather than the device names).

You should probably make sure that you have SMART monitoring installed on your system so that you can monitor the status (and predictive failure) of drives. To get information on the mirror you can use the Disk Utility (GUI) or just type

cat /proc/mdstat

There are many resources on setting mirrors on Linux; for starters you can simply look at the man pages on the mdadm command.

NOTE: This procedure was developed and tested using Ubuntu 10.04 LTS x64 Desktop.

A RAID5 array is a fault tolerant disk configuration which uses a distributed parity block; this provides the ability to lose one drive (or have damaged sectors on one drive) and still retain data integrity.

RAID5 will likely have slightly lower write performance than a single drive; but will likely have significantly better read performance than a single drive. Other types of RAID configurations will have different characteristic. RAID5 requires a minimum of three drives, and may have as many drives as desires; however, at some point RAID6 with multiple parity blocks should be considered because of the potential of additional drive failure during a rebuild.

The following instructions will illustrate the creation of a RAID5 array with four SATA drives.

Remember, all these commands will need to be executed with elevated privileges (as super-user), so they’ll have to be prefixed with ‘sudo’.

First step, select two disks — preferably identical (but as close to the same size as possible) that don’t have any data on them (or at least doesn’t have any important data on them). You can use Disk Utility (GUI) or gparted (GUI) or cfdisk (CLI) or fdisk (CLI) to confirm that the disk has no data and change (or create) the partition type to “Linux raid autotected” (type “fd”) — also note the devices that correspond to the drive, they will be needed when building the array.

Check to make sure that mdadm is installed; if not you can use the GUI package manager to download and install it; or simply type:

apt-get install mdadm

For this example, we’re going to say the drives were /dev/sde /dev/sdf /dev/sdg and /dev/sdh.

Create the RAID5 by executing:

mdadm ––create /dev/md1 ––level=5 ––raid-devices=4 /dev/sd{e,f,g,h}1

Now you have a RAID5 fault tolerant drive sub-system, /dev/md1 (the defaults for chunk size, etc are reasonable for general use).

At this point you could setup a LVM volume, but we’re going to keep it simple (and for most users, there’s no real advantage to using LVM).

Now you can use Disk Utility to create a partition (I’d recommend a GPT style partition) and format a file system (I’d recommend ext4).

You will want to decide on the mount point

You will probably have to add an entry to /etc/fstab and /etc/mdadm/mdadm.conf if you want the volume mounted automatically at boot (I’d recommend using the UUID rather than the device names).

You should probably make sure that you have SMART monitoring installed on your system so that you can monitor the status (and predictive failure) of drives. To get information on the RAID5 container you can use the Disk Utility (GUI) or just type

cat /proc/mdstat

There are many resources on setting RAID5 sub-systems on Linux; for starters you can simply look at the man pages on the mdadm command.

NOTE: This procedure was developed and tested using Ubuntu 10.04 LTS x64 Desktop.