Tuesday, October 26, 2010

One of our (presumably) faithful followers, wrote in asking how to track a user's login and logout times. Here at CLKF, we don't have a problem with keeping an eye on employees, since we don't have any (yet). But once we get them, we will ensure they begin work promptly at 5am and work until at least 9pm. How, do you ask? Lashings. The same goes for poor performace. If morale drops, the beatings will continue until morale increases. Simple. And that is the way that Hal and Ed brought me onboard.

For those of you who don't live in a utopia similar to ours, you may need to track the login and logout times. We can do this a few different ways depending on your environment. Windows keeps track of the logon and logoff events in the event log. The way they are tracked depends on the version of Windows. That divide is between the newer versions, beginning with Vista, and the older versions.

Let's first look at Windows Vista, 7 and 2008. While only Windows 7 and 2008R2 come with PowerShell installed by default, these newer version have a new logging system. With the new system, the logon event ID is 4648 and the logoff event ID is 4647. With this new logging system (and PowerShell v2), we can use the cmdlet Get-WinEvent. However, all versions of Windows and PowerShell support Get-EventLog.

The Get-WinEvent cmdlet abstracts the log objects. This abstractions allows for searching of multiple logs at once, but it comes at a price. The abstraction makes searching for specific data is a little quirky. Here is what I mean.

This abstracted command doesn't allow us to filter with using the -Id paramter, so we have to use a hashtable. The hashtable allows us to filter on any property of the underlying log. All we need to do is pick the property and the value for filtering, like this @{name1=value1,name2=value2}. It isn't that bad, but there is no tab completion for the properties used in the hash table.

Now we have a list of all the logon and logoff events, but we have to find events specific to our user. The username is in the message, so we need to filter for events related to the user in question.

The Where-Object (alias ?) is used to filter for events related to our user. We take the Message property and search it, with Select-String, for the username of that pesky Ed Skoudis. This gives us the full object for the event, and we have all the associated data. This data can exported to CSV, XML, or other formats.

Now on to those older versions which don't support Get-WinEvent, we have to use Get-EventLog cmdlet. The nice thing with Get-EventLog is that it is supported on all version of Windows and both versions of PowerShell.

Hmmm, suspiciously user "skodo" was the only person on the box the last time it rebooted. Ed, you've got some 'splaining to do! Actually, the person I really want to visit with the old "clue by four" is whoever left root logged in on the console. Bad form.

The last command operates on the system default wtmp file, usually /var/log/wtmp. On some Unix machines, this log grows forever without bound during the entire lifetime of the system. On other machines, the wtmp gets rotated weekly like other logs. If you want to look at an older wtmp file, use "last -f <filename>". This is also helpful when you're doing forensics on a Unix system image and you want to examine the wtmp file.

Notice in the above output that host names are displayed when reverse DNS information is available. The Linux version of the last command has a "-i" flag to always show IP addresses. Unfortunately, this option isn't widely available on older, proprietary Unix flavors. One option that is common to most Unix-like OSes that I've tried is the "-a" flag that moves the host information to the last column of output so that it doesn't get truncated.

The last command also lets you be selective. For example, you can ask for the most recent 5 logins:

Tuesday, October 19, 2010

My coworker has a directory which contains other sub-directories. These sub-directories contain files, but might also contain other sub-directories. My buddy would like to find, for each subdir of the current working directory, the most recent file. He would like to ignore any directories contained in sub-dirs.

Why Joel's "friend" wants to do this is a question perhaps best left unanswered. Ours is not to question why here at Command Line Kung Fu, we're merely here as a public resource for your oddest command-line challenges.

I pondered Joel's problem for a while trying to figure out how best to get started. I thought about using find to locate all of the regular files from the current directory downwards, but that won't give them to me in date-sorted order. I thought about looping over each directory and parsing the output of "ls -t", but filtering out the sub-sub-directories (if they happened to be the most recent object) could get complicated.

Then I thought that maybe I was over-thinking the problem and I should just try the naive approach:

This actually looks surprisingly promising. The regular files from all of the sub-directories are listed first in date-sorted order, and then we get listings of the sub-sub-directories separated by blank lines. So all I have to do here is pull out the first listed file from each sub-directory and stop when I hit the first blank line.

After getting shown up by Davide Brini last weekDavide Brini's brilliant solution to last week's challenge, I had awk on the brain and I stole a page or two from Davide's playbook:

The first part of the awk code, "/^$/ {exit}", simply terminates the program when we hit a blank line. We want to do this first, so we terminate the program immediately and don't produce any output when we hit the blank line.

The interesting stuff is happening in the rest of the awk expression. I'm splitting my input on slashes ("-F/"), so $1 in this case will be set to the name of the sub-directory. I'm keeping an array called "seen", which is indexed by sub-directory name. If there is not yet an array entry for a particular sub-dir, then this must be the first (and therefore most recent) file listed for that subdirectory. So we output that line ("print"), and update the "seen" array to indicate that we've output the most recent file for the sub-directory. I won't say it's obvious, but it's very compact and clean.

Honestly, this works fine on my test directory, but I was a little worried that the initial shell glob, "ls -t */*", would choke and die on a large directory structure. However, when I tested my solution in directories like /usr, /usr/include, and /proc and couldn't make it fail.

Notice that lib64 is a symlink to lib. The glob happily follows the symlink and reports that the file libgdiplus.so is the most recent file in each of these two "directories" when really we're talking about the same file in the same directory twice.

The "ls -F" command puts a special character after each file name. Directories are tagged with a trailing "/", which I match with egrep to get a list of only the sub-directories of the current directory. On Linux I could have just used "find . -maxdepth 0 -type d", but "-maxdepth" is not supported on all versions of find. In any event, I'm taking my list of directories and iterating over it with a for loop.

Now "ls -t <somedir> | head -1" is a useful idiom for getting the name of the most recently modified object in a directory. But that object might be a sub-directory and we don't want that in this case. So I'm once again using the "-F" option with ls to indicate the type of file and then using egrep to filter out both directories ("/") and symlinks ("@") before I pipe things into head. This will give me the name of the most recently modified file (or device or named pipe or whatever-- you could filter those out too if you wanted), but I need to prepend the directory name.

The problem is that in some cases the output of my ls pipeline can be null-- like when the directory is empty or contains only sub-directories and/or symlinks. So I'm using the funny echo construct you see in the loop above rather than something like "echo -n $d; ls -tF ..." just so that I can guarantee that we get a terminating newline. However in the cases where the output of the ls pipeline is null I'll just have a sub-directory name and a trailing slash but no actual file name. So I added one more egrep after the loop to filter out these unnecessary lines of output.

In fact you'll notice that the output of my loop solution is considerably different from the output of the awk solution. Adding a "| xargs ls -l" to each expression makes the issues clearer:

Do you see that I've added an extra pattern match in the awk code-- "!/@$/"-- to filter out the symlinks? Of course now I've got the trailing "*" markers for executable files. A little sed will clean that up:

I also threw in a sort command to make it easier to compare the output of the new awk expression with the the for loop output. The only real discrepancy now is that our glob in the awk version is still following the symlink and giving us output about the lib64 directory when it probably shouldn't. So you could say that the for loop version is more correct, but the awk solution "works" in most cases and is easier to type.

Two solutions from me this week. Why am I thinking that coming up with just one is going to be rough for Tim this time around?

The next portion is where the magic happens. We use the ForEach-Object cmdlet (alias %) to iterate through each subdirectory under our working directory. The iterator is the current pipeline object ($_). In our case, the current pipeline object will first contain dir1, and then dir2.

Inside our ForEach-Object script block, we get a listing of the contents in our subdirectory, and filter out objects that are not containers (directories). The objects (files) are then sorted by the LastWriteTime in reverse order. Finally, we select the first object. Remember, we are working in base zero, so the first object has an index of 0.

Tuesday, October 12, 2010

In response to our "blegging" in last week's Episode, we got a bunch of good ideas from our readers for future Episodes. But don't let that stop you from sending those cards, letters, and emails for topics you'd like to see us cover in the blog!

This week's Episode comes from long-time friend of the blog, Jeff Haemer. Not only did he send us a problem, he also sent the solution-- at least for the Unix side of the house. So, easy week for me as I sit back and explain Jeff's problem and solution.

Jeff's situation is that he's got a bunch of software build directories tagged with a software revision number and a date:

The problem is that Jeff wants to clean up his build area by removing all but the last two date-stamped directories for each of the different software versions.

There are really two pieces to solving this problem and Jeff's solution is a nice little bit of "divide and conquer". The first problem is figuring out the different software version numbers that are present in each directory:

$ ls | cut -d- -f1 | uniq1.2.00.00_devel2.0.00.00_devel

Here we're just taking the directory listing, using cut to chop off the date stamps after the "-" and then uniq-ifying the list to get just one instance of each version number. Normally you would call sort before uniq, but in this case the ls command is sorting the directory listing for us.

The next problem is, for each version number, figure out the directories we need to remove-- i.e., everything but the two most recently date-stamped directories. The naive approach might be to start with a directory listing like this:

The directories we want to delete are everything except for the last two directories. You could try some tricks using head piped into tail, but that gets complicated pretty quickly. An easier approach is to just invert the problem:

Notice that the correct syntax for tail is "-n +3"-- "start three lines into the input and output the rest". If you were thinking "-n +2", well let's just say you were probably in good company.

So now we know how to extract the various software versions, and how to get the names of all but the two most recent directories. The final solution is just a matter of putting those two ideas together:

In the for loop itself, I'm using our expression to obtain the directory version numbers inside of "$(...)", which is essentially the same thing as using backticks. However, the "$(...)" construct is preferable for reasons which we'll see in a moment. Then for each version number I'm using the expression we developed to output the names of the directories we want to remove.

Great! We're now outputting the names of all the directories we want to remove, now we want to actually remove them (note that it's always best to do this sort of confirmation before you do a dangerous operation like rm). There's a lot of different ways we could go here. I choose xargs:

Constructs like this are why you want to use "$(...)" instead of backticks. If you tried doing the above command line with backticks, you'd get a syntax error because the shell doesn't parse "nested" backticks the way you want. On the other hand, "$(...)" nests quite nicely.

The only problem with the second solution is that if the number of directories we need to remove is large, you could theoretically overwhelm the limit for the length of a single command line. Using xargs protects you from that problem.

Anyway, thanks for the interesting problem/solution, Jeff! It looks like Tim's got his gold parachute pants on and he's ready to rock...

Tim busts the funky lyrics

I did not wear gold parachute pants in the 80s, at least there is no proof of it. And Hal, sorry to correct you on your 80s fashion, but Hammer pants are WAY different from parachute pants, and besides, I wore the silver ones.

Let's fast-forward 20 years and PowerShell this moth-ah. Similar to Hal's approach, we'll divide and conquer conqu-ah.

In this example, the Group-Object cmdlet uses a script block to define how the groups are created. The groupings are created by taking the Name property of the current object ($_.Name), splitting it using the underscore as a delimiter, and then using the first item (actually zeroth item, remember base 0) in the resulting array. This gives us groups of directores where the group is based on the software version.

So now we have two groups. But what does the group contain? Remember, in PowerShell everything is an object. So the groups are just collections of the objects. As such, the items in the groups can be treated the same way as a directory, since the items are the directories.

We can now use the ForEach-Object cmdlet to iterate through each item in each group.

The Current Pipeline Object, represented as $_, contains a group. So we need to take that group, and remove the first two directories from the collection. We are then left with just the directories we want to delete. The Select-Object cmdlet is used to expand the group back into directory objects. That output is piped into another Select-Object cmdlet which removes (skips) the first two items, and leaves us with the directories to be deleted.

Now we have the directories we want to delete, so we can pipe the whole thing into Remove-Item. But before we do, let's make sure we have the correct directories, so we use the -WhatIf switch.

Let me explain that awk for the two or three of you out there who may be having problems decoding it:

The "-F-" tells awk to split its input on hyphen ("-") instead of white space. So for each line of input from the ls command, $1 will be the version string and $2 will be the date stamp.

The awk code uses two variables: "c" is a line count, and "v" is the current version string we're working on.

The first section of code, "$1!=v{c=0; v=$1}", checks to see if the version string in the current line of input is different from the last version string we saw ("$1!=v"). If so, then the code block gets executed and the line counter variable is reset to zero and v is set to the new version string ("{c=0; v=$1}").

The next bit of code, "{c++}", is executed on every line of input and just increments the line counter.

The last expression, "c>2", means match the case where the line counter is greater than 2-- in other words when we're on the third or higher line of ls output for a particular version string (remember c gets reset every time the version string changes). Because there's no code block after the logical expression, "{print}" is assumed and the line gets output.

So the net result is that the awk expression outputs the directories we want to remove, and we just pipe that output into xargs like we did with the output of the for loop in the original solution.

Tuesday, October 5, 2010

[A PLEA FROM THE BLOGGERS: Hey, if you think it's easy coming up with over 100 ideas for Command-Line Kung Fu Episodes... we really, really want to hear from you! Send us you ideas for things you'd like to see covered in future Episodes! Because we're lazy and we need you to program the blog for us we're running out of ideas here...]

Hal's got a lot on his mind

Lately I've been thinking a lot about directories.

Loyal reader Josh Olson wrote in with an idea for an Episode that involved making a copy of one directory hierarchy under a different directory. The idea is that you just copy the directory structure-- but not any of the files-- from the original directory.

Josh's solution involved a "for" loop, but I can go that one better with a little xargs action:

In the above example we're taking the output from find and piping it into a subshell that first moves over to our target directory and then invokes mkdir via xargs to create the parallel directory structure. In many ways this is similar to the traditional "tar cp" idiom that I mentioned back in Episode #73, except that here I'm just making directories, not copying the directory contents.

Of course, my solution here involves "find ... -print0" piped into "xargs -0", which is great if you live an environment that supports these options for null-terminated I/O. But this is not portable across all Unix-like operating systems. If you want portability, you need to go "old school":

$ find * -depth -type d | cpio -pd /path/to/dest/dir0 blocks

Yes, that's right. I'm busting out the old cpio command to amaze and terrify you. In this case I'm using "-p" ("pass mode"), which reads in a list of files and/or directories from the standard input and recreates the same file/directory structure under the specified destination directory. The "-d" option tells cpio to create directories as necessary. As a bonus, cpio preserves the original file ownerships and permissions during the copy process.

And this can actually be a problem-- for example when you're copying a directory which has read-only permissions for you. That's why I've added the "-depth" option to the find command, so that cpio is told about the "deepest" objects in the tree first. With "-d", it will make the directory structure to hold these objects and copy them into place. However, cpio will not set the restrictive permissions on the parent directory until it reads the parent directory name in the input.

See what happens when I try the above command without the "-depth" option:

Without "-depth", the directory "read-only" appears first in the output before its sub-directories. So the cpio command makes this directory first and sets it to be mode 555, just like the original directory. But then when cpio goes to make the subdirectories that come next in the find output, it doesn't have the write permissions it needs and it fails. So the moral of the story here is always use "find -depth ..." when piping your input into cpio.

I'm curious to see what Tim's got for us this week. It's possible that this is one of those cases where Windows actually makes this task easier to do than in the Unix shell...

Tim's been on the road

Finally, an episode that is really easy in Windows. We can do this with XCopy, which stands for EXTREEEEEEME COPY! Actually, it stands for eXtended Copy, but for all intents and purposes it is EXTREEEEEEME (because it is really easy)!

C:\> xcopy originaldir mynewdir /T /E /I

The /T option creates the directory structure and does not copy the files, but it does not copy empty directories or subdirectories. To copy the empty directories we need to use the /E option. And the /I option, well, that is a weird option...

If we don't use the /I option xcopy isn't sure if our destination is a file or directory, so we get this prompt: