The Ramblings of BSDZ

I have developed in a variety of different languages for a number of years. Mostly this is for whoever employs me at the time; however, a substantial amount of my development is for my own models and systems - I will try and share some of these here.

Saturday, 19 June 2010

I recently decided I wanted to synchronize some of my music with my Nexus One. I quickly discovered that the Nexus One didn't support WMA audio files. I also discovered there weren't any out-of-the-box solutions for syncing audio files and playlists with the flexibility I required.

So I decided to write my own between two World Cup matches. I felt the script might be useful to other people and I couldn't find a suitable place to put it. So I am dumping it here.

All my music files are currently in WMA format. This will change in the future but until then the script converts the files into MP3 using FFmpeg. This can be changed if you need by adjusting the function "convert_file".

Copy and paste the code below into a file called syncplayer.py. You may need to edit some of the settings at the top of the file. If you have Python files correctly associated on your PC then you should be able to doubleclick the file and the sync will run whilst displaying a log on the console.

Enjoy!

""" SyncPlayer.py - A Python script for synchronizing music files between Foobar2000 and an MP3 device.

Copyright (C) 2010 Blair Sutton

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see . """

Friday, 5 September 2008

Recently, I have grown fond of Powershell. As someone who is also responsible for a little administration from time to time it quickly caught my eye as a language that could solve many mundane problems quickly and succinctly. Having originally come from a UNIX background I could see the Powershell Development Team had taken the best features from Korn Shell and Perl then combined them with the .NET framework to provide a very powerful tool.

However, it's not all a bed of roses and this will become clear as you read on.

I have recently been analysing various files containing financial tick data. Typically there are around two million lines in a file and each contains a comma delimited string with a date, time, price and amount traded of a particular stock. For this analysis I needed to extract a single column from this file and save it in a new file. The new file actually being used as input for GNU Octave.

The Problem

This task is typical and can easily be done with a tool such as Perl or Awk. However, since I am trying to use more Powershell recently, I felt obliged to see how well it could do the job.

To start, I created a sample data file, testdata.csv, by using the following: -

So perhaps this is a deficiency of Powershell; one they might improve with V2. Fortunately, in this case we are dealing with a CSV file so we can improve performance using Import-Csv. Here is another attempt: -

A considerable improvement! Then I began thinking would it be possible to include Perl within the Powershell pipeline. Unfortunately, this is not a simple case of placing Perl after the pipe character '|' since Powershell will not connect to the STDIN and STDOUT of a normal process. One needs to open a stream to STDIN of a Perl process and feed it Powershell's pipeline '$_' object. Furthermore, the STDOUT of the Perl process needs to be collected and sent back through to Powershell as a pipeline object.

This Powershell function starts by spawning a Perl process and redirecting its STDIN and STDOUT streams. During the processing stage, data is flushed into Perl's STDIN and finally all data from the STDOUT stream is sent back down Powershell's pipeline via the echo command. Note the following will not work: -

PS> Get-Content .\testdata.csv | Perl-Filter > testdata.out

One problem with this function is that all of Perl's output is kept in memory until the process ends. It would be nice to Read data from the Perl process during the processing stage and send it down Powershell's pipeline as it is ready. Sadly, due to a known problem with .Net's StreamReader implementation a Read or Peek will block if the Stream has not had any data sent through it. The only workaround I know of is to start a separate thread to manage the Stream and this is where Powershell V1 has its limitations.

Another problem is it simply hangs because Perl is blocked from writing to the STDOUT stream once this pipe buffer is full. This usually is set to around 8KB.

A Powershell Cmdlet

So, how does one allow a pipeline to access another process's STDIN and STDOUT streams? Well the answer appears to be that one has to write a Cmdlet using C# or VB.Net.

The general layout is similar to that of the function above. One must implement three main methods BeginProcess, ProcessRecord and EndProcess each corresponding to the BEGIN, PROCESS and END blocks above. Building a new Powershell Cmdlet is made very simple by using David Aiken's Visual Studio Template.

I chose to follow a similar structure to the Powershell Function above spawning my process in a begin block but also starting a special thread that monitors the STDOUT of this process. The thread looks for a line delimiter sequence in the stream and as these are discovered records are broken off and pushed into an ObjectQueue. Here is an excerpt from the thread: -

This is only a small improvement on the original .Net's string method above.

Conclusion

There are many advantages to using Powershell for many administration tasks. However, when processing more than 100,000 objects through its pipeline your task's performance will take a big hit. This is perhaps where it is worth falling back on more tried and tested tools such as Perl or Python and using traditional techniques. Powershell's object layer is very powerful; however, it would be nice if it could detect if it was processing either an object stream or text stream and behave accordingly.

I felt having access to a non-Powershell process as a Cmdlet might be useful so I have posted my code here on Google if you would like to play with it. Please note it has not been tested thoroughly and is bound to have many bugs.