Archive for August, 2005

One of the things I've been thinking about writing about is interesting
and common math problems. I'll start with some easy ones that I've had
to do at work (no trade secrets, just common math in the trade).
Notes:
- I'll use the prefix "0x" to denote that the number is written in hex
notation. E.g.'s 0x10 is 16, 0x1F is 31.
- There's 8 bits in a byte (I may not get to that until later problems
though).
Largest packet aligned buffer:
MPEG 2 transport streams have both 188 and 204 byte packets. In order to
transfer a high speed transport stream, large buffers are needed. Large
buffers reduce the number of interrupts per second, and make for
effective DMA transfers. To effectively process an MPEG-II transport
stream, buffers should be a multiple of the packet size. Splitting and
joining buffers is not only "a pain" to program, but also requires
otherwise unnecessary processing power.
With a maximum buffer size of 0x20000 (131072) bytes, and a chosen
packet size of 188, what is the largest packet aligned buffer size
allowed?
The answer is simply 131072/188*188 when the division is integer
division (i.e. no decimal places). To do this with a calculator that
only does regular division, one simply needs to remember the integer
part of the division. In this case 131072/188=697..... So, then I just
clear the calculator, and type 697*188 and get 131036 (0x1FFDC).
...
With a maximum buffer size of 0x20000 (131072) bytes, and a chosen
packet size of 204, what is the largest packet aligned buffer size
allowed?
The answer is 131072/204*204 where the division is integer division. The
final number is then 130968 (0x1FF98).
What is the minimum buffer size that is divisible by both 188 and 204?
To solve this, take out the common factors, and multiply the remaining
together. The factors in 188 are: 2, 2, 47. The factors in 204 are: 2,
2, 3, 17. So the answer is 2*2*47*3*17=9588 (0x2574), or
188*204/2/2=9588 (0x2574).
With a maximum buffer size of 0x20000 (131072) bytes, and a packet size
that can only be 188 or 204, what is the largest packet aligned buffer
allowed?
Using the knowledge from above that 9588 bytes is the smallest multiple
of both 188 and 204, it's simply 131072/9588*9588 (again where the
division is integer division). So the answer is 124644 (0x1E6E4).
Other numbers of interest:
- 196=188+8(64 bits is 8 bytes) for a 188 byte packet with a 64 bit
timestamp.
- 212=204+8 for a 204 byte packet with a 64 bit timestamp.
- 512 bytes is a common write size divisor for hard drives.
- 192=188+4(32 bits is 4 bytes) for the MPEG stride (MPEG 2 transport
stride?) format (HDV).
- 4 bytes (32 bits) seems to be a preferred DMA number (the PCI bus is
always at least 32 bits wide).
- 8 bytes (64 bits) might be preferred for certain DMA.
I'll probably write myself up a quick reference. When programming, you
can have the program calculate the right numbers given any buffer size,
packet size or other factor.
Originally from: http://www.boxheap.net/ddaniels/notes/20050820.txt

So today I got a new USB key. It's a Kingston Data Traveller. I decided
that I should set it up so that I can use it with the laptop I use on a
regular basis (a very old Toshiba Satellite 310CDS that's rattling). My
first step was to consider the file system format. I was surprised to
find that I couldn't format the device in NTFS format. I left it
formated at the default FAT32 and decided to get on to other things.
At home tonight I spent some time downloading the drivers for the device
and attempted to install them. The driver installer is an installshield
created one that's been winziped into a self extracting file (sometimes
called an sfx). Three layers of compression and installer junk managed
to make the under 49KiB of driver files take over 1MB. The second
frustrating thing I ran into is the installer is designed to detect the
operating system, and refuse to install if it doesn't think it'll work.
Well, I guess before that I had read a faq from Kingston saying that
Windows 95 doesn't support USB drives at all.
Before I get ahead of myself again, I'll go back and say that I tried to
do some research on what Windows 95 supports in the way of USB drives. I
didn't manage to find much, but I did find the usual indications that
earlier versions of Win95 didn't have any USB support, or that it wasn't
working. I already knew that USB support worked on my computer.
I guess I should have seen things coming ahead of time. In December I
had looked at trying to get pictures off a Fijitsu FinePix digital
camera. It too had an annoying installer, and claimed not to work with
Win 95. It further had a bunch of software bundled with it's
installation that I still haven't bothered to figure out. Luckily, the
installer for the driver itself wasn't hard to find, and I managed to
get the device driver, and some of the software installed.
Despite getting things installed for the FinPix camera, the software
complained about a missing dll function, and the driver didn't seem to be
working. I decided that with my many licences of Windows, I should try
to upgrade certain dll's with versions from newer versions of Microsoft
Windows. My results were of course that some of the important dll's
could not be replaced.
That got me thinking again about getting open source replacements for
certain components. I looked for a while, and decided that without a
better understanding, I might end up accidentally installing a dll that
needs an Linux shared library (.so) or something. My following of the
Wine Weekly News (WWN) on http://www.winehq.com and reading the ReactOS
developers/kernel mailing list indicated that some dll's from these
projects were defiantly dependent on components that I'm not ready to
replace.
So more recently (getting back to the USB key), I did another search on
the subject of replacing Microsoft Windows 95 dll's with OpenSource
compatible versions. I'm also now considering replacing the kernel and
other core files. I did find that WWN shows that they've been building
PE versions of their dll's for Win32, but it's not clear which can
replace the dll's in Windows 95. I get the impression that files from
ReactOS might be a better replacement than Wine's as they'll have less
Linux, BSD, Solaris related stuff in them and be created with binary
compatibility in mind for even more core pieces (e.g. no required
wineserver).
To date I've had no luck with either the USB key or digital camera under
Windows 95. I've decided that in order to start replacing Win95 on this
notebook, I'd better get a better understanding of the dependencies and
compatibilities of different components. To do this I'd like to get or
create a list of files, a graph (tree?) of the dependencies between
files, and a fresh compatibility status of the files from whatever
source I choose. Unfortunately ReactOS's compatibility page doesn't jump
out at me in searches (I remember seeing it once or twice). I also
believe both ReactOS and Wine don't list their compatibility in relation
to Windows 95, but to whatever the latest version of the component is.
So the processes I'll probably want to take will start with listing the
operating system files on the computer I'm targeting. Then I'll probably
use something like dependency walker (depends.exe from systeminternals?)
to figure out the dependencies of each files (as best I can). Then I'll
look at the compatibility status on the web. Last I may have to look at
the exports from both files. Since no one else seems to have published
this information, I'll probably write up my findings as I go. I might
even make it easier to install Open Source replacement components for
other versions of Windows by performing the same process using fresh
installs of other versions.
It's getting late now and I'm getting tired. I was planning to also
write about how to use unshield and winzip to extract files from
annoying installers. I also felt the need several times to explain why I
wanted open source replacement files, and didn't upgrade Windows
(remember I do have licences to newer versions). I guess I can quickly
say that I like having free access to the source of what I'm using so
that I or just about any other programmer can enhance/fix it. I also
don't want to install Windows98 or later on this laptop because it may
take more system resources, not run, and well I'd rather maximize the
use of my Windows 95 licences before using other ones. I've tried
ReactOS and Wine, and I know they're still not 100% replacements for
Windows (although extremely close nowadays). I also believe that other
people share my viewpoints and/or situations.
Maybe later this week I'll write more on the topic of replacing windows
components or Windows Device Drivers (wdm, ndis, inf, the wonderful
dpinst.exe and more), but for now it's time for me to get some sleep...
Originally from: http://www.boxheap.net/ddaniels/notes/20050817.txt

I was planning on writing about the problems I faced at work looking up
open source software for SMPTE 125M convertion. I kept finding SMPTE
timecode stuff (for MIDI), and other usages of the acronym SMPTE without
reference to which standard was being used. The one's related to SMPTE
125M are SMPTE 292M (HD-SDI), SMPTE 259M (transport of SDI and SDTI),
SMPTE 305M (sometimes called SMPTE 305.2M which is SDTI), and the
document on ancellary data. Actually SDTI really is quite different from
SDI except that it goes over 259M.
Anways, tonight I think I write a bit about linking and google. Yes,
part of the reason that I'm writing these notes is to increase the
ranking that I'll get for topics that I'd like employers to see. The
bigger way that I plan to get a good ranking is something I've
accidentaly found before. I've put a one line signature in my e-mails to
mailing lists with my resume's URL. I was hoping I could find someone on
the mailing list that might be interested, or might refer me to someone,
but instead I found that the html mailing list archives looked to be
increasing the rank of my resume. I guess this was a neat trick that can
work on google, and maybe on other search engine's that look at what's
linking to a page to give it a score.
When I finnaly am happy with the testing scripts that I'm working on for
my tarball enhancements I'll post the results to various mailing lists
that are development forums for projects with large tarballs (e.g. the
lkml, some kind of gimp mailing list, maybe some OpenOffice.org AKA OOo
mailing lists...). I've got my resume's URL in the scripts themselves,
but I also plan to put my resume URL tagline in my messages.
One of my problems with my tarball enhancement postings is that I'll
want a perminate place with my domain name that I can host the scripts,
but I'm getting free hosting from a friend (thanks Dean). I don't want
to generate a lot of hits on my friend's server due to the fact he
likely has better uses for his bandwith, and his ISP may not apreciate
it. To prevent such a load on the link to his server (and his server), I
plan to keep the scripts only on the mailing lists (archived in their
archives) until interest drops down a bit. I figure a few weeks would
do, but I'll probably wait a few months.
I'm really quite kean to get my scripts out the door, but I feel they're
not yet ready to stand up to the kind of critism that one gets on the
Linux Kernel Mailing List (lkml). I've got a script to do the actual
tarball creation, and one to show the difference between a normaly
generated one, and the one my script makes, but I don't have something
showing the amount of time that it takes. Measuring the sorting isn't
easy, as it's a series of piped commands. My shell scripting really
isn't put to enough use for me to be able to quickly work around such a
problem. I've checked a few howto's like the bash one, I've asked in the
bash scripting IRC channel, but I couldn't find an answer. I decided to
put the commands into a separate script and time that whole script.
The other problem I've run into is testing. My home computer was taking
a beating compressing and untaring etc.. I decided to use my
SourceForge compile farm shell to do the testing, but it's a pain to put
files onto them. It took me a while before I figured out I had to
download the files to my computer, and then upload them to the compile
farm's central server via sftp or scp. That's something I can do, but it
really compounds another problem I'm having. It takes me a while to make
progress on my free time coding projects, so new target files are
comming out for me to test. I want to be able to post on the lkml the
results of recompressing the latest 2.6 and 2.4 kernels. I keep
optimistically downloading the latest kernels and then having real life
interupt things long enough for me to need a new version to continue.
I'll stop doing that for a while though until I've actually got a draft
sitting in my posponed box of an e-mail to the lkml with the scripts
already finnished and attached or actually inline I think. That's
another problem. The lkml only accepts certain posts, and Linus only
usually accepts things that are in a certain format (plain text inline
iirc). That put me on a tangent of looking up the mailing list rules,
and reading the Linux Weekly News. It'll likely do the same once I get
close enough again.
So with all my knowledge, reading, and interest in digging deep into
open source stories that I see writen/posted, I've thought about trying
to get payed to write. These notes are a bad example of my ability to
write, but a good example of what I enjoy writing about. I've been
solicited once to write a book on Intrusion Detection from a genuine
publisher, but I kind of "fubbed" my responce. I said that I'd be
interested in contributing, but I didn't think I'd have time to write a
whole book. I kind of regret doing that, but I think it was the right
thing to say (just look at my bad record finding time to do coding). I'm
hoping however that a paying gig would actually let me take some time
away from real life to actually get things done (and I'm sure it would).
Of course I've got to stike a balance to keep my home life happy and
healthy (fammily, friends, and my own condition). I've offered to write
a peice on the history of the BSD's to the Linux Weekly News, but they
didn't seem interested. They do post BSD articles, and I was pitching
that I could write one that would show the parallels between AT&T vs The
Regents of Berkly (BSD) and the current SCO vs IBM etc.. It's
interesting how the history repeats itself. For good reference I'd
suggest reading the FreeBSD mailing list archives (a google search found
some good stuff).
Later I might publish the research that I used as part of my pitch for
my BSD history repeats itself story. I'm also probably going to consider
writng about why I don't want to publish my unrealized ideas. I'll also
probably talk about:
- Why I don't write about office politics
- Why I don't write much about my personal private home life (well,
maybe I made that clear <g>)
- My music idea's
- My thoughts and research into a self powered home (well actually
getting power form alternate sorces like sun, wind, water...)
- Thoughts on using "image stacking" for ameture (and hopefully
professional) astronomy (I'll talk about this because other people have
already implemented some of this)
- Some idea's for how people can generate data that's easier to compress
(e.g.'s typing in lower case when there's the option, removing obvious
redundant information, using the same words...)
- Perhaps my ideas on natural language processing
...
I may eventually post my project ideas from the last fourteen years that
I've been writing on paper.
Consider sending me money! My resume is at
http://www.boxheap.net/ddaniels/resume.html
Oh, and I'll probably write about resume creation and open source tools
to do it (hey, maybe lwn.net would be interested in buying that
article).
Originally from: http://www.boxheap.net/ddaniels/notes/20050811.txt

There doesn't appear to be any adopted standards for MPEG over IP. IP
over MPEG looks more interesting. Just packetize an IP stream into a
packetized elementary stream (PES) and multiplex it into a valid MPEG2
Transport Stream. MPEG-2 typically gets transfered over DVB-ASI, DVB-C,
DVB-SI, DVB-T and other protocols (even "ATSC" AKA SMPTE 310M).
So how do you packetize IP packets to go into an MPEG stream? Well that
depends on the source. I'd like to think that any IP source "worth it's
salt", is from a live network. Thus a network feed would need to input
into the packetizer, multiplex it and put it out over a different type
of device. I've heard of some people making a network device driver for
DVB-ASI cards, but at least one engineer I talked to said there's
probably a better way. He suggested keeping the regular characteristics
of the ASI device, and doing the packetizing in application space. I
managed to convince him however that the conveniences of creating a
network device which can be bridged would be far better. He stuck with
the separate device driver idea however and suggested one driver could
use the other.
So then the question is, how do you create a network device driver
that's just a packetizer, multiplexer and forwarder? No doubt there's
some good examples out there, and NDIS should make it easier. I still
worry about doing more than elementary processing in a driver might
cause some strange system behavior. I guess I should also say there's
probably an even easier way to do things in Linux and FreeBSD variants,
but I'm mostly focused on the Microsoft world as that's what I'm told by
Marketing is what's wanted.
On the opposite end you need a depacketizer? Or something to demultiplex
the stream, and put IP back out onto the network. I've seen this done in
software, and that might make more sense on this side of transfers. The
engineer that I speak of above however suggested that the unidirectional
nature of MPEG II transport streams would give another problem,
associating one direction of traffic with the other.
I'm not quite sure how other people bind one transfer direction with
another, but I remember several satellite companies offering service
that beamed high speed broadband internet access to customers and
accepted data back to them via telephone modem. So schemes to put two
different directions from seperate devices have been around for a while.
I just hope that modern network stacks are smart enough to remember that
it's allowed.
I remember someone telling me that the ARPA network was an experiment
designed with the goal that it be able to stay up, even if one link in
the network went down. It failed, or at least that's the punchline. The
modern Internet can't reroute if there's a failure in a router. There
was a fire in a telecom building in Toronto, and connections from
Manitoba Telephone Services (MTS) to Shaw in Winnipeg went down. I've
also seen where an outage in Shaw's network caused places to become
inaccessable, but if you had a proxy on a CA network accessable address,
you could access the rest of the internet. Those are just two local
examples that I know about. The CA network thing is political (I'm told
they're not allowed to carry commercial data due to their funding
grants). For what it's worth, I've also seen many shares of
misconfigured routers, the more obvious cases were with major telecom.
companies.
So back to MPEG transport streams. I know companies like Norsat have
been selling "solutions" to do these things for years, so I think
there's a market. Identifying the market potential is difficult for this
because it's not something most broadcasters, stations, and local
distributors are looking for. It's also not something that's really even
remotely accessable to consumers.
A similar issue that I've thought about for even more years is multiple
links between computers to increase throughput. I know lots of other
people have looked at bridging and bonding, but I wanted to look at it
at an even more insane level, serial ports. Actually I wanted to look at
paralell ports, modems, ethernet, etc.. I suppose it is possible to bond
all these links together, but it certainly isn't common enough that it's
as easy as listing the links (at least as far as I know).
So why bother with all this legacy stuff? Why not build a new network
card that can communicate at the full buss speed? Well actually we're
pretty close to that now. From my own experiences I've calculated that
modern HD-SDI cards must be close to maxing out the bus throughput.
I've also learned that multiple cards on the same bus can't allow a
faster network connection as of course there's only the one bus. Of
course I've seen computers with multiple bus's, but it's hard to know if
they're truly independent, or if they're more likely bridged. Even a
bridged network of bus's can allow each bus to operate almost
independently, if the parent bus is faster than it's children combined
there may be an advantage to using multiple cards.
Even further it's important to note that most modern bus's have
bottlenecks. When was the last time you looked up the DMA latency of the
motherboard you wanted to buy? I'll wager never. I've thought about how
useful this could be to consumers and whether there was a way I could
get the comany that I'm working for to publish regular results of DMA
throughput and latency. We could then get free motherboards. The idea
likely wouldn't work though as that's not really what the business
does.
Other crazy idea's I've had include using every processor on the system
to do computations including the IDE/ATA hard drives (they have RAM
too!). Alas of course most of it would be very convoluted to figure out
a way to use.
The more recent idea that I've had (since Mark Nelson's "random" binary
file challange was posted), was to figure out a list of common
instructions and library calls for which I could get more output that
would be required to issue the request for data (e.g. more bits from
register results than from instruction cost...). This idea has some
potential, but my current "hurdle" is finding the time to get and go
through a list of CPU instructions. Getting a list of available function
calls is also a challange although maybe a nice program to do it already
exists (just get the exports from all dll's etc?).
So as you might be beginning to see, one of my primary interests is
data compression. I used to be interested in the pure pattern finding,
and making the smallest representation possible of common data, but then
I started working in multimedia. It became obvious very fast that the
speed of compression actually is important (not just to those who can't
wait). If things don't compress fast enough you can get overruns, data
loss and ultimatly data corruption (even if that just means missing
bits/bytes/frames...).
One of my past pet projects was zlib compression of SDI (see StreamBed's
deflate option). I've found that I can gzip at 270,000,000 bps (that's
270Mbps in SI notation). The small b means bits of course. The problem
is that it ether needs a fast processor, or a simple pattern (like
colour bars). It may even need both. I haven't had the time to check.
Unfortunaly without the inflate option in StreamBed, customers aren't
too interested yet, and without customer interest my boss isn't too
interested yet either.
I later plan to talk about:

compression of MPEG transport streams (they're already MPEG
compressed, but the tables are text, and there's that predictable 0x47
once per packet or 188/204 bytes).