In article <1peuciINNicd at s-crim1.dl.ac.uk> mbpcr at s-crim1.dl.ac.uk (A. Parsons) writes:
>>This sounds very impresive performance, and I dont want to sound facetious
>but HOW much would a network of 96 Un*x workstations cost? And more to the
>point - how much floor space would they take up!
It depends, no doubt there are many workstations hiding in peoples offices
which have lots of spare cycles available so distributed processing of this
kind sounds like a sensible use on the other hand......
>>Also - my understanding is that when it comes to symmetric (or even asymmetric)
>multiprocessing that performance starts to degrade after a certain number of
>processors are added due to interprocessor communication and synchronisation.
Yes. I wrote a full Smith Waterman for the transputer surface at
University of Kent while I was a student and performance did tail off but
it depends very much on how the communications are dealt with and how much
data you shunt around. When I did it I was rather silly creating full
alignments on the individual processors and shipping those back to the
farming processor. Obviously now it is possible to just do the comparison
and pass the score and coordinates back to the farmer for ranking then only
reconstruct however many are required, but I was young and naive way back
in those days :-) Even so, given that scheme I ran on 86 T800s and still
achieved over 90% processor utilisation, so for a pipeline that long the
over head was about 10%, now I could run a pipeline well into the hundreds
with negligible tailing off of efficiency. But, transputers are designed
for that sort of parallelism, writing code on workstations to handle all
the communications etc is much harder than writing a true occam parallel
program as I did so hats off to those who do such things. Me, I stick to
true parallel machines since they are easier to program.
>>I suppose what I am saying is - if more is better - and parallelism is the
>preferred paradigm - then massively parallel is surely the ultimate solution??
Hmmmm, parallelism is the only foreseable way ahead with present technology
since the limitations of performance on single processors will force the
move if we are to stick to silicon based processors. As for massively
parallel (ie SIMD) versus MIMD, well its a matter of personal preference,
SIMD machines are a doddle to program (mind you so are MIMD machines so
don't let that put anyone off) but the style of programming differs greatly
between the two. I would avoid saying ultimate about anything since
something can only be ultimate if the world ends immediately after the
statement. Anything else is just plain hype.
>>I heard Donald Lindberg give a lecture at the Royal Society in London on Monday
>and this was very much the thinking behind his talk so presumably this is the
>preferred route that the NCBI/NLM are going to take?
>>The question I originally asked also had the caveat (which noone to date has
>commented on) "How much longer can we do WITHOUT data parallel solutions for
>searching the masses of data being generated by the HGMP?"
One thing that looks like being rather fun is the ability given very fast
machines to go beyond the Smith Waterman algorithm and do more work to
improve the sensitivity of the program. Of course, you need to get the
Smith Waterman fast enough before you can claim to have time to spare to
deal with such things. I don't see that there is much point in mucking
about with heuristic algorithms which may be sensitive but given the
advances in hardware will pretty much be pointless when it takes a matter
of seconds to do the full SW (or more) algorithm on parallel machines.
>>As this thread is starting to die down i will summarise all responses soon.
I look forward to reading them.
--
Shane Sturrock, Biocomputing Research Unit, Darwin Building, Mayfield Road,
University of Edinburgh, Scotland, Commonwealth of Independent Kingdoms. :-)
Civilisation is a Haggis Supper with salt and sauce and a bottle of Irn Bru.