takes several hours to decode few minutes of audio. of course, there is optimized version for x86, which performance is close to that of PS3. however, my points here are two - first, the importance of optimizations; second, i'm quite sure ARM CPUs in ODROID-X will need at least hours if not days to deal with DST multi-channel decoding, showing with that their real performance for multimedia applications. so, also that as suggestion - build the reference DST decoder and run it on ODROID-X - even if you wish try to optimize it as much as possible for ODROID-X hardware and see the results.

And you are so sure because you tried yourself to implement DST on ARM or have a reference that proves you're not just doing unsustained claims? :rolleyes:

08-25-2012, 02:54 PM

const

Quote:

Originally Posted by ldesnogu

And you are so sure because you tried yourself to implement DST on ARM or have a reference that proves you're not just doing unsustained claims? :rolleyes:

i've tried it on high-end x86 CPU, as well on PS3 and i already told the results:

* PS3: significantly faster than real-time, i.e. few minutes of multichannel DST data are decoded for less than those few minutes; of course that's with its SPE optimized decoder.

few minutes of multichannel DST data are decoded for several hours, yes several hours. so, it's like PS3 is hundreds if not thousands times faster

so, unless you state that there are ARMs with faster computational power than high-end x86 CPU then ARM performance would be worst than x86 CPU with the reference decoder - in fact i can try it for sure since i have several Samsung and Texas Instruments ARMs (probably one of the fastest at least those 2 companies are making), but i see it as pointless. once again, optimizing the reference decoder for x86 CPUs and multithreading it gives big performance bump and high-end x86 CPU get closer to PS3, but then is the price - such x86 system costs times the costs of PS3. so, bottom line is that ARM will be 3rd for sure and doing real test only will show 3rd by how big margin compared to the 2nd - i believe it would be very big.

08-25-2012, 06:28 PM

elg2001

const, the fact that you keep saying "high-end x86 cpu" instead of the actual model number makes me doubt your claims. furthermore, your comparison of ARM and Cell are pretty off-base because you're comparing the Cell running code that its best at to an ARM CPU which is optimized for power efficiency; not a fair comparison at all. show overall benchmarks across media, content creation, 3d rendering, power efficiency, etc and you'll have a better idea.

I have a PS3 and it has the worst freakin I/O speeds I've ever seen. Even with a SSD. It is tremendously deficient at mundane tasks like reading data from a hard drive. Overall, it's a shitty processor in my eyes, but for gaming - assuming you write a game that can be in RAM most of the time - it's absolutely amazing. God of War 3 is better looking than any PC game ive ever seen, including ones running on quad SLI etc. Crysis 2, Unreal Engine 3, Crysis 1, Metro 2033, etc look like shit in comparison, and that's running on hardware that's AT LEAST 10x faster GPU-wise. Sony Santa Monica is the studio behind it, and they're an amazing example of what can be done with super-optimized C code.

08-25-2012, 09:17 PM

ldesnogu

The issue is that reference code is usually extremely badly optimized as it serves to show how an algorithm works. I have first hand experience on the reference code for AES; it's used in one of the EEMBC benchmarks and was giving very bad performance on ARM CPU due to the use of integer divisions and modulos that were not needed. Once I took care of the inefficiencies the code was 20 times faster.

I don't know the ref DST decoder but I guess it's not been written with speed in mind. So using it as the basis to know whether a CPU can decode DST looks like a very bad idea. Reference code should never be used as is, unless one has an agenda :)

Overall, I'm expecting ARM computers to really come into their own when they hit 8+ A15 cores with at least quad-channel memory controllers (since individual memory controllers tend to be stuck at a 32-bit width). ARM is a great architecture; I can't wait until it can actually challenge x86 performance-wise. I hear that MIPS is an even more efficient architecture (but I don't know for sure). The Chinese are putting a lot of weight behind MIPS and Alpha for their "new" CPU architectures, so I would expect a lot of progress in those fields. The Chinese will probably be able to give US microprocessor design companies a run for their money in about 10 years (once they get fab issues worked out and confidence in their products, mainly). I welcome our new Chinese overlords. :cool:

Guys, no offense, you say you doubt my claims, but at the same time you don't make difference between "DTS", which is "Digital Theater Systems" and "DST", which is "Direct Stream Transfer". so, then what i should think about you, your level of expertise and your claims. let me put it short again - when ODROID-X can give the same gaming and multimedia real-life experience as PS3 then i will believe that benchmark and those none-sense numbers of the PS3 performance that are put there - i myself have small 4-node PS3 cluster at the university and would be very happy to replace it with ODROID-X, but ARM won't reach any time soon computational power that PS3 offers via its SPE cores. also, you're accepting and talking how important are the optimizations and at the same time you're finding benchmark made with generic not-optimized code for PS3 as good benchmark showing ARM is faster, that's why reciprocal to that way of benchmarking i suggested to test "Direct Stream Transfer" decoding on ODROID-X with the reference code and even if you wish with optimized - i'm sure it can't reach PS3 performance and all PS3 has basically the same CPU - in case you're asking now what PS3 CPU i used to measure the performance.

08-27-2012, 10:52 AM

notzed

it's not that old ...

... that everybody should have forgotten by now. Then again, it is phoronix.

PPE is a single core with dual-threading. And the PPE just isn't very fast, but it wasn't designed to be. It's only there to route data to/from the SPUs and to handle I/O. The SPUs are so fast, and have some very useful multi-cpu features that there's little point in doing any heavy lifting on a PPE.

PS3 linux has 6 SPE's available for free use by applications, but they need to be coded specially for it (opencl should work rather well with appropriate code, although sony killed linux before that was around). The hypervisor only adds some overhead to i/o and the disk is slow to start with - but it doesn't affect computational performance.

A benchmark of linux is one thing, but it isn't doing much to compare the capabilities of the hardware. The 6xSPUs available to linux can run easily code an order of magnitude faster than the PPE (each spu is faster than 1 ppu to start with, even with scalar code) and with some effort, more like two orders. This is because they have isolated memory (== all isolated, dedicated cache), a ton of registers, SIMD, and some fast multi-core synchronisation hardware. They're also most of the chip, so not using them is just a silly comparison.

One can get some pretty nice performance out of ARM + NEON (but not using a c compiler either!), but it's just not in the same league. And when the whole system picture is taken into account (RAM speed, the EIB, the atomic unit, etc), it's another few leagues away.

But it's all pretty academic, since the PS3 was a dead platform the day Sony intentionally killed it to save money.

08-27-2012, 11:30 AM

ldesnogu

Quote:

Originally Posted by const

Guys, no offense, you say you doubt my claims, but at the same time you don't make difference between "DTS", which is "Digital Theater Systems" and "DST", which is "Direct Stream Transfer". so, then what i should think about you, your level of expertise and your claims.

Silly me :o

Quote:

let me put it short again - when ODROID-X can give the same gaming and multimedia real-life experience as PS3 then i will believe that benchmark and those none-sense numbers of the PS3 performance that are put there - i myself have small 4-node PS3 cluster at the university and would be very happy to replace it with ODROID-X, but ARM won't reach any time soon computational power that PS3 offers via its SPE cores. also, you're accepting and talking how important are the optimizations and at the same time you're finding benchmark made with generic not-optimized code for PS3 as good benchmark showing ARM is faster, that's why reciprocal to that way of benchmarking i suggested to test "Direct Stream Transfer" decoding on ODROID-X with the reference code and even if you wish with optimized - i'm sure it can't reach PS3 performance and all PS3 has basically the same CPU - in case you're asking now what PS3 CPU i used to measure the performance.

Yes PS3 Cell probably is faster if properly used. I'm not saying ARM is faster, I agree the original benchmarks are stupid. The problem is that Cell development cost is extremely high.

OTOH your original claim of DST decoding taking days on ODROID-X is stupid too and obviously not based on first hand experience, and that is what I was pointing.

If you want we can all use the reference decoder, don't touch anything to it and see how it performs. Too bad it won't use the SPE, its performance will be extremely bad and you won't have proven anything.

08-27-2012, 11:42 AM

elg2001

Quote:

Originally Posted by notzed

... that everybody should have forgotten by now. Then again, it is phoronix.

PPE is a single core with dual-threading. And the PPE just isn't very fast, but it wasn't designed to be. It's only there to route data to/from the SPUs and to handle I/O. The SPUs are so fast, and have some very useful multi-cpu features that there's little point in doing any heavy lifting on a PPE.

PS3 linux has 6 SPE's available for free use by applications, but they need to be coded specially for it (opencl should work rather well with appropriate code, although sony killed linux before that was around). The hypervisor only adds some overhead to i/o and the disk is slow to start with - but it doesn't affect computational performance.

A benchmark of linux is one thing, but it isn't doing much to compare the capabilities of the hardware. The 6xSPUs available to linux can run easily code an order of magnitude faster than the PPE (each spu is faster than 1 ppu to start with, even with scalar code) and with some effort, more like two orders. This is because they have isolated memory (== all isolated, dedicated cache), a ton of registers, SIMD, and some fast multi-core synchronisation hardware. They're also most of the chip, so not using them is just a silly comparison.

One can get some pretty nice performance out of ARM + NEON (but not using a c compiler either!), but it's just not in the same league. And when the whole system picture is taken into account (RAM speed, the EIB, the atomic unit, etc), it's another few leagues away.

But it's all pretty academic, since the PS3 was a dead platform the day Sony intentionally killed it to save money.

I'd argue it was a dead platform when they announced it was "599 US dollars". I was a huge PS3 hater at first due to it being mostly a shitshow, but I tried to keep an open mind. My opinion changed drastically when I saw the E3 conference announced PlayStation Home and LittleBigPlanet (both of which ended up being horrible). They have some absolutely phonomenal developers (God of War 3 is the most impressive piece of software on any platform that I've seen through today), but Linux should have remained a supported platform if only to facilitate algorithm and open engine development, thus more devs, thus more games, thus more money. Their bean counters aren't very smart if you ask me.

I have no idea how Sony ended up spending $3 billion on developing one console.

Does anyone know why Sony chose QNX as the base platform for the PS3 instead of Linux? I hear it's a really good realtime OS, but I'm sure Linux is too (plus most of their platform work would be done for them for free).