If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

not necessarily it just means it using a codepath in gl4 that benefit performance.

for a simple example if you emulate tessealation with shader when using gl3 and use actual hardware dedicated tessalation if gl4 is present, just switching to gl4 allow you to do the same but hardware accelerated[you can't do that without gl4 cuz the silicon don't exist in previous hardware not cuz is gl4]. another example could be gl4 class hardware can support fp64[at least some of them not sure if all] then your shader compiler can pass fp64 data in 1 cycle instead of 2 fp32[1 per cycle theorically] needed for gl3 class hardware that will provide some nice speed up in your shaders, etc[<-- is more complex that this but you get the idea]

My hope is that Michael is doing it to annoy you, so that you move to another forum. I hear that the anandtech forums are full of like-minded individuals. You could go there. Let me help you. Here's a link. See you around.

F

and i hope he will do it also fedora vs windows vs osx and not just ubuntu.because its very annoying that he do it just in ubuntu.

and how you connect it to anandtech????? i just said that if you create a benchmarks at least do it right all along.

1.) no, in the case of DX is quite different[most drivers] and in the opengl cases you have many variants with specific custom extension [AGL,WGL,GLX among others] and in the case of WGL depending the driver its emulated over DX. so it takes some analisys depending from and destination of the port[this should not happen but it is like that for many reason]

I can't think of a driver offhand that implements OGL through DX [certainly not NVIDIA, AMD, or Intel].

2.) no, data[api calls but whatever] from a game is not os independant at all[maybe textures] nor is tool independant nor is OS graphic stack independant because you somehow assume opengl is a language or some sort of proto IR language but in the real world is a library that is amazingly flexible and is used in conjuctions many languages[mostly C++/ASM(x86/arm)] and the driver need to be very smart to know what it can do and can't do[or emulate] in each OS. even if is true every OS can manage the hardware they do it extremely different[1:1 translate yeah in movies <-- id4 comes to my mind] sometimes it helps sometimes that force you to rethink a million lines of code.

Again, lets simplify: Assuming an application coded in straight C/C++ [or any other easily portable language], only the OS native libraries are going to be significantly changed from one host to another. The issue becomes when no direct functional replacement is available, in which case you have to either create a suitable replacement algorithm, or significantly alter code. In that case, a driver written against one OS may be significantly degraded in features/performance on another OS, even if the code base is shared. But aside from those sections of code, there should be VERY little that needs to be functionally changed within a program to make it work on any given OS.

Additionally you assume somehow every gl api call is a direct ASM gpu function, no opengl is hardware agnostic[which make it more complex tho] so you can use CPU/GPU/preprocessors/clusters/etc and many of those are not possible on windows while on linux are perfectly standard[no in OSS drivers for now tho] you also wrongly assume that every OS api call do the same is called the same and perform the same which neither is true[is like saying an F35 and helicopter should be the same cuz both fly] [there are many good sites that explain this very deeply and easy to understand google it] and even an algorithm that is efficient in windows can be terribly slow on linux/mac compared to a modified algorithm using the techs in that native OS[many many examples of this | google is your friend] and this mostly drive you to rewrite half of your glsl interpreter to try to find a mid point between OSes.

Again: Lets simplify. I want to add two numbers together. One OS has an API call titled 'AddTwoNumbers(in int, in int, out int)', and another has an API call titled 'Add(in int, in int, out int)'. Now, the first may be implemented by the host OS as Vector Addition. The other may handle this via bit shifts. I, as the application developer, don't care. All I require is some function that takes two integers, adds then, and returns the result. I don't care HOW this is accomplished.

If there is some GL function, I don't care how that function is implemented on Windows or Linux; as long as the input and output parameters are the same, the OS implementation doesn't matter [unless it doesn't work, but that's its own issue].

Again: Implementation is separate. As a developer, I don't care. I'm giving you inputs, and I expect a certain output. How you do that is up to you.

and just to name few more factors that make this ridiculous that would force you to rethink most of that code to meet an performance expectation filesystem, cpu scheduler, vectorization, I/O subsystem, latency, memory handling, interrupt handling/OS flexibility[windows pretty much allow any dirty hack you can think of where linux abort compilation or sigsegv your ass out] and many more

Again, ALL THIS IS INVISIBLE TO ME. I DON'T CARE!

3.) please explain to me this suspend thread thing[why you think is so important] cuz you have like 6 posts getting IANAL about it and after 10 years of developing threaded apps[c++] for linux i never have found a technical reason to suspend threads in efficient code[i always design my apps to be thread safe/small portion/atomics type/etc] and in my windows time i don't remember using them either, so i would like some example or something to get your point here

I can't think of a driver offhand that implements OGL through DX [certainly not NVIDIA, AMD, or Intel].

Again, lets simplify: Assuming an application coded in straight C/C++ [or any other easily portable language], only the OS native libraries are going to be significantly changed from one host to another. The issue becomes when no direct functional replacement is available, in which case you have to either create a suitable replacement algorithm, or significantly alter code. In that case, a driver written against one OS may be significantly degraded in features/performance on another OS, even if the code base is shared. But aside from those sections of code, there should be VERY little that needs to be functionally changed within a program to make it work on any given OS.

Again: Lets simplify. I want to add two numbers together. One OS has an API call titled 'AddTwoNumbers(in int, in int, out int)', and another has an API call titled 'Add(in int, in int, out int)'. Now, the first may be implemented by the host OS as Vector Addition. The other may handle this via bit shifts. I, as the application developer, don't care. All I require is some function that takes two integers, adds then, and returns the result. I don't care HOW this is accomplished.

If there is some GL function, I don't care how that function is implemented on Windows or Linux; as long as the input and output parameters are the same, the OS implementation doesn't matter [unless it doesn't work, but that's its own issue].

Again: Implementation is separate. As a developer, I don't care. I'm giving you inputs, and I expect a certain output. How you do that is up to you.

1.) well not through the developer exposed API but more at subsystem but you don't care [can't find the article but in vista days it was discussed to use DX infrastructure to provide WGL not sure this days]
2.) well maybe and maybe not is very dependant what you say is 80% true[i actually ported win32 code to linux and never is that dreamy], for example:

3.) well is a taste issue as a developer i care a freaking lot how is the under code working but well to me performance and security are religious and my clients accept little longer delivery terms so i can get really picky and even dispose code that works but don't make me happy but back to topic with complex functions is not that dreamy either but ok you don't care if it take a 5 seconds or a second as long as return what you want not my style but ok

4.) if you post on a thread that shared code is the valhalla of drivers and cool and save kittens i expect you understand in practice this is very horrible to achieve and thing like unreadiness, type aliasing hell, very horrible wrappers is the norm cuz this can't be done another way[maybe with exokernels but that is science fiction for now], so since you posted in a driver thread i tried to make you realize drivers are 60% hellish low level kernel dependant code[I/O, memory, vectors, cache, security,etc] 20% glsl compiler, 10% Opengl, 10% aplication profiling/multigpu/other goodies. and in this level your analogy of little C++ funcs don't work since the paradigm in every subsystem is radically different between kernels[api, names, types, parameters, compilers extension, arguments, etc] so it doesnt exist in 99% of the code functions that do X kernel function on both not even remotely and absolutely incomptible with each other so you need wrappers/aliasing/complex high level interpreters/etc to be able to manage the gpu from both OSes and this is per architecture.

sure nVidia and AMD did this years ago when driver were much simplier but even today AMD devs and nvidia devs admits the complexity and readiness of the code is insane and performance tuning is hellish work but is cheaper to maintain this mess than rewrite a driver from scratch and in the case of nvidia they can stay as closed as they feel right

5.) if you don't care [or understand] the difference between shared drivers and native drivers advantages or disvantages and just come to state [or bash] that shared drivers are awesome [maybe cuz your nvidia i faster with the blob for now??] and state intel should do the same cuz its easy but you don't care how cuz you are a developer, what type of answer were you expecting?? i mean for real??

6.) that is some serious uglyness of threaded code but of course is win32 MFC threading api[bleeding pain], well i can tell you for sure this never happened to me in linux using posix threads nor using posix threads with prefetch vectors code either nor using posix threads using hand optimized cpu affinity code with cache prefetch vectors EVER neither using OpenMP 2/3 neither using Qt threading model, so maybe this is or was an win only issue or maybe gcc take care of it silently[has to read optimization pass in g++ to be sure] but either way if you have a piece of code in posix that expose this behaviour without suspend you should report it to glibc bugzilla since they can be more helpful or provide an alternative approach

Windows 7 did perform the best overall but I wouldn't say it destroyed linux. Overall I saw a pretty health balance between all 3 OSes, if you choose the highest-performing Ubuntu tests. Each OS had its strengths, which I personally found weird considering how many of these games are based on the same engine.

Thanks for the sanity check. I was wondering if an RDF was about
Windows didn't win every test, and average margin of victory wouldn't put it in the "destroyed" range, IMHO.

Its not quite that bad anymore; For one, all variants of CHAR are functionally obsolete on windows. TCHAR is the preferred way to use a character constant, since it compiles to either ANSI or Unicode depending on the environmental flag, solving THAT old problem. But yeah, if you use Windows types (which infers using the Win API), then theres more stuff to replace during a port.

3.) well is a taste issue as a developer i care a freaking lot how is the under code working but well to me performance and security are religious and my clients accept little longer delivery terms so i can get really picky and even dispose code that works but don't make me happy but back to topic with complex functions is not that dreamy either but ok you don't care if it take a 5 seconds or a second as long as return what you want not my style but ok

Barring some really strange under the hood issue, you aren't going to see significant execution time differences based on how the OS operates. Assuming your code is written in such a way to ensure no race conditions, proper thread management, and the like. About 95% of the time, when I DO see performance drops in a port, its due to some fundamental flaw in the design that happened to have a much lower impact on a Windows OS. [Thats why having a good compiler debug environment is a must these days.]

4.) if you post on a thread that shared code is the valhalla of drivers and cool and save kittens i expect you understand in practice this is very horrible to achieve and thing like unreadiness, type aliasing hell, very horrible wrappers is the norm cuz this can't be done another way[maybe with exokernels but that is science fiction for now], so since you posted in a driver thread i tried to make you realize drivers are 60% hellish low level kernel dependant code[I/O, memory, vectors, cache, security,etc] 20% glsl compiler, 10% Opengl, 10% aplication profiling/multigpu/other goodies. and in this level your analogy of little C++ funcs don't work since the paradigm in every subsystem is radically different between kernels[api, names, types, parameters, compilers extension, arguments, etc] so it doesnt exist in 99% of the code functions that do X kernel function on both not even remotely and absolutely incomptible with each other so you need wrappers/aliasing/complex high level interpreters/etc to be able to manage the gpu from both OSes and this is per architecture.

Stop, take a deep breath, and READ. I DON'T CARE HOW THE KERNEL IMPLEMENTS SOME FUNCTION ON SOME OS.

I need to talk to some piece of HW. I invoke some function from the Kernel driver, which executes the function, and returns the result.

That's is. The fact the Kernel is 60% ASM, 35% other low level programming, and so on, I don't care. That's implementation. I only care if performance significantly degrades WITHIN the driver itself.

sure nVidia and AMD did this years ago when driver were much simplier but even today AMD devs and nvidia devs admits the complexity and readiness of the code is insane and performance tuning is hellish work but is cheaper to maintain this mess than rewrite a driver from scratch and in the case of nvidia they can stay as closed as they feel right

And its their right. They also have whats probably the highest performance driver on Linux, so I can't help but laugh when people complain about the driver not being open source.

5.) if you don't care [or understand] the difference between shared drivers and native drivers advantages or disvantages and just come to state [or bash] that shared drivers are awesome [maybe cuz your nvidia i faster with the blob for now??] and state intel should do the same cuz its easy but you don't care how cuz you are a developer, what type of answer were you expecting?? i mean for real??

Ok, so the driver is slightly larger because it contains code for more then one OS. That's about the only downside to shared drivers. Sometimes, a slight performance hit is worth it for ease of development (especially is such ease of development leads to a better architecture, leading to increased performance).

6.) that is some serious uglyness of threaded code but of course is win32 MFC threading api[bleeding pain], well i can tell you for sure this never happened to me in linux using posix threads nor using posix threads with prefetch vectors code either nor using posix threads using hand optimized cpu affinity code with cache prefetch vectors EVER neither using OpenMP 2/3 neither using Qt threading model, so maybe this is or was an win only issue or maybe gcc take care of it silently[has to read optimization pass in g++ to be sure] but either way if you have a piece of code in posix that expose this behaviour without suspend you should report it to glibc bugzilla since they can be more helpful or provide an alternative approach

Hardcoded CPU Affinity? You are joking, right? Thats a big no-no. Call me when some "smart" developer decides to do the same thing with his high-priority driver, in which case I hope your using a low-latency scheduler, otherwise your threat might be waiting a bit to actually run. CPU affinity is the realm of the OS scheduler, and any attempts to be smarter then the OS are almost certainly bound to fail.

As for MFC threads, _BeginThread (and variants) are essentially just wrappers around the CreateThread API, just with the MFC message maps already handled. MFC, headaches aside, is a VERY powerful API. Though those headaches are the primary reason a lot of us have moved to C#. (Yes, inefficient as hell, but great for non-performance demanding applications.)

And I'm not the first to recommend a way to start a pthread in a suspended state, but there seems to be a reluctance to expand the standard in this way. (I've yet to hear a TECHNICAL argument against it though). Instead, you have to have the first instruction of every thread you create have it suspend, which slowly eats performance (thread starts, then suspends, eating up its timeslice).

Hardcoded CPU Affinity? You are joking, right? Thats a big no-no. Call me when some "smart" developer decides to do the same thing with his high-priority driver, in which case I hope your using a low-latency scheduler, otherwise your threat might be waiting a bit to actually run. CPU affinity is the realm of the OS scheduler, and any attempts to be smarter then the OS are almost certainly bound to fail.

well cpu schedulers are designed to be fair not almighty but handling cpu affinity manually [and playing a bit with nice] can reduce cache misses or boost very heavy SIMD loops[either in C/C++ or in ASM], is very surgical but is not uncommon, for this to work you need your atomic algorithm's and atomic types tho, if you are used to handle threads that depends on other threads this can get really error prone.

i personally don't consider a good practice share data between threads or keep threads waiting for another threads, so im very careful in never do that and in the case of atomicity is not possible try to use an IPC system with a thread watchdog to do damage control if for some reason the IPC system fails[network issues(in case of clusters or HPC), disk fails, corrupted data, etc].

another thing is i try to keep my threads as small as possible and as branchless/loopless as possible.

with this 2 mantras is easy to to handle threads even when you need to handcode some ASM magic inside C/C++[unlike you think compilers(pgi,icc,gcc,clang, vsc) not always generate pristine 100% perfect asm so sometimes some ASM tricks can gain you a tangible performance or performance over time gain <--- x264 do this a lot] ofc you don't need to do this for pops inventory software at the store but in HPC is very standard regardless the OS and this quite quite common in games too[by hand CPU ASM and especially GPU ASM]