"One difference in Haswell’s decoding path is the uop queue, which receives uops from the decoders or uop cache and also functions as a loop cache. In Sandy Bridge, the 28 entry uop queue was replicated for each thread. However, in Ivy Bridge the uop queue was combined into a single 56 entry structure that is statically partitioned when two threads are active. The distinction is that when a single thread is executing on Ivy Bridge or Haswell, the entire 56 entry uop buffer is available for loop caching and queuing, making better use of the available resources."

Reminded me of the "disable core parking" thread and some of the mixed results I got with it enabled/disabled. Anyways since the "disable core parking" thread is locked, and I doubt the mods want a continuation that's all I have.

It would take you... 2233 continuous hours or 93 days, 1 hour, and 20 minutes of gameplay to complete your Steam library.In this time you could... Speed run Super Mario Bros (NES) 26,800 times.

I was thinking in regards to Sandy bridge having the uop queue laid out as 28-28, and with core parking enabled half the buffer goes unused.Ivy bridge/Haswell a single thread can use the whole 56 entry uop queue. With core parking enabled it should force the buffer from being partitioned into 28-28 hardwired split Sandy bridge has.

Edit: From Page 3 Figure 2:

It would take you... 2233 continuous hours or 93 days, 1 hour, and 20 minutes of gameplay to complete your Steam library.In this time you could... Speed run Super Mario Bros (NES) 26,800 times.

I was thinking in regards to Sandy bridge having the uop queue laid out as 28-28, and with core parking enabled half the buffer goes unused.

I'm not sure why that would be the case. Core-parking deals with cores, and what you are talking about is intra-core. I don't see how it would interact with core-parking unless logical cores were being "parked," which would mean that only hyper-threading Sandy Bridges would show any effect.

And, of course, if the logical cores are being parked, that would be an obvious performance difference by itself, right?

biffzinker wrote:

Ivy bridge/Haswell a single thread can use the whole 56 entry uop queue. With core parking enabled it should force the buffer from being partitioned into 28-28 hardwired split Sandy bridge has.

Are you saying that core parking makes Ivy Bridge/Haswell behave like Sandy Bridge? Because I don't think that's right, at all.

I just saying on Ivy bridge core parking would allow the whole buffer to be assigned to one thread while the other thread is parked unless a demanding workload is run.

Again, as I said, if that were true then you would only see an effect on hyperthreading chips.

And the feature is called "core-parking" but you are now talking about "thread parking." I don't know what that means, but if you actually mean the logical core can be parked, then, as I said, that would obviously have performance implications that transcend the structure of the decode queue. Namely, you aren't using the logical core at all.