Gradual Degradation of 3DMark Fire Strike Produces Unreliable Results

For those that are not already aware of the issue, the folks at Futuremark seem to be struggling to keep a consistent product in the latest 3DMark benchmark. In particular, Fire Strike. Sometime around the release of Time Spy things started getting screwy with Fire Strike and now it seems with every Fire Strike GUI version update the effect is progressively decreasing benchmark scores, and specifically the physics portion of the benchmark.

Kudos to @Papusan for noticing this months ago and asking me to have a look at it. He has been going back and forth with Futuremark about the problem and it seems they are either ignoring him or perhaps they do not view it as a high priority issue. Or, maybe because most people running Fire Strike are not observant enough to notice, care, or ask questions they feel they don't need to fix it.

Some people might say you cannot compare results across benchmark software versions, but that shouldn't hold water here. There is a leaderboard and searchable database of results that basically every benching enthusiast and PC reviewer relies on, and if there is not a very high degree of consistency between GUI versions the results in their database will become irrelevant, as will their leaderboard. The search filter does not have a field to filter by GUI version, so we can expect the results from the database and leaderboard to be increasingly misleading, inaccurate and unreliable over time. This certainly is not a desirable thing for what is supposedly the current defacto standard in PC benchmarks.

You will notice from the examples posted below that with each new version of Fire Strike the scores get lower and lower. These examples are consecutive runs on the same day, same machine, and identical CPU and GPU settings. The only thing that changes is Fire Strike benchmark results degrade with newer versions. We need Futuremark to understand and correct this.

If you agree this is a problem and want it to be fixed, please complain to Futuremark and let them know they need to put the brakes on and not do anything else with 3DMark until they have this mess under control. Gimmicky features are one thing, but inconsistent benchmark results makes 3DMark unreliable.

If you would like to do your own testing to validate the issue before contacting Futuremark, older versions of 3DMark are available for download from the TechPowerUp.com web site.

In case you're not good at simple math, here is a visual aid to show what the fuss is about.

Update 12/13/2016:

We would like to acknowledge that a representative of Futuremark has responded promptly to this article and provided an email address for those interested in communicating with them about the issue. We appreciate the accountability and responsiveness.

Update 12/15/2016:

We sincerely are grateful for Futuremark's responsiveness. I provided additional test results to Mr. Kokko to corroborate the findings of @Papusan and they have released an update that is expected to resolve the issue. See the message from James below for more details.

14 hours ago, Futuremark_James said:

Hello. James from Futuremark here again.

We've confirmed that there was an issue with the GUI, and we're in the process of rolling out an update (3DMark v2.2.3509) that should fix the scoring discrepancy.

With this update, overall scores increase slightly by up to 0.3%. Scores from the Physics and CPU parts of benchmark tests may improve by up to 2.5%. These changes bring the scores from 3DMark v2.2.3509 back in line with results from earlier versions that did not have the GUI issue.

For context, it is normal for 3DMark scores to vary by up to 3% between runs since there are some factors in a modern, multitasking operating system that cannot be completely controlled. So again, all credit to @Papusan for noticing the problem and bringing it to us.

To get the update, just open 3DMark and you should get a notification with the option to install it. The Steam version and Steam demo have also been updated.

On 12/13/2016 at 0:19 PM, Futuremark_James said:

Hi. James from Futuremark here.

We've been looking into this today, and I'd like to share what we've found.

The Fire Strike workload has not changed at all since 2013. This means that Fire Strike scores should not have changed across app versions either.

We've confirmed that running 3DMark from the command line gives consistent scores across all versions. Unfortunately, it does look like there is an issue when running recent versions from the GUI. We see the same ~2.5% difference in Physics test scores across GUI versions that @Papusan reported to us.

We believe we have found the bug in the GUI, but we need to run some more tests to be sure.

@Mr. Fox, the differences that you are seeing in your results are much larger, and it is not clear why. We would be grateful if you could contact us at info@futuremark.com so we can go through some troubleshooting steps with you.

Thank you, @Papusan, for bringing this to us. I am sorry that we have been slow to respond. I understand how frustrating that is.

Share this comment

Link to comment

Share on other sites

LOL, just click the link above the video. It shows what the video does and it's instant gratification. The only reason I made the video was for YouTube viewers and for increased exposure of the issue that warrants attention.

The lowest score is the "latest and greatest" version of Fire Strike. The benchmarks were run only minutes apart on the same machine with identical CPU and GPU settings with no changes whatsoever except for the version of Fire Strike.

1 person likes this

Share this comment

Link to comment

Share on other sites

Hey I watched all painful X minutes of the video... seems about 10+% drop in score Physics based. I can see this being an issue, but also an advantage to post your results ASAP! because people down the road will be less likely to beat you hahah

Share this comment

Link to comment

Share on other sites

Did they alter their physics calculation algorithm or is this a case of bloat? I'd be curious to see what Futuremark says about this.

@Papusan has been communicating with them for a while now and since that started every version gets progressively worse, so I thought it was time to bubble this up for attention before it gets too far out of hand. I have not had any communication with them. It's important that they pay attention to this to preserve their reputation. If nobody can trust the results to be accurate and it becomes necessary to research the version number to draw comparisons that are not dramatically skewed it could really hurt them. I don't want that to happen, so the article here will hopefully be a wake-up call for them. The degradation of results are already significant over multiple software revisions and that will make it more difficult for professional reviewers to get a clear and reliable comparison of old versus new tech as well.

1 person likes this

Share this comment

Link to comment

Share on other sites

I have sent a new feedback mail to Futuremark v/jarlo Kokko for the 10th time... Futuremark have said long time ago to me in the mail that"They have reproduced in-house and investigation is ongoing" I have send them a lot of result for their investigation. Nothing happens as you can see in the pictures - links!!!

And when they finally push out the new <FIXED> 3DM version after 3 months, so is the 3DM benchmark software in an even worse condition...

Like the last time... New 3DM suiteUI 2.2.3488 64version out 9th Dec. = Fiasko!! Then they need to push out an even newer one because the trouble witht the first one out... 1 day lateraka10th Dec. The newest messed up come out <UI 2.2.3491 64>.

Same mess happened last two time as well(I think in July and Aug). Futuremark have BIG problems with their 3DM Suite!!!

See results. Both older UI versions 2.0.2067_64 and 2.0.2809_64 will give 15002 in Physics with 6700K@4.8GH and both 2 latest drivers from Nvidia!! Newer UI versjons of 3DM Suite will give up to 400 points lower physics in fire Strike. All tested with same Nvidia drivers, stock graphics and 4.8GHz on processor.

Share this comment

Link to comment

Share on other sites

I have sent a new feedback mail to Futuremark v/jarlo Kokko for the 10th time... Futuremark have said long time ago to me in the mail that"They have reproduced in-house and investigation is ongoing" I have send them a lot of result for their investigation. Nothing happens as you can see in the pictures - links!!!

And when they finally push out the new <FIXED> 3DM version after 3 months, so is the 3DM benchmark software in an even worse condition...

Like the last time... New 3DM suiteUI 2.2.3488 64version out 9th Dec. = Fiasko!! Then they need to push out an even newer one because the trouble witht the first one out... 1 day lateraka10th Dec. The newest messed up come out <UI 2.2.3491 64>.

Same mess happened last two time as well(I think in July and Aug). Futuremark have BIG problems with their 3DM Suite!!!

See results. Both older UI versions 2.0.2067_64 and 2.0.2809_64 will give 15002 in Physics with 6700K@4.8GH and both 2 latest drivers from Nvidia!! Newer UI versjons of 3DM Suite will give up to 400 points lower physics in fire Strike. All tested with same Nvidia drivers, stock graphics and 4.8GHz on processor.

Share this comment

Link to comment

Share on other sites

We've been looking into this today, and I'd like to share what we've found.

The Fire Strike workload has not changed at all since 2013. This means that Fire Strike scores should not have changed across app versions either.

We've confirmed that running 3DMark from the command line gives consistent scores across all versions. Unfortunately, it does look like there is an issue when running recent versions from the GUI. We see the same ~2.5% difference in Physics test scores across GUI versions that @Papusan reported to us.

We believe we have found the bug in the GUI, but we need to run some more tests to be sure.

@Mr. Fox, the differences that you are seeing in your results are much larger, and it is not clear why. We would be grateful if you could contact us at info@futuremark.com so we can go through some troubleshooting steps with you.

Thank you, @Papusan, for bringing this to us. I am sorry that we have been slow to respond. I understand how frustrating that is.

Share this comment

Link to comment

Share on other sites

We've been looking into this today, and I'd like to share what we've found.

The Fire Strike workload has not changed at all since 2013. This means that Fire Strike scores should not have changed across app versions either.

We've confirmed that running 3DMark from the command line gives consistent scores across all versions. Unfortunately, it does look like there is an issue when running recent versions from the GUI. We see the same ~2.5% difference in Physics test scores across GUI versions that @Papusan reported to us.

We believe we have found the bug in the GUI, but we need to run some more tests to be sure.

@Mr. Fox, the differences that you are seeing in your results are much larger, and it is not clear why. We would be grateful if you could contact us at info@futuremark.com so we can go through some troubleshooting steps with you.

Thank you, @Papusan, for bringing this to us. I am sorry that we have been slow to respond. I understand how frustrating that is.

I'll post here again when we have more info to share.

Thank you for responding. Much appreciated!

I will reach out to the email provided so you can ask questions privately by email. The exaggerated example shown with the wide difference is not incremental. By choosing GUI versions that were further apart in time, those two versions in particular, there was a much greater variance than, for example, comparing the latest to the most recent previous GUI version. Of course the concern is the fact that over time it would be more difficult to compare things using scores from the database or leaderboard due to the gradual but growing decrease in physics performance, and the combined test.

@Futuremark_James - here is a less dramatic example from two versions released close to one another. The variance is probably a closer representation to what you have seen comparing current to last release.

Share this comment

Link to comment

Share on other sites

We've been looking into this today, and I'd like to share what we've found.

The Fire Strike workload has not changed at all since 2013. This means that Fire Strike scores should not have changed across app versions either.

We've confirmed that running 3DMark from the command line gives consistent scores across all versions. Unfortunately, it does look like there is an issue when running recent versions from the GUI. We see the same ~2.5% difference in Physics test scores across GUI versions that @Papusan reported to us.

We believe we have found the bug in the GUI, but we need to run some more tests to be sure.

@Mr. Fox, the differences that you are seeing in your results are much larger, and it is not clear why. We would be grateful if you could contact us at info@futuremark.com so we can go through some troubleshooting steps with you.

Thank you, @Papusan, for bringing this to us. I am sorry that we have been slow to respond. I understand how frustrating that is.

I'll post here again when we have more info to share.

Thanks for taking care of this problem. I reported this problems medium August. Now December!! I really hope this now finally will be fixed. Thanks again

2 hours ago, Mr. Fox said:

Glad to help. I think everyone that knows about the problem will want them to correct it.

Your images are broken. Maybe posting them on imgur or postimage.org and use the direct links to insert them here would help.

Sorry Fox. I posted with my small phone, so pict was screwed I think

2 people like this

Share this comment

Link to comment

Share on other sites

I ran a benchmark with the latest Fire Strike version to an older run I had done when I got my 13 R2. While the drivers versions are different, I should have seen in increase in performance nonetheless. Made sure to minimize background tasks as much as I needed as well.
- Game7a1

Share this comment

Link to comment

Share on other sites

We've been looking into this today, and I'd like to share what we've found.

The Fire Strike workload has not changed at all since 2013. This means that Fire Strike scores should not have changed across app versions either.

We've confirmed that running 3DMark from the command line gives consistent scores across all versions. Unfortunately, it does look like there is an issue when running recent versions from the GUI. We see the same ~2.5% difference in Physics test scores across GUI versions that @Papusan reported to us.

We believe we have found the bug in the GUI, but we need to run some more tests to be sure.

@Mr. Fox, the differences that you are seeing in your results are much larger, and it is not clear why. We would be grateful if you could contact us at info@futuremark.com so we can go through some troubleshooting steps with you.

Thank you, @Papusan, for bringing this to us. I am sorry that we have been slow to respond. I understand how frustrating that is.

I'll post here again when we have more info to share.

Hi James and welcome to Tech|Inferno! Thanks for the response and I think the enthusiast community will be waiting to hear back on the findings Futuremark has on this discrepancy and what will be done to resolve it.

3 people like this

Share this comment

Link to comment

Share on other sites

I ran a benchmark with the latest Fire Strike version to an older run I had done when I got my 13 R2. While the drivers versions are different, I should have seen in increase in performance nonetheless. Made sure to minimize background tasks as much as I needed as well.
- Game7a1

Thank you so much for taking time to respond. We really appreciate it.

Your 16% variance is a great example that is similar to the one I posted. This demonstrates how much disparity there is with Fire Strike submissions over the course of revisions that have occurred since the beginning of the year. While the changes between consecutive revisions seem like they are within a small margin of error at first blush, the cumulative effect is not acceptable if the results stored in their database and respect for their leaderboard are to be deemed important and useful data.

2 people like this

Share this comment

Link to comment

Share on other sites

Thank you so much for taking time to respond. We really appreciate it.

Your 16% variance is a great example that is similar to the one I posted. This demonstrates how much disparity there is with Fire Strike submissions over the course of revisions that have occurred since the beginning of the year. While the changes between consecutive revisions seem like they are within a small margin of error at first blush, the cumulative effect is not acceptable if the results stored in their database and respect for their leaderboard are to be deemed important and useful data.

Even a small change in score around 1-3% in the subtests between the different GUI versions, should be easy to find with normal testing. Also the change in power draw between the GUI versions... should be easily discovered and a bell should start to ring.

Edited December 13, 2016 by Papusan

2 people like this

Share this comment

Link to comment

Share on other sites

Yes, I agree @Papusan. It should have been detectable. It depends on how rigorous the testing was and whether they connected the dots that every version got worse and worse to where the cumulative effect of a drop with each GUI revision amounts to a lot over time.

I really hate the bloated new UI. The older version that wasn't so busy was much better. Maybe in the process of fixing this problem they can return to the older/sleeker UI. The big circle at the top with the score and the excessive amount of wasted screen space that gets hogged up by junk is unnecessary and unattractive in my personal opinion. Maybe that was to make the kiddos happy or something.

2 people like this

Share this comment

Link to comment

Share on other sites

Yes, I agree @Papusan. It should have been detectable. It depends on how rigorous the testing was and whether they connected the dots that every version got worse and worse to where the cumulative effect of a drop with each GUI revision amounts to a lot over time.

I really hate the bloated new UI. The older version that wasn't so busy was much better. Maybe in the process of fixing this problem they can return to the older/sleeker UI. The big circle at the top with the score and the excessive amount of wasted screen space that gets hogged up by junk is unnecessary and unattractive in my personal opinion. Maybe that was to make the kiddos happy or something.

You are 100% right about the new 3DM GUI. This pict below is from the old 3DM Fire Strike GUI. As you can see in the picture, you could actually see the maximum CPU power draw in benchmark test if you checked the box.New is not always better!! Just look at the Windows X failure. The new Os... Windows 10 look like an OS designed for children around 5 years old with all the pastel colored tiles. More intended for handheld tablet, phones. The <new> Windows is No longer a nice OS for desktops.

Share this comment

Link to comment

Share on other sites

We've confirmed that there was an issue with the GUI, and we're in the process of rolling out an update (3DMark v2.2.3509) that should fix the scoring discrepancy.

With this update, overall scores increase slightly by up to 0.3%. Scores from the Physics and CPU parts of benchmark tests may improve by up to 2.5%. These changes bring the scores from 3DMark v2.2.3509 back in line with results from earlier versions that did not have the GUI issue.

For context, it is normal for 3DMark scores to vary by up to 3% between runs since there are some factors in a modern, multitasking operating system that cannot be completely controlled. So again, all credit to @Papusan for noticing the problem and bringing it to us.

To get the update, just open 3DMark and you should get a notification with the option to install it. The Steam version and Steam demo have also been updated.

Share this comment

Link to comment

Share on other sites

We've confirmed that there was an issue with the GUI, and we're in the process of rolling out an update (3DMark v2.2.3509) that should fix the scoring discrepancy.

With this update, overall scores increase slightly by up to 0.3%. Scores from the Physics and CPU parts of benchmark tests may improve by up to 2.5%. These changes bring the scores from 3DMark v2.2.3509 back in line with results from earlier versions that did not have the GUI issue.

For context, it is normal for 3DMark scores to vary by up to 3% between runs since there are some factors in a modern, multitasking operating system that cannot be completely controlled. So again, all credit to @Papusan for noticing the problem and bringing it to us.

To get the update, just open 3DMark and you should get a notification with the option to install it. The Steam version and Steam demo have also been updated.

Thank you so much, James. I am looking forward to testing the new release and confirming the fix. I trust @Papusan is equally appreciative. Thanks as well to Jarno Kokko (Futuremark) for his assistance.

Share this comment

Link to comment

Share on other sites

We've confirmed that there was an issue with the GUI, and we're in the process of rolling out an update (3DMark v2.2.3509) that should fix the scoring discrepancy.

With this update, overall scores increase slightly by up to 0.3%. Scores from the Physics and CPU parts of benchmark tests may improve by up to 2.5%. These changes bring the scores from 3DMark v2.2.3509 back in line with results from earlier versions that did not have the GUI issue.

For context, it is normal for 3DMark scores to vary by up to 3% between runs since there are some factors in a modern, multitasking operating system that cannot be completely controlled. So again, all credit to @Papusan for noticing the problem and bringing it to us.

To get the update, just open 3DMark and you should get a notification with the option to install it. The Steam version and Steam demo have also been updated.

Thanks for the help James. And say thanks from us bench enthusiasts to Mr. KOKKO on Futuremark as well. For us here is number crunching a pleasure as you probably know, so it's importent that the bench tests work as intended.

And we hope Futuremark might consider making benchmark tests that put more emphasis/importance on the processor power, than what sub tests like Firestrik in 3DMark suite does today. More like the old 3DM Vantage and 3DM11.

I am looking forward to testing the new fixed 3DM suite and confirming the fix. But this will take time because my internet speed sucks

Share this comment

Link to comment

Share on other sites

I asked the team about the CPU power monitoring. It seems there was a concern that it didn't work reliably, or at all, with some hardware. We're going to look into it again and see whether that is still the case.

Sign in

Similar Content

For those that are not already aware of the issue, the folks at Futuremark seem to be struggling to keep a consistent product in the latest 3DMark benchmark. In particular, Fire Strike. Sometime around the release of Time Spy things started getting screwy with Fire Strike and now it seems with every Fire Strike GUI version update the effect is progressively decreasing benchmark scores, and specifically the physics portion of the benchmark.

Kudos to @Papusan for noticing this months ago and asking me to have a look at it. He has been going back and forth with Futuremark about the problem and it seems they are either ignoring him or perhaps they do not view it as a high priority issue. Or, maybe because most people running Fire Strike are not observant enough to notice, care, or ask questions they feel they don't need to fix it.

Some people might say you cannot compare results across benchmark software versions, but that shouldn't hold water here. There is a leaderboard and searchable database of results that basically every benching enthusiast and PC reviewer relies on, and if there is not a very high degree of consistency between GUI versions the results in their database will become irrelevant, as will their leaderboard. The search filter does not have a field to filter by GUI version, so we can expect the results from the database and leaderboard to be increasingly misleading, inaccurate and unreliable over time. This certainly is not a desirable thing for what is supposedly the current defacto standard in PC benchmarks.

You will notice from the examples posted below that with each new version of Fire Strike the scores get lower and lower. These examples are consecutive runs on the same day, same machine, and identical CPU and GPU settings. The only thing that changes is Fire Strike benchmark results degrade with newer versions. We need Futuremark to understand and correct this.

http://www.3dmark.com/compare/fs/11047304/fs/11047179/fs/11047154

Here is a similar example from @Papusan: http://www.3dmark.com/compare/fs/11036017/fs/11035883

If you agree this is a problem and want it to be fixed, please complain to Futuremark and let them know they need to put the brakes on and not do anything else with 3DMark until they have this mess under control. Gimmicky features are one thing, but inconsistent benchmark results makes 3DMark unreliable.

If you would like to do your own testing to validate the issue before contacting Futuremark, older versions of 3DMark are available for download from the TechPowerUp.com web site.

In case you're not good at simple math, here is a visual aid to show what the fuss is about.

Update 12/13/2016:
We would like to acknowledge that a representative of Futuremark has responded promptly to this article and provided an email address for those interested in communicating with them about the issue. We appreciate the accountability and responsiveness.
Update 12/15/2016:
We sincerely are grateful for Futuremark's responsiveness. I provided additional test results to Mr. Kokko to corroborate the findings of @Papusan and they have released an update that is expected to resolve the issue. See the message from James below for more details.

View full article

AMD's Zen Processor
213 members have voted

1. Do you think AMD's new 8 core Zen processor will be competitive with Intel?

About Us

Tech|Inferno was formed in 2011 by a small group of enthusiasts who wanted to create a platform that would empower others to share in their passion of all things related to modern technology. To achieve this vision, Tech|Inferno invites developers, modders, enthusiasts and dreamers to come on board and share their knowledge with its large and ever growing community.