My name is Leena Narayanaswamy and I am a graduate student working on my thesis on SPECCPU2006 Benchmarks. I want to install the benchmarks on windows. I have installed it, but I am unable to give the right path for my compiler in the shrc.bat file. I don't want to use the pre-compiled binaries. I want to compile them and get the binaries. My computer is Intel core 2 Duo(x86-64 bit) and it has windows 7. And I have installed the intel C++ compiler (Intel composer xe2013).

Can you please guide what changes I have to make in the shrc.bat file. or I just want to try following the installation steps given on the website. I am stuck at the 6th step, that is editing the shrc.bat. Please do reply me.

11-11-2011 09:42 AM

marfig

Indeed! And it comes with very important optimization to floating-point multiplication (AVX1.1 handles addition only) that can improve the performance of some of the commonly used calculations in modern computing; Polynomials, matrix multiplication and, necessarily, dot product.

11-11-2011 09:28 AM

Kougar

Slightly off topic, but since you mention AVX... AVX 2.0 is already due at the start of 2013 in Haswell, apparently.

11-11-2011 07:17 AM

marfig

Thanks

I figured this way anyone will have a better idea what CPU2006 is and does and can better judge the results that will soon start to appear. On the other hand Rob get a time saving link he can use on the CPU2006 section of the reviews if he so wishes.

There's a whole lot more that can be said about CPU2006, of course. The documentation of this suite is a sure indicator of that.

For instance, the usage or not of processor instruction extensions on the tests (SSE4.2, the newly introduced AVX, etc). We ended up deciding not to use optimization flags, because we figured the inclusion of those would not reflect the type of real-world coverage that is demanded of a benchmark suite for systems review, as is the case with Techgage cpu and computer reviews. So, the configuration files we ended up producing reflect the typical compilation flags of any mainstream application.

Commercial software of the type we tend to use everyday, does not usually ship compiled with processor extension flags. An application optimized for AVX, for instance, will crash on any processor which doesn't have this extension (that's anything that has come before Intel Sandy Bridge and AMD Bulldozer). So, only very specialized software, developed in-house, and for systems known to the developer, usually carries these optimizations.

Still, we have prepared an AVX configuration file (and Rob's still waiting for me to give him a SSE4.2 one). It's likely there is a need to occasionally discuss processors in this context -- either on review articles or, more likely, on separate articles. Processor extensions can be big contributors to CPU performance on certain specialized processing areas. I'm particularly excited(!) about AVX, for instance. So in case there is a need to enter into that type of article, Techgage has it covered too.

11-11-2011 01:16 AM

Kougar

Awesome post Marfig! Thank you for posting it.

11-09-2011 04:47 PM

marfig

Quote:

Originally Posted by Rob Williams

There were high times, low times, and times when I wanted to jump out the window and run into traffic. But in the end, we got an exhaustive and accurate benchmark out of the deal. I regret not deciding to jump on this well over a month ago... I just had no idea what we were in for :S

Indeed, lol. And even yesterday we were still picking glass from the floor. This thing consumes us in more than one way. Uff!

Ok, here's the promised description of the CPU2006 benchmark suite, for future reference. I'll try to make this short and to the point.

----------------------------------------------

First and foremost, Rob already linked to it, but here it is again, the website of the suite, for anyone wishing to dig deeper.

What is CPU2006?
CPU2006 is a benchmark suite composed of 29 individual benchmarks that test a CPU integer and floating point performance, as well as its memory subsystem. Every individual benchmark is composed of real-world algorithms (algorithms that are used on various types of applications, from video encoding to scientific applications or file compression) and was specifically developed to stress test the CPU.

The CPU2006 benchmarks are divided in two distinct groups:

CINT2006 measures computer-intensive integer performance and is composed of 12 tests developed either in C or C++.

CFP2006 measures computer-intensive floating-point performance and is composed of 17 tests, developed either in C, C++ or Fortran.

Results are presented separately for the integer and floating-point groups.

What kind of metrics there are?
CPU2006 can perform two different types of benchmarks:

Speed Benchmarks: How fast the CPU performs a task.

Rate Benchmarks: How many tasks can a CPU perform in a given time.

Techgage will concern itself only with Speed Benchmarks and these are the results that will be shown in future CPU reviews.

How is the data reported?
Rob is still working on the best way to present this data. But he's also going to publish the results to SPEC website. So we can discuss those, knowing that Rob will use these as a base for whatever cool graphs and presentation he cooks for us on future CPU review articles.

You can see from the top-right, this is an integer benchmark results file. That is, this is a CINT2006 results file. A floating-point (CFP2006) results file is exactly the same, only the tests are different. One example (for the exact same system) can be found here.

Results at the top-right are median values obtained from all tests. Individual benchmark results can be seen on the graph below and on the Results Table further down.

CPU2006 runs the tests 3 times. Run #1, Run #2 and Run #3. Each of these Runs can still be configured to run the same test more than once. These individual runs inside a Run # are called Copies. So a Run # can be made up of 1 or more copies (1 or more executions of the benchmark).

CPU2006 can be configured to run two Sets of benchmarks. Base and Peak. Each of these sets performs 3 Runs and their results are calculated separately.

Base: Base Run #1 is executed. It executes as many Copies, as it was configured to. The final result for Base Run #1 is calculated by averaging the results of every copy executed. Then Base Run #2 starts, then Base Run #3.

Peak: After all Base benchmark is done, Peak benchmark starts. It executes just as Base did above. The difference is that the results are calculated differently. Instead of averaging the results of every executed Copy, Peak locates the highest performing Copy and uses that as the result for that Run #.

Peak is optional and doesn't actually need to be performed if we so wish. Its values can be collected from the Base run by simply grabbing the highest speed Copy from that run. But in that case, if the benchmark is configured to run just one Copy, the results from Peak and Base will coincide for every Run.

Techgage isn't planning at this time to run more than one Copy. So there are no plans to include Peak in Techgage benchmarks.

Results are published in Seconds and as a Ratio.

Seconds: The average time of an executed Copy, for Base. Or the fastest time of an executed Copy, for Peak.

Ratio: CPU2006 uses a reference machine (a 296 MHz UltraSPARC II) as the basis for the ratio calculation. A ratio of 20, means 20 times faster than this reference machine result.

It's this Ratio result that serves as the basis for CPU2006 reporting, as you can see form the reports linked above.

How is the benchmark suite executed?
SPEC has a very stringent set of procedures and rules that it demands every tester to follow in order for the benchmarks to be sanctioned and validated by SPEC (and allowed to be posted to their website).

The machine being benchmarked must have a compiler suite installed. For Linux this is usually GCC, while for Windows this is usually the Intel compiler. Other compilers can be used, but they must conform to certain C99 rules that SPEC source code makes use of (which on windows excludes Microsoft's VC++, for instance).

The tester can however use pre-compiled binaries of the individual tests. But this is by far less than ideal, because those binaries may have been compiled on a different CPU and may not fully reflect the CPU architecture being tested.

Techgage will always compile and never use pre-built binaries. Any exception will be clearly noted and the reasons detailed.

The tester must configure the compilation process, conforming to SPEC run rules; a very complex rules set, as you can see from the link. Fortunately, compiler configuration files already exist that facilitate this process somehow and the tester is left with the task of tweaking these to their needs... following the rules at all times. Here you can find an example of a configuration file.

An utility called runspec, executes the benchmark once all is properly configured. This can take several hours. On a Intel 2600k Rob used, it takes around 13 hours to execute the whole benchmark (3 Runs of Base and Peak, 1 Copy each).

Once the benchmark is executed, result files are automatically generated and everything is checked against SPEC rules. If the benchmark conforms to these rules, the results files are flagged as valid and can be submitted to SPEC for inclusion on their results page. Otherwise they are flagged as Invalid.

Invalid results aren't bad results (all results are good since they are always accurate). These are however results that don't conform to SPEC rules. But that may have been intentional, depending on the needs and requirements of the tester. The results just simply cannot be submitted to SPEC, since their specificity isn't in agreement with SPEC requirements for a standard benchmark procedure.

Does Techgage follow SPEC rules?
To the letter! All benchmarks done by Techgage in the context of its CPU or computer review articles, will be submitted to SPEC. Any exceptions will be clearly mentioned on the article and the reasons for that explained.

Techgage will use Windows for CPU2006 benchmarking. The source code build tools will be the Intel Compiler v12 and Visual Studio 2008. The configuration files have all been already tested and fully conform to SPEC rules.

Visual Studio is required because the Intel Compiler uses Microsoft C and C++ standard libraries. The reason 2008 is being used and not 2010 is because there's currently a bug in the Intel Compiler that generates an header conflict between math.h and Intel's specific mathinf.h (source). Because SPEC does not allow changes in the source code of the individual benchmarks, we adopt VS 2008.

11-07-2011 12:14 PM

Rob Williams

Thanks, Psi*!

Great recap, Mario! There were high times, low times, and times when I wanted to jump out the window and run into traffic. But in the end, we got an exhaustive and accurate benchmark out of the deal. I regret not deciding to jump on this well over a month ago... I just had no idea what we were in for :S

11-07-2011 08:46 AM

Psi*

Kudos guys, nice work

11-07-2011 08:04 AM

marfig

SPEC's CPU2006 benchmark suite was quite an unruly beast to tame. We've spent two weeks working on it non-stop, from starting the process of reading documentation and understanding the whole SPEC protocol in place for performing and reporting benchmarks, to finally come up with what we felt was a correct and acceptable configuration for Techgage CPU reviewing purposes.

In the way we hit many walls, head first, that would leave us bewildered, confused, and often frustrated, and that would force us to go back and rethink the way we were trying to work with it. It can be said that this wasn't a case of progressively reaching a good configuration. We would in fact, more than once, run an entire benchmark for more than 10 hours, only to find at the end of it all that we were still not doing it right. Not according to SPEC's rules and not according to our interests.

I can still remember me telling Rob this would be easy. Ah, the naivety! CPU2006 is a complex benchmark suite meant to provide anyone who wishes to use it with a standardized processes, from compiling the tests to running and reporting them. The complexity is in a fact a consequence of this standardization, and guarantees proper scaled results across different machine setups and compiler optimization flags. It's an ideal benchmark suite (the one ideal benchmark suite!) for CPU performance analysis and comparison. As such, it is extremely well suited for an hardware review website such as Techgage. But the cost is a complex and hard to learn suite that demands also knowledge of software development and building, particularly compiler usage.

However, it needs to be said that many of the obstacles we faced were entirely our own fault. There's a proper way to do these things. It involves reading the documentation from start to finish, understanding the key concepts, experimenting and planning ahead. We did nothing of those things. Or we did, but all lumped together in a 2 week race to have the benchmark suite ready for the upcoming Techgage reviews. Time was not on our side, so we had to do without the best we could. That involved necessarily not always reading documentation properly, missing key information or making decisions that would soon enough reveal themselves obvious mistakes we should have realized.

What's worse, I didn't have a 64bit machine. While I could compile for 64bit runs, I couldn't run them. So I was flying a little blind there, after we finally dealt with all compilation errors and went into understanding why some tests wouldn't run or the whole benchmark was marked as invalid. Rob had to take the blunt of it and, with that, all the frustration of running a 13 hour benchmark only to find at the end it wasn't a valid benchmark (several times!). This while he was trying to build 2 machines in time for an upcoming review.

At some point it became depressing. So much in fact that when we finally got our first 100% valid, totally foolproof, reportable benchmark, we didn't celebrate. We were so exhausted, that hitting success came like a drink of water when you've already passed out from thirst.

Let this serve of a lesson to anyone wishing to use CPU2006. This is a serious, professional and complex benchmark suite that companies like Intel, AMD, ASUS, Dell and Cisco use for internal purposes. The membership and associates page of SPEC will give you a good idea of what we are talking about here. It's meant to be properly studied before it is implemented. Don't do what we did

So, how do we know we have a good CPU2006 benchmark configuration?

Rob insisted from the very beginning that he wanted the benchmarks to be submitted to SPEC. For this to happen, benchmarks have to be flagged as valid by the CPU2006 tools. These tools perform an exhaustive analysis of the our benchmark configuration and compiler flags and only when they respect CPU2006 specifications, will they be marked as valid. And this is the validation we need. By conforming to SPEC rules and specifications, we know we have a good benchmark configuration that not only allows Rob to submit the results to SPEC, but give readers of Techgage the confidence they are looking at credible and true results that conform with the very high standards imposed by SPEC for proper CPU benchmark.

In an upcoming post on this thread, I'll be discussing (more briefly, thank goodness!) what exactly is CPU2006 and why should you care.

11-07-2011 12:46 AM

Rob Williams

SPEC's CPU2006 Added to Our CPU Test Suite

Alongside Intel's launch of its Sandy Bridge-E processors next week, we'll be unveiling something of our own - an overhauled CPU test suite. Our last major update occurred in late 2008, so we had quite a bit to tweak, add, remove, or replace with the latest iteration. Over the course of the coming week, I'll be making a few posts in our news section like this one, explaining some of the new tests we're introducing, why they make for a great CPU benchmark, and of course, why they're relevant.

The first benchmark I wanted to talk about is also the most time-consuming and complicated: SPEC's CPU2006. Don't let the five-year-old name fool you; the benchmark received its latest update a mere two months ago. The goal here is to test both the compilation and execution performance of a machine. In addition to stressing the CPU, CPU2006 also takes full advantage of the memory sub-system and also the compiler.