Keep in mind that Broadwell is a "Tick" in Intel's "Tick Tock" model. Generally speaking "Tick" processors improve efficiency while "Tock" processors improve performance. As a result I do not expect the MacBook Air and MacBook Pro scores to increase significantly.

MacBook Air

Single-core performance has increased 6% from Haswell to Broadwell, and multi-core performance for the i5 model has increased 7%. However, quite surprisingly, multi-core performance for the i7 model has increased an impressive 14%.

If you're thinking of buying the new MacBook Air I would strongly recommend the i7 processor. It has 20% faster single-core performance and 25% faster multi-core performance for only a 15% increase in price.

MacBook Pro

Single-core performance has increased between 3% to 7% from Haswell to Broadwell, depending on the model. Multi-core performance has increased 3% to 6%. These sorts of increases are in line with what I would expect from a "Tick" processor.

I have no recommendations regarding the processor for the new MacBook Pro. The performance differences and the price differences between the processors are roughly equivalent.

Back in December we ported a few of our Geekbench workloads to Swift and compared their performance to the C++ implementations. With last week's announcement of a beta release of Xcode 6.3 we thought it would be a good time to revisit those results. In this post we find out whether the performance improvements in Xcode 6.3 Beta provide any speedup for our Swift workloads.

The following table shows the performance of the Swift workloads compiled with Xcode versions 6.1.1 and 6.3 Beta. We use the same optimizer settings as we did in December and use the same machine to run the tests. As before the averages are taken over eight executions of the workloads.

Workload

Version

Minimum

Maximum

Average

Mandelbrot

Swift (6.3 Beta)

2.07 GFlops

2.49 GFlops

2.32 GFlops

Swift (6.1.1)

2.15 GFlops

2.43 GFlops

2.26 GFlops

C++ (6.1.1)

2.25 GFlops

2.38 GFlops

2.33 GFlops

GEMM

Swift (6.3 Beta)

2.14 GFlops

2.18 GFlops

2.16 GFlops

Swift (6.1.1)

1.48 GFlops

1.59 GFlops

1.53 GFlops

C++ (6.1.1)

8.61 GFlops

9.92 GFlops

9.32 GFlops

FFT

Swift (6.3 Beta)

0.25 GFlops

0.27 GFlops

0.26 GFlops

Swift (6.1.1)

0.10 GFlops

0.10 GFlops

0.10 GFlops

C++ (6.1.1)

2.29 GFlops

2.60 GFlops

2.42 GFlops

The improvements in the Xcode 6.3 Beta have provided a 1.4x speedup for GEMM and a 2.6x speedup for FFT over Xcode 6.1.1. Performance for the C++ workloads did not change, so we omit those numbers for the 6.3 Beta.

Our Swift FFT implementation got an additional speedup last week thanks to some performance patches from Joseph Lord (the code for the Swift workloads is available on GitHub). His optimizations include:

eliminate virtual function dispatches by making the Workload classes final

allow the compiler to do more inlining by moving the Complex definition into the same file as the FFT code

work around slow behavior when accessing an array of structs by changing the output array in FFT from a Swift array to an UnsafeMutablePointer<Complex>.

These changes provide a significant speedup for FFT of about 8.5x over our previous implementation:

Workload

Version

Minimum

Maximum

Average

Mandelbrot

Swift with Joseph's patches (6.3 Beta)

2.32 GFlops

2.45 GFlops

2.40 GFlops

Swift (6.3 Beta)

2.07 GFlops

2.49 GFlops

2.32 GFlops

C++ (6.1.1)

2.25 GFlops

2.38 GFlops

2.33 GFlops

GEMM

Swift with Joseph's patches (6.3 Beta)

2.01 GFlops

2.19 GFlops

2.13 GFlops

Swift (6.3 Beta)

2.14 GFlops

2.18 GFlops

2.16 GFlops

C++ (6.1.1)

8.61 GFlops

9.92 GFlops

9.32 GFlops

FFT

Swift with Joseph's patches (6.3 Beta)

1.85 GFlops

2.31 GFlops

2.20 GFlops

Swift (6.3 Beta)

0.25 GFlops

0.27 GFlops

0.26 GFlops

C++ (6.1.1)

2.29 GFlops

2.60 GFlops

2.42 GFlops

After the improvements in Xcode 6.3 and some careful optimizations, the performance of the FFT workload is now within 10% of the C++ implementation. The optimizations might look strange to someone who hasn't read up on Swift internals, but they are easy to apply and can be used by any Swift programmer. If you try these optimizations in your own code, benchmark the changes carefully. They might not provide any speedup at all for your algorithm. They might even slow it down. Also keep in mind 6.3 is still in Beta and it could change before the final release.

Geekbench 3.3, the latest version of our popular cross-platform benchmark, is now available for download and includes the following changes:

Added a battery test for Android, iOS.

Added a brief summary to "Share Results" email on iOS.

Addressed 64-bit code generation issues on Android/AArch64.

Fixed a crash that occurred on Windows 10.

Fixed a crash that could occur on 32-core systems.

Reduced the memory footprint of the BlackScholes workload.

The biggest new feature in Geekbench 3.3 is the battery test. The new battery test is designed to measure the battery life of a device when running processor-intensive applications (such as games).

The test is meant to completely discharge a completely charged battery. While it's possible to run the test with a partially discharged battery (e.g., a battery with 75% charge) the test results will not be as accurate.

The recommended steps for running the test are as follows:

Plug in your device.

Launch Geekbench 3.

Launch the battery test.

Wait for your device to completely charge.

Unplug your device. The battery test will start automatically. The test can take several hours to complete, especially on newer devices with larger batteries.

Wait for your device to completely discharge and turn off.

Plug in your device and wait for it to turn on.

Launch Geekbench 3. The battery test result will display automatically.

The test result includes the battery test runtime, the battery test score, and the battery level at the beginning and at the end of the test.

Here's what the different numbers mean:

Battery Runtime is the battery test runtime. If the test started with the battery completely charged and ended with the battery completely discharged then the test runtime is also the battery lifetime.

Battery Score is a combination of the runtime and the work completed during the battery test. If two phones have the same runtime but different scores, then the phone with the higher score completed more work. As with Geekbench scores, higher battery scores are better.

Battery Level is the battery level at the start and the end of the test.

We hope you find the new battery test useful. Please let us know if you have any questions, comments, or suggestions regarding the test (or the release).

With all the excitement around Apple's new Swift programming language we were curious whether Swift is suitable for compute-intensive code, or whether it's still necessary to "drop down" into a lower-level language like C or C++.

To find out we ported three Geekbench 3 workloads from C++ to Swift: Mandelbrot, FFT, and GEMM. These three workloads offer different performance characteristics:

Mandelbrot is compute bound.

GEMM is memory bound and sequentially accesses large arrays in small blocks.

We built both the C++ and Swift workloads with Xcode 6.1. For the Swift workloads we used the -Ofast -Ounchecked optimization flags, enabled SSE4 vector extensions, and enabled loop unrolling. For the C++ workloads we used the -msse2 -O3 -ffast-math -fvectorize optimization flags. We ran each workload eight times and recorded the minimum, maximum, and average compute rates. All tests were performed on an "Early 2011" MacBook Pro with an Intel Core i7-2720QM processor.

Workload

Version

Minimum

Maximum

Average

Mandelbrot

Swift

2.15 GFlops

2.43 GFlops

2.26 GFlops

C++

2.25 GFlops

2.38 GFlops

2.33 GFlops

GEMM

Swift

1.48 GFlops

1.59 GFlops

1.53 GFlops

C++

8.61 GFlops

9.92 GFlops

9.32 GFlops

FFT

Swift

0.10 GFlops

0.10 GFlops

0.10 GFlops

C++

2.29 GFlops

2.60 GFlops

2.42 GFlops

The Swift implementation of Mandelbrot performs very well, effectively matching the performance of the C++ implementation. I was surprised by this result. I did not expect a language as new as Swift to match the performance of C++ for any of workloads. The results for GEMM and FFT are not as encouraging. The C++ GEMM implementation is over 6x faster than the Swift implementation, while the C++ FFT implementation is over 24x faster. Let's examine these two workloads more closely.

GEMM

Running GEMM in Instruments (using the Time Profiler template) shows the inner loop dominating the profile samples with 25% attributed to our Matrix.subscript.getter:

Suspecting that the getter was performing poorly I tried caching the raw arrays and accessing them directly without using the subscript getter. This seems to boost performance slightly giving us an average of about 1.55 GFlops. All that remains in the inner loop are the integer operations that compute the indexes, two array reads, one floating point multiply, and one floating point add:

In our C++ GEMM implementations we get a big performance boost from loop vectorization, so I wondered whether the Swift array implementation might be somehow preventing the LLVM optimizer from vectorizing the loop. Disabling vectorization in the C++ workload (via -fno-vectorize) reduced the average compute rate to just 2.05 GFlops, so loop vectorization is a likely culprit.

FFT

Running FFT in Instruments (again using the Time Profiler template but with the "flatten recursion" option enabled) shows that we spend a lot of time on reference counting operations:

This is surprising because the only reference type in our FFT workload is the FFTWorkload class: arrays are structs and structs are values types in Swift. The FFT workload code reference the FFTWorkload instance using the self member and through calls to instance methods. We begin our investigation here.

To isolate the effects of self references and instance method calls I wrote a recursive function to compute Fibonacci numbers (this is a tremendously inefficient approach to computing Fibonacci numbers, but it is useful for this investigation). I use a self access to count the number of nodes in the recursion by incrementing the nodes member in the recursive function:

This results in a 12x speedup over the first Fibonacci implementation. The Instruments time profile shows that the reference counting operations are now gone:

I don't mean to suggest that we should prefer static Swift methods whenever possible; use static method when they make sense in your design. However, if you must implement a recursive algorithm in Swift and you find the performance of your algorithm to be unacceptably poor, then modifying your algorithm to use static methods is worth some investigation.

To quickly test this strategy on the FFT workload I made all the instance variables global and changed the recursive methods to class methods. This gives about a 5x boost in performance up to an average of 548.09 MFlops. This is still only about one 20% of the C++ performance, but is a significant improvement. In the time profiler we see that the samples are now more evenly distributed with hotspots on memory access and floating point operations. This is closer to what we might expect for FFT:

Final Thoughts

What can we conclude from these results? The Mandebrot results indicate Swift's strong potential for compute-intensive code while the GEMM and FFT results show the care that must be exercised. GEMM suggests that the Swift compiler cannot vectorize code that the C++ compiler can vectorize, leaving some easy performance gains behind. FFT suggests that developers should reduce calls to instance methods, or should favor an iterative approach over a recursive approach.

Swift is still a young language with a new compiler so we can expect significant improvements to both the compiler and the optimizer in the future. If you're considering writing performance-critical code in Swift today it's certainly worth writing the code in Swift before dropping down to C++. It might just turn out to be fast enough.

The Core i5 Retina iMac is slightly faster than the other Core i5 iMacs, and is competitive with the Core i7 iMacs in single-core performance. However, the Core i7 iMacs are up to 20% faster in multi-core performance.

The Core i7 Retina iMac is significantly faster than all of the other iMacs (including the Core i5 Retina iMac), with at least 15% higher single-core performance and 10% higher multi-core performance.

These Geekbench results aren't surprising since all of the iMacs use Haswell processors; any performance increase is due to the increase in clock speed.

How does the Retina iMac perform compared to the Mac Pro?

The Core i5 Retina iMac is faster at single-core tasks but slower at multi-core tasks. The Core i7 Retina iMac is also faster at single-core tasks (25% faster than the fastest Mac Pro) and is also faster than the 4-core Mac Pro at multi-core tasks.

Apple announced a long-awaited update to the Mac mini lineup on Thursday. Along with 802.11ac Wi-Fi and PCI-based flash storage options the new models feature Intel's Haswell processors. While Apple hasn't identified which Haswell processors they're using in the new lineup, I believe these are the processors Apple is using based on the Mac mini specifications published by Apple:

From the table you can see Apple has moved from dual- and quad-core processors in the "Late 2012" lineup to dual-core processors across the entire "Late 2014" lineup. How much this change will affect multi-core performance? Will the new Mac minis be slower than the old Mac minis?

Unfortunately there are no Geekbench results for the new Mac minis in the Geekbench Browser to help us answer this question. Instead, I estimated the new Mac minis' scores by using data from other systems with the same processor. I expect the estimated scores will be within 5% of the actual scores for the Mac minis.

Here are the estimated scores for the "Late 2014" Mac minis alongside the actual scores for the "Late 2012" Mac minis:

Single-core performance has increased slightly from 2% to 8% between the "Late 2012" and "Late 2014" models. This increase is in line with what we saw when other Macs models moved from Ivy Bridge to Haswell processors.

Unlike single-core performance multi-core performance has decreased significantly. The "Good" model (which has a dual-core processor in both lineups) is down 7%. The other models (which have a dual-core processor in the "Late 2014" lineup but a quad-core processor in the "Late 2012" lineup) is down from 70% to 80%.

So why did Apple switch to dual-core processors in the "Late 2014" lineup? The only technical reason I can think of is that the Haswell dual-core processors use one socket (that is, the physical interface between the processor and the logic board) while the Haswell quad-core processors use different sockets:

Apple would have to design and build two separate logic boards to accommodate both dual-core and quad-core processors. Other Macs use the same logic board across models, so I wouldn't expect Apple to make an exception for the Mac mini. Note that this wasn't an issue with the Sandy Bridge and Ivy Bridge processors, where both dual- and quad-core processors used the same socket.

Apple could have gone quad-core across the the "Late 2014" lineup, but I suspect they wouldn't have been able to include a quad-core processor (let alone one with Iris Pro graphics) and still hit the $499 price point.

All things considered, if you're looking for great multi-core performance in a mini (say if you're using your Mac mini as a server), I have a hard time recommending the new Mac mini. I would suggest trying to track down a "Late 2012" Mac mini rather than buying a new "Late 2014" Mac mini. Otherwise the improved WiFi, graphics, and single-core performance make the new "Late 2014" Mac mini worth considering.