Geekbench 3.3

Geekbench 3.3, the latest version of our popular cross-platform benchmark, is now available for download and includes the following changes:

  • Added a battery test for Android, iOS.
  • Added a brief summary to "Share Results" email on iOS.
  • Addressed 64-bit code generation issues on Android/AArch64.
  • Fixed a crash that occurred on Windows 10.
  • Fixed a crash that could occur on 32-core systems.
  • Reduced the memory footprint of the BlackScholes workload.

The biggest new feature in Geekbench 3.3 is the battery test. The new battery test is designed to measure the battery life of a device when running processor-intensive applications (such as games).

The test is meant to completely discharge a completely charged battery. While it's possible to run the test with a partially discharged battery (e.g., a battery with 75% charge) the test results will not be as accurate.

The recommended steps for running the test are as follows:

  • Plug in your device.
  • Launch Geekbench 3.
  • Launch the battery test.
  • Wait for your device to completely charge.
  • Unplug your device. The battery test will start automatically. The test can take several hours to complete, especially on newer devices with larger batteries.
  • Wait for your device to completely discharge and turn off.
  • Plug in your device and wait for it to turn on.
  • Launch Geekbench 3. The battery test result will display automatically.

The test result includes the battery test runtime, the battery test score, and the battery level at the beginning and at the end of the test.

Here's what the different numbers mean:

  • Battery Runtime is the battery test runtime. If the test started with the battery completely charged and ended with the battery completely discharged then the test runtime is also the battery lifetime.

  • Battery Score is a combination of the runtime and the work completed during the battery test. If two phones have the same runtime but different scores, then the phone with the higher score completed more work. As with Geekbench scores, higher battery scores are better.

  • Battery Level is the battery level at the start and the end of the test.

We hope you find the new battery test useful. Please let us know if you have any questions, comments, or suggestions regarding the test (or the release).

Swift, C++ Performance

With all the excitement around Apple's new Swift programming language we were curious whether Swift is suitable for compute-intensive code, or whether it's still necessary to "drop down" into a lower-level language like C or C++.

To find out we ported three Geekbench 3 workloads from C++ to Swift: Mandelbrot, FFT, and GEMM. These three workloads offer different performance characteristics:

  • Mandelbrot is compute bound.
  • GEMM is memory bound and sequentially accesses large arrays in small blocks.
  • FFT is memory bound and irregularly accesses large arrays.

The source code for the Swift implementations is available on GitHub.

We built both the C++ and Swift workloads with Xcode 6.1. For the Swift workloads we used the -Ofast -Ounchecked optimization flags, enabled SSE4 vector extensions, and enabled loop unrolling. For the C++ workloads we used the -msse2 -O3 -ffast-math -fvectorize optimization flags. We ran each workload eight times and recorded the minimum, maximum, and average compute rates. All tests were performed on an "Early 2011" MacBook Pro with an Intel Core i7-2720QM processor.

Workload Version Minimum Maximum Average
Mandelbrot Swift 2.15 GFlops 2.43 GFlops 2.26 GFlops
C++ 2.25 GFlops 2.38 GFlops 2.33 GFlops
GEMM Swift 1.48 GFlops 1.59 GFlops 1.53 GFlops
C++ 8.61 GFlops 9.92 GFlops 9.32 GFlops
FFT Swift 0.10 GFlops 0.10 GFlops 0.10 GFlops
C++ 2.29 GFlops 2.60 GFlops 2.42 GFlops

The Swift implementation of Mandelbrot performs very well, effectively matching the performance of the C++ implementation. I was surprised by this result. I did not expect a language as new as Swift to match the performance of C++ for any of workloads. The results for GEMM and FFT are not as encouraging. The C++ GEMM implementation is over 6x faster than the Swift implementation, while the C++ FFT implementation is over 24x faster. Let's examine these two workloads more closely.

GEMM

Running GEMM in Instruments (using the Time Profiler template) shows the inner loop dominating the profile samples with 25% attributed to our Matrix.subscript.getter:

Instruments stack trace for GEMM

Suspecting that the getter was performing poorly I tried caching the raw arrays and accessing them directly without using the subscript getter. This seems to boost performance slightly giving us an average of about 1.55 GFlops. All that remains in the inner loop are the integer operations that compute the indexes, two array reads, one floating point multiply, and one floating point add:

for var k0 = k; k0 < kb; ++k0 {
  let a = AM[i0 * N + k0]
  let b = BM[j0 * N + k0]
  scratch += a * b
}

In our C++ GEMM implementations we get a big performance boost from loop vectorization, so I wondered whether the Swift array implementation might be somehow preventing the LLVM optimizer from vectorizing the loop. Disabling vectorization in the C++ workload (via -fno-vectorize) reduced the average compute rate to just 2.05 GFlops, so loop vectorization is a likely culprit.

FFT

Running FFT in Instruments (again using the Time Profiler template but with the "flatten recursion" option enabled) shows that we spend a lot of time on reference counting operations:

Instruments stack trace for FFT

This is surprising because the only reference type in our FFT workload is the FFTWorkload class: arrays are structs and structs are values types in Swift. The FFT workload code reference the FFTWorkload instance using the self member and through calls to instance methods. We begin our investigation here.

To isolate the effects of self references and instance method calls I wrote a recursive function to compute Fibonacci numbers (this is a tremendously inefficient approach to computing Fibonacci numbers, but it is useful for this investigation). I use a self access to count the number of nodes in the recursion by incrementing the nodes member in the recursive function:

func fibonacci(n : UInt) -> UInt {
  self.nodes += 1
  if n == 0 {
    return 0
  } else if n == 1 {
    return 1
  } else {
    return fibonacci(n - 1) + fibonacci(n - 2)
  }
}

The time profile for this implementation shows a similar effect as observed in the FFT workload.

Instruments stack trace for Fibonacci

The source code view suggests that the self accesses are slow in this case:

Instruments source view for Fibonacci

Updating the recursion to remove references to self nearly doubles performance, but we still see the reference counting operations in the Instruments time profile. This leaves only the method calls.

Next we try making fibonacci a static method instead of an instance method. This is easy since we already removed the self reference: we only need to add the class keyword to the method declaration:

class func fibonacci(n : UInt) -> (f : UInt, nodes : UInt) {
  if n == 0 {
    return (0, 1);
  } else if n == 1 {
    return (0, 1);
  } else {
    let left = fibonacci(n - 1)
    let right = fibonacci(n - 2)
    return (left.f + right.f, left.nodes + right.nodes + 1)
  }
}

This results in a 12x speedup over the first Fibonacci implementation. The Instruments time profile shows that the reference counting operations are now gone:

Instruments stack trace for static Fibonacci

I don't mean to suggest that we should prefer static Swift methods whenever possible; use static method when they make sense in your design. However, if you must implement a recursive algorithm in Swift and you find the performance of your algorithm to be unacceptably poor, then modifying your algorithm to use static methods is worth some investigation.

To quickly test this strategy on the FFT workload I made all the instance variables global and changed the recursive methods to class methods. This gives about a 5x boost in performance up to an average of 548.09 MFlops. This is still only about one 20% of the C++ performance, but is a significant improvement. In the time profiler we see that the samples are now more evenly distributed with hotspots on memory access and floating point operations. This is closer to what we might expect for FFT:

Instruments source view for static FFT

Instruments source view for static FFT

Final Thoughts

What can we conclude from these results? The Mandebrot results indicate Swift's strong potential for compute-intensive code while the GEMM and FFT results show the care that must be exercised. GEMM suggests that the Swift compiler cannot vectorize code that the C++ compiler can vectorize, leaving some easy performance gains behind. FFT suggests that developers should reduce calls to instance methods, or should favor an iterative approach over a recursive approach.

Swift is still a young language with a new compiler so we can expect significant improvements to both the compiler and the optimizer in the future. If you're considering writing performance-critical code in Swift today it's certainly worth writing the code in Swift before dropping down to C++. It might just turn out to be fast enough.

Retina iMac 64-bit Performance

64-bit Geekbench 3 results for the Retina iMacs have appeared on the Geekbench Browser. Let's take a quick look at how they perform compared to the non-Retina iMacs.

Single-Core Performance

Multi-Core Performance

The Core i5 Retina iMac is slightly faster than the other Core i5 iMacs, and is competitive with the Core i7 iMacs in single-core performance. However, the Core i7 iMacs are up to 20% faster in multi-core performance.

The Core i7 Retina iMac is significantly faster than all of the other iMacs (including the Core i5 Retina iMac), with at least 15% higher single-core performance and 10% higher multi-core performance.

These Geekbench results aren't surprising since all of the iMacs use Haswell processors; any performance increase is due to the increase in clock speed.

How does the Retina iMac perform compared to the Mac Pro?

Single-Core Performance

Multi-Core Performance

The Core i5 Retina iMac is faster at single-core tasks but slower at multi-core tasks. The Core i7 Retina iMac is also faster at single-core tasks (25% faster than the fastest Mac Pro) and is also faster than the 4-core Mac Pro at multi-core tasks.

If you're considering replacing your Mac Pro with a Retina iMac then these results show it's not a bad idea provided you don't regularly run heavily-threaded applications.

Estimating Mac mini Performance

Apple announced a long-awaited update to the Mac mini lineup on Thursday. Along with 802.11ac Wi-Fi and PCI-based flash storage options the new models feature Intel's Haswell processors. While Apple hasn't identified which Haswell processors they're using in the new lineup, I believe these are the processors Apple is using based on the Mac mini specifications published by Apple:

ProcessorCoresFrequencyTurbo Boost
Core i5-4260U21.4 GHz2.7 GHz
Core i5-4278U22.6 GHz3.1 GHz
Core i5-4308U22.8 GHz3.3 GHz
Core i7-4578U23.0 GHz3.5 GHz

For comparison, here are the Haswell processors from the "Late 2014" lineup alongside the Ivy Bridge processors from the equivalent model in the "Late 2012" lineup:

Late 2014Late 2012
ConfigurationProcessorCoresProcessorCores
GoodCore i5-4260U2Core i5-3210M2
BetterCore i5-4278U2Core i7-3615QM4
BestCore i5-4308U2Core i7-3615QM4
BTOCore i7-4578U2Core i7-3720QM4

From the table you can see Apple has moved from dual- and quad-core processors in the "Late 2012" lineup to dual-core processors across the entire "Late 2014" lineup. How much this change will affect multi-core performance? Will the new Mac minis be slower than the old Mac minis?

Unfortunately there are no Geekbench results for the new Mac minis in the Geekbench Browser to help us answer this question. Instead, I estimated the new Mac minis' scores by using data from other systems with the same processor. I expect the estimated scores will be within 5% of the actual scores for the Mac minis.

Here are the estimated scores for the "Late 2014" Mac minis alongside the actual scores for the "Late 2012" Mac minis:

Single-Core Performance

Single-core performance has increased slightly from 2% to 8% between the "Late 2012" and "Late 2014" models. This increase is in line with what we saw when other Macs models moved from Ivy Bridge to Haswell processors.

Multi-Core Performance

Unlike single-core performance multi-core performance has decreased significantly. The "Good" model (which has a dual-core processor in both lineups) is down 7%. The other models (which have a dual-core processor in the "Late 2014" lineup but a quad-core processor in the "Late 2012" lineup) is down from 70% to 80%.

So why did Apple switch to dual-core processors in the "Late 2014" lineup? The only technical reason I can think of is that the Haswell dual-core processors use one socket (that is, the physical interface between the processor and the logic board) while the Haswell quad-core processors use different sockets:

ProcessorCoresGraphicsSocket
Core i7-4578U2IrisFCBGA1168
Core i7-4770HQ4Iris ProFCBGA1364
Core i7-4700MQ4HD 4600FCPGA946

Apple would have to design and build two separate logic boards to accommodate both dual-core and quad-core processors. Other Macs use the same logic board across models, so I wouldn't expect Apple to make an exception for the Mac mini. Note that this wasn't an issue with the Sandy Bridge and Ivy Bridge processors, where both dual- and quad-core processors used the same socket.

Apple could have gone quad-core across the the "Late 2014" lineup, but I suspect they wouldn't have been able to include a quad-core processor (let alone one with Iris Pro graphics) and still hit the $499 price point.

All things considered, if you're looking for great multi-core performance in a mini (say if you're using your Mac mini as a server), I have a hard time recommending the new Mac mini. I would suggest trying to track down a "Late 2012" Mac mini rather than buying a new "Late 2014" Mac mini. Otherwise the improved WiFi, graphics, and single-core performance make the new "Late 2014" Mac mini worth considering.

Geekbench 3.2.2

Geekbench 3.2.2, the latest version of our popular cross-platform benchmark, is now available for download. Geekbench 3.2.2 features the following changes:

  • Added support for iOS 8, iPhone 6, and iPhone 6 Plus.
  • Added benchmark comparison charts on iOS.
  • Added support for High DPI mode on Windows.
  • Fixed code signing issues on OS X Mavericks, Yosemite.

Geekbench 3.2.2 is a free update for all Geekbench 3 users.

Geekbench 3.2

I'm excited to announce that Geekbench 3.2, the latest version of our popular cross-platform benchmark, is now available for download.

The most visible change in Geekbench 3.2 is the redesigned result view. The redesign both improves the legibility and increases the information density of the benchmark results, especially on mobile devices.

Geekbench 3.2 also adds support for 32-bit ARMv8 processors on Android. Geekbench has been recompiled to take advantage of the new instruction set, and the AES and SHA-1 workloads have been updated to use the new cryptography instructions. When Android devices with ARMv8 processors arrive in the fall Geekbench 3.2 will be able to measure their full performance potential.

Geekbench 3.2 is a free upgrade for all Geekbench 3 users.

MacBook Pro Performance (July 2014)

On Tuesday Apple updated its MacBook Pro lineup. Geekbench 3 results for most of the new models have already appeared in the Geekbench Browser which lets us see how performance has improved across the lineup.

For the 15-inch MacBook Pro, processor speeds were increased by 200 MHz, leading to a 6% to 9% increase in performance:

Processor20132014
GoodCore i7-4750HQ @ 2.0 GHzCore i7-4770HQ @ 2.2 GHz
BetterCore i7-4850HQ @ 2.3 GHzCore i7-4870HQ @ 2.5 GHz
BestCore i7-4960HQ @ 2.6 GHzCore i7-4980HQ @ 2.8 GHz

For the 13-inch MacBook Pro, processor speeds were also increased by 200 MHz, leading to a 7% to 8% increase in performance (note that we do not yet have results for the new high-end model):

Processor20132014
GoodCore i5-4258U @ 2.4 GHzCore i5-4278U @ 2.6 GHz
BetterCore i5-4288U @ 2.6 GHzCore i5-4308U @ 2.8 GHz
BestCore i7-4558U @ 2.8 GHzCore i7-4578U @ 3.0 GHz

Overall performance improvements for the new MacBook Pros are modest and unsurprising. Both the 2013 and the 2014 models use Haswell processors, so all of the performance gains come from the increased clock speeds. We will have to wait for the new Broadwell processors (currently scheduled for mid-2015) to see more signficant improvements in MacBook Pro performance.

iMac Performance (June 2014)

Apple announced a new lower-cost dual-core iMac today, and Geekbench 3 results for it are already appearing on the Geekbench Browser. Let's see how the new iMac performs compared to other iMacs.

When compared to the rest of the iMac lineup, the new iMac has reasonable single-core performance — it's almost identifcal to the entry-level quad-core iMac. Multi-core performance is significantly lower due to the lower number of cores (2 cores vs 4 cores).

One interesting thing about the new iMacs is that they use a low-voltage i5-4260U "Haswell" processor (the same processor is used in the MacBook Air). Why would Apple use a low-voltage dual-core processor in a desktop machine? The answer might be graphics:

According to Intel, the HD 5000 is twice as fast as the HD 4600. Apple may have sacrificed multi-core performance for GPU performance. Given the increasing importance modern user interfaces place on GPU performance, this may turn out to be a smart decision that extends the useful lifespan of the new iMac.

MacBook Air Performance (May 2014)

Earlier this week Apple announced a minor refresh to its MacBook Air lineup. The only change (besides a $100 price cut) is the base model now comes with a 1.4 GHz processor instead of a 1.3 GHz processor.

How does the new MacBook Air model perform compared to previous models? To find out, I've collected Geekbench 3 results for several models and charted the scores below:

The results for the 2013 and the 2014 models aren't surprising. Both models use Intel Haswell processors, so there are no major changes in processor technology. The 5-7% increase in performance is what I would expect from the 7% increase in processor frequency.

However, comparing the results for the 2011 and the 2012 models is much more interesting. For example, base model single-core performance has improved by almost 45% since 2011, and by almost 20% since 2012. Anyone considering an upgrade from these models will certainly notice (and appreciate!) an improvement this large.

Combatting Benchmark Boosting

Late last year Ars Technica noticed that some Samsung phones artificially boost performance when running Geekbench 3. This boost inflated Geekbench 3 scores by up to 20%. Since benchmarks are only meaningful when they're treated the same as any other application, we have been working on determining which devices "benchmark boost", and what we should do with results from these boosted devices. I'd like to share what we've discovered.

In order to determine which devices artificially boost performance when running Geekbench we added a "boost detector" to Geekbench 3. The detector embeds a report in each Geekbench 3 result uploaded to the Geekbench Browser. After analyzing thousands of reports we determined that the following Android devices artificially boost performance when running Geekbench 3:

  • Samsung Galaxy Note 10.1 (2014)
  • Samsung Galaxy Note 2
  • Samsung Galaxy Note 3
  • Samsung Galaxy S 3
  • Samsung Galaxy S 4
  • Sony Xperia Z
  • Sony Xperia Z Tablet
  • Sony Xperia Z Ultra
  • Sony Xperia Z1
  • Sony Xperia ZL

On both Samsung and Sony devices the boost appeared in Android 4.3. Earlier versions of Android (up to and including Android 4.2.2) did not boost. Anthony Schmieder and Daniel Malea, two Geekbench developers, worked with Ars Technica to find the code responsible for the boost on Samsung devices.

In order to combat benchmark boosting we have decided to exclude results from these devices running Android 4.3 from the Android benchmark chart. This way the results on the chart reflect the true performance, not the boosted performance, of each device. We have also added a list of excluded devices to the chart. We will continue to monitor the detector reports, and we will update this list if we discover other devices or Android versions that apply a benchmark boost.

There is one bit of good news that our detector uncovered — Samsung removed the benchmark boost from their Android 4.4 update. We hope that Sony follows Samsung's lead and also removes their benchmark boost from their Android 4.4 update as well.