Geekbench 3.2

I'm excited to announce that Geekbench 3.2, the latest version of our popular cross-platform benchmark, is now available for download.

The most visible change in Geekbench 3.2 is the redesigned result view. The redesign both improves the legibility and increases the information density of the benchmark results, especially on mobile devices.

Geekbench 3.2 also adds support for 32-bit ARMv8 processors on Android. Geekbench has been recompiled to take advantage of the new instruction set, and the AES and SHA-1 workloads have been updated to use the new cryptography instructions. When Android devices with ARMv8 processors arrive in the fall Geekbench 3.2 will be able to measure their full performance potential.

Geekbench 3.2 is a free upgrade for all Geekbench 3 users.

MacBook Pro Performance (July 2014)

On Tuesday Apple updated its MacBook Pro lineup. Geekbench 3 results for most of the new models have already appeared in the Geekbench Browser which lets us see how performance has improved across the lineup.

For the 15-inch MacBook Pro, processor speeds were increased by 200 MHz, leading to a 6% to 9% increase in performance:

Processor20132014
GoodCore i7-4750HQ @ 2.0 GHzCore i7-4770HQ @ 2.2 GHz
BetterCore i7-4850HQ @ 2.3 GHzCore i7-4870HQ @ 2.5 GHz
BestCore i7-4960HQ @ 2.6 GHzCore i7-4980HQ @ 2.8 GHz

For the 13-inch MacBook Pro, processor speeds were also increased by 200 MHz, leading to a 7% to 8% increase in performance (note that we do not yet have results for the new high-end model):

Processor20132014
GoodCore i5-4258U @ 2.4 GHzCore i5-4278U @ 2.6 GHz
BetterCore i5-4288U @ 2.6 GHzCore i5-4308U @ 2.8 GHz
BestCore i7-4558U @ 2.8 GHzCore i7-4578U @ 3.0 GHz

Overall performance improvements for the new MacBook Pros are modest and unsurprising. Both the 2013 and the 2014 models use Haswell processors, so all of the performance gains come from the increased clock speeds. We will have to wait for the new Broadwell processors (currently scheduled for mid-2015) to see more signficant improvements in MacBook Pro performance.

iMac Performance (June 2014)

Apple announced a new lower-cost dual-core iMac today, and Geekbench 3 results for it are already appearing on the Geekbench Browser. Let's see how the new iMac performs compared to other iMacs.

When compared to the rest of the iMac lineup, the new iMac has reasonable single-core performance — it's almost identifcal to the entry-level quad-core iMac. Multi-core performance is significantly lower due to the lower number of cores (2 cores vs 4 cores).

One interesting thing about the new iMacs is that they use a low-voltage i5-4260U "Haswell" processor (the same processor is used in the MacBook Air). Why would Apple use a low-voltage dual-core processor in a desktop machine? The answer might be graphics:

According to Intel, the HD 5000 is twice as fast as the HD 4600. Apple may have sacrificed multi-core performance for GPU performance. Given the increasing importance modern user interfaces place on GPU performance, this may turn out to be a smart decision that extends the useful lifespan of the new iMac.

MacBook Air Performance (May 2014)

Earlier this week Apple announced a minor refresh to its MacBook Air lineup. The only change (besides a $100 price cut) is the base model now comes with a 1.4 GHz processor instead of a 1.3 GHz processor.

How does the new MacBook Air model perform compared to previous models? To find out, I've collected Geekbench 3 results for several models and charted the scores below:

The results for the 2013 and the 2014 models aren't surprising. Both models use Intel Haswell processors, so there are no major changes in processor technology. The 5-7% increase in performance is what I would expect from the 7% increase in processor frequency.

However, comparing the results for the 2011 and the 2012 models is much more interesting. For example, base model single-core performance has improved by almost 45% since 2011, and by almost 20% since 2012. Anyone considering an upgrade from these models will certainly notice (and appreciate!) an improvement this large.

Combatting Benchmark Boosting

Late last year Ars Technica noticed that some Samsung phones artificially boost performance when running Geekbench 3. This boost inflated Geekbench 3 scores by up to 20%. Since benchmarks are only meaningful when they're treated the same as any other application, we have been working on determining which devices "benchmark boost", and what we should do with results from these boosted devices. I'd like to share what we've discovered.

In order to determine which devices artificially boost performance when running Geekbench we added a "boost detector" to Geekbench 3. The detector embeds a report in each Geekbench 3 result uploaded to the Geekbench Browser. After analyzing thousands of reports we determined that the following Android devices artificially boost performance when running Geekbench 3:

  • Samsung Galaxy Note 10.1 (2014)
  • Samsung Galaxy Note 2
  • Samsung Galaxy Note 3
  • Samsung Galaxy S 3
  • Samsung Galaxy S 4
  • Sony Xperia Z
  • Sony Xperia Z Tablet
  • Sony Xperia Z Ultra
  • Sony Xperia Z1
  • Sony Xperia ZL

On both Samsung and Sony devices the boost appeared in Android 4.3. Earlier versions of Android (up to and including Android 4.2.2) did not boost. Anthony Schmieder and Daniel Malea, two Geekbench developers, worked with Ars Technica to find the code responsible for the boost on Samsung devices.

In order to combat benchmark boosting we have decided to exclude results from these devices running Android 4.3 from the Android benchmark chart. This way the results on the chart reflect the true performance, not the boosted performance, of each device. We have also added a list of excluded devices to the chart. We will continue to monitor the detector reports, and we will update this list if we discover other devices or Android versions that apply a benchmark boost.

There is one bit of good news that our detector uncovered — Samsung removed the benchmark boost from their Android 4.4 update. We hope that Sony follows Samsung's lead and also removes their benchmark boost from their Android 4.4 update as well.

Geekbench 3.1.5

Geekbench 3.1.5, the latest version of our popular cross-platform benchmark, is now available for download.

Geekbench 3.1.5 adds support for BlackBerry 10. Geekbench 3 for BlackBerry is available for download on BlackBerry World. You can also see how BlackBerry handsets compare using the new BlackBerry Benchmark Chart on the Geekbench Browser.

Geekbench 3.1.5 also features the following changes:

  • Added support for Android devices with MIPS processors.
  • Added Android CPU governor to system information.
  • Added L4 cache information to system information.
  • Fixed an issue where results uploaded to Dropbox had meaningless names.

Geekbench 3.1.5 is a free upgrade for all Geekbench 3 users.

Geekbench 3.1.4

Geekbench 3.1.4 is now available for download. Geekbench 3.1.4 features the following changes:

  • Added support for the Mac Pro (Late 2013).
  • Added the ability to export benchmark results to XML.
  • Fixed an issue that broke Dropbox integration on 64-bit iOS devices.

Geekbench 3.1.4 is a free upgrade for all Geekbench 3 users.

Developing a Cross-Platform Benchmark

Since this is my first post I’d like to begin by introducing myself. My name is Anthony Schmieder and I’m a software developer here at Primate Labs. I joined the team in the spring. I want to share some of the details of my first project at Primate Labs. A key goal for Geekbench 3 was to improve cross-platform comparability of the scores. To ensure that we did not lose sight of this goal as development ramped up, we needed an automated system to compare the scores across platforms and provide immediate feedback as we developed the workloads. I was able to quickly develop this system using Pulse, Python, and matplotlib.

First, we already used the Pulse continuous integration server to build Geekbench and run our test scripts after each commit to our development branch. Our Pulse build servers are three Mac minis with identical hardware. One runs OS X, one runs Linux, and one runs Windows. Since Geekbench is a processor benchmark and not a system benchmark, we wanted Geekbench scores to be similar across operating systems when run on identical hardware. There will always be some variation between operating systems, but our goal was to understand the sources of the variation and minimize those sources when possible. The build servers were a convenient place to test this, so I added a step to the Pulse builds that runs Geekbench on each server and exports the results to JSON using the --save option of the Geekbench command line tool.

Next, I wrote a Python script to read the Geekbench results for each platform, compare them, and generate a set of reports using matplotlib. The reports highlighted the differences in Geekbench scores across platforms and guided our investigations into performance and scoring issues. Of all the reports, the one we used most frequently compares the relative execution time of each workload across all platforms. A report for 32-bit single-core performance is shown below:

After each commit that affected workload code we examined this report, investigated any workloads that appeared problematic, analyzed the issues, fixed them if possible, committed those fixes, and repeated. For example, in the above report we see discrepancies in Black-Scholes and Stream Copy. For Black-Scholes, Visual C++ has optimized calls to sqrtf in the standard math library into sqrtps and sqrtss instructions. In the Stream Copy workload, the Clang and GCC optimizers have replaced the stream copy with a call to _memcpy.

Using this report we eliminated many other issues such as loop unrolling issues in the AES and Sobel workloads and vectorization issues in the GEMM workload. In AES, text is processed in chunks. Each chunk is transformed ten times and each transformation uses a different precomputed key. On systems with AES-NI instructions, this is implemented by the loop:

scratch = _mm_xor_si128(scratch, ((__m128i*)ctx->schedule)[0]);
for (int j = 1; j < 10; j++) {
  scratch = _mm_aesenc_si128(scratch, ((__m128i*)ctx->schedule)[j]);
}

This loop is unrolled by Clang and GCC. Visual C++, however, is conservative about unrolling loops that contain compiler intrinsics, so it did not unroll the loop. This led to a 25% performance penalty for MSVC. We worked around this issue by unrolling the loop by hand:

scratch = _mm_xor_si128(scratch, ((__m128i*)ctx->schedule)[0]);
scratch = _mm_aesenc_si128(scratch, ((__m128i*)ctx->schedule)[1]);
scratch = _mm_aesenc_si128(scratch, ((__m128i*)ctx->schedule)[2]);

The report helped us uncover another interesting loop unrolling issue in our Sobel workload. Using Clang and GCC we saw a slight performance increase by running the benchmark in 64-bit mode instead of 32-bit mode. In Visual C++ we saw a 50% drop in performance when running in 64-bit mode. With help from the Visual C++ team at Microsoft, we were able to track the issue to a difference between the 32-bit and 64-bit loop unrolling heuristics in Visual C++. Again, we worked around the issue by unrolling the loop by hand.

The last issue I’ll talk about was in our GEMM kernel. We had code similar to the following where A, B, and C all have type float**:

for (unsigned i = 0; i < N; i ++) {
  for (unsigned j = 0; j < N; j ++) {
    for (unsigned k = 0; k < N; k ++) {
      C[i][j] += A[i][k] * B[j][k];
    }
  }
}

Clang and GCC vectorized the inner loop over k, but Visual C++ refused. The issue was that VC++ could not vectorize over the second level of indirection on the float** arrays. We worked around this issue by removing the first level of indirection:

for (unsigned i = 0; i < N; i ++) {
  for (unsigned j = 0; j < N; j ++) {
    float *x = A[i];
    float *y = B[j];
    for (unsigned k = 0; k < N; k ++) {
      C[i][j] += x[k] * y[k];
    }
  }
}

While we were unable to eliminate all of the performance differences between platforms, the automated performance reports helped us verify that there were no systemic issues affecting either Windows, Linux, or OS X performance. That, along with the large variety of different workloads in Geekbench 3, resulted in overall scores that vary by less than 5% across platforms:

These reports have become an important part of our development process. They helped us to quickly locate performance issues in new workloads and avoid performance regressions in updated workloads. Automating the generation of these reports was key to their usefulness. If the manual effort to check for performance issues was large, then so would be the temptation to postpone cross-platform performance verification. As correcting these issues often took several days and sometimes required offsite correspondences to verify compiler optimization issues, learning about the issues quickly was critical.

Geekbench 3.1.3

I'm pleased to announce that Geekbench 3.1.3, the latest version of our popular cross-platform benchmark, is now available for download. Geekbench 3.1.3 features the following changes:

  • Added support for the iMac (Late 2013) and the MacBook Pro (Late 2013).
  • Added support for the latest Android and iOS devices.
  • Stress test now works as expected and uses 100% of processor resources.
  • Fixed issues with processor information on Android devices with an Intel processor.
  • Fixed an issue where the Nexus 7 (2013) was misidentified as the Nexus 7 (2012).
  • Fixed an issue where standalone mode did not work on OS X.
  • Fixed an issue where 64-bit iOS devices were reported as 32-bit iOS devices.
  • Fixed an issue where processor frequency could be misreported on Linux.

Geekbench 3.1.3 is a free upgrade for all Geekbench 3 users.

iPad mini Benchmarks

After going on sale unexpectedly yesterday morning, the iPad mini with Retina display has started to appear on the Geekbench Browser. While the new iPad mini is already included on our iOS benchmark chart, I wanted to take a closer look at how the new iPad mini performs compared to other iPads, both past and current.

Both charts were generated using Geekbench 3 results from the Geekbench Browser. If you're not familiar with Geekbench 3, higher scores are better, where double the score means double the performance.

Geekbench 3 confirms that the new iPad mini processor runs at 1.3 GHz (my original guess of 1.4 GHz was wrong). This is the same speed as the iPhone 5s processor, but 100 MHz slower than the iPad Air processor. I'm not sure why the new iPad mini processor is slower but I suspect it has to do with the mini's smaller battery (less power) or the mini's smaller chassis (less cooling). It's also not clear if the new iPad mini will throttle performance when overheating in a way similar to the iPad Air or the iPhone 5s.

The new iPad mini is substantially faster than the original iPad mini with an over 5x increase in performance. Given that the iPad mini has the same processor found in the iPhone 5s and the iPad Air, this isn't terribly surprising.

However, the new iPad mini is 7% slower than the Air in both single-core and multi-core tests. While the difference is significant, I don't think it's significant enough to warrant purchasing an Air instead of a mini for performance alone. Also, since I expect developers to still support the iPad 2 and the iPad mini for some time I don't expect much software will take advantage of the A7 processor to the point where that 7% difference will matter.