iMac Performance (June 2014)

Apple announced a new lower-cost dual-core iMac today, and Geekbench 3 results for it are already appearing on the Geekbench Browser. Let's see how the new iMac performs compared to other iMacs.

When compared to the rest of the iMac lineup, the new iMac has reasonable single-core performance — it's almost identifcal to the entry-level quad-core iMac. Multi-core performance is significantly lower due to the lower number of cores (2 cores vs 4 cores).

One interesting thing about the new iMacs is that they use a low-voltage i5-4260U "Haswell" processor (the same processor is used in the MacBook Air). Why would Apple use a low-voltage dual-core processor in a desktop machine? The answer might be graphics:

According to Intel, the HD 5000 is twice as fast as the HD 4600. Apple may have sacrificed multi-core performance for GPU performance. Given the increasing importance modern user interfaces place on GPU performance, this may turn out to be a smart decision that extends the useful lifespan of the new iMac.

MacBook Air Performance (May 2014)

Earlier this week Apple announced a minor refresh to its MacBook Air lineup. The only change (besides a $100 price cut) is the base model now comes with a 1.4 GHz processor instead of a 1.3 GHz processor.

How does the new MacBook Air model perform compared to previous models? To find out, I've collected Geekbench 3 results for several models and charted the scores below:

The results for the 2013 and the 2014 models aren't surprising. Both models use Intel Haswell processors, so there are no major changes in processor technology. The 5-7% increase in performance is what I would expect from the 7% increase in processor frequency.

However, comparing the results for the 2011 and the 2012 models is much more interesting. For example, base model single-core performance has improved by almost 45% since 2011, and by almost 20% since 2012. Anyone considering an upgrade from these models will certainly notice (and appreciate!) an improvement this large.

Combatting Benchmark Boosting

Late last year Ars Technica noticed that some Samsung phones artificially boost performance when running Geekbench 3. This boost inflated Geekbench 3 scores by up to 20%. Since benchmarks are only meaningful when they're treated the same as any other application, we have been working on determining which devices "benchmark boost", and what we should do with results from these boosted devices. I'd like to share what we've discovered.

In order to determine which devices artificially boost performance when running Geekbench we added a "boost detector" to Geekbench 3. The detector embeds a report in each Geekbench 3 result uploaded to the Geekbench Browser. After analyzing thousands of reports we determined that the following Android devices artificially boost performance when running Geekbench 3:

  • Samsung Galaxy Note 10.1 (2014)
  • Samsung Galaxy Note 2
  • Samsung Galaxy Note 3
  • Samsung Galaxy S 3
  • Samsung Galaxy S 4
  • Sony Xperia Z
  • Sony Xperia Z Tablet
  • Sony Xperia Z Ultra
  • Sony Xperia Z1
  • Sony Xperia ZL

On both Samsung and Sony devices the boost appeared in Android 4.3. Earlier versions of Android (up to and including Android 4.2.2) did not boost. Anthony Schmieder and Daniel Malea, two Geekbench developers, worked with Ars Technica to find the code responsible for the boost on Samsung devices.

In order to combat benchmark boosting we have decided to exclude results from these devices running Android 4.3 from the Android benchmark chart. This way the results on the chart reflect the true performance, not the boosted performance, of each device. We have also added a list of excluded devices to the chart. We will continue to monitor the detector reports, and we will update this list if we discover other devices or Android versions that apply a benchmark boost.

There is one bit of good news that our detector uncovered — Samsung removed the benchmark boost from their Android 4.4 update. We hope that Sony follows Samsung's lead and also removes their benchmark boost from their Android 4.4 update as well.

Geekbench 3.1.5

Geekbench 3.1.5, the latest version of our popular cross-platform benchmark, is now available for download.

Geekbench 3.1.5 adds support for BlackBerry 10. Geekbench 3 for BlackBerry is available for download on BlackBerry World. You can also see how BlackBerry handsets compare using the new BlackBerry Benchmark Chart on the Geekbench Browser.

Geekbench 3.1.5 also features the following changes:

  • Added support for Android devices with MIPS processors.
  • Added Android CPU governor to system information.
  • Added L4 cache information to system information.
  • Fixed an issue where results uploaded to Dropbox had meaningless names.

Geekbench 3.1.5 is a free upgrade for all Geekbench 3 users.

Geekbench 3.1.4

Geekbench 3.1.4 is now available for download. Geekbench 3.1.4 features the following changes:

  • Added support for the Mac Pro (Late 2013).
  • Added the ability to export benchmark results to XML.
  • Fixed an issue that broke Dropbox integration on 64-bit iOS devices.

Geekbench 3.1.4 is a free upgrade for all Geekbench 3 users.

Developing a Cross-Platform Benchmark

Since this is my first post I’d like to begin by introducing myself. My name is Anthony Schmieder and I’m a software developer here at Primate Labs. I joined the team in the spring. I want to share some of the details of my first project at Primate Labs. A key goal for Geekbench 3 was to improve cross-platform comparability of the scores. To ensure that we did not lose sight of this goal as development ramped up, we needed an automated system to compare the scores across platforms and provide immediate feedback as we developed the workloads. I was able to quickly develop this system using Pulse, Python, and matplotlib.

First, we already used the Pulse continuous integration server to build Geekbench and run our test scripts after each commit to our development branch. Our Pulse build servers are three Mac minis with identical hardware. One runs OS X, one runs Linux, and one runs Windows. Since Geekbench is a processor benchmark and not a system benchmark, we wanted Geekbench scores to be similar across operating systems when run on identical hardware. There will always be some variation between operating systems, but our goal was to understand the sources of the variation and minimize those sources when possible. The build servers were a convenient place to test this, so I added a step to the Pulse builds that runs Geekbench on each server and exports the results to JSON using the --save option of the Geekbench command line tool.

Next, I wrote a Python script to read the Geekbench results for each platform, compare them, and generate a set of reports using matplotlib. The reports highlighted the differences in Geekbench scores across platforms and guided our investigations into performance and scoring issues. Of all the reports, the one we used most frequently compares the relative execution time of each workload across all platforms. A report for 32-bit single-core performance is shown below:

After each commit that affected workload code we examined this report, investigated any workloads that appeared problematic, analyzed the issues, fixed them if possible, committed those fixes, and repeated. For example, in the above report we see discrepancies in Black-Scholes and Stream Copy. For Black-Scholes, Visual C++ has optimized calls to sqrtf in the standard math library into sqrtps and sqrtss instructions. In the Stream Copy workload, the Clang and GCC optimizers have replaced the stream copy with a call to _memcpy.

Using this report we eliminated many other issues such as loop unrolling issues in the AES and Sobel workloads and vectorization issues in the GEMM workload. In AES, text is processed in chunks. Each chunk is transformed ten times and each transformation uses a different precomputed key. On systems with AES-NI instructions, this is implemented by the loop:

scratch = _mm_xor_si128(scratch, ((__m128i*)ctx->schedule)[0]);
for (int j = 1; j < 10; j++) {
  scratch = _mm_aesenc_si128(scratch, ((__m128i*)ctx->schedule)[j]);
}

This loop is unrolled by Clang and GCC. Visual C++, however, is conservative about unrolling loops that contain compiler intrinsics, so it did not unroll the loop. This led to a 25% performance penalty for MSVC. We worked around this issue by unrolling the loop by hand:

scratch = _mm_xor_si128(scratch, ((__m128i*)ctx->schedule)[0]);
scratch = _mm_aesenc_si128(scratch, ((__m128i*)ctx->schedule)[1]);
scratch = _mm_aesenc_si128(scratch, ((__m128i*)ctx->schedule)[2]);

The report helped us uncover another interesting loop unrolling issue in our Sobel workload. Using Clang and GCC we saw a slight performance increase by running the benchmark in 64-bit mode instead of 32-bit mode. In Visual C++ we saw a 50% drop in performance when running in 64-bit mode. With help from the Visual C++ team at Microsoft, we were able to track the issue to a difference between the 32-bit and 64-bit loop unrolling heuristics in Visual C++. Again, we worked around the issue by unrolling the loop by hand.

The last issue I’ll talk about was in our GEMM kernel. We had code similar to the following where A, B, and C all have type float**:

for (unsigned i = 0; i < N; i ++) {
  for (unsigned j = 0; j < N; j ++) {
    for (unsigned k = 0; k < N; k ++) {
      C[i][j] += A[i][k] * B[j][k];
    }
  }
}

Clang and GCC vectorized the inner loop over k, but Visual C++ refused. The issue was that VC++ could not vectorize over the second level of indirection on the float** arrays. We worked around this issue by removing the first level of indirection:

for (unsigned i = 0; i < N; i ++) {
  for (unsigned j = 0; j < N; j ++) {
    float *x = A[i];
    float *y = B[j];
    for (unsigned k = 0; k < N; k ++) {
      C[i][j] += x[k] * y[k];
    }
  }
}

While we were unable to eliminate all of the performance differences between platforms, the automated performance reports helped us verify that there were no systemic issues affecting either Windows, Linux, or OS X performance. That, along with the large variety of different workloads in Geekbench 3, resulted in overall scores that vary by less than 5% across platforms:

These reports have become an important part of our development process. They helped us to quickly locate performance issues in new workloads and avoid performance regressions in updated workloads. Automating the generation of these reports was key to their usefulness. If the manual effort to check for performance issues was large, then so would be the temptation to postpone cross-platform performance verification. As correcting these issues often took several days and sometimes required offsite correspondences to verify compiler optimization issues, learning about the issues quickly was critical.

Geekbench 3.1.3

I'm pleased to announce that Geekbench 3.1.3, the latest version of our popular cross-platform benchmark, is now available for download. Geekbench 3.1.3 features the following changes:

  • Added support for the iMac (Late 2013) and the MacBook Pro (Late 2013).
  • Added support for the latest Android and iOS devices.
  • Stress test now works as expected and uses 100% of processor resources.
  • Fixed issues with processor information on Android devices with an Intel processor.
  • Fixed an issue where the Nexus 7 (2013) was misidentified as the Nexus 7 (2012).
  • Fixed an issue where standalone mode did not work on OS X.
  • Fixed an issue where 64-bit iOS devices were reported as 32-bit iOS devices.
  • Fixed an issue where processor frequency could be misreported on Linux.

Geekbench 3.1.3 is a free upgrade for all Geekbench 3 users.

iPad mini Benchmarks

After going on sale unexpectedly yesterday morning, the iPad mini with Retina display has started to appear on the Geekbench Browser. While the new iPad mini is already included on our iOS benchmark chart, I wanted to take a closer look at how the new iPad mini performs compared to other iPads, both past and current.

Both charts were generated using Geekbench 3 results from the Geekbench Browser. If you're not familiar with Geekbench 3, higher scores are better, where double the score means double the performance.

Geekbench 3 confirms that the new iPad mini processor runs at 1.3 GHz (my original guess of 1.4 GHz was wrong). This is the same speed as the iPhone 5s processor, but 100 MHz slower than the iPad Air processor. I'm not sure why the new iPad mini processor is slower but I suspect it has to do with the mini's smaller battery (less power) or the mini's smaller chassis (less cooling). It's also not clear if the new iPad mini will throttle performance when overheating in a way similar to the iPad Air or the iPhone 5s.

The new iPad mini is substantially faster than the original iPad mini with an over 5x increase in performance. Given that the iPad mini has the same processor found in the iPhone 5s and the iPad Air, this isn't terribly surprising.

However, the new iPad mini is 7% slower than the Air in both single-core and multi-core tests. While the difference is significant, I don't think it's significant enough to warrant purchasing an Air instead of a mini for performance alone. Also, since I expect developers to still support the iPad 2 and the iPad mini for some time I don't expect much software will take advantage of the A7 processor to the point where that 7% difference will matter.

Estimating Mac Pro Performance

Ever since the first updated Mac Pro result appeared on the Geekbench Browser back in June, everyone has been curious about how the upcoming Mac Pros will perform. Arguably the most important component when it comes to performance is the processor. While Apple hasn't announced which processors will be used in the upcoming Mac Pro, they have provided some details on the Mac Pro specification page. Using this information, and information from the Intel ARK processor database, here are the processors I expect to see in the upcoming Mac Pro:

ProcessorCoresBaseTurboPrice
Xeon E5-1620 v243.7 GHz3.9 GHz$294
Xeon E5-1650 v263.5 GHz3.9 GHz$583
Xeon E5-1680 v283.0 GHz3.9 GHz$1723
Xeon E5-2697 v2122.7 GHz3.5 GHz$2614

Even though some Geekbench 3 results have leaked for the upcoming Mac Pro, results are not available for all of the upcoming models. Luckily, since Geekbench 3 is a cross-platform benchmark, we can estimate the missing Mac Pro scores using results from Windows workstations that use the same processors as the Mac Pros. Here are my estimated Geekbench 3 scores for the upcoming Mac Pros:

These estimates suggest that single-core performance will be similar for the 4-, 6-, and 8-core models. Since all of the processors have the same Turbo Boost frequency, and since the processors run single-core tasks at the Turbo Boost frequency, this isn't surprising news. However, it is welcome news since users will not have to sacrifice single-core performance when choosing between the 4-core and the 6- or 8-core models.

These estimates also suggest that single-core performance will be 15% lower for the 12-core model. However, the 12-core model will have the best multi-core performance. I think the 12-core model will appeal to users with heavily-threaded applications that can take advantage of all 12 cores, while everyone else will be much happier with the superior single-core performance the other models offer.

How do the upcoming Mac Pros compare to the current Mac Pros?

The upcoming Mac Pro will have significantly better single-core performance than the current Mac Pro. For example, the upcoming 4-core model will be between 50% and 75% faster, and the upcoming 12-core model between 16% and 32% faster, than the equivalent current models.

Multi-core performance is also significantly better. The upcoming 4-core model will be between 58% and 78% faster than the current 4-core models, and the upcoming 12-core model will be between 17% and 47% faster than the current 12-core models. The 6-core and 8-core models are also quite speedy. The upcoming 6-core model will only be 10% slower than the current base 12-core model, and the 8-core model is faster than most of the current 12-core models.

Final Thoughts

I'm really excited about the upcoming Mac Pros as they're the first significant update in over three years. Even if you don't consider the new industrial design, the fact that Apple has moved from the outdated Nehalem and Westmere processors to the new Ivy Bridge processors should be exciting for all Pro users.

The only question left is how much will the 8-core and 12-core models cost? Given the price of the 8-core and the 12-core processors ($1723 and $2614, respectively) I expect both models will be quite expensive, even when compared to the $3999 6-core model.

iPad Air Benchmarks

Geekbench 3 results for the new iPad Air are starting to appear on the Geekbench Browser. I've charted the results for all iOS 7 capable iPads below. If you're not familiar with Geekbench 3, it's our cross-platform processor benchmark. Higher scores are better, where double the score means double the performance.

Some thoughts on the results:

  • The iPad Air's A7 processor is running at 1.4 GHz, 100 MHz faster than the iPhone 5s' A7 processor. It's not clear if the iPad Air processor runs at a higher speed thanks to a larger battery (providing more power), a larger chassis (providing better cooling), or some combination of the two. I expect the new iPad mini's A7 processor will run at 1.4 GHz as well.

  • The iPad Air is over 80% faster than the iPad (4th Generation), close to the 2x increase promised by Apple.

  • The iPad Air is over 5x faster than the iPad 2, yet is only $100 more expensive. I do not understand why Apple kept the iPad 2 around, especially at a $399 price point. What market are they targeting?

From a performance standpoint the iPad Air is a great upgrade to the iPad (4th Generation). With most recent Mac updates showing only modest performance improvements, it's exciting to see iOS devices do the opposite with substantial improvements between generations. I wonder how much longer Apple can keep this up?