Swift Performance in Xcode 6.3 Beta

Posted on 18 Feb 2015 by Anthony Schmieder

Back in December we ported a few of our Geekbench workloads to Swift and compared their performance to the C++ implementations. With last week's announcement of a beta release of Xcode 6.3 we thought it would be a good time to revisit those results. In this post we find out whether the performance improvements in Xcode 6.3 Beta provide any speedup for our Swift workloads.

The following table shows the performance of the Swift workloads compiled with Xcode versions 6.1.1 and 6.3 Beta. We use the same optimizer settings as we did in December and use the same machine to run the tests. As before the averages are taken over eight executions of the workloads.

Workload	Version	Minimum	Maximum	Average
Mandelbrot	Swift (6.3 Beta)	2.07 GFlops	2.49 GFlops	2.32 GFlops
	Swift (6.1.1)	2.15 GFlops	2.43 GFlops	2.26 GFlops
	C++ (6.1.1)	2.25 GFlops	2.38 GFlops	2.33 GFlops
GEMM	Swift (6.3 Beta)	2.14 GFlops	2.18 GFlops	2.16 GFlops
	Swift (6.1.1)	1.48 GFlops	1.59 GFlops	1.53 GFlops
	C++ (6.1.1)	8.61 GFlops	9.92 GFlops	9.32 GFlops
FFT	Swift (6.3 Beta)	0.25 GFlops	0.27 GFlops	0.26 GFlops
	Swift (6.1.1)	0.10 GFlops	0.10 GFlops	0.10 GFlops
	C++ (6.1.1)	2.29 GFlops	2.60 GFlops	2.42 GFlops

The improvements in the Xcode 6.3 Beta have provided a 1.4x speedup for GEMM and a 2.6x speedup for FFT over Xcode 6.1.1. Performance for the C++ workloads did not change, so we omit those numbers for the 6.3 Beta.

Our Swift FFT implementation got an additional speedup last week thanks to some performance patches from Joseph Lord (the code for the Swift workloads is available on GitHub). His optimizations include:

eliminate virtual function dispatches by making the Workload classes final
allow the compiler to do more inlining by moving the Complex definition into the same file as the FFT code
work around slow behavior when accessing an array of structs by changing the output array in FFT from a Swift array to an UnsafeMutablePointer<Complex>.

These changes provide a significant speedup for FFT of about 8.5x over our previous implementation:

Workload	Version	Minimum	Maximum	Average
Mandelbrot	Swift with Joseph's patches (6.3 Beta)	2.32 GFlops	2.45 GFlops	2.40 GFlops
	Swift (6.3 Beta)	2.07 GFlops	2.49 GFlops	2.32 GFlops
	C++ (6.1.1)	2.25 GFlops	2.38 GFlops	2.33 GFlops
GEMM	Swift with Joseph's patches (6.3 Beta)	2.01 GFlops	2.19 GFlops	2.13 GFlops
	Swift (6.3 Beta)	2.14 GFlops	2.18 GFlops	2.16 GFlops
	C++ (6.1.1)	8.61 GFlops	9.92 GFlops	9.32 GFlops
FFT	Swift with Joseph's patches (6.3 Beta)	1.85 GFlops	2.31 GFlops	2.20 GFlops
	Swift (6.3 Beta)	0.25 GFlops	0.27 GFlops	0.26 GFlops
	C++ (6.1.1)	2.29 GFlops	2.60 GFlops	2.42 GFlops

After the improvements in Xcode 6.3 and some careful optimizations, the performance of the FFT workload is now within 10% of the C++ implementation. The optimizations might look strange to someone who hasn't read up on Swift internals, but they are easy to apply and can be used by any Swift programmer. If you try these optimizations in your own code, benchmark the changes carefully. They might not provide any speedup at all for your algorithm. They might even slow it down. Also keep in mind 6.3 is still in Beta and it could change before the final release.

Anthony Schmieder is a software developer at Primate Labs.