Let’s test the Apple A12 chip

When Apple announced iPhone Xs and iPhone Xs Max, our eyes were all focused on what was inside it: the Apple-designed A12 Bionic chip. We knew how fast and powerful its predecessor, the Apple A11 was, and we knew that the A12 could only go beyond what we had seen so far.

Today we are going to use our Asteroids Benchmarks to measure and analyze the graphics and compute performance of the Apple A12 chip, contained in the iPhone Xs, Xs Max and Xr.

In a press release, Apple set remarkable expectations about the A12 Bionic performance:

A12 Bionic features a six-core fusion architecture with two performance cores that are up to 15 percent faster, four efficiency cores that are up to 50 percent more efficient, a four-core GPU that is up to 50 percent faster, powerful Apple-designed Image Signal Processor (ISP), video encoder and more.

At Virtual Arts, we embrace mobile hardware getting better and better. In fact, we believe that both software and hardware need to improve to deliver amazing augmented and mixed reality experiences. The theoretical performance improvements are great, but we are interested of course in seeing how this chip performs in a real-world scenario.

In this blog post, we want to focus on graphics and compute performance, which are the major factors that contribute to extended reality performance. The main sub-systems that are stressed by these tests are CPU, GPU, bus, memory and obviously the OS, in particular, the graphics driver and the scheduler. In future blog posts, we will analyze also other aspects of the system.

iOS 12 – more performance improvements

Just a few months before the A12 Bionic chip revelation, we were in San Jose, California, at WWDC 2018, where Apple announced, in the keynote, that iOS 12 was bringing big performance improvements, compared to iOS 11. In fact, Apple’s senior vice president of Software Engineering, Craig Federighi, said:

“And so, for iOS 12, we are doubling down on performance from top to bottom making improvements to make your device faster and more responsive”

And talking about the CPU performance:

“CPUs traditionally respond to an increased demand for performance by slowly ramping up their clock speed. Well, now in iOS 12, we’re much smarter. When we detect that you need a performance lift when you’re scrolling and launching an app, we ramped up processor performance instantly to its highest state delivering high performance and a ramp it down just as fast to preserve battery life”

This, on the top of the hardware improvements brought by the A12 Bionic chip, sounds very promising to us. Today, we are going to run our benchmarks and analyze the performance of the iPhone Xs, which features the new chip. Here are the technical specifications relevant to graphics and compute performance:

Our benchmarks

For our research, we are going to use Asteroids Benchmarks v1.1.1, which includes complex 3D scenes and animations that represent game-like and educational-like content. We are going to run all our benchmarks on the iPhone Xs, and compare them to the precedent, the iPhone X.

Asteroids Benchmarks are can be configured with different rendering resolutions and MSAA settings. We will run the benchmarks using all the configurations and check how the performance scales across them.

Asteroids Benchmarks v1.1 include three benchmarks, two of which render real-world content, and one that consists of a set of low-level technical tests:

Snow Forest

Focus: graphics rendering and animations
Workload: GPU and CPU
Use case: gaming

Orbital Flight

Focus: graphics rendering and transparent materials
Workload: GPU
Use case: education

Meteor Shower

Focus: technical benchmarks for CPU and GPU
Workload: CPU and GPU
Use case: low level performance testing and analysis

Asteroids Benchmarks and the Universe Engine on iOS

The Asteroids Benchmarks run on the Universe Engine. Universe Engine is specifically written and optimized for iOS and Android-based devices, featuring a multi-threaded job system, low overhead APIs like Metal and Vulkan, and it is optimized for tile-based mobile GPUs.

Our iOS implementation of Universe Engine makes use of a great deal of Apple’s technology, including:

  • Metal for the rendering
  • ARKit for Augmented Reality (not used in this version of the benchmarks)
  • UIKit for view and event handling
  • Signposts for XCode Instruments profiling
  • Grand Central Dispatch for the multithreading

Metal rendering

Universe Engine has been developed from scratch targeting Vulkan and Metal APIs, which allowed us to multi-thread every task that runs in the engine. Not having a legacy OpenGL ES implementation allowed us to structure our engine around the new low-overhead API concepts.

From a technical point of view, our Metal implementation uses parallel render command encoders, which are objects that split up a single render pass so it can be simultaneously encoded from multiple threads.

The standard Metal shaders are handwritten and optimized directly using the Metal Shading Language (based on C++14), and the custom shaders are compiled from the shader graphs directly to the native shading language as well. All shaders are precomputed in the application’s build stage.

All static textures are compressed to ASTC format, and the texture channels are reduced to the minimum necessary. Moreover, mipmaps are enabled for all textures, to improve the visual quality and reduce bandwidth utilization.

Concurrent execution with Grand Central Dispatch

In Universe Engine, every task is packaged as a job that can be executed concurrently on the CPU. Conceptually, there is a thread pool that will pick the jobs up and execute them accordingly to their priority and dependencies.

On Apple devices, we use the native Dispatch framework, which is provided by iOS on every platform. Grand Central Dispatch (GCD) provides FIFO queues to which we can submit tasks. Work submitted to dispatch queues are executed on a pool of threads fully managed by the system.

Apple says that no guarantees are made as to the thread or core on which each task executes, but we assume that GCD has been optimized by Apple’s engineers to do what is optimal for each platform. Indeed, after we tried it with our workloads, we found that it performs very well, and it is a good extension to the original job manager that we are using.

Our Xcode Instruments integration

Galileo once said: “measure what is measurable, and make measurable what is not”. To analyze and optimize a system, first, you have to be able to measure it. Apple provides great tools for profiling and debugging on iOS devices. Instruments is the performance analysis tool that is shipped with Xcode. It can profile and plot metrics for the CPU, the GPU, the system, memory, and other sub-systems.

Like many profiling tools based on a timeline, it is great at visualizing what the system is doing (hardware and OS), but they typically lack information about what the user’s application does. Without engine integration, it is impossible to know what the application was rendering at any point in time.

Universe Engine is fully instrumented to emit point of interests and annotations to the Instruments tool and other vendors’ profilers. This is extremely useful because it amplifies the usefulness of the performance analysis tool. For example, this is what Instruments looks like with the Universe Engine points of interests:

Screenshot of Xcode Instruments with Universe Engine points of interests

Testing with the Snow Forest scene

Snow Forest is a mobile graphics and compute benchmark that measures the performance of a modern mobile phone rendering complex 3D graphics and effects. It features:

  • Content-based on a gaming demo developed by Virtual Arts.
  • Multiple character animations with skinning and blend shapes.
  • Particle effects.
  • Custom materials and custom shaders with PBR rendering.
  • Dynamic reflections and shadow mapping.
  • Multiple dynamic lights.
  • 38 different material with 146 textures (of which 11 are 2k and 35 are 1k).

Here you can watch a screen capture of the Snow Forest benchmark.

Overview

We have run our benchmark on iPhone Xs multiple times, allowing the device to cool down a bit between different runs (we’ve noticed some thermal throttling, more about this below). All data that we measure is collected by our web service and available on our the results page of our website, and we have performed some analysis on it.

We run the benchmark in offscreen mode so that rendering is not limited by vsync.

In this chart, we plot the frame rate that was obtained running the benchmark in offscreen mode at different resolutions, with varying multisampling (MSAA) levels:

Snow Forest benchmark - running offscreen at different resolutions and multisamplling levels

Overall, the Apple A12 chip performs very well with this benchmark, which is expected by the “most powerful chip in a smartphone”, quoting Apple. It is important to say that on smartphones, peak performance like this is not sustainable for long periods of time.

Effects of multisampling (MSAA) on performance

It’s interesting to see how cheap multisampling with 4 samples is (MSAA 4x) on this device. The effect on the frame rate is minimal. Here we can see the actual numbers:

Comparison between Apple A12 and Apple A11 Bionic at different rendering resolutions

The A12 Bionic runs really fast. But how fast compared to the previous generation?

Well, more than twice as fast.

This is great and makes us quite excited by the opportunities for extended reality that the future is quickly bringing to us. Where the iPhone X with the A11 chip was below 60 frames per second, now the iPhone Xs can beat vsync and run smoothly even at the highest resolutions:

Snow Forest benchmark - comparing iPhone Xs and iPhone X at different resolutions

Analyze with Xcode Instruments

Let’s use the tools to validate our assumptions. We want to check is that for resolutions above 2688×1242 the benchmark is GPU limited, while for lower resolutions it is CPU limited. For this we use Xcode Instruments and we profile the benchmark at 2688×1242 and at 1334×750 (after having cooled it down).

Screenshot of Instruments showing GPU activity at 1334x750
Screenshot of Instruments showing GPU activity at 2688x1242

The vertex and fragment charts represent tasks running on the GPU for processing the vertices (geometry) and the pixels of the frames that we are rendering.

When we pay close attention to the fragment processing, which is by far the most intensive task (on almost every 3D graphics game or app), we can see that it dominates the chart. However, there is a big difference in the case of the 1334×750 resolution: there are gaps.

The gaps, representing GPU idle time, normally would be caused by vsync, and are a good thing because they let the hardware go idle and cool down for a bit. Except that in this case, this benchmark is rendering on an offscreen render target, and it is not controlled by vsync. This means that the gaps in the fragment processing show that the GPU has to wait for the CPU to be done submitting a new frame. This means that at this resolution, the benchmark is CPU-limited.

At 2688×1242 instead, there are no gaps. Fragment (and vertex) processing completely consume the available GPU time. In this case the GPU is the limiting factor.

This can be clearly seen in the following zoomed-in picture:

It is understandable that the system is stressing a lot more the GPU at higher resolutions. In fact, whilst the CPU work remains the same, because the same number of objects and effects need to be processed and animated, the GPU has to render a lot more pixels (3.3 times). Here’s a comparison of the two resolutions that we tested, in scale:

Thermal throttling

While running our tests we have noticed many inconsistencies in the results we were getting from different runs of the same benchmark configuration, on the same device. Sometimes the frame rate we were getting was half compared to what we had gotten running the same test just a few minutes earlier.

Here’s a chart showing the results from different runs of the same benchmark at 2688×1242:

Different measurements obtained by running the same benchmark on the same device

As you can see in the plot, frame rate varies from 137 fps to a minimum of just 68 fps. This is caused by thermal throttling and its effect is more visible for low resolutions, where the benchmarks are CPU-limited. To further verify this, we have run the benchmark on a “cold” device which had been idle for more than 30 minutes, then we have run the benchmark again a series of times and measured again when the device was “hot”. We have performed the same operation on both the iPhone Xs and the iPhone X, to compare how well the A12 and the A11 chips cope with thermals:

Comparison of the frame rates obtained on a cold device vs hot device

The frame rate drops from 137 fps to 78 fps, which is just a bit more than half. Although peak performance of the Apple A12 has improved significantly, and we can see a larger impact of thermal throttling, compared to the A11 contained in the iPhone X. This makes us interested in benchmarking sustained performance, which we will do in a future blog post.

Testing with the Orbital Flight scene

Orbital Flight is a mobile graphics-focused benchmark that measures the performance of a modern mobile phone rendering complex 3D graphics. It features:

  • Assets based on an educational demo developed by Virtual Arts
  • Metal rendering
  • 21 custom blended materials with complex shaders (some have more than 60 nodes in the graph)
  • Material layering and additive blending
  • Multiple dynamic lights (Sun and Moon both emit dynamic light)
  • 118 textures (of which 26 are 2k textures)
  • Animated material properties.

Here you can watch a screen capture of the Orbital Flight benchmark.

Overview

Like we did before with the Snow Forest, here’s the results after running Orbital Flight at various resolutions and multisampling settings, in offscreen mode (so it’s not vsynced):

Orbital Flight benchmark - running offscreen at different resolutions and multisamplling levels

In this case we can see a larger impact of the resolution and MSAA settings, compared to the Snow Forest. This is because this benchmark is very heavy on 3D rendering, while the CPU does not have a lot of work to do. We also notice a greater impact of the number of samples for MSAA. We suspect this is related to the amount of transparency and small triangles that are drawn in this scene.

Overall the A12 performs really well, only missing 60 fps at 4k resolution (3840×2160).

Comparison between Apple A12 and Apple A11 Bionic at different rendering resolutions

Comparing the results of Orbital Flight against those measured on the iPhone X, we see that the A12 is about 90% faster than the A11 at almost all resolutions. This is really good, and beyond what was said in Apple’s announcement (50% faster GPU).

Orbital Flight benchmark - comparing iPhone Xs and iPhone X at different resolutions

As said earlier we had noticed a greater impact of MSAA with this benchmark, where MSAA 4x can be up to 15% slower compared to MSAA 1x:

Meteor Shower

Meteor Shower is a suite of micro-benchmarks that focus on the technical aspects and provide results that are specific to an area of graphics and compute in isolation. We often need to understand in detail the hardware characteristics and limitations of each device, and testing each subsystem in isolation can give valuable data. Nevertheless, because the micro-benchmarks run in isolation, they are not representative of real scenarios and real apps and games: for that, we have the other content-based benchmarks.

Focus on CPU

CPU Parallel Test

The CPU parallel micro test in the Meteor Shower benchmark contains:

  • Large number of independently animated objects
  • Rigid bodies animation
  • Vertex skinning on CPU
  • Simple shading
  • Parallel job system for animation and simulation (full core utilization)
CPU Parallel micro test comparing frames per second

With Apple claiming 50% performance improvements on the energy efficient cores and 15% improvement on the performance cores in their press release, we found in our tests that this is to mostly be the case. We found a 41% improvement in our CPU animation test which can be assigned on multiple cores on the CPU (we do not control which cores).

Focus on GPU

Vertex shading

A Vertex shading micro test with 1 million triangles in view.

Vertex Shading test comparing frames per second

Apple claims that the GPU in the A12 provides 50% faster performance over the GPU in the A11; our results show that even just vertex processing leads to a performance increase of 35% in this Vertex Shading test.

Fragment shading

A fragment shading test with multiple layers of clouds using a complex fragment shader:

  • Heavy fragment shading​
  • 6 layers of clouds with simulation in the fragment shader​
  • Water shader with reflections and water simulation​
  • Multiple 2k textures​
  • Flow maps​
  • Heavy on texturing and arithmetic
Fragment Shading test comparing frames per second

Further to the Vertex Shading test, the GPU in the A12 showcases a 44% improvement in the Fragment Shading test, which pretty much validates Apple’s announcement.

Conclusions

In this blog post, we have tested and analyzed the performance of the Apple A12 Bionic chip which powers the iPhone Xs and Xs Max. Apple promised amazing performance, compared to the previous A11 chip, claiming 50% faster GPU and 15% to 50% faster CPU (depending on which cores).

We have run our benchmarks at different resolutions and multisampling levels, and they all scored extraordinarily well on this device. Compared to the iPhone X (powered by the A11), we have seen the frame rate of the Snow Forest scene more than doubling, and the Orbital Flight scene running 90% faster at various resolutions.

However, these improvements are emphasized only at peak performance, when the device is cold. As soon as the phone starts warming up, which happens after just 30 seconds of intense workloads, thermal throttling kicks in and the frame rate drops significantly, down 43%, in our test.

Even with the thermal limitations, this is the most powerful mobile device we have seen so far, and we think it’s going to be a hard one to overtake. Moreover, we are looking forward to seeing what Apple is working on in 2019, we expect it to be even faster.

Make sure you subscribe to the Asteroids Benchmarks Newsletter to keep up to date on performance analysis of future mobile hardware.

Lorenzo Dal Col

Lorenzo is leading the development of engines and tools for high performance VR and AR graphics on low power devices at Virtual Arts, a tech startup in Cambridge, UK. He first used Arm technology when, in 2007, he created a voice-controlled robot at the university. He has experience in computer graphics, machine learning, image processing, and computer vision. Before Virtual Arts, he worked for Arm for 6 years on 3D graphics, developing performance analysis and debug tools for software running on Arm Mali GPUs.
Close Menu