In-Depth Graphical Technology - Extreme Post-Processing Effects on Mainstream (Mid-end) Mobile Platforms

At the Unite Beijing annual event just past, Arm China was fortunate to be able to watch the latest graphics technology together with a wide range of developers and to demonstrate the fun XR technology on the spot. More than 300 enthusiastic audiences visited us. The technical special event, coupled with countless visitors to the booth, not only brought more than expected popularity, but also left many valuable insights.

In recent years, China’s mobile game market has developed at an unprecedented rate. Here we see one miracle after another. Most of them are inseparable from the Unity engine and the vibrant developer community behind it. Unite's conference is even more of a real focus. It brings together industry leaders from all corners of the globe and collides with the spark of technology and business. Arm has also been fortunate enough to continue to witness with Unity and help the mobile gaming industry thrive in the past many years.

Today, from the perspective of market size and number of users, mobile games have far surpassed any other forms of games including PCs and consoles, and in this game, games with highly realistic graphics are slowly relying on themselves. The quality advantage comes out.

In the Asia-Pacific region, about 90% of all mobile devices use the Arm architecture, and most of them are also equipped with Arm's Mali GPUs. Arm's excellent hardware infrastructure, debugging tools, and development skills make this a beautiful image. Just a patent for high-end models, developers can bring their own outstanding creativity to almost all players without any obstacles, so as to obtain the unimaginable chance of success in the past.

High-speed post-processing effects on mainstream (middle) mobile platforms

Careful observation, it is not difficult to find, the above four real-time rendering images in the visual effects of the difference, the top two are the original image without adding the Bloom effect, character armor and ground metal material performance is unsatisfactory, and the following two are optimized The picture not only shines on the key elements, but also can run stably under 60 FPS high frame rate on most mobile platforms.

These pictures were taken from Arm's important game ecology strategic partner: 3rd-class mobile masterpiece Spellsouls developed by Nordeus (at the time of the internal testing stage, there is no official Chinese name, and there are lovers who call it “the magic soul”. ). Nordeus is an independent game studio from Belgrade, the Serbian capital, and Spellsouls is one of its representative works. At recent GDC and UNITE conventions, they shared their valuable experience with developers from time to time.

Nordeus and Arm always maintain an in-depth and close collaboration in the optimization of mobile games. We also expect to incubate more such ecological cooperation models in China.

Custom Forward+ rendering path:

In order to be able to run smoothly at 60 frames with the highest rendering quality on the most devices, Nordeus innovatively used Forward+ rendering in Spellsouls to maximize the advantages of the Tiled Based architecture of mobile GPUs (Mali GPUs are typically TB architecture). For well-known reasons, the application of mature and efficient Deferred Rendering technology on PCs and host platforms directly to the mobile platform is still an unrealistic option. However, the traditional Forward rendering also has certain limitations, and its computational complexity will increase. With the number of light sources and the complexity of the scene, there is a rapid rise and it is almost impossible to handle 4 or even less dynamic light sources.

In this game, the tone of the dark real system, and a large number of high-reflective materials are in urgent need of outstanding lighting effects to enhance the atmosphere. Fortunately, developers can pioneerly implement the Forward+ rendering path, supplemented by appropriate light source properties, quantities, and Scope settings to successfully solve this big problem.

With regard to the implementation, characteristics, and implementation skills of Forward+ on the mobile side, Arm is planning to launch a series of in-depth topics in the near future and to discuss with vivid examples.

Exquisite balance between performance and performance

The audience on the scene must remember that the entire speech was based on a flashback technique and deduced the entire game's optimization process. Here, we no longer sell it.

Throughout the game's performance optimization process, choosing the right texture resolution, including terrain, light maps, etc., although they are all small-scale and not highly advanced measures, they have also yielded significant results and have contributed to the overall FPS improvement. Even under the premise of opening the PBR, it still allows developers to have enough rendering power budgets in the final stage for post-processing that can be highly efficient but not completely free of charge.

What needs to be pointed out here is that many developers may mistakenly believe that PBR is exclusive to the high-end flagship platform. They do not know that as long as they are properly implemented, they can also have thousands of Yuan machines.

Highly optimized Blur implementation

Spellsouls uses the super-efficient Blur method that is comparable to the standard Gaussian Blur speed up to 14 times, published by Arm in 2015. In simple terms, this technique is to reduce the overall amount of pixel operations required by reducing the target image in advance, and then zooming in again, and alternately performing horizontal and vertical fast pixel mixing operations. The algorithm has a qualitative improvement.

After you have read it, you can also boldly adopt it in your own project. Please feel free to contact us with any questions or discoveries! (Nathan.)

Flexible use of pre-baked and billboards

// Vertex shader

floatlightObjCameraAlignment = dot(objToCam, reftLightDir);

halfalignmentFactor = clamp(lightObjCameraAlignment, 0.0, 1.0);

// Fragment shader

Halfbloom = rawGlossMap.a;

finalColor += finalColor * bloom * i.alignmentFactor * _BloomStrength;

In Spellsouls, in the game production phase, Bloom map-related information is generated in advance and stored in the corresponding textured Alpha channel. At the moment of the game, the mixed pixels are attached to a camera forever. And suspended in the bulletin board between the character and the window, with the above two simple and efficient shader to make a vivid highlight effect, how clever! By doing so, you can not only avoid the problem that the character highlight effect is limited by the outline, but also ensure that the whole process takes less than 1 millisecond!

Mobile game optimization ideas and related tools

Reflections on Efficiency and Benefits

To make a long story short, countless projects have proven that if we want to reap the maximum benefits, then we should maximize our focus and direction on optimization from a higher level, before we jump into the details. Business at a loss, avoid spending precious time and energy, and spending too much time on the improvement of the details of the vices. Please look to the left as shown above. In the Spellsouls game project, the basic rendering optimization took about 16 milliseconds, plus super-fast post-processing (<1 millisecond), and finally stabilized at 60 frames.

Arm's Developer Tools Suite

The above figure is the use sequence diagram of Arm's (main) developer tool family portrait and everyone recommended to us. In principle, we usually first determine the performance bottleneck through DS-5 Streamline, and then use Mali Graphics Debugger to analyze the application. In the final detail optimization phase, Mali Offline Compiler is used to step up Shader efficiency.

DS-5 Streamline itself has a completely free Community Community Edition for download and use, and for those who want to deepen the development of the professional version of the purchase, please contact us directly. The Streamline Community Edition is arguably the same GPU analysis as the Professional Edition (free trial for 3 months), but it is limited to a part of the CPU's HW Counter. The Mali Graphics Debugger (MGD) itself is a free tool, and only advanced features such as Trace/Replay (recording and playback) require the DS-5 Professional Edition to unlock.

It is worth mentioning that MGD can now run directly on most devices without root, provided that the developer owns the project source code and opens corresponding debugging options in the corresponding engines (Unity and Unreal Engine). Can be used after build, if you need to debug application APK without code, you still need to use the root device. DS-5Streamline used to be root and recompile the kernel code of the device in order to do effective analysis. This restriction will be released on more devices in the near future, which greatly facilitates developers. Please contact us for details. communication.

Ningbo Autrends International Trade Co.,Ltd. , https://www.mosvapor.com

Posted on