Kraken’s incremental adoption of the New Architecture for fixing performance issues
Performance pain can make developers go the extra mile. In this blog post, you can learn from the performance issues we experienced at Kraken and how we embarked on a New Architecture adoption journey to solve those issues. Yes, there were speed bumps along the way. We learned from them, and we hope you can, too.
The New Architecture is going to be the default, starting from React Native 0.76 and Expo SDK 52, the very next releases of React Native and Expo. New in-development features will be implemented only for the New Architecture and some libraries are already dropping support for the old architecture. You should really start thinking about adopting it if you don’t want to miss out!
Kraken architecture overview
Kraken is one of the largest, most trusted and secure cryptocurrency platforms, with a vibrant community of over 13 million clients worldwide. We currently have three mobile apps in production – all written in React Native with a bunch of custom native libraries and components in Swift/Kotlin, and a backend in Rust.
While we don’t use the full Expo suite for historical reasons, we have started migrating to use Expo modules over some of the community packages for maintenance and performance reasons. Performance is a key concern for us, especially in our Pro app, which is data intensive and filled with interactive charts which are constantly being updated via WebSockets. This puts a strain on performance, especially on low end Android devices. So for a long time we had kept our eyes on the New Architecture progress and hoped it would alleviate some of the issues we were facing.
At the end of this journey, we were able to improve the performance of our apps significantly in several areas:
Complete app renderer: 1.3x faster
Home screen renderer: 2.5x faster
Trading flow screen render: 5.3x faster
And more…
Keep reading to learn about our journey and all the other performance benefits that came along with it.
Our New Architecture adoption plan
Our primary goal was to improve performance on Android. Our first course of action was to create a quick proof-of-concept using Fabric to be able to estimate the gains. Despite our fairly large codebase and multitude of dependencies, this was done pretty quickly by leveraging the legacy interop layer and stubbing out incompatible libraries/components. The result was a much snappier feeling app which was backed up with objective performance metrics.
Knowing the ecosystem was still in a migratory phase and expecting some rough edges, we decided to adopt the New Architecture in an incremental manner to reduce the engineering risk. This meant going platform by platform, app by app, and architecture feature by feature. Our simplified plan looked something like this:
Update third party component libraries and migrate internal components to support both the new and old renderer
Update third party native module libraries and migrate internal libraries to Native Turbo Modules
Enable bridgeless mode
Remove backwards compatibility once fully rolled out
New Architecture adoption speed bumps
On our incremental adoption journey we ran into a handful of speed bumps. This was to be expected. In this section we’ll call out each one in the hopes that it will help other teams navigate them a little more swiftly than we did.
Swift
Unlike Turbo Modules, Fabric components don’t officially have Swift support. This was a bummer because our codebase is in Swift and we didn’t want to go back to Objective-C. With some inspiration from the Lottie library (and help from a video from Coding With Nobody) we got it working. It’s worth noting that Expo Modules have native Swift support and an arguably nicer API. We’re also keeping an eye on the Nitro project from Marc Rousavy which might support Fabric components in the future.
Automatic batching
In some screens we noticed perceived slower rendering, especially very render-heavy screens such as the interactive graphs.
While we’re not completely sure of the root cause, we suspect that this was due to the automatic batching introduced in React 18, which is only supported on the New Architecture. The theory was that while batching leads to less CPU load, it also skipped a few intermediary steps that gave a faster impression. Ultimately, the component was not correctly built, so after a refactor and migration to use Reanimated for performance sensitive interactions the issues were solved.
Bridgeless
Because Bridgeless mode is the most recent piece of the New Architecture puzzle, we wanted to adopt this last, even though it was the comparatively least disruptive change (thanks to a great interop mode). However, our plan didn’t work out because Expo 51 doesn’t support Fabric without also using Bridgeless mode. This was a problem for us because we wanted some fixes in React Native 0.74 which meant that we had to adopt Bridgeless slightly sooner than planned.
Overall it was uncomplicated, with one exception: CodePush will be deprecated soon and we rely on requestIdleCallback for some of our performance metrics. We’re currently in the process of migrating to Expo updates instead, but in the meantime we’ve fixed support through patch-package/yarn patch and backported requestIdleCallback, which is supported from 0.75.
Interop layers
The interop mode for Old Renderer components worked like magic for most Android components, but for iOS we found that it had layout issues on one of our internal native components. This was never our intended end-state regardless, and we solved it by simply migrating them to Fabric.
Proguard
Early on in our development we noticed that a branch that worked great in development insta-crashed in a production build with somewhat vague error messages. After some digging, we found that this was caused by Proguard removing certain third party classes and methods. It’s possible that it was caused by the lazy nature of Turbo Modules, which confused the Proguard optimizer into thinking that they were not used. Once we discovered the problem it was easy to simply exclude those symbols from being stripped.
Rollout
As previously mentioned we wanted to adopt the New Architecture as incrementally as possible. Ideally we would have wanted to go screen by screen, and while the New Architecture is supported natively, it’s not currently supported by React Navigation, so we had to be careful when rolling out Fabric. However, due to the interop layers we were able to successfully roll out the new arch at a project level.
Maestro
While we have many component tests using React Testing Library, unfortunately, they will not give us any confidence in adopting the new renderer; instead we relied heavily on our automated end-to-end tests on Maestro Cloud. This is also where we run our performance suite to give us hard numbers before hitting production.
Internal testing
Normally we don’t rely on manual testing, but since these changes are more impactful and cannot easily be rolled back with a feature flag we distributed builds internally for people to test and verify that their flows were working as expected. This was especially useful for finding rendering regressions in niche screens that were initially missed due to lack of visual testing.
“Canary releases”
When we believed we had tested as much as we could with and without automation, we wanted to serve it to a small number of production users. We would traditionally use feature flags in LaunchDarkly for this, but since most of the pieces of the New Architecture are compile flags this was not an option. Instead we opted for a poor man’s version of canary releases via gradual rollouts on Play Store.
Our apps are released on a weekly cadence, and essentially once we deem a release stable and fully rolled out to production we serve a small percentage of users a version with the New Architecture enabled. Since gradual releases on Play Store can be halted, we could limit user impact in case of any serious bugs or crashes. Additionally, rolling forward is faster due to the generally faster review process.
Real client monitoring
Once the app was in our clients’ hands we religiously monitored them on stability, performance and product/conversion metrics.
Stability through Sentry and Play Store
Performance through Sentry with our own custom metrics
Product metrics primarily through Mixpanel
New Architecture adoption results
Stability
In our first few builds we noticed a slight decrease in stability due to a crash in one of the third party libraries only present on the New Architecture and affecting a quite rare flow. Once we fixed this issue the stability was on par with old architecture at 99.9% crash free sessions.
Performance
Overall, our production data showed that render times got significantly faster, but with large variability between different screens. We also noticed that the biggest improvements were seen on the slowest devices – both in absolute and relative terms – which was a happy surprise.
Not everything got faster though: The native cold start got a little bit slower which was somewhat surprising given our migration to Turbo Modules. Since the app binary size increased with the New Architecture enabled, our current assumption is that this is caused by still-present parts of the old architecture. We expect this to get better in the future when the migration is fully completed and with initiatives like Nicola’s single merged dynamic library.
React Native 0.76 will ship with a single merged dynamic library called `https://t.co/w2nNNDov97`:https://t.co/peZ08rvbtS
This comes with major space savings for users as well as performance wins
— Nicola (@cortinico) August 20, 2024
As a whole, our most important and more holistic user-impacting metric called App Render Complete –which includes native boot, js boot, networking and rendering — was improved.
MeasureP50P95App Render Complete1x1.3xHome Screen Render2x2.5xTrading Flow Screen Render3.8×5.3xNative Cold Start0.9×0.7xNavigation Total Blocking Time1x1.1x
Next steps
With the New Architecture successfully in place we’re looking at how to further leverage the new capabilities gained, such as:
Use useDeferredValue for frequently updated, but less critical components such as price tickers
Fix instances of jumpy layouts by replacing onLayout with synchronous measure() calls
Expose existing Rust libraries from the backend to the apps via JSI bindings
Thanks
Nicola Corti and the React Native team at Meta for providing the incredibly useful resources for adopting the new architecture and being receptive to, and quickly addressing feedback.
Brent Vatne at Expo for driving the effort of making the ecosystem migrate to the new architecture and answering in-depth questions.
The whole Software Mansion team for doing the mammoth task of migrating many of the core third party libraries such as reanimated, gesture handler, screens and svg.
These materials are for general information purposes only and are not investment advice or a recommendation or solicitation to buy, sell, stake, or hold any cryptoasset or to engage in any specific trading strategy. Kraken makes no representation or warranty of any kind, express or implied, as to the accuracy, completeness, timeliness, suitability or validity of any such information and will not be liable for any errors, omissions, or delays in this information or any losses, injuries, or damages arising from its display or use. Kraken does not and will not work to increase or decrease the price of any particular cryptoasset it makes available. Some crypto products and markets are unregulated, and you may not be protected by government compensation and/or regulatory protection schemes. The unpredictable nature of the cryptoasset markets can lead to loss of funds. Tax may be payable on any return and/or on any increase in the value of your cryptoassets and you should seek independent advice on your taxation position. Geographic restrictions may apply.
The post Kraken’s incremental adoption of the New Architecture for fixing performance issues appeared first on Kraken Blog.
Kraken Blog