Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Lions beat Bucs, Is Detroit the best team in the NFC? | The Herd

    White House official says “no plans” for Trump-Putin meeting in the “immediate future”

    Blue Jays advance to World Series, Can they stop the Dodgers from repeating? | The Herd

    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram Pinterest VKontakte
    Sg Latest NewsSg Latest News
    • Home
    • Politics
    • Business
    • Technology
    • Entertainment
    • Health
    • Sports
    Sg Latest NewsSg Latest News
    Home»Technology»Mozilla gives Firefox AI Runtime a big speed boost with native C++
    Technology

    Mozilla gives Firefox AI Runtime a big speed boost with native C++

    AdminBy AdminNo Comments4 Mins Read
    Facebook Twitter Pinterest LinkedIn Tumblr Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Mozilla engineers have accelerated the Firefox AI Runtime by replacing its WebAssembly-based backend with a native C++ ONNX Runtime implementation.

    This architectural shift has yielded performance gains of 2-10 times for on-device machine learning features, eliminating WASM warm-up overhead and using hardware-specific CPU instructions for faster model execution.

    Mozilla explains the WASM bottleneck it faced for Firefox

    The original architecture Mozilla used for Firefox AI features like Smart Tab Grouping and PDF.js alt-text was powered by Transformers.js, which uses onnxruntime-web—a WebAssembly (WASM) build of ONNX Runtime.

    While functional, this approach presented several performance challenges:

    • JS/WASM boundary overhead: A typical inference cycle involved multiple crossings between the JavaScript and WASM layers for pre-processing, model execution, and post-processing, introducing latency even with warm caches.
    • Generic SIMD limitations: The key hotspot, matrix multiplications, was implemented in WASM using generic SIMD. This could not compete with the performance of hardware-specific intrinsics like NEON on Apple Silicon or AVX-512 on modern Intel chips.
    • Warm-up tax: Each cold start of a feature incurred a JS/WASM warm-up penalty.

    The team had previously seen success with native code in Firefox Translations, which uses WASM built-ins to call into C++. However, attempting to port the huge number of ONNX operators one-by-one was deemed “unmaintainable.”

    Native C++ integration strategy

    Mozilla opted for a full backend replacement for Firefox AI, made feasible by the “tiny surface” through which Transformers.js interacts with the ONNX Runtime.

    The migration involved three main steps:

    1. Vendor ONNX Runtime C++ directly into the Firefox tree.
    1. Expose the C++ library to JavaScript via a thin WebIDL layer.
    1. Wire Transformers.js to the new native backend.

    This approach ensured the change was completely transparent to the feature-level code, which still simply calls await pipeline(…).

    To avoid bloating the main repository and slowing down builds, the ONNX Runtime source was not added in-tree. Instead, a configuration flag allows a pre-compiled version of the library to be downloaded from Mozilla’s Taskcluster CI system.

    This required some upstream patches to ONNX to support building without exceptions and RTTI, and the build configuration was set to MinSizeRel with LTO to balance binary size and speed.

    Quantifiable Firefox AI Runtime performance gains

    Mozilla says the switch to native C++ yielded immediate benefits for Firefox AI features:

    • PDF.js Alt-Text: The image-to-text model saw its latency fall from 3.5 seconds to just 350 ms on the same hardware.
    • Smart Tab Grouping: For the topic model, cold start latency dropped from 1920.9 ms (WASM) to 532.2 ms (ONNX native). Warm inference time was reduced from 31.4 ms to 19.2 ms.
    Mozilla Firefox AI runtime comparison between cold start latency and inference time between WASM and ONNX native.

    This new backend is being gradually rolled out, starting with Smart Tab Grouping in Firefox 142.

    Mozilla’s future Firefox AI Runtime optimisation roadmap

    With the C++ API now directly accessible, Mozilla is planning several further optimisations:

    • Multi-threading DequantizeLinear: A patch has been developed to parallelise this frequently single-threaded operation across multiple cores, resulting in “an almost linear speedup.”
    • Optimising matrix transposition: Naive nested for-loops are being replaced with a “multi-threaded cache-aware tiled transposition scheme” that uses SIMD, speeding up the operation by a supra-linear factor.
    • Caching the compiled graph: For large models, compiling the model graph can take up to five seconds on every launch. Mozilla plans to cache the compiled graph to eliminate this start-up cost.
    • GPU acceleration: The next major step is to integrate GPU-accelerated ONNX backends. This is a huge undertaking, as it “demands additional sandboxing to safely and securely interact with the underlying hardware.”

    What is most interesting about this migration is how the Mozilla team delivered such a performance improvement while migrating Firefox AI features gradually and in complete isolation from the feature code itself. This architectural success not only makes current ML-based features more responsive and accessible to a wider audience but also establishes a solid foundation for the even more ambitious optimisations planned for the future.

    (Photo by Rubaitul Azad)

    See also: State of Python 2025: Web development makes a comeback

    Banner for TechEx events.

    Want to experience the full spectrum of enterprise technology innovation? Join TechEx in Amsterdam, California, and London. Covering AI, big data, cybersecurity, IoT, digital transformation, intelligent automation, edge computing, and data centres, TechEx brings together global leaders to share real-world use cases and in-depth insights. Click here for more information.

    TechHQ is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Admin
    • Website

    Related Posts

    Aqua Labs Launches $20 Million Startup Support Program, Calling For Founders Worldwide

    How to watch ‘It’s the Great Pumpkin, Charlie Brown’ for free on Apple TV+

    Stripe’s Former CTO Rahul Patil Joins Anthropic as New Tech Leader

    US government shutdown seen dragging into next week

    Add A Comment
    Leave A Reply Cancel Reply

    Editors Picks

    Judge reverses Trump administration’s cuts of billions of dollars to Harvard University

    Prabowo jets to meet Xi in China after deadly Indonesia protests

    This HP laptop with an astonishing 32GB of RAM is just $261

    Top Reviews
    9.1

    Review: Mi 10 Mobile with Qualcomm Snapdragon 870 Mobile Platform

    By Admin
    8.9

    Which LED Lights for Nail Salon Safe? Comparison of Major Brands

    By Admin
    8.9

    Review: Xiaomi’s New Loudspeakers for Hi-fi and Home Cinema Systems

    By Admin
    Sg Latest News
    Facebook X (Twitter) Instagram Pinterest Vimeo YouTube
    • Get In Touch
    © 2025 SglatestNews. All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.