Introduction
Mojo has been creating some noise in the Python and Data Science communities for its speed and efficiency, with impressive performance benchmarks comparable to other compiled, low-level languages. Like many data-centric organisations, we’re already heavy users of Python, and Mojo is often billed as a replacement.
Modular, the company behind Mojo, recently open-sourced the Mojo Standard Library. This is an important milestone in open-sourcing the language and shows Modular's commitment to opening up Mojo.
As Mojo is on the road to being open-source, we felt it was time to assess the language, and whether it would be a benefit to ComplyAdvantage.
Process
Within the Data Tribe, our use cases stretch data ingestion and scraping, data cleansing, model training and inference, and providing APIs for access. In order to assess Mojo, we selected three of our use cases to evaluate:
- Integrating with existing services
- Building an API
- Running inference on a pre-existing trained model
The rationale for these use cases is to answer the question: Can we use Mojo for performance-critical portions of our pipeline?
Assessment
First Impressions
Installation of the Mojo compiler was straightforward, run a script to install a custom package manager, and then use the package manager to install Mojo. Overall, it is a similar experience to installing other languages or tools. Unfortunately, the convention ended with the requirement to register with Modular. There was no information on what this data collection was being used for, and I was left with an unconfirmed suspicion that usage telemetry was being collected.
Syntax and language-wise it was a mixed bag: Mojo advertises itself as a replacement for Python. Superficially the syntax is Python-like, however, the similarities quickly evaporate as you get deeper. For example, Mojo has two types of function: def exists for Python compatibility, whereas fn enforces more rules on the language. Classes don’t exist, and replacing these are struct and trait - concepts much more familiar to Rust developers. While I'm sure it would be possible to get used to it, the differences with Python felt jarring.
Use-case 1: Integrating with existing services
If we can transform some of our internal Python services into multi-language applications, we can defer certain complexities to modules that can solve the problem faster or better. Mojo currently supports importing Python code and libraries and making calls through some helpers. The opposite is not true though, as it seems it is not possible to call Mojo code from Python as stated in their documentation.
Being unable to call Mojo code from Python leaves us with very few options, forcing us to rewrite entire services in Mojo and then integrate the critical Python sections like interaction with machine learning models. On investigation, the interaction between Mojo and our internal code hit another wall as it seems there is a poor integration of C-based modules like NumPy. For example, the only workaround currently available to work with NumPy is to install a second Python virtual environment using Conda.
Use-case 2: Building an API
Mojo can call Python code directly, so naively we expected to be able to use the existing Python FastAPI framework to build an API. This proved impossible. Further investigation found closed bug reports, with no obvious solutions or workarounds. A quick experiment determined that Mojo does not yet support code that raises exceptions or uses decorators or callables. It was easy to conclude that using an existing Python ASGI framework was not going to work.
Moving into the Mojo world, there is the lightbug-http module. It took a bit of playing around to convince the compiler to use it, as package management and library support seemed incomplete, but in the end, it created a working server and accessed the HTTP URL, scheme, headers and body from code.
Unfortunately, this success was short-lived. Lightbug is unfinished at this time and does not include request routing or body parsing. We briefly investigated Mojo’s string manipulation, in order to try and write a simple request routing table. This was an underwhelming experience driven mainly by differences with Python, the standard library felt incomplete and modern array access features like slices were missing from Mojo’s syntax.
Use-case 3: Running Inference
Modular has also created the Max engine, which is built on top of Mojo and promises significant improvements in Machine Learning inference performance.
At present, the Max engine supports only Linux and Windows (via WSL), with MacOS compatibility expected soon. As a MacOS user, we tried to install it within an AMD64 Docker image. This was unsuccessful due to the Modular auth requirement as the command could not complete the authentication process. Switching to a Linux server for testing, and following the provided guide, the setup was straightforward, as was using its Python API for testing.
Quick tests with open-source NLP models like xlm-roberta-large-ner-hrl and papluca/xlm-roberta-base-language-detection demonstrated a substantial boost in CPU inference speed, improving performance by approximately 3 to 5 times. However, we also noticed a few signs of immaturity in the engine. Ecosystem support is lacking. For instance, the Python SDK's model class does not integrate with the high-level pipeline api, requiring users to handle all the post-processing of model output logic themselves. In addition, while the model.execute function in Max seems analogous to the __call__
method in the popular Transformers library (meaning model.execute(*input)
is equivalent to model(*input)
), Max currently lacks anything similar to the generate method used by many NLP tasks like translation, summarisation and text generation tasks. Currently, this lack of features negates any potential performance benefit.
Conclusions
At this time, we won’t be proceeding with further assessments of Mojo or Max. Overall, it is still a pre-production language and ML framework. The open-source community is in its infancy and mainly consists of early adopters and enthusiasts. Parity and inter-language support with Python is lacking, package support is incomplete and the number of missing features and unmet expectations from what is headlined made the experience a disappointment. Maybe we’ll revisit in 2025 and see what’s changed!