https://github.com/matthewfl/redmagic
This is a newly published repository from an experiment that I started during the summer to attempt to optimize Python’s runtime without impacting compatibility with C models and existing libraries.
There currently exists frameworks such as PyPy and Truffle/Graal which will generate a JIT when you design your programming language inside of their framework and use their compiler to generate a custom binary which implements a JIT for a given language. One problem with these frameworks is that they require that an existing language is reimplemented essentially from scratch using a subset of Python or Java respectively. Additionally, once a programming language is reimplemented, any existing modules which interface with internal aspects of the interpreter (any python C module) will not be compatible and will have to be rewritten.
Redmagic is similar to PyPy and Truffle/Graal in that it tries to be a framework for creating a JIT however it is different in that it tries to work with an existing C or C++ interpreter requiring only a few annotations inside the code to identify loops at the user level of the program. (cPython example) It then generates traces of a user’s program by switching between letting the language’s interpreter run normally and on top of a virtual machine which records all instructions and branching directions. Unlike other JITs it does not reorganize how memory is laid out and even goes as far as simulating pushing and popping of C level stack frames. This means that at any point while running inside the virtual machine or inside the generated code, the system can resume the “normal” code simply by jumping to the corresponding instruction in the original interpreter. This does come with the downside that it inherits all memory layout and pointer dereferencing performance issues that were present in the original interpreter.
Results: After much work on Redmagic, I have observed very slight (<2%) performance improvements in limited cases when working with cPython. The issues of memory layouts (explored further in this post) seem to contribute significantly to Python’s performance issues and those are not addressed at all by the Redmagic implementation. Additionally, it is not clear that these would be easy to address from Redmagic itself given that it is looking at a program from the level of x86 assembly instructions vs higher level representations where data-structures can be easily recovered. I believe that there might be cases, possibly in math routines, where memory layouts have already been optimized and possibly allowing for more clever uses of branching could prove beneficial to performance.