Some hellowordling on GBA resulted in a simple test application. The goal was to draw a gradient triangle and an image just for the kick of it.
However, at first I've been stupid and used floating-point math, which was very stupid considering that GBA has no FPU unit. So drawing a gradient triangle using a barycentric function took 11 seconds. Can't attach the video here though; but this https://twitter.com/sleeps_darkly/status/1637619100116721665 should contain the original 11 second generation.
Conversion to a fixed-point math through using a scale_factor dropped the generation time to 3 seconds, and then implementing simple reciprocity to account for missing hardware integer division dropped it to 1 second. Maybe it's... possible to optimize further.
So, seems that to implement anything 3D, I possibly need to "cheat" with raycasting.