it's unconscionable that every midjourney result does not come with a "fingers" jog wheel for fine tuning the output. why the fuck would anyone settle for 4 per hand
I think this actually gets to the heart of the matter as to why "generative AI" is like it is.
As nic says: actually teaching something more context on the image would be more work, both in dev time, and classification time.
And the whole point of these things is to be less work, specifically less labor to pay for. The dream is to automate creative labor so they don't have to rely on us squishy human weirdoes with our "ideas" and "wanting to pay rent".
Doing it in any other manner than "make a black box and throw compute time at it" would defeat the entire goal of the thing.

