there's no real exact science to it, but i think a lot of it relies on really good spotting & trusting the material you're scoring
for those that haven't done it before: spotting is basically the process of watching or playing through something without music to get a proper idea of how it feels & where music would be appropriately deployed. during this process, you get a real sense for where everyone's instincts are for using music to enhance or telegraph.
it's usually less common in games, because unless you're making a super cinematic game (which i don't usually work on), things generally are split up into....area or level bgm, combat themes, character themes, dialogue mood tracks, etc. a lot of the "spotting" comes on the implementation side where you are choosing when things come in, go away, dynamically change, etc... it's a huge part of why having a composer be a part of the implementation is extremely important if you plan on doing effective things with the music & not just, wallpapering everything with loops.
my personal inclinations are that if a musical cue is matching the emotion and tone of a scene (or level, or what have you) then it can't come first. some people would disagree with me, but it's sort of my base instinct for 'trusting' the material. think of the strongest emotional moments in films: does the music come in first, letting you know that something is going to be sad/scary/chaotic before it actually is? or does it let the moment happen, and then bolster the mood? this is why i think dynamic music in games is great, because it can be a response to a player's actions & still be strong scoring.
that's sort of a basic example though. there's a ton of moments where the action & music cue on time to collaboratively make an extremely cool moment. but that takes coordination & again, trust, that everyone's on the same page.
another really effective technique, especially in games where music tends to play more often than films, is simply taking the music away. the lack of that bed of sound immediately creates a tonal shift that lets the material take control without the music telling you what to think. often times, removing the music lets you be even more emotional than if a track immediately came in that was sad, or shocking, or terrified.
there's also the fun technique of using contrasting music to score a scene and completely change the mood. i think this is great because it's purposely using those expectations for creative effect. there's totally correct times to play into a trope or commonly understood shorthand in music.
anyway that's a short list of my thoughts!

