For example, if a lyric contains “that you”, it ends up like “thatchoo”. One example of this I can think of is in Karma by Taylor Swift (I know, I know, but it’s one of the most popular songs I listen to). The line where she sings “Karma’s a relaxing thought/Aren’t you envious that for you it’s not?” sounds like “arentchoo”. It doesn’t happen every time but it seems to happen unless you’re consciously making an effort to not make that sound. An example of this is in Love Story where she sings “That you were Romeo/You were throwing pebbles”, and it sounds like if you were just talking to someone and said “that” and “you” separately.
I’m just wondering if this happens in other languages with different combinations of sounds? It probably happens with other sound combinations in English too, but this is the easiest example to think of.
In linguistics this is called a coarticulatory effect, and it’s caused by needing to move the articulators between two positions rapidly. As such, it can be thought of as a kind of “hardware” limitation of humans, as opposed to a “software” limitation of any single language. Whether other languages would have the same sounds in sequence is the main factor.
The “ch” affricate (which is t͡ʃ in the IPA) is a mix of a voiceless alveolar stop component and a post-alveolar fricative component. Because “y” is palatal, you end up getting that post-alveolar fricative component through coarticulation.
Edit: here’s an explanation without the jargon:
“t” in English is produced by your tongue contacting the ridge behind your top teeth.
“y” in this context is produced with the tongue sitting near the palate (significantly behind the ridge used for “t”).
The English “ch” sound is actually a mix of two sounds: “t” and “sh”, in rapid succession.
The “sh” sound is produced between the places where “t” and “y” are produced.
So, if you have a “t” and a “y” in quick succession, your tongue has to move quickly between a couple different spots – and crucially, through the spot which produces “sh”:
“t” -> “sh” -> “y”
And because “t” + “sh” equals “ch”, you get the “ch” when producing this sequence. You can of course articulate things carefully and not produce it – but in common, quick speech, that’s why it shows up. Singing isn’t different from speech in this regard.
Explain this for the bimbo girlies now please!
Added an edit without the ling jargon