Salam mu alikum.
This is what I believe, someone please confirm or reject this. There is two cases.
1) The easy one that that the sound is divided so that one ear phone is the music the other is the sound. Which is pretty easy to do.
2) This is a very tricky case.
- How would you define music vs normal sound. What is music is the question?
My thinking is that from the sound you would filter based on how a music sounds(music patterns).
But honestly the question is what is that you want to do?Like what videos we talking about.
I personally had a bigger dream of creating a hayaa program that filters out all the haram. How that would work will be like snapchat how you can add a mustache, a cap, change your nose etc. You know if it fits a particular pattern change it.
Try both

Well music would be background sounds wouldn't it? The kind of videos I am talking about is like Youtube. It would be awesome if you could do it outside youtube too (other websites), like finally being able to watch a movie or watch a cartoon without having to have the volume really low or practically mute.