Question

So this is from the late 90s ... http://www.cs.princeton.edu/~prc/SingingSynth.html

Why hasn't this taken off? (We can synthesize photorealistic like images, but the synthesis of singing ... still seems to be in very primitive stages).

What exactly is it that makes the synthesis of singing difficult?

http://www.interspeech2007.org/Technical/synthesis_of_singing_challenge.php <-- still seems primitive.

Was it helpful?

Solution

My feeling is that we get into the uncanny valley for sounds easier than for images. While our brain accepts a badly formed image relatively well, it does not accept a badly formed sound unless it sounds natural. Everything that does not sound perfectly unperfect sounds creepy, and this makes a very strong barrier to actual applications. It is good for announcements and telephone services, but we are a long way from totally synthetic singing.

On the other hand, modification of actual voices is daily performed, both live and in studio. Without Autotune all the "gangsta" and "lady gagas" out there would do a job more suited to their actual talent.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top