All of my Google research gives me results for text-to-speech. This is not what I am looking for. Instead, I am wondering if it is possible to create a "voice" (or recognizable speech) from scratch with no recordings. What is the difference between voice waves and noise waves?

One system I was looking at is from the 1970's called VOSIM. (Just wrote a Java version of the basic algorithm about a month ago. It only makes simple tones, not words.) Yes, it sounds very robotic and artificial. But I think it was (is) possible to make recognizable words, especially with later iterations or with other such systems.

"What is the difference between voice waves and noise waves?"

The closest that voices come to "noise" (if taken to mean random sound waves) is the production of sibilants and fricatives (some of the consonants, like "s" or "f"). A big part of what makes them different is that these sounds are run through a resonating system (the throat and mouth) that amplifies certain frequency areas. You can learn more about this by researching "vocal formants" which also are a key to the forming of the particular timbres that are used to identify vowels.

An artificial system has to control for the timing of many components, not just the pitch of the spoken tones, but the formant transitions as well as amplitude transitions arising from plosives and stops. Getting synthesizers to do all this on the fly is quite a task--having samples is a big processing help and sounds better.

There is a good, downloadable (free) book that talks about the basics contours of speech recognition. You might want to check out. It's called "The Engineer and Scientist's Guide to Digital Signal Processing" by Steven Smith. The relevant section is in chapter 22. It may be a little dated but seems to me to be holding up well (anticipating neural net solutions). "Recognition" may be the opposite of "simulation," but if you are trying to recognize something, you need to be able to create some sort of model. And a model that can be used to match a sound can also be the used as a source for a sound, right? So, many of the issues discussed in the chapter are pertinent and should help you get a better understanding.

Speech synthesis has existed for over 30 years. I had a computer that did it from scratch in the early eighties. Back then it did sound very robotic. As Dalv pointed out, the current research is far improved from those days, so most variants sound much more realistic.