Nice use case.
A UA could implement a Voice IME using the type of speech navigation
depicted in the video. That is: user can speak a field label to focus on the
field, and speak a field value to fill-in the focused field. (But this
behavior would be UA-dependent, and thus I believe beyond the scope of this
group's proposal.)
What is in scope of this proposal is how a <evsp:grammar> or <reco> tag can
be used to supply grammars and other information to the recognizer. In this
respect, I believe this is remarkably similar to my <reco> proposal. For
example, your snippet of markup could also be written as:
<label for="KIDNEY_FAT_id">Kidney Fat</label>
<reco
addGrammar="builtin:input?type=range&min=0&max=1&step=0.01">
<input type="text" size="6" maxlength="4"
name="KIDNEY_FAT" id="KIDNEY_FAT_id" value=""
onchange="validateField(this, true);" />
</reco>
Or better yet, using HTML5 validation, no <reco> or <evsp:grammar> tag would
be required as the grammar is implicit.
<label for="KIDNEY_FAT_id">Kidney Fat</label>
<input type="number" min="0" max="1" step="0.01"
name="KIDNEY_FAT" id="KIDNEY_FAT_id" value="" />