Ok this is a lot harder than you think. And the results will not be so good.

First you need a speech recognition shield, these are by and large crap and give you many false triggering and they need training to recognise your voice.Then you want the speech output which is best done by using a wav shield https://www.adafruit.com/product/94

Then because you have servos the sound output will prevent regular updates of your servo signal the servos will jitter. Solve this by having an external board generate your servo signal.