Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

3.
spoken dialogue system
natural language
understanding
state tracker
policy manager
natural language
generation
data
“Hi, do you know a good
Indian restaurant”
system
response
user act
system
act
dialogue
state
user
The central question: how to train the policy manager?
inform(food=“Indian”)
user
input
“Sure. What price range
are you thinking of?” request(price_range)

29.
deep reinforcement learning
2015 Nature paper from DeepMind introduced an RL
method based on deep learning, called DQN
main result: with same network architecture, learned to
play large number of Atari 2600 games effectively

30.
deep reinforcement learning
2015 Nature paper from DeepMind introduced an RL
method based on deep learning, called DQN
main result: with same network architecture, learned to
play large number of Atari 2600 games effectively
DQN characteristics
variation on Q-learning that uses deep neural networks to
approximate the Q function
uses experience replay to deal with non-i.i.d. samples
uses two networks (Q and Q’) to mitigate non-stationarity of
update targets

33.
deep RL for dialogue system
exact state is not observed, hence belief state is
used
belief-state spaces are typically discretized into
summary state spaces to make the task tractable
deep RL can be applied directly to the belief-state
space due to its strong generalization properties
with pre-training, a deep RL method can become
even more efﬁcient

34.
effect of pre-training
without pre-training with pre-training
[based on DSTC2 dataset]

35.
summary
RL is a data-driven approach towards learning
behaviour
RL does not require knowledge of good policy
RL can be used for online learning
combining RL with deep learning means that RL
can be applied to much bigger problems
constructing a good policy for a modern dialogue
manager is a challenging task
deep RL is the perfect candidate to address this
challenge