Data Science Stack Exchange is a question and answer site for Data science professionals, Machine Learning specialists, and those interested in learning more about the field. It only takes a minute to sign up.

I am trying to develop an rl agent using DQN algorithm.During training, the agent interacts with environment which is a simulated one.Each episode takes around 10 mins to run. This way if want my agent to train for some 1000000(to achieve convergence) episodes, its becoming computationally infeasible, Is there a way anyone is aware to speed up my training process, like using parallel threading or using cuda. Or is it something because of the algorithm?

my episode here basically is of a one day long with actions taken at every 15 mins. I am using FMIs for the simulation and to finish this 1 episode it takes 10 mins. I can reduce my episode time to few seconds by using reduced order models/equations. This decreases the episode length(time). But my question is are there ways to speed up the training process. Or there better algorithms than DQN which donot reqire this much of training.

$\begingroup$Hey, my episode here basically is of a one day long with actions taken at every 15 mins. I am using FMIs for the simulation and to finish this 1 episode it takes 10 mins. I can reduce my episode time to few seconds by using reduced order models/equations. This decreases the episode length(time). But my question is are there ways to speed up the training process. Or there better algorithms than DQN which donot reqire this much of training. Thanks$\endgroup$
– cvgSep 19 '19 at 9:31

$\begingroup$You suggest that you might want to run for 10,000,000 episodes, but do not say where you got this estimate from or why you think you need this many. If you really are looking for ways to reduce amount of data, the more formal term for this is "sample efficiency". For ideas to improve sample efficiency, someone answering would need to know what variants of DQN you are already using, some key facts about your environment (e.g. is it stochastic, what is the variance in return from a deterministic policy, state and action space size), and ideally some indication of your current sample efficiency$\endgroup$
– Neil SlaterSep 19 '19 at 9:45

$\begingroup$sorry for not making it clear, editing the question accordingly$\endgroup$
– cvgSep 19 '19 at 10:05

$\begingroup$@NeilSlater. thank you. is there a way i can speed up the training process with same amount of data?$\endgroup$
– cvgSep 19 '19 at 10:08

$\begingroup$That's a little bit clearer. But using the phrases "speed up" and "reduce the time" still makes it look like you are asking about the clock time it takes to train using DQN. Whilst what you actually seem to care about is the number of episodes that you must simulate before you can get an approximate optimal policy. And that's because your simulations are computationally expensive (looks like orders of magnitude more compute required for the simulation than any neural network training time on the produced data)$\endgroup$
– Neil SlaterSep 19 '19 at 10:10