Open question: why do the costs for the test data set (and the training data set) grow during learning?
If we reduce training_epochs from 1000 to 100, the costs are smaller, but the learned model is worse: