I have been learning Reinforcement Learning for about two weeks. Although haven’t go through all the course of Arthur Juliani, I had been able to write a small example of Q-learning now.
This example is about using DNN for Q-value table to solve a path-finding-problem. Actually, the path is more looks like a tree:

Google has published their quantization method on this paper. It use int8 to run feed-forward but float32 for back-propagation, since back-propagation need more accurate to accumulate gradients. I got a question right after reading the paper: why all the performance test works are on platform of mobile-phone (ARM architecture)? The quantization consequences of model in google’s method doesn’t only need addition and multiplication of int8 numbers, but also bit-shift operations. The AVX instruments set in Intel x86_64 architecture could accelerate MAC (Multiplication, Addition and aCcumulation), but couldn’t boost bit-shift operations.

To verify my suspicion, I wrote a model with ResNet-50 (float32) to classify CIFAR-100 dataset. After running a few epochs, I evaluate the speed of inference by using my ‘eval.py’. The result is:

1

Time:5.58819s

Then, I follow these steps to add tf.contrib.quantize.create_training_graph() and tf.contrib.quantize.create_eval_graph() into my code. This time, the speed of inference is:

1

Time:6.23221s

A little bit of disappointment. Using quantized (int8) version of model could not accelerate processing speed of x86 CPU. May be we need to find other more powerful quantization algorithm.

4. Problem: When select ‘Tools’–>’Check Spelling…’ in texStudio, it report “No dictionary Available”.
Answer: Download english dictionary from https://extensions.openoffice.org/en/download/1471, change suffix from ‘oxt’ to ‘zip’ and unzip it. In ‘preferences’ of texStudio, set dictionary path to the unzip directory. (ref)

This week, I was trying to train two deep-learning models. They are different from my previous training job: they are really hard to converge to a small ‘loss’.

The first model is about bird image classification. Previously we wrote a modified Resnet-50 model by using MXNet and could use it to reach 78% evaluation-accuracy. But after we rewrote the same model by using Tensorflow, it could only reach 50% evaluation-accuracy, which seems very weird. The first thing that in my mind is that it’s a regularization problem, so I randomly pad/crop and rotate the training images:

By data augmentation, the evaluation accuracy rise to about 60%, but still far from the result of MXNet.
Then I change the optimizer from AdamOptimizer to GradientDescentOptimizer, since my colleague tell me the AdamOptimizer is too powerful that it tends to cause overfit. And I also add ‘weight_decay’ for my Resnet-50 model. This time, the evaluation accuracy shrived to 76%. The affection of ‘weight_decay’ is significantly positive.

The loss is rapidly reducing from hundreds to twelve, but stay at eleven for a very long time. The loss looks like will stay here forever. Then I begin to adjust hyper-parameters. After testing several learning rates and optimizers, the results doesn’t change at all.
Eventually, I noticed that the loss doesn’t stay forever, it WILL REDUCE AT LAST. For some tasks such as classification, its loss will converge significantly. But for other tasks such as object detection, its loss will shrink at extremely slow speed. Use AdamOptimizer and small learning rate is a better choice for this type of task.

Yesterday I wrote a Tensorflow program to train CIFAR100 dataset with Resnet-50 model. But when the training begin, I saw the ‘loss’ of classification is abnormally big and didn’t reduce at all:

1

2

3

4

5

loss[2.6032338e+25]

loss[2.5617402e+25]

loss[3.3851871e+25]

loss[3.092054e+25]

...

Firstly, I thought the code for processing dataset may be wrong. But after print out the data in console, the loading input data seems all right. Then I print all the value of tensors right after initialization of model. And these value seems correct either.
Without other choices, I began to check the initializer in Tensorflow code:

To use Resnet-50 to run CIFAR100 dataset, I wrote a program by using Tensorflow. But when running it, the loss seems keeping in about 4.5~4.6 forever:

1

2

3

4

5

6

7

8

9

10

11

step:199,loss:4.61291,accuracy:0

step:200,loss:4.60952,accuracy:0

step:201,loss:4.60763,accuracy:0

step:202,loss:4.62495,accuracy:0

step:203,loss:4.62312,accuracy:0

step:204,loss:4.60703,accuracy:0

step:205,loss:4.60947,accuracy:0

step:206,loss:4.59816,accuracy:0

step:207,loss:4.62643,accuracy:0

step:208,loss:4.59422,accuracy:0

...

After changed models (from Resnet to fully-connect-net), optimizers (from AdamOptimizer to AdagradOptimizer), and even learning rate (from 1e-3 to even 1e-7), the phenomena didn’t change at all.
Finally, I checked the loss and the output vector step by step, and found that the problem is not in model but dataset code:

Python

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

defnext_batch(self,batch_size=64):

images=[]

labels=[]

foriinrange(self.pos,self.pos+batch_size):

image=self.data['data'][self.pos]

image=image.reshape(3,32,32)

image=image.transpose(1,2,0)

image=image.astype(np.float32)/255.0

images.append(image)

label=self.data['fine_labels'][self.pos]

labels.append(label)

if(self.pos+batch_size)>=CIFAR100_TRAIN_SAMPLES:

self.pos=0

else:

self.pos=self.pos+batch_size

return[images,labels]

Every batch of data have the same pictures and same labels! Than’t why the model didn’t converge. I should have used ‘i’ instead of ‘self.pos’ as index to fetch data and labels.

So in DeepLearning area, problems comes not only from models and hyper-parameters, but also dataset, or faulty codes…