Main menu

Creating photorealistic images with neural networks and a Gameboy Camera

In 1998 Nintendo released the Gameboy Camera. With this camera, it was possible to take images in a resolution of 256×224 pixels (or 0.05734 megapixels). The screen resized your image to 190×144 pixels and shows it in 4 shades of gray/green. Despite these limitations images you took are recognizable for us humans. In this post, I show my adventures in creating photorealistic camera images using Deep Neural Networks!

Inspiration

Recently several applications of convolutional neural networks have been discovered. Examples are super-resolution (upscaling an image without loss), coloring (from grayscale to RGB), and removing (JPEG) compression artifacts. Other examples are real-time style transfer and turning sketches into photorealistic face images, discovered by Yağmur Güçlütürk, Umut Güçlü, Rob van Lier, and Marcel van Gerven. This last example inspired me to take Gameboy camera images of faces and turn them into photorealistic images.

Summary of my experiment

Back in 1998, the Gameboy camera got the world record as “smallest digital camera” in the Guinness book of records. An accessory you could buy was the small printer you could use to print your images. When I was 10 years old we had one of these cameras at home and used it a lot. Although we did not have the printer, taking pictures, editing them, and playing minigames was a lot of fun. Unfortunately, I could not find my old camera (no colored young Roland pictures, unfortunately), but I did buy a new one so I could test my application.

In the end the result turned out very good. The generated images are really great. Although we trained on a small part of the face even pictures of whole heads seem to turn out nice. The following images are images the network has never seen before. The image in the center is the “real image”, the image on the right is generated by our neural network. Note that even skincolor is accurate most of the times.

These are some images that were shot with the gameboy camera, and uploaded by random people.

And… My face, shot with a gameboy camera.

In this blog post, I will guide you through my progress of this project. Some boring parts are hidden but can be expanded for the full story. With the code, you should be able to replicate the results. A Git repository can be found here.

Training data

Unfortunately, there is no training data set with Gameboy-camera images of faces together with the real picture of the person. To create a dataset I made a function that takes an image as input and creates an image with 4 shades of black. The shade is based on the mean and standard deviation of the image, to make sure that we always use 4 colors. If you look at original Gameboy camera images you can see that they create gradients by alternating pixels to give the illusion of more colors. To immitate this I simply added noise on top of my original images. Note that if you want to experiment you can change the apply_effect_on_folder function to create images from sketches instead of Gameboy camera images.

View code

In [1]:

%matplotlib inline
importtensorflowastfimportnumpyasnpimportmatplotlib.pyplotaspltimportIPython.displayasipydimportosimportrandomimportpickleimportcv2fromsklearn.decompositionimportPCAfromlibsimportvgg16# Download here! https://github.com/pkmital/CADL/tree/master/session-4/libsfromlibsimportgiffromlibsimportutilsIMAGE_PATH="/home/roland/img_align_celeba_png"# DOWNLOAD HERE! http://mmlab.ie.cuhk.edu.hk/projects/CelebA.htmlTEST_IMAGE_PATH='/home/roland/workspace/gameboy_camera/test_images'PICTURE_DATASET=os.listdir(IMAGE_PATH)PREPROCESSED_IMAGE_PATH="/home/roland/img_align_celeba_effect"PROCESSED_PICTURE_DATASET=os.listdir(PREPROCESSED_IMAGE_PATH)IMAGE_WIDTH=96IMAGE_HEIGHT=96COLOR_CHANNEL_COUNT=3NORMALISE_INPUT=Falsedefload_random_picture_from_list(image_names,path):index_image=random.randint(0,len(image_names)-1)name_picture=image_names[index_image]path_file=os.path.join(path,name_picture)image=plt.imread(path_file)returnimagedefrgb2gray(rgb):returnnp.dot(rgb[...,:3],[0.299,0.587,0.114])defadd_sketch_effect(image):img_gray=cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)img_gray_inv=255-img_grayimg_blur=cv2.GaussianBlur(img_gray_inv,ksize=(5,5),sigmaX=0,sigmaY=0)img_blend=dodgeV2(img_gray,img_blur)ret,img_blend=cv2.threshold(img_blend,240,255,cv2.THRESH_TRUNC)returnimg_blenddefadd_gameboy_camera_effect(image):img_gray=cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)mean=np.mean(img_gray)stdev=np.std(img_gray)random_noise=np.random.random_sample(size=img_gray.shape)/10img_gray+=random_noiselowest=img_gray<(mean-stdev)second=(img_gray<(mean))third=(img_gray<(mean+stdev))highest=(img_gray>=0.0)-thirdpallet=np.zeros(img_gray.shape,dtype=np.float32)pallet[highest]=1.0pallet[third]=0.66pallet[second]=0.33pallet[lowest]=0.0returnpalletdefdodgeV2(image,mask):returncv2.divide(image,255-mask,scale=256)defburnV2(image,mask):return255-cv2.divide(255-image,255-mask,scale=256)defresize_image_by_cropping(image,width,height):"""Resizes image by cropping the relevant part out of the image"""original_height=len(image)original_width=len(image[0])start_h=(original_height-height)//2start_w=(original_width-width)//2returnimage[start_h:start_h+height,start_w:start_w+width]defapply_effect_on_folder(name_input_folder,name_output_folder):picture_names=os.listdir(name_input_folder)i=0forname_pictureinpicture_names:i+=1ifi%250==1:print(i)print(len(picture_names))path_file=os.path.join(IMAGE_PATH,name_picture)image=plt.imread(path_file)image=resize_image_by_cropping(image)effect=add_gameboy_camera_effect(image)write_path_original=os.path.join(name_output_folder,name_picture+".orig")write_path_effect=os.path.join(name_output_folder,name_picture+".effect")np.save(write_path_original,image)np.save(write_path_effect,effect)defload_names_images():names_images=[a[:6]forainPROCESSED_PICTURE_DATASET]names_images=list(set(names_images))orig=[a+".png.orig.npy"forainnames_images]effect=[a+".png.effect.npy"forainnames_images]returnlist(zip(orig,effect))defnormalise_image(image,mean,stdev):normalised=(image-mean)/stdevreturnnormaliseddefnormalise_numpy_images(images,mean,stdev):returnnp.array([normalise_image(image,mean,stdev)forimageinimages])defdenormalise_image(image,mean,stdev):returnimage*mean+stdevclassPreprocessedImageLoader:defget_random_images_from_set(self,count,names_images):Xs=[]Ys=[]random.shuffle(names_images)names_images_batch=names_images[:count]forname_imageinnames_images_batch:index=random.randint(0,len(names_images)-1)name_orig=os.path.join(self.path,name_image[0])name_effect=os.path.join(self.path,name_image[1])Xs.append(np.load(name_effect))Ys.append(np.load(name_orig))returnnp.array(Xs),np.array(Ys)defget_train_images(self,count):returnself.get_random_images_from_set(count,self.trainimage_names)defget_test_images(self,count):returnself.get_random_images_from_set(count,self.testimage_names)def__init__(self,path,image_names,trainsplit_ratio=0.8):asserttrainsplit_ratio>0.0asserttrainsplit_ratio<1.0self.path=pathself.trainimage_names=names_images[:int(trainsplit_ratio*len(image_names))]self.testimage_names=names_images[int(trainsplit_ratio*len(image_names)):]#apply_effect_on_folder(IMAGE_PATH,PREPROCESSED_IMAGE_PATH)names_images=load_names_images()imageloader=PreprocessedImageLoader(PREPROCESSED_IMAGE_PATH,names_images)source_x,test_y=imageloader.get_test_images(10)fig=plt.figure()plt.subplot(121)plt.imshow(source_x[0],cmap='gray')plt.subplot(122)plt.imshow(test_y[0])plt.show()

On the left you see the image I generate from the right image for the train and test data.

As you can see the random noise on top of the image creates the “gradients” you see in the gameboy camera images that give the illusion of more than 4 colors. Note that a downside of the crop function I programmed is that most of the background of the images is not really visible (even parts of the chin are hidden). This might give problems later if we try to feed the network images with more background.

Data preprocessing

The preprocessing step of the project is normalising the input images. Hidden is the code that loads 30.000 training images and calculates the mean and standard deviation of the gameboy images and the original images. Because it looks cool, this is the mean of the input and of the output:

Helper functions

To ease the programming I created several helper functions that create layers in my graph. There are two ways of “deconvolving” an image: either using the conv2d_transpose operation of Tensorflow, or resizing the image and doing a normal convolution after that.
During early experiment I used the conv2d_transpose function, but this resulted in images with strange patterns:

Loss functions

In the sketch-to-photorealistic-image paper and the real-time style transfer paper, the authors use three different loss functions: pixel-loss, content-loss, and a smoothing-loss. By using a pixel-loss you “teach” the network that the resulting colors are important. Unfortunately, using only this loss function gives very blurry images. With the content loss, you indicate that the image has to have the same “features” as the output image. The smoothing-loss gives a small penalty for neighboring pixels with big differences. I implemented the same three loss functions for my project.

The network

The network consists of three convolutional layers for scaling the image down, two residual layers for adding/removing information, and three deconvolution layers for scaling the image back to the size we want it to be. There are some differences with the network described in the paper “Convolutional Sketch Inversion”. One thing I ignored is the batch normalisation layer. Although it is easy to add my network, this network already trained fast enough. Another difference is using only two residual layers, this is mostly because of my lack of computing power.

Results

After explaining the image generation methods and network, it is time for running the network! It is really interesting to see the output of the network over time. You can see the generated image with the first image of my ‘test set’ at several moments during training. During training every 100 steps the same test image is converted to a photorealistic image. By creating a GIF out of these images you can see the progress of the network.

Interesting is that in the first iterations the skin colour of the generated image is off, but after seeing 9600 people the network learned the right colour. In the GIF it is interesting to look at the eyes and eyebrows who keep getting sharper and sharper (due to the content loss). In the sped up version gif you see lots of changes keep happening during time, indicating that maybe I could have benefited from a lower learning rate.

Testdata

To test my algorithm I tried to convert the following data using the trained network:

testdata from the celebrity dataset

images from people I found using Google Images by typing in “gameboy camera”

I was impressed with how well these images turned out given that they do not follow the pattern as the train images. Even though the eyes are on a different spot, and a larger area was cropped around the face, I think the network created pretty good images.

Images from the gameboy itself

When trying to display an empty animation the gameboy camera has several faces it can display warning you that you have to create an animation first. I took two of these faces and tried colorizing them.

Images I took

A big problem trying to create color images from my own face was getting them off the gameboy camera. Buying the camera was easy, but finding a gameboy printer was impossible. Although somebody made a cable to put the images through the link cable on your pc, this also was impossible to find. What was left was the great method of taking images of the screen. A problem with this approach is that the lighting is always a lot off. As our network is trained on images that have equal lightning this posed a bit of a problem. This was a problem that was not easy to solve, and we have to do with colored images from noisy input.

In the end I’m quite happy with these results. The train images always have exactly 4 shades of black, and very specific noise patterns. In the self-made images the light is totally off, what should be white is pretty dark, and what should be dark is way too white. Especially in the last image you see a brightness gradient over the image. Yet, our algorithm is able to create a pretty good-looking face!

Output last deconvolution

To see what the network “learned” the activations in the last layer can be visualised.

The output of the principal components is both interesting and a bit obvious. The network learns to encode the skin, hair and background of the input images (just like we seen before).

Interesting observations/lessons learned

During this project I learned a lot of lessons. The lesson about different deconvolution layers is something I already described above. Another interesting lesson is that I started with normalising the output of the neural network.This yielded nice results early in training (outputting only zeros is already a good guess), but later this network had a lot of problems. The output of a barely trained network can be seen below. Unfortunately faces that were far away from the norm (i.e. people with hair in front of their face, sunglasses, people looking sideways) became blurry.

One question I asked myself was: how does this task compare to coloring sketch images? The details of the face are very blurry, but the outline of face details is still preserved. Because the areas between features are filled with 4 colours, the network has more grasp on what the resulting colour should compared to the line sketch problem. One interesting thing is that this network gives the right skincolor to people most of the time.

If you have other ideas for styles to convert from, or other things you would like to try, let me know. I am always willing to answer your questions. If you enjoyed reading this, please leave a comment or share this post to others who might be interested.

Update: somebody wanted to try out my network, and sent me an image taken with his gameboy camera. This is the result:

The colorized result of an image sent to me taken with his Gameboy Camera