The following images show a comparison of three modes of visual mentation all using the restricted set of 1000 percepts. The top image is the “Watching” mode where percepts are located in the same location as they are sensed. The middle image is something like “Imagery” where the position of percepts is random but constrained by the distribution of percept positions in Watching, and therefore still tied to sensation. The bottom image is a first attempt at dreaming decoupled from sensory information. Percepts are positioned randomly, but constrained by the distribution of percepts as learned by an LSTM network. The position in time and space of each percept is wholly determined by the LSTM predictive model.

What keeps this from being ‘real’ dreaming (according to the Integrative Theory) is that the sequence of distributions generated by the predictive model are seeded by every time-step in Watching (keeping them from diverging significantly from Watching). In real dreaming, one single time-step will seed a feedback loop in the predictive model to generate a sequence that is expected to diverge significantly from Watching. I think these are working quite well; the generation of positions from distributions certainly softens a lot of structure in Watching, but holds onto some resemblance. There is some literature on the possibility of mental imagery and dreaming being hazier and less distinct than external perception. I’ve also included a video at the bottom that shows the whole reconstructed sequence from the LSTM model.

Following from my previous post I’ve been investigating reducing the number of clusters in order to scale the predictor problem (for Dreaming) down to something feasible. The two pairs of images below show the original reconstructions with 200,000 clusters and the corresponding reconstructions with 1000 clusters. For more context, see this post. I’ll try generating a short sequence and see how they look in context.

In working on Dreaming, I recalculated the K-Means segment clusters (percepts) with only 1000 means (there were 200,000 previously). The images below show the results. It seems that when it comes to collages, the most interesting segments are the outliers (and I expect probably the raw segments). The fact that so many segments get averaged in these clusters means they end up being very small and 1000 is just not enough to capture the width and height features (hence the two very wide and very tall percepts). Clearly, the colour palette is still preserved, but that is pretty much it. The areas of colour below are so small that these images end up being only 1024px or smaller wide. These SOMs are trained over only 10,000 iterations to get a sense of what all the percepts look like together.

As filtering by area lead to such interesting results, I went ahead and split up the percepts into three groups according to percept areas. The triptych below shows all 200,000 percepts, but separated into three separately trained and differently sized SOMs. I’ve also included details of the latter two SOMs. I thought this approach would lead to more cohesion within each map, but the redundancy between the second and third images leads me to believe that 200,000 is too many clusters. Since I need to reduce the number of clusters for the Dreaming part of Watching and Dreaming, I’ll put the collage project aside until I’ve determined a reasonable max number of clusters for LSTM prediction and then come back to it.

After looking at the previous results I think the issue is that there is simply too much diversity in all 200,000 components to make an image with any degree of consistency. I’ve managed to implement code to filter image components based on pixel area. The following images and details are composed of the top 5,000 and 10,000 largest components. Due to the large size of these components, these are full size (no scaling) and suitable to large scale printing. I think the first image with 5,000 components is the most compelling. I will now look at making collages from the remaining smaller components, or a subset thereof.

Following shows the results of training over the weekend. It seems with this many inputs (200,000) and the requirement for over-fitting (the number of neurons ~= the number of inputs) we need a lot of iterations. I think this is the most interesting so far, but I also had the idea to break the percepts into sets and make a different SOM for each set. This would make each one more unified (in terms of scale) and give them very different character.

The following image is the result of a 5,000,000 iteration training run. Note the comparative lack of holes where no percepts are present. The more I look at these images the more I think they would need to be shown not as a print, but as a light-box. I wonder what the maximum contrast of a light-box would be… On the plus side, the collages seem to work best at a lower resolution (4096px square below) due to the small size of the percepts (extracted from a 1080p HD source); this would mean much smaller (27″ @ 150ppi, 14″ @ 300ppi) and affordable light-boxes. I wonder how the collages using the 30,000,000 segments will compare since they will not have soft edges and higher brightness and saturation. It will be a while before I get to those since the code I’m using is quite slow to return segment positions (17hours for 200,000 percepts) and is not currently scalable to the 30,000,000 segments.

I have been working on getting large percepts to stick in the middle so they don’t push the outer edges too much. I attempted this by explicitly setting particular neurons in the middle of the SOM with features corresponding to the largest percepts. While this worked for a smaller number of training iterations (1000) it did not seem to make any difference over a large number of training iterations. The following images show the results where large percepts are scaled down to reduce the size variance. The lack of training leads to quite a few dead spots where no percepts are located. While quite dark, the black background works better for this content. I’ve included a visualization of the raw weights and a few details.

The image above shows some early results of organizing 200,000 percepts (the same vocabulary used in “Watching (Blade Runner)“) in a collage according to their similarity (according to width, height, and colour). I’ve included a few details below showing the fine structure of the composition. The image directly below shows a visualization of the SOM that determines the composition of the work. (more…)

I have not started work on making large collages from Blade Runner clusters and segments since the residency. I ended up writing some code for my public art commission (“As our gaze peers off into the distance, imagination takes over reality…“, 2016) that arranged segments using a SOM. I did not end up using that approach in the final work, so I’m now adapting it to make collages from Blade Runner clusters and then segments.

The following image shows the colour values of each of the 200,000 clusters, in no particular order:

I’ve stalled on the ‘Dreaming’ size of the project for now realizing that changes I made for ‘Watching’ significantly impact dreaming. With 200,000 percepts and each being able to be in multiple locations for every frame, the LSTM (prediction network) would have an input vector of 5.7 million elements (Including a 19+8 position histogram for each frame). Too big for me to even build a model (at least on my hardware). I took the opportunity to rethink what I should do and came to the conclusion that I’ll need to recompute segments to downscale the LSTM input vector to something feasible. This will be about a month of computation time, so I’ve spent some time working on other projects, such as: “Through the haze of a machine’s mind we may glimpse our collective imaginations (Blade Runner)”.

I’ve been doing a lot of reading and tutorials to get a sense of what I need to do for the “Dreaming” side of this project. I initially planned to use Tensorflow, but found it too low level and could not find enough examples, so I ended up using Keras. Performance using Keras should be very close, since I’m using tensorflow as the back-end. I created the following simple toy sequence to play with:

I’ve written some code that will be the interface between the data produced during “perception”, the future ML components, and the final rendering. One problem is that in perception, the clusters are rendered in the positions of the original segmented regions. This is not possible in dreaming, as the original samples are not accessible. My first approach is to calculate probabilities for the position of clusters for each frame in the ‘perceptual’ data, and then generate random positions using those probabilities to reconstruct the original image.

The results are surprisingly successful, and clearly more ephemeral and interpretive than perceptually generated images. One issue is that there are many clusters that appear in frames only once; this means that there is very little variation in their probability distributions. The image above is such a frame where the bins of the probability distribution (19×8) are visible in the image. I can break this grid by generating more samples than there were in the original data; in the images in this post the number of samples is doubled. Due to this increase of samples, they break up the grid a little bit (see below). Of course, for clusters that appear only once in perception, this means they are doubled in the output, which can be seen in the images below. The following images show the original frame, the ‘perceptual’ reconstruction and the new imagery reconstruction from probabilities. The top set of images has very little repetition of clusters, and the bottom set has a lot. (more…)

After spending some time tweaking the audio, I’ve finally processed the entire film both visually and aurally. At this point the first half the project “Watching and Dreaming (Blade Runner)” is complete and the next steps are the “Dreaming” part which involves using the same visual and audio vocabulary where a sequence is generated through feedback within a predictive model trained on the original sequence. Following is an image and an excerpt of the final sequence.

As previously discussed, I found there was too much variation in the process that is not manifest in the final still image. I’ve thus made a video loop that shows the gradual deconstruction of the photographic image.

Following is a selection of frames from the second third of the film. I’ve also finished rendering the final third, but I have not taken a close look at the results yet. Once I revisit the sound, I will have completed the “Watching” part of the project.

I have been slowly making progress on the project, but had my attention split due to the public art commission for the City of Vancouver. I managed to run k-means on the 30million segments of the entire film and generated 200,000 percepts (which took 10 days due to the slow USB2 external HDD I’m working from). The following images show a selection of frames from the first third of the film. I’m currently working on the second third. They show a quite nice balance between stability and noise with my fixes for bugs in the code written at the Banff Centre (which contained inappropriate scaling factors size features).

As our gaze peers off into the distance, imagination takes over reality…

The relation between the world as we see it and the world as we understand it to be is all wrapped up in abstraction, the act of emphasizing some details in order to dispense with others. My artistic interest in abstraction and computation is an interest in the world as it is versus the world as we read into and conceptualize it. As our gaze peers off into the distance, imagination takes over reality… is site-specific public artwork that uses a photographic image of the surrounding site as raw material. The bottom of the image shows this imperfect photographic illusion, a proxy of objective reality, that becomes increasingly abstracted towards the top. The abstraction is a proxy for the constructive powers of imagination and is the result of a machine learning process that rearranges colour values according to their similarity. The imaginary structure presented in this image is both an emergent composition and an exposition of the implicit structure of reality. The image is a representation of place (as a physical location at the site of installation) and non-place (as an ambiguous and emerging structure under constant renewal). The viewer’s experience of the tension between ambiguity and reality mirrors the mind’s constant attempt to bridge experience and truth.