Im developing this kinda wierd new way to control video with a joystick.

ok so say youve got 90 minutes of footage, which has to have "similarities" (remember similarity is just as epic a word as singularity, just we use

it like it isnt special.) all through it, it sorta needs to be from a single camera placement, and the action sorta is "blendable".

now i want you to go frame by frame and mark "whats happening" into say 9 different categories, which can also

be on at the same time.

Like for instance "steer left" "steer right" "shoot gun" etc.

now. with some simple system, itll sorta flick through the movie whenever you want one or more of the states, with a spacial match. (say you select categories with

a joystick?)

now to get it perfect. you use this system abbreviated HTM (hierarchical temporal memory), and it makes the following things possible right out of the naive implementation.

you can have multiple frames playing at the same time in different parts of the screen.

and you have a balance of how "flicky" the changes are, and how non interactive it is, the more "flicky" it is, the more interactive, the

less flicky the less interactive.

the more you "pool" (destroy/shrink data between regions) the more itll find similarities in differences.

a mud basher game with a sprint car, around a tube track would be good, for example. (note there shouldnt be any moving objects, like no other cars in the race, theres limitations, but a crowd would be ok.)

you could just pick the most common frame on the screen and make that have the audio responsibility.

What advantage would such a system have over animating like a normal video game does? Other than perhaps allowing for really realistic graphics -- given it's actually a spliced-together video of real events -- isn't this really just limiting what can be included and the possible inputs? The realism might also be damaged if you weren't extremely careful about maintaining exact positioning, angles, lighting conditions, etc. for each alternative, or if your blending between alternatives wasn't absolutely perfect.

It also seems like you could potentially end up with a rather enormous file-size if you elected to provide footage for a non-trivial number of events, and given you need to record live action for every option you want to include it would probably be very time consuming to create your content.

Interesting idea, but I just don't see how it would actually be useful -- if I'm understanding correctly it seems like a non-cost-effective way of producing lower-quality content that has more limitations.

you can have multiple frames playing at the same time in different parts of the screen.

We can do that pretty easily with traditional animation.

and you have a balance of how "flicky" the changes are, and how non interactive it is, the more "flicky" it is, the more interactive, the
less flicky the less interactive.

We can do that with traditional animation if we want to as well, and it's fairly trivial -- you would just need to restrict inputs or clamp to specific discreet values.

the more you "pool" (destroy/shrink data between regions) the more itll find similarities in differences.

I'm not sure if I understand this one, could you try to explain it more clearly?

Im developing this kinda wierd new way to control video with a joystick.

Is this purely conceptual at the moment, or are you actually working on an implementation? It might be easier to understand if you could show a video demonstrating the technique in action or something...

Just imagine drawing a stick man battle and be able to hook controls up to it to play it... thats what it could do.

But that's not quite it, is it? If I understand correctly you can't just draw a stick man battle from start to finish and then have it become playable, you would have to draw every possible permutation of different decision/outcome allowed in the game, unless I'm missing something?

You would therefore have an absolutely HUGE overhead in both time and effort needed to create your content as another major disadvantage to go along with the huge file-size to store the data. To create a completely non-interactive stick man battle lasting one minute you would have to create one minute worth of animation -- to add just one decision in the middle of the battle you need to create one minute and thirty seconds worth of animation -- that's an absolutely MASSIVE increase.

...but I'm still not seeing any actual advantages? What advantages do you think this technique could offer? A working prototype might help to show what you're thinking of, but even theoretically I'm just not seeing any upsides to the idea based on your present explanation.

I imagine you just keep drawing the fight, all sorts of things happening, without really catering for anything especially, except a bit of varience in the combat and that should be enough. im sure there would be some amount of frames which would then be enough for the system to work.

I only would put controls on one of the characters, with maybe left right jump duck and strike, wouldnt complicate controls more than that... the htm system would then be able to "parallel playback" the record and access any part of the movie at any part of the screen all at the same time.

It works with bitmap matching, the screen is split into a quadtree, and it access the recorded data in each segment, and finds the closest pixel match with some error allowance that rings the control state true. so there is a balance between keeping the playback solid, and splitting off the quadtree segments to obey the correct control states instead, granting interactivity whilst attempting to keep the playback as solid as possible.

it has an error fader, you adjust it to the happy medium where there is enough interactivity and its still making sense most of the time.

how do you evaluate the targeted action then? say you want to jump, what do you base you data off for searching for the frame with the "jump" animations.

Honestly this system sounds very impractical. If i understand it correctly, you are essentially mapping every possible action the user can take. I think jbadams has made it pretty clear on why it's rather impractical. However, as he has said, if you could supply a demo of it in action, it would be easier to understand what you are going for, and how you are making it work.

well... ive finished writing it, its about 1000 lines, but its not working yet.

that picture is of this bouncing smiley face game im going to try and record first.

the numbers on the right are the spacial and temporal states stored, the list is a collapsing quadtree, the final node the two 0's at the bottom, which hasnt filled yet, cause the later levels fill at a slower frequency than the first layers.

as you can see, its 1 bit monochrome and only 64x64 pixels big... (like all robotic eyes, they generally are 1 bit and only low resolution, cause they need to work so hard even in low resolution already) later on i might be able to fix this to 4 levels of brightness and more resolution for working on real video.

(at the very surface level in the quadtree, i then simply convert to full 32 bit colour.)

so the game just plays through with the code, i record it and mark control states, then if this idea works, on playback i should still be able to control it.

but its doing funny noisy things so far, ive got some problems to fix. ill be back when i get it done to tell you guys if it works or not. (its highly experimental)

when its in play back mode i get funny motions on a still output, so theres something really wrong with my code, but im positive the idea is fine.

Ive just worked out how I can do this! (im gonna have a terrible time explaining how this darn fangled thing works, but trust me its dead simple.)

Just say I keep the high frequency layer (of the quad tree) just rolling through.

Each layer of the quadtree is 2x2 pixels and 4 frames worth of video.

then the layer after I then randomize (mutate) the spacial state, to be a part of any 4 frame section (temporal state) it can be any of the 4 frames.

then I pass this to the next layer, and now I need to find spacial states that have any of these temporal states as an input into them (using the proximal link)

and this way I can nullify off permutations that werent a part of the original record, this way I can stop it from splitting off into noise!

Because there is always less states the further up the quadtree you go, it pretty much makes sense that we are going to knock a lot of noisy options out.

I keep going till I basicly hit 2 possibilities, then I randomize between them, and I get an infinite playback out of the linear record.

Past this is the breaking point, and some video might not break that nicely (your better off with a still camera position, im pretty sure) but it is guaranteed to keep it interesting and a fresh play every time.

Actually, going for 2 possibilities every frame might be a little too much ,maybe every 32nd frame would produce interesting enough results already, not sure.

If I was adding controls, I would keep finding possibilities, and stop feeding forward as soon as I didnt get the control option I needed, doing that would probably garble the screen cause that would theoretically be touch sensitive per frame. (which would be a little too good to be true)

So it gathers as far as it can with keeping to the constraints.

Its a lot like a chess algorythm, you start with many possibilities and you slowly knock them off one at a time until you get the desired result.