Slashdot videos: Now with more Slashdot!

View

Discuss

Share

We've improved Slashdot's video section; now you can view our video interviews, product close-ups and site visits with all the usual Slashdot options to comment, share, etc. No more walled garden! It's a work in progress -- we hope you'll check it out (Learn more about the recent updates).

An anonymous reader writes "I'm interested in converting 2D video to Stereoscopic 3D video — the Red/Cyan Anaglyph type in particular (to ensure compatibility with cardboard Anaglyph glasses). Here's my questions: Which software(s) or algorithms can currently do this, and do it well? Also, are there any 3D TVs on the market that have a high quality 2D-to-3D realtime conversion function in them? And finally, if I were to try and roll my own 2D-to-3D conversion algorithm, where should I start? Which books, websites, blogs or papers should I look at?" I'd never even thought about this as a possibility; now I see there are some tutorials available; if you've done it, though, what sort of results did you get? And any tips for those using Linux?

Exactly. People with the proper equipment and money fail at this regularly.

I think it is important to arrive at the correct mindset. This has never stopped people from snapping pix at weddings and sporting events and tourist traps, even if their pix look like garbage compared to a pro photo on a postcard or whatever.

If you want to do it for fun, heck yes go for it. Go Go Go. You don't need help just try it.

If you think you'll turn out something that means anything to anyone else in the world, you'll probably be disappointed. Insert stereotype of goans when someone wants to show you old fashioned slides of their vacation. Although that old tech is getting kind of retro cool now.

Hollywood has released big-budget movies costing tens of millions of dollars to produce, using 2D-to-3D conversion, and the results have been terrible. Hollywood may suck at coming up with original storylines and good plots, but their skills at technical effects are unequaled anywhere on earth; if they can't do it, no one else can either. The whole thing is a bad idea. If you really want 3D imagery, you need a 3D camera.

I may be nitpicking, but ugh it's important... Cameras are only 2D. For good stereoscopic output, you need a stereoscopic camera. The recent "3D movies" are really called stereoscopic in the industry, "3D" is strictly a marketing term. 2D to 3D conversion, which adds a Z-buffer (the third dimension), still outputs a stereoscopic image in the end (which would be 3D stereoscopic, being generated from a 3D model).

I'd second this. I hope that 3D is just a gimmick that falls out of style rather quickly.

Most certainly not. Converting 2D footage to 3D is a horrendous endeavor and should be stopped - or at least left alone. But well-done stereoscopic footage has an added value IMO.

Now, they've improved resolution and stereoscopic aspect. What I'd like to see next is the framerate. I find 24fps vastly insufficient to relay the feeling that "you're there", and whenever there's a big tracking shot, I find it choppy at best.

You're in luck... Peter Jackson is pushing 48fps over 24 in the cinemas, stating enough digital projectors are capable. He's shooting the Hobbit at 48fps, and shooting it in 3D from the get-go. I'm more interested in the content of the movie, but I'm expecting it'll be one of the best, if not THE best, attempts at 3D so far (Jackson Explains "Hobbit" 48FPS Shooting [goo.gl]).

He's trying to encourage future film productions to step up to 48, too.

did you know they storyboarded the hobbit in 3D? saw a bit about it, they had two storyboard artists, sitting side by side, one drew in cyan, the other in red, and they both had glasses available, so the resulting storyboards where in 3D. really kind of an amazing co-op effort on the artists part. (saw a thing about it on the production vlogs, had to break out my cyan/red sunglasses for that episode)

Someone is going to have to explain to me how theaters are going to project this, because the DCI stereoscopic standard (pretty much THE stereoscopic projection standard if you're not IMAX), used by RealD and others, stores 24 FPS stereoscopic movies as 48 FPS and alternates left/right eyes. 48 FPS stereoscopic would mean handling 96 frames per second, which is far above the capacity of most already-installed digital cinema projectors.

Now, there's plenty of automatic algorithms that already improve this in popular videoprojectors and TV sets, but I haven't experienced it first hand so I can't vouch for it.

Be glad you haven't seen that yet. Those "upscale" (for the lack of a better word for it) algorithms make the picture look absolutely horrible, much like a cheaply produced VHS Porno. Now, there's nothing wrong with pr0n, but I'm not especially a fan of the cheap 80s porn esthetic. I'd rather scratch my eyes out than watching a complete

Well, there is a way to do it, a very elegant way even. One that can be, for all purposes and intents, as good as you can get with the raw material; even to the point where the average human will not be able to tell the difference.

The thing is: That solution has a big catch. How big? Well, to put it mildly, you will most likely win the Turing Award in the process of doing so and will at some point end up with a Nobel Prize in your hand, too. As you can imagine, the solution is: Artificial Intelligence; and if you want to really do it, only strong artificial intelligence will do.

The fact is, as others have quite succinctly pointed out, that the issue is in determining what is "in front" and what is "in the background" on top of how far away everything is. This is, quite simply, impossible to do right if you approach it as a purely algorithmic picture-to-picture problem. There is just not enough information inside the frames/movie to do it well enough even at the best of times.

So, what do you do? Easy, you import external information. Things like: "This is a tree; That is a human. A tree is bigger than a human. Both take up the same space in the picture. Assumption: The human is closer than the tree. Proof: The tree casts a shadow on the human and the only light source is behind the tree. Angles point to a distance of 20 meters between human and tree. Etc. pp."

This line of reasoning imports lots of information from the outside; essential things like "What does a tree/human look like?", and "What are their relations to each other size-wise?". But if you grant that this information can be derived and used by an AI, the result can be a very precise derivation of the distances between objects.

It is exactly the same line of reasoning the human brain uses for large distances (where the parallax of your eyes is too small, focus is unimportant and difference between eye positions negligible), or when you have lost vision in one eye (or just plainly covered it). Even though your brain suddenly has only half the information, it is capable of giving you a good feeling for distance and depth.

Of course, it doesn't always work, as far too many optical illusions like the Ames-Room show, but it works significantly better than a "pure" picture-to-picture approach and is the sole reason why almost everyone here feels that 2D-3D conversions are so horrible:

Their brain tells them, that what they see just can't be correct, even if their eyes have actually seen it.

But of course, just using 2 cameras is much simpler. So good luck with (strong) AI. I would be surprised if you solved this issue all by yourself.:)

No, if you do it that way you just end up with flat cardboard scenes with a bit of depth. If I can't observe parallax, say a slight rotation of someone's head, when I switch eyes, it's not real 3d.

That's why I said you need most likely strong AI to get it really perfect. You need to know not only the relations between things (like my example; a tree vs. a human), you also need to know the relations of things to themselves, i.e. the properties of a human face.

Imagine, that instead of lifting whole objects from the 2D-plane, you lift individual pixels, just like a modern computer game calculates the lighting, texturing and shadowing on a per-pixel basis to give you things like Normal mapping [wikipedia.org].

To elaborate: imagine that you were asking for a magical algorithm to automatically colourise an arbitrary monochrome video. That's peanuts compared to adding accurate and, more importantly, pleasant looking stereo data to a 2D video, while creating said stereo data from thin air.

Not really as bad as you think. All it does is show frame n in one eye and frame n + 1 in the other, stretched (and cropped to preserve aspect ratio) a bit to exaggerate the depth. So things that do not move, they are assumed to be in the background, moving things seem to be closer. It's not as bad as you say, no resetting one key frames for example, but yes the effect is strange, often not right, as well as neat.

Ditto. I'm typically the sour grapes guy of the group that always resists seeing the newest 3-D blockbusters at the theater because I can't stand it, it looks like overly dim, out of focus crap and 9 times out of 10 I leave with a headache to boot.

I don't know. During Cyber Monday I was able to pick up a 3d television for the same price I was willing to pay for a 120hz 1080p television, so I did. I got a kickback on the glasses, so, while I know that can be a real expense, I didn't suffer from it.

Sure, it's a gimmick. But the saturday night movie with my kids has a whole new level of "excitement" for them, especially when they have sleep over guests. That alone is well worth it. But my Nvidia GTX on my PC works with it, and playing Left4Dead 2 or Ba

Given the amount of studies and remarks about the potentially dangerous effect of 'fake 3D' on the brain/vision development of youngsters, I think I'll keep this technology away from my kids for as long as I can. Sure, the occasional movie or fun-park attraction is OK, but having this in the living-room and/or gaming-computer(console) simply is asking for day-to-day use.Might well be the effects are largely overrated I prefer to play on the safe side with this. It's a fun gimmick and it does have a 3D-ish e

Currently, 3D is being used as a gimmick. However, like depth of field or lens focal length, it's an aspect of photography that can certainly be sued to enhance your storytelling. There is an intimacy about seeing characters in 3D, which can make dialogue scenes more real. Also imagine a creepy character like Hannibal Lechter stepping off the screen and into the theatre with you to invade your personal space. That would make him even more creepy. If 3D sticks around, and people shoot exclusively for it (no

you're wrong. a movie is mostly in the mind (and pure audio; ie, your cd collection, is even more so like that). imagination and the brain's ability to connect dots and extrapolate does more for the entertainment than your tech tricks.

I'm an audio guy and spend a lot on design an implementation but I realize that when a tune comes on the radio, my mind gets the same memory enjoyment from it as if it was on a high end system. the content and my extrapolation of it (on poor playback equip) completes the ex

A friend of mine used to work for a French special effects company and he had to work on this. He told me that this is basically a world of pain and it produces great piles of smocking shit. It just sucks, even when done properly by highly trained people. Can you imagine making 3D out of a 2D tree? Make every background 3D or properly cut out the character to get the desired effect?
It sucks, it's mostly manual, get over it.

This anon has it right. If you have two synchronized 2D films of the same thing from slightly different angles, then you can try to match objects in the two frames, and use that to determine the depth. You could just apply the red filter to one film, blue to the other, superimpose, and boom, it's like you're watching the two different films with your two different eyes, and if they were filmed with cameras set properly, it will actually look right.
But it sounds like you only have one 2D film. Here, the be

In theory you can expand the image to 3d by clipping each object and vertically slicing it into layers (for a tree, or horizontally for a bench) then adding 3D effect to the layers, filling in where there is gaps and overlaying where needed, for each object in each frame of the film, then composit all the objects and masks back on to the scene. Now that you've spent ~100 hours on that your first frame is done, time to do the next, but now it's harder because you need to account for motion so your clipping

It is possible. There are some algorithms that do this (semi-)automatically. Not sure how they work (perhaps using parallax from moving objects), but they do work, and I have seen the results. I came across a 3D version of one of the Star Wars movies, and I was quite impressed with the results from what is after all an automated process. The 3D in space and landscape scenes was pretty good. However closeups of talking faces revealed the weakness; the moving face confused the algorithm and the result wa

You will want to avoid the old paper red/cyan glasses and go with the slightly more expensive plastic ones that are designed for LCD monitors and TVs. Otherwise be prepared for a LOT of ghosting. Also, nvidia makes platic red/cyan glasses that are designed to fit over regular glasses. You may also need to calibrate your monitor to make sure that red is really red and cyan is really cyan.

I was personally very surprised at how well red/cyan works. Of course the colors get a little muddle, but not as mu

You can't turn a two-dimensional photograph into 3D because the original has lost all the phase information that conveys needed info (e.g., "depth"). Similarly, you can't restore 2D sound to 3D, because the essential information isn't in the source recording that you'd need to "position" all the sound sources in 3D. In general, you can go from (N+1) to (N) dimensions, but you lose information. That means you can not automatically go from (N-1) dimensions to (N) without restoring that lost information...w

Let's say you have a video camera poked out of the side window of your car, and you're driving down a road alongside a wide field. The field is sparsely populated with trees, and there are mountains far off in the background.

With the use of video in such a case, the depth information can be pretty accurately inferred from the parallax effect, due to the fact that your car (and camera) are moving along the road. It's a difficult problem, but by comparing frame with frame, an algorithm might piece a somew

This is not as easy of a problem as you decide. Unless metadata was recorded with things like the focal length of the lens, there is no way to determine distance information from a 2D image. Even with the metadata, it is still a painstaking process that involves manually isolating every object in the frame, tracking those objects through the shot, and assigning a depth map to them. You then need to determine your interaxial distance - a wider interaxial increases the depth of the 3D effect, and then finally

Actually, when done professionally, the object isolation is done automatically. It's a variation on motion compensation. As the object moves, frame-to-frame, the object edges are revealed (e.g. an edge that becomes apparent at frame N, this info can be applied for frames N-10... N+10).

As the edges and shading change, this reveals the 3D structure.

Even in a single frame, inference of object depth/texture is possible by application of [inverse] shading models.

There are a host of techniques to apply depth to a scene. Parallax from multiple camera angles is one. Vanishing point analysis is another. Prototype mapping (a human is going to be *this* shape, with *these* depths) and size of motion analysis (big motions are likely to be caused by objects closer to the camera) may also help.

However, the easiest way is to just shoot the thing in 3D in the first place.

Let's say you have a video camera poked out of the side window of your car, and you're driving down a road alongside a wide field. The field is sparsely populated with trees, and there are mountains far off in the background.
With the use of video in such a case, the depth information can be pretty accurately inferred from the parallax effect, due to the fact that your car (and camera) are moving along the road.

This is sort of how the Pulfrich effect [wikipedia.org] works anyway.
You have a camera moving sideways while pointed at a relatively static scene. Any two frames a moderate fraction of a second apart will thus effectively have been taken from slightly different parallax positions, like twin stereoscopic 3D images.

The key is when watching this film on an ordinary 2D television you view it through a pair of glasses with one lens darker than the other. Due to the way the brain processes images, the eye viewing through the

I forgot to add- if you have existing material that pans from (IIRC) right to left in the appropriate manner, you may already be able to get a 3D effect simply by viewing it through a pair of dirt-cheap Pulfrich glasses!

Clarification -- Arduino doesn't suck, just paraphrasing the unfortunate mentality of a bunch of posters on this article. It is bewildering to me that on a "news for nerds" site, people are disparaging somebody from undertaking what could turn out to be a cool tech project, even if it is known in advance that the end result isn't going to be "Avatar". And even if the best of 3D is a bomb in the theater, that doesn't mean it isn't a lot of fun to play with, as a school project, etc. I enjoyed messing with this stuff in physics lab in college.

Contra my provocative subject, Arduino is an excellent choice for serious hobbyists. And similarly, there is nothing wrong with playing around with 3D video techniques and even being willing to try rolling one's own algorithm.

Get a (homebrew friendly) life, slashdotters!

(If the OP clarifies that he's working on a big Hollywood title, I'll take this back. Until then...)

There was a recent NOVA episode about aerial photo reconnaissance during WWII. To make stereoscopic images, they'd fly the plane straight and level over the target. If they could take multiple pictures with 60% overlap, they could use two adjacent images to make one stereoscopic image that was good enough to tell a ship from a decoy.

Any motion picture where the camera pans side to side gives an opportunity to create a "3d" image. If an object moves across a still camera, you can also derive 3d information. (Also if it spins)

An interesting exercise would be to process a film, and make stereoscopic only what what can be done properly, and leave the rest flat. A scene would start out flat, then people and things would begin to jump out at you.

Also (while I seriously doubt this applies to the OP), the world of 3D will be an interesting place when the CV and AI academic communities start recognizing a bunch more objects and create the ability to much more accurately annotate and infer 3D within a scene. Currently 3D can be added by a difficult manual process, which is certainly too time intensive to do thoroughly and well for anything movie-length, hence the annoyingly partial jobs we've seen up to now. We haven't yet seen the theoretical

It depends on how they accomplished the pan. If it was by pivoting the camera on the stationary tripod, then it won't work. If it was by laterally moving the camera on a dolly or crane, then you've got something.

But that's how stereographic photos work in general, whether one is using one camera or two (or more). This wasn't a new technique.

I think the real trick were those 3-D system they smuggled in, where they could pan around the area in virtual 3-D that let them see the V-2 rockets as they were on their launch pads and get a feel [sic] for them.

this can be done easily with ffmpeg and imagemagick - you need two video sources, and from a ffmpeg script, extracting a picture sequence from both videos, one sequence from the left camera, and another from the right - with a bash script using imagemagick you will separate the colour channels from each frame: red from one camera, and green/blue from another - and having the separation done, you will join with imagemagick again the red channel picture frame from one and green/blue from another, into a new picture sequence, and when you have this sequence ready, you convert it into video again with ffmpeg - try googling for ffmpeg and imagemagick instruction arguments when coding this bash script

You dont simply want to filter the channels red vs green/blue. That creates terrible ghosting. Instead look up Dubois alforithm, its a linear projection from 6d colorspace to 3d colorspace, optimized for minimal ghosting using MSE. Finished matrices exist fro both red/cyan, green/magenta and amber/blue, available from Dubois homepage. Recently used this for a project, works great.

My blueray player can simulate 3D from any 2D source (Panasonic DMP-BDT210) although I'm not exactly sure how it does it, or how good it looks. (no 3D tv) You might be able to talk someone into connecting one up at your favorite bigbox store for you if you acted interested in buying the blueray player, and wanted a demo of its conversion capabilities. This would at least give you a firsthand idea of how it will look to see if YOU think its worth it.

The convert utility in the imagemagick package does a good job of it with still images. I'd consider dumping your frames out as a series of images, running the convert utility on them, and then re-creating your movie.

I've also thought that taking that code in convert, merging it into VLC, and setting up VLC to grab from 2 cameras at once... with enough CPU and RAM, it could be come very close to real time 3d movie.

You might want to look into luminosity based research. The brightness at each pixel may contain some information of the angle of the surface with respect to the camera and a light source. At some point that looked potentially promising. But of course the technique can fail pretty easily.
Much of the work I've seen is based on trying to figure out how our brains do this all the time. Try closing one eye, see how 3D the world still looks (better than most 3D movies). You are going to have a tough challe

1. Display 2d images on a flat panel tv facing you2. spin the display 45 degrees so that one edge is nearer to you the other edge3. That's it --notice how pixels on one side are closer to you when the ones on the opposite edge are futher away from u spetially)you display is in 3D now.

2d->3d converted media is much more likely to make people feel sick or get headaches from the video than media recorded directly in 3d. There are two reasons for this. Firstly, because you lack some information. For instance if you look at a box that is obscuring your vision of the objects behind it in the real world, each eye has different information based on its perspective. (Try looking at something with one eye, then the other, and look at what changes behind the object). 2d media will only hav

Of course, there are also the one or two gems of insight from people who know how to do _something_, or have done _something_ in the past, which refute the barrage of responses by people that don't really know how to do _something_ (which may even be ways of doing _something_ that you were about to try) which
can help you do _something_ yourself.

There is an active benchmark [middlebury.edu] of disparity estimation algorithms (full bibliography at the end of the page). Those algorithms take two pictures and estimate a depth image. From this depth image, it is possible to reconstruct the scene in 3D (but you cannot see what's beh

http://www.pnas.org/content/108/40/16849.short [pnas.org]
It turns out there is information in a single static image about the magnitude and sign of defocus (if you know something about the camera). That information is carried largely by the shape of the power spectrum, and is the reason we are able to look at a single static 2d image and say "it's out of focus." Once you know the magnitude, sign of defocus and the optics of the camera, you have depth.
Obviously there is a 2)... before 3) profit... but I think thi

Despite what some PR hustling excitables might claim, stereoscopic conversion cannot be effectively automated at this time. Do people try it? Yes. Does it generate watchable results? Sometimes by accident, yes.

The thing is, a stereoscopic conversion done painstakingly frame by frame by a highly skilled compositing artist looks pretty bad. Any automated conversion process will be orders of magnitude worse.

What you need is a ton of really excellent rotoscoping (I send my jobs out to work farms in Russia) to s

I'm waiting for the 1D to 2D algorithms to be perfected. I have this 1D sketch of the battle of Bull Run that I'd really like to get converted.
Here's the 1D version: __________________________ ________________ ___ ____________
Hopefully Slashdot doesn't get a takedown notice.
What will be really awesome is when all of these work together, so I can convert that 1D drawing into a 4D movie!

You can't just run a 2D video through an algorithm and magically get a 3D video.

You have to run the video through a compositing program (think Photoshop for video) and use that to chop and mask each scene and introduce parallax effects. Then (if your compositing program supports 3D space) you output the streams from two different virtual cameras so that you have 2 final videos that are synced and are from two different angles (one for each eye). At that point, it's trivial to encode them to whichever 3D vid

Also, if you really think you're up to the task of writing an algorithm, the place to start is reading up on all of the various SIGGRAPH research papers on image composition analysis and video processing.

The current crop of movies with 2D-to-3D conversions still took significant human and artistic effort to achieve, even though the results are mediocre. For a given frame, for every pixel in 2D, SOMETHING has to decide how far away the subject depicted must be. That is, it has to INVENT the third dimensional value. Then this value is used to calculate two new 2D frame with parallax involved.

I friend of mine (former CEO of a startup I founded) asked me to write one.

He called and kept offering more each time. I actually spent some time investigating this and decided that it was a good way to give my self a stroke.

It's hard enough implementing and getting things right when you know what to do, with 2D to 3D there isn't even a clear algorithmic method to use, few papers and no examples of a good automated conversion. DDD seems about the best.

I work in post-production, and while some of the stereo-handling algorithms are impressive from a technical point of view (like the stuff in Eyeon Dimension and The Foundry's Ocula), and while I think stereo 3D is here to stay for video games (at least after consoles add some improvements to head tracking), I doubt it will be more than a passing fad for movies. It's simply not compatible enough with human vision, even when done properly (head movements spoil the effect, the difference between convergence po

The problem isn't too hard if you are moving your camera sideways at an even speed since you could just use 1 frame for the left view, and a frame a short amount of time later for the right view. However, if the video camera is taking some unknown path then no 2 frames from the original video will in general create the correct parallax. Therefore, you would need to do a bundle adjust on the camera movement (computationally quite painful and not always reliable for arbitrary camera motion). Then comes the

I write post-production software used to do this (and it runs on Linux!). The best results I've seen involve manually breaking each shot into dozens of layers, using rotoscoping. Each set of layers is exported as masks and imported into a compositing application where the images for the layers are projected onto the masks in 3D space. In some cases they build rough 3D models and project the layers onto the respective models. Now they can add a virtual camera and render the scene from both views. Then they b

Have you tried http://www.youtube.com/editor_3d [youtube.com]
It's quite basic, and requires dual video input.
I gave it a crack and got horrible results (mainly due to bad camera setup on my end and a lack of patience, oh and I didn't have the r/b glasses, did I mention that I wasn't trying very hard either).
With a decent dual camera setup you could probably produce them quite painlessly.

Download Bundler + PMVS 3D reconstructions packages and feed video into them. Those packages are fairly stable and reliable and they give you 3D point cloud. After that convert 3D point cloud to surface - there are several packages which can do it, but I can't give any advice here - I don't know any *stable* package, all of them are research soft - memory leaks, random crashes, difficult parameters setting, compilation problems etc. If you want to learn algorithms himself that's at least year worth of math

If you look on Google under "OpenCV stereo vision" you will find links showing how the code runs. There are video examples using two web cams that run in real time at around 5 frames per second. If you record and run off line you can get reasonable playback frame rates.

This code generates a depth map for the scene, so each pixel is assigned a distance from the came

The company DDD has built hardware to do this; it "works", after a fashion. It is, indeed, incorporated into a number of recent 3D TVs.

Basically, there are a number of algorithms in the box, and it chooses the one that is most appropriate for a given sequence. If the system sees blue in the top of the frame, it assumes that it is sky, and puts it in the back. If the camera is trucking from one side to the other to generate parallax, it uses that to generate depth. If I recall correctly, there are some 2