So, I have been writing a science paper (more of a book than anything else) on the subject of computer vision, and approximating 3D scenes through still images. I have almost all the algorithms down and working, except 1 piece.

I need to know how I could possibly calculate the normals of a mesh through a screen space approximation. Others seem to have done this before, but I can't find a decent explanation of how. Can someone link me to a few papers, or perhaps even just explain it to me?

As soon as I finish this, I'll be able to finish my my paper (book).

PS:

Not entirely sure if this should go under AI, since its more graphics than computer vision.

If you have two images from different viewpoints then you can reconstruct depth and from depth you can integrate depth to get normal information.

If you have one image and you can determine the light direction (via shadows) then you can determine normals via n.l*p = I (p is albedo and I is intensity of the image)

If you have multiple images from one (the same) viewpoint and multiple known light directions then you can reconstruct the light direction using the previous technique and a least squares regression. Additionally, you can use expectation maximization or other non-linear solvers. The term to search for is photometric stereo.

Lastly, if you can have user input and one image and have the user pick highlights then you can try to determine the light direction and then reconstruct normals.

If you have one image and you can determine the light direction (via shadows) then you can determine normals via n.l*p = I (p is albedo and I is intensity of the image)

I do like how that sounds, since one of the algorithms that I developed finds shadows within the image, and parents it to a light source. Then, from that I can find an estimated light direction.

Do you mind elaborating on the technique you are explaining? Perhaps provide some links

I appologize, this is the worst case you can find yourself in. In general, the solution is underdetermined because a gradient has two components for a surface and you have one equation. If you can find two highlights in your image (from different light sources) then its very easy to solve. General photometric stereo techniques require at minimum two equations. Intuitively, this means the normals can take any isotropic rotation and give the same lighting intensity. For example, imagine a ball lit given an intensity of .747 any normal that has a rotation of 45 degrees from the +Z axis would satisfy this equation.

However, that doesn't stop an algorithm from working. Given enough ingenuity and some user input you can still solve it. There has been published algorithms that do accomplish what you are looking for but its a guided process and generates depth. From depth, it's easy to get back to normals. If you are still looking to go this way then let me know and I'll dig up the paper that does this when I get home from work.

Do you have any other information? If you are working with computer vision then typically you have either 3d information or at least depth?

If you have one image and you can determine the light direction (via shadows) then you can determine normals via n.l*p = I (p is albedo and I is intensity of the image)

I do like how that sounds, since one of the algorithms that I developed finds shadows within the image, and parents it to a light source. Then, from that I can find an estimated light direction.

Do you mind elaborating on the technique you are explaining? Perhaps provide some links

I appologize, this is the worst case you can find yourself in. In general, the solution is underdetermined because a gradient has two components for a surface and you have one equation. If you can find two highlights in your image (from different light sources) then its very easy to solve. General photometric stereo techniques require at minimum two equations. Intuitively, this means the normals can take any isotropic rotation and give the same lighting intensity. For example, imagine a ball lit given an intensity of .747 any normal that has a rotation of 45 degrees from the +Z axis would satisfy this equation.

However, that doesn't stop an algorithm from working. Given enough ingenuity and some user input you can still solve it. There has been published algorithms that do accomplish what you are looking for but its a guided process and generates depth. From depth, it's easy to get back to normals. If you are still looking to go this way then let me know and I'll dig up the paper that does this when I get home from work.

Do you have any other information? If you are working with computer vision then typically you have either 3d information or at least depth?

-= Dave

I have no form of depth information, or 3D scene information/

My algorithm detects multi-level gradient by calculating an estimated rate of decay of each visible shadow within the room. Said being, utilizing that, I can detect where shadows overlay, or where more than one show is visible.

The final objective is a bit on the sci-fi end, but seems more and more practical every day that I work on this. I want to make a 3D scanner that can work on any existing mobile device, without any form of optical modifications, or user input. Out off all the issues that I have, the 2 largest ones are Normal Approximation without any form of depth,or 3D data, and threshold approximation, so the AI can classify whether the image contains a pattern to its interest

PS: For the time being, lets pretend performance doesn't matter

The reason why I am trying to approximate normals, is because ambient occlusion requires it. My idea is that, since AO gives depth perception to video games, and special effects, why cant it give computer vision applications depth perception? I think it may come down to a matter of just solving for X

My algorithm detects multi-level gradient by calculating an estimated rate of decay of each visible shadow within the room. Said being, utilizing that, I can detect where shadows overlay, or where more than one show is visible.

The final objective is a bit on the sci-fi end, but seems more and more practical every day that I work on this. I want to make a 3D scanner that can work on any existing mobile device, without any form of optical modifications, or user input. Out off all the issues that I have, the 2 largest ones are Normal Approximation without any form of depth,or 3D data, and threshold approximation, so the AI can classify whether the image contains a pattern to its interest

PS: For the time being, lets pretend performance doesn't matter

The reason why I am trying to approximate normals, is because ambient occlusion requires it. My idea is that, since AO gives depth perception to video games, and special effects, why cant it give computer vision applications depth perception? I think it may come down to a matter of just solving for X

If you have a mobile device then it can record video. Video can w/out a doubt reconstruct 3d surfaces. Let me know if you're interested in this or if you want to stick with the static one picture approach.

My algorithm detects multi-level gradient by calculating an estimated rate of decay of each visible shadow within the room. Said being, utilizing that, I can detect where shadows overlay, or where more than one show is visible.

The final objective is a bit on the sci-fi end, but seems more and more practical every day that I work on this. I want to make a 3D scanner that can work on any existing mobile device, without any form of optical modifications, or user input. Out off all the issues that I have, the 2 largest ones are Normal Approximation without any form of depth,or 3D data, and threshold approximation, so the AI can classify whether the image contains a pattern to its interest

PS: For the time being, lets pretend performance doesn't matter

The reason why I am trying to approximate normals, is because ambient occlusion requires it. My idea is that, since AO gives depth perception to video games, and special effects, why cant it give computer vision applications depth perception? I think it may come down to a matter of just solving for X

If you have a mobile device then it can record video. Video can w/out a doubt reconstruct 3d surfaces. Let me know if you're interested in this or if you want to stick with the static one picture approach.

-= Dave

I think I wanna stay with the static approach, since I wouldn't have much of a science paper if I didn't (Mainly because, I wanna do something new, and extremely challenging)

Um, I can see from where the video suggestion comes. One of the methods mentioned requires multiple images, and an user who's holding a phone is not going to have a steady aim (unlike e.g. a tripod), especially not when pressing the button. So instead of taking a photo, you could take a few consecutive frames of video and use them for the algorithm. The user would still probably think it's just like taking a pic since the amount of time is very short =P

Alternatively you could take e.g. a pic when the user presses the button and a pic when the user releases it. Both pics would be from different viewpoints and could achieve the same result.

Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.