In real life you're used to moving the camera to alter the view of a certain scene, in OpenGL it's the other way around. The camera in OpenGL cannot move and is defined to be located at (0,0,0) facing the negative Z direction. That means that instead of moving and rotating the camera, the world is moved and rotated around the camera to construct the appropriate view.

A camera represents nothing but a projection view. So you transform the world to get the projection you want. The concept is weird, but it actually makes sense...in a way...I think
–
SidarOct 28 '12 at 10:52

20

+1 for a nice site I hadn't seen before - open.gl
–
MaxOct 28 '12 at 12:09

@sharethis, I have improved my answer with better explanation. Added projection perspective with math and 3D Camera (Virtual Camera) option. May be helpful for you and others.
–
Md. Mahbubur R. AamanOct 29 '12 at 5:47

2

This is not actually true, since both operations (moving the camera, or the objects in the world) are symmetric, there is no way of telling which operation has taken place. You are thinking about the world moving about the camera, but equally someone else can visualize the camera moving in reverse about the world... Okay, so you can move objects relative to each other, so one way may be more intuitive, but neither person is "wrong", both ways of visualizing the situation are good in different situations. Often it is helpful to think about both.
–
user3728501Jul 22 '13 at 21:13

To give the appearance of moving the camera, your OpenGL application must move the scene with the inverse of the camera transformation. where OpenGL is concerned, there is no camera. More specifically, the camera is always located at the eye space coordinate (0, 0, 0)

To simulate a camera transformation, you actually have to transform
the world with the inverse of that transformation. Example: if you
want to move the camera up, you have to move the world down instead.

Understanding by perspective

In the real world, we see things in a way that is called "perspective".

Perspective refers to the concept that objects that are farther away appear to be smaller than those that are closer to you. Perspective also means that if you are sitting in the middle of a straight road, you actually see the borders of the road as two converging lines.

That’s perspective. Perspective is critical in 3D projects. Without perspective, the 3D world doesn't look real.

While this may seem natural and obvious, it's important to consider that when you create a 3D rendering on a computer you are attempting to simulate a 3D world on the computer screen, which is a 2D surface.

Imagine that behind the computer screen there is a real 3D scene of sorts, and you are watching it through the "glass" of your computer screen. Using perspective, your goal is to create code that renders what gets "projected" on this "glass" of your screen as if there was this real 3D world behind the screen. The only caveat is that this 3D world is not real…it's just a mathematical simulation of a 3D world.

So, when using 3D rendering to simulate a scene in 3D and then projecting the 3D scene onto the 2D surface of your screen, the process is called perspective projection.

Begin by envisioning intuitively what you want to achieve. If an object is closer to the viewer, the object must appear to be bigger. If the object is farther away, it must appear to be smaller. Also, if an object is traveling away from the viewer, in a straight line, you want it to converge towards the center of the screen, as it moves farther off into the distance.

Translating perspective into math

As you view the illustration in following figure , imagine that an object is positioned in your 3D scene. In the 3D world, the position of the object can be described as xW, yW, zW, referring to a 3D coordinate system with the origin in the eye-point. That’s where the object is actually positioned, in the 3D scene beyond the screen.

As the viewer watches this object on the screen, the 3D object is "projected" to a 2D position described as xP and yP, which references the 2D coordinate system of the screen (projection plane).

To put these values into a mathematical formula, I'll use a 3D coordinate system for world coordinates, where the x axis points to the right, y points up, and positive z points inside the screen. The 3D origin refers to the location of the viewer's eye. So, the glass of the screen is on a plane orthogonal (at right angles) to the z-axis, at some z that I’ll call zProj.

You can calculate the projected positions xP and yP, by dividing the world positions xW, and yW, by zW, like this:

xP = K1 * xW / zW
yP = K2 * yW / zW

K1 and K2 are constants that are derived from geometrical factors such as the aspect ratio of your projection plane (your viewport) and the "field of view" of your eye, which takes into account the degree of wide-angle vision.

You can see how this transform simulates perspective. Points near the sides of the screen get pushed toward the center as the distance from the eye (zW) increases. At the same time, points closer to the center (0,0) are much less affected by the distance from the eye and remain close to the center.

This division by z is the famous "perspective divide."

Now, consider that an object in the 3D scene is defined as a series of vertices. So, by applying this kind of transform to all vertices of geometry, you effectively ensure that the object will shrink when it's farther away from the eye point.

Other Important Cases

In case of 3D Camera (Virtual Camera), camera moves instead of the world.

To get a better understanding of 3D cameras, imagine you are shooting a movie. You have to set up a scene that you want to shoot and you need a camera. To get the footage, you'll roam through the scene with your camera, shooting the objects in the scene from different angles and points of view.

The same filming process occurs with a 3D camera. You need a "virtual" camera, which can roam around the "virtual" scene that you have created.

Two popular shooting styles involve watching the world through a character's eyes (also known as a first person camera) or pointing the camera at a character and keeping them in view (known as a third person camera).

This is the basic premise of a 3D camera: a virtual camera that you can use to roam around a 3D scene, and render the footage from a specific point of view.

Understanding world space and view space

To code this kind of behavior, you'll render the contents of the 3D world from the camera's point of view, not just from the world coordinate system point of view, or from some other fixed point of view.

Generally speaking, a 3D scene contains a set of 3D models. The models are defined as a set of vertices and triangles, referenced to their own coordinate system. The space in which the models are defined is called the model (or local) space.

After placing the model objects into a 3D scene, you'll transform these models' vertices using a "world transform" matrix. Each object has its own world matrix that defines where the object is in the world and how it is oriented.

This new reference system is called "world space” (or global space). A simple way to manage it is by associating a world transform matrix to each object.

In order to implement the behavior of a 3D camera, you'll need to perform additional steps. You'll reference the world—not to the world origin—but to the reference system of the 3D camera itself.

A good strategy involves treating the camera as an actual 3D object in the 3D world. Like any other 3D object, you use a "world transform" matrix to place the camera at the desired position and orientation in the 3D world. This camera world transform matrix transforms the camera object from the original, looking forward rotation (along the z-axis), to the actual world (xc, yc, zc) position, and world rotation.

Following figure shows the relationships between the World (x, y, z) coordinate system and the View (camera) (x', y', z') coordinate system.

"In the next section, you'll use this perspective projection formula into ActionScript that you can use in your Flash 3D projects." Since there is no mention of a Flash 3D project in the original question, this makes me think you copy-pasted this from somewhere else, which is fine, if you cite your sources.
–
GillesOct 30 '12 at 15:34

@Gilles, Sorry for my mistake. I have edited my answer. I have prepared the answer studying several sources. And many thanks to you as you pointed out. :)
–
Md. Mahbubur R. AamanOct 31 '12 at 2:16

Mahbubar R Aaman's answer is quite correct and the links he provides explain the math accurately, but in the event you want a less technical/mathy answer, I'll try a different approach.

Positions of objects in the real world and the game world are defined with some coordinate system. A coordinate system gives meaning to position values. If I tell you that I'm at "100,50" that won't help you unless you know what those numbers mean (are they miles, kilometers, latitude and longitude, etc.). If they're Cartesian coordinates (the "normal" kind of coordinates), you also need to know what origin they're relative to; if I just say "I'm 100 feet to the East," you need to know "East of what," which is called the coordinate origin.

There's an easy way to think of this. You could tell someone "the train station is 3 kilometers north and 1.5 kilometers east of the southwest corner of the city." You could also tell someone "the train station is 1 mile directly north of where I am right now." Both coordinates are correct and identify the location of the same landmark, but they are measured from a different origin, and hence have different numeric values.

In a 3D application, there is generally a "world" coordinate system, which is used to represent the position of the camera and the objects in the game, measured with Cartesian coordinates with some arbitrary designer-specified origin (generally the center of whatever level or map you're playing). Other coordinate systems exist in the game, such as the Cartesian coordinate system with the camera at the origin. You can define any new coordinate system any way you want at any time you want, and this is done very frequently in 3D simulation to make things easier for the math.

The algorithm that actually renders an individual triangle onto your screen works in a particular way, and so it is not convenient to directly work with the world coordinates when rendering. The math is not really set up to deal with information like "the object is 100 units to the right of the center of the world." The math instead wants to work with "the object is directly in front of the camera, and 20 units away." Hence, an additional step is added to the rendering math to take object world positions and translate them from into the camera coordinate system.

Of course the camera has a position and an orientation as well. So if an object is at position 20,100,50 and the camera is at position 10,200,-30, the object's position relative to the camera is 10,100,80 (the object's position minus the camera's position). When the camera moves in a game, that camera position in world coordinates is moved exactly like you'd expect.

Note that the objects are not moved; they are staying right where they were before. However, their position is now being expressed relative to a different coordinate origin. The object's world coordinates only move if the object itself moves, but its camera coordinates also change whenever the camera moves, since they are relative to the camera's position.

Also note that the description from the tutorial you're quoting is a simplified explanation and not necessarily an accurate description of what OpenGL does. I don't think the author of the article failed to understand that; the author just tried to use a simplified analogy that in this case caused confusion rather than eliminating it.

If it helps further to understand why the math cares about camera coordinates, try this exercise: hold up your hands touching your thumbs and forefingers together to make a rectangle (let's call that a "viewport") and look around at the room you're in. Find an object, and look at it, then look around it but not directly at it. When you do so, ask yourself, "where is the object in my viewport?" That object has some specific real-world longitude and latitude that you can use to pinpoint it's location on Earth, but that doesn't tell you anything about what you're seeing. Saying "the object is in the upper left corner of my viewport and looks to be about 2 meters away" tells you quite a bit, though. You've created a coordinate system relative to your head and the direction you're looking that defines where an object according to your vision. That's basically what the triangle rasterizer part of OpenGL/Direct3D needs, and that's what the math requires that object positions and orientation be transformed from their convenient world coordinates into camera coordinates.

Just adding to the other two (excellent) answers some further elaboration on a point that Mahbubur R Aaman touched on: "there is no camera".

This is quite true and represents a failing of the common "camera" analogy, because the "camera" does not actually exist. It's important to realise that the camera analogy is just exactly that - an analogy. It doesn't describe (or pretend to describe) the way things actually work behind the scenes.

So view (pun intended) it as being a means of helping you get your head round this stuff if it's new to you, but always remember that it's just a helper and not any kind of description of the way things actually are.

Now, you have two classes of object that are relevant here: the view point and everything in the world. You want to move the view point closer to some objects, but for this movement the end result is the very same whether the view moves closer to the objects or whether the objects move closer to the view. All that you're doing is changing the distance between them; since the current distance is X and you want the new distance to be Y, it doesn't matter which you move, just so long as that after the move the new distance is Y. So you're not really moving at all, you're just changing a distance. (I didn't mean to come over all Einstein in this... honest!)

But, however, since the camera does not exist, the only thing you can change the distance of is the objects. So you change the distance of the objects and out comes the very same result. Since all objects go through transforms anyway, this is no more or less expensive.

A simpler mathematical explanation may help more. Let's pretend that all coordinates are 1D - the viewpoint is at 0, your objects are at 4 and you want the viewpoint to go to 3. That means that the distance between them will change from 4 (4 - 0) to 1 (4 - 3). But since the camera does not exist you can't change that 0; it's always going to be 0. So instead of adding 3 to 0 (which you cannot do) you subtract 3 from 4 (which you can do) - the objects are now at 1, and the end result is the very same - distance between viewpoint and objects is 1.

Though while the camera doesn't exist as such, you can still calculate its position before the transformation. In some cases however (non-axis-aligned parallel projection) you'll end up with more than one of the usual coordinates "at infinity" (positive or negative), which is less useful than the transformation matrix.
–
Martin SojkaOct 29 '12 at 6:41

I don't think that the claim is categorically true, as one only rarely "moves" the world coordinates in a game, but actually changes the coordinates of the virtual camera.

What the concept of camera actually does, is transform the finite viewing frustum -- that is a truncated pyramid with 8 corner points (or defined by intersection of 6 planes) to a unit cube, which represents the clip space in the final stages of openGL rendering pipeline.

In that sense the world is not moved, but one only calculates world coordinates in the coordinate system of the clip space.

Moving the camera or moving the world are two equally valid choices which both amount to the same thing. At the end of the day you are changing from one coordinate system to the other. The above answers are correct but which way you visualise it are two sides of the same coin. Transformations can go either way - they are just the inverse of each other.

Part of the rendering process does convert from world coordinates to eye coordinates. However an easy way to model this is with a virtual camera object in your application. The camera can represent both the projection matrix (which is responsible for the perspective effect) and also the view matrix which is used to convert from world space to eye space.

So although the vertex shader uses the view matrix to change the coordinates of your geometry to eye space, it is often easier to think about a camera object moving around your virtual world which as it moves re-calculates the view matrix.

So in your application, you move the camera in world coords, update the camer'as view matrix, pass the new view matrix to the vertex shader as a uniform or part of block, render your scene.

I would posit instead that it's a flawed analogy. At its most basic, "moving the camera" and "moving the world" are exactly the same mathematical construct - it's just that moving the world is somewhat easier to think about conceptually, especially when it comes to hierarchical transformations. Basically, you're moving the world around the camera only in that you're translating the world vertices into the camera's coordinate space - but this is a reversible affine transformation.

However, when you start to bring visibility determination into the mix, the LAST thing you want to do is to translate the entire world around the camera. Instead, in most cases (especially the classical case of fixed BSPs or the like) you're going to be using the camera's position within the world to query your visibility structures to determine which things are likely to be visible, and then only translate THOSE things into the camera's coordinate space.

Moving the camera or moving the world are two equally valid choices (and both are true). At the end of the day we are changing from one coordinate system to the other. Transformations can go either way - they are just the inverse of each other.

A lot of good answers here. I'll try to not repeat any of them. Sometimes it's easier to think of in terms of a camera, like how Direct3D does it (note: haven't played with a lot of post 9.0c)

"Moving the world" like in the Futurama sense that someone out there quoted is a very good way to look at it ("The engines don't move the ship at all. The ship stays where it is and the engines move the universe around it!"). This was actually quite common for 2D games. You literally had a viewport that you would have a hard time adjusting, and that was sometimes your video RAM or a UI Window. If OpenGL does it for those kind of reasons, eh, hard to tell.

You can certainly think of a 2D motion in terms of a camera as well, and just that kind of thinking process can make it easier to figure out effects.

Thanks! I always found that adding to the discussion on pages that are found via a search engine turn out to be much appreciated, especially if the info is handy or interesting
–
Joe PlanteMay 2 '13 at 18:43

There seems to be a whole lot of misunderstanding here, starting from the writers of OpenGL docs...

Let me quickly restore your sanity: the world doesn't move, it stays put. Whoever tries to implement the world as moving around the player will quickly run into trouble in the multiplayer mode. Not to mention that updating the positions of millions (or billions) of objects in the world at each player's move will make a rather slow gameplay...

So, what really happens there, and what's up with the quote?

Well, first of all you need to understand the concept of a coordinate system. Generally, you pick one point in the world and declare it to be the "origin", that is a point with coordinates (0,0,0). You also choose three "main" directions, which you call X, Y, and Z. Obviously, there are many ways to assign a coordinate system. Usually there is one "world coordinate system", in this system the world is stationary (more or less). In a game this system would be chosen by the level designer.

Now, it is also convenient to consider another coordinate system, tied to the player's eye. In this coordinate system the player is always at coordinates (0,0,0), and the world moves and rotates around him. Thus the quote is correct if you understand as being made in the player's coordinate system.

However the world doesn't operate in the player's coordinates, it operates in the world's coordinates. And where two coordinate systems are involved, there is always a way to transform one kind of coordinates into the other. In OpenGL this is done using a 4x4 view matrix.

Ultimately, when a player moves, the world remains stationary, while the player is moved. This is in the world coordinates, the way the objects are stored in your game. The player also has a view camera associated with him, and this camera similarly moves around the world (despite what the OpenGL docs seem to be saying). However, in order to show the world on the user's screen, the coordinates of all visible objects are translated into the player's coordinate system using a transformation matrix, and then additional projection is applied to create a perspective effect. In this player's coordinate system the world actually appears to move around the player. But it's just an extremely unhelpful and confusing way of thinking about it.

"starting from the writers of OpenGL docs" Right, because I'm sure the makers of OpenGL are obviously too stupid to understand the difference between the presentation of a world (which is all OpenGL cares about) and the conceptual representation of that world (which is not something OpenGL deals with).
–
Nicol BolasMay 25 '13 at 13:49

"But it's just an extremely unhelpful and confusing way of thinking about it." It's also the truth. And the truth is always more helpful than a lie. Because sooner or latter, that lie will catch up to you and you'll have to face the truth.
–
Nicol BolasMay 25 '13 at 13:50