-
-
Notifications
You must be signed in to change notification settings - Fork 4
A.04 Spaces
In appendix 03 we showed that to transform a vector we can transform the starting frame so that we can express its coordinates with respect to a new coordinate system. Then, it’s interesting to look at the common spaces the pipeline uses to render a 3D scene on the screen, and how to go from a space to another as well.
The object space (also called local space) is the frame in which 3D meshes are defined. Usually, 3D graphic artists create meshes in a convenient space where it’s simpler to model vertices, maybe to get some symmetry with respect to the origin of the system. That's why the local space exists.

Indeed, it’s easier to model a sphere by placing all the vertices at the same distance from the origin rather than using a random point as the center of the sphere. We can also verify it mathematically.
Equation of the sphere with center in
Equation of the Sphere with center in
Anyway, the local space is the frame where the vertices of a mesh are defined in the first place. Often these vertices are stored in a file on disk that we can load into memory in order to create the vertex buffer to send to the input assembler. Indeed, the vertex buffer contains vertices in local space that the graphics pipeline transforms to get a 2D representation of 3D objects.
When the input assembler sends its output to the next stage (the vertex shader), we have vertices in local space that we want to place in a 3D global scene shared by all meshes. The space of the global scene is called world space, and the transformation to go from local to world space is called world transformation.

As we know, to go from a frame to another we need to express the basis vectors of the starting frame with respect to the new frame. So, we can build a matrix

Then, we can define
where the first three rows of
Example:
Given a cube in local space, suppose you want to double its size, clockwise rotate it by
As you can see in the following illustration, the first three rows of

After the world transformation, our meshes are all in world space. Now, we want a point of view from which to look at the 3D scene. This new space is called view space, or camera space. Again, we need to transform all the vertices of our meshes with another transformation (called view transformation) to go from world space to view space. We call a matrix

However, unlike the world transformation, we will use the same view matrix to transform all the vertices of our meshes because we (usually) don’t want to change the scene. That is, we only need a different point of view, so we must apply the same transformation to all vertices. You can consider the whole scene (the collection of all meshes) as a single large mesh we want to transform from world space to view space. Now, to build the view matrix, we can start considering the camera as an ordinary mesh we can place in world space. So, we could use a world matrix
Indeed, remember that the inverse of a rotation matrix is equal of its transpose (see appendix 03). Then, the view matrix
It’s interesting to note that, since
because both
Now, we need to calculate
Please note that we are considering
To compute

Then, we can calculate
Observe that
Finally, to compute
Both
DirectXMath provides the helper function XMMatrixLookAtLH to build a view matrix. You can pass the view space position and target point as arguments to this function, which returns the related view matrix.
// pos: position (in world coordinates) of the (origin of the) view space.
// target: position (in world coordinates) where we want the camera is aimed at.
// up == j (unit basis vector which points up).
XMVECTOR pos = XMVectorSet(x, y, z, 1.0f);
XMVECTOR target = XMVectorZero();
XMVECTOR up = XMVectorSet(0.0f, 1.0f, 0.0f, 0.0f);
// Compute the View matrix.
XMMATRIX V = XMMatrixLookAtLH(pos, target, up);XMVectorSet and XMVectorZero are also helper functions which allow us to initialize an XMVECTOR variable. As explained in appendix 01, XMVECTOR is an alias for __m128, so we should not initialize it with a simple assignment, or the usual array initialization, because it would require multiple instructions. On the other hand, XMVectorSet and XMVectorZero use a single SIMD instruction to load four values in a 16-byte aligned __m128 variable.
Once we have the whole scene in camera space, we need to project it onto a 2D plane. That is, we want a 2D representation of a 3D scene. If we place a plane in front of the camera, and take a ray from the camera to a mesh vertex, then a 2D representation of a 3D vertex is the intersection between the ray and the plane. If the projection rays are parallel to each other, and orthogonal to the plane of projection, the position of the camera is irrelevant.

However, in the first case we have that distant objects will appear smaller. This mimics the human vision in real life. We refer to this type of projection as perspective.
On the other hand, if the projection rays are parallel, we lost the perspective effect, and the size of the objects is independent of their distance from the camera. We refer to this type of projection as orthographic.
You can see the difference in the following illustration, where two segments of the same size are placed at different distances from the camera. In the perspective projection, the closer segment is longer when projected onto the projection plane.

Fortunately, this is almost transparent to the programmer, who is only required to define the region of 3D scene to project (usually, we are not interested in capturing the whole scene). For orthographic projections the region is a box. For perspective projections, it is a frustum: the portion of a pyramid between two parallel planes cutting it. For frustums the camera position is at the apex of the related pyramid. We refer to the plane closer to the camera as near plane, while the other one is the far plane. The intersection of a pyramid and a plane parallel to the base of the pyramid result in a window. So, we can intersect a plane and a pyramid somewhere between the camera and the near plane to get a projection window. Although, for this purpose, we can also use the upper face of the frustum (i.e., the intersection of the near plane and the related pyramid). In computer graphic literature the terms near plane and far plane are generally used to indicate the related windows as well.

As you can see in the following illustration, the green ball is outside the region in both perspective and orthographic projections, so it’s not projected onto the projection window. Also, in the orthographic projection the red and yellow balls are the same size, while in the perspective projection the red ball is smaller because it is further from the camera.

To define a frustum, or a box, we need to set the distances of both near and far planes from the camera. So, it is convenient to define the frustum in view space, where the camera position is at the origin. We also need to set the dimension of the projection window. With this information we can call a helper function provided by DirectXMath to create a projection matrix we can use to transform 3D vertices from view space to another one, called NDC (Normalized Device Coordinates) space. The frustum defined in view space becomes a parallelepiped in NDC space, whose origin is at the center of the front face of the parallelepiped (transformation of the near plane). The most important thing about the NDC space is that the meshes inside the parallelepiped (the same ones that were inside the frustum) have vertices with coordinates in the following ranges.
The following illustration shows the frustum in view space (on the left), and the related parallelepiped in NDC space (on the right). The z-axis is always orthogonal to both front and back faces of the parallelepiped in NDC space, and pass through their centers. The same can also apply in view space, but it's not a strict rule (the z-axis can be non-orthogonal to both near and far planes, and can pass through a point different from their centers).

Now, you may wonder what’s the point of this transformation. The following illustration shows a 2D representation from the top that explains what happens if you transform a frustum to a parallelepiped. The meshes inside the frustum are transformed accordingly, and the projection rays become parallel to each other. That way, we can orthographically project the mesh vertices onto a projection window (for example, the front face of the parallelepiped in NDC space), and mimic the perspective vision we are used to in real life, where the sides of a long road (or a railway) seem to meet at infinity, and where near objects appear bigger than distant ones.

Actually, we don’t really need to project 3D vertices onto the projection window because we already have a 2D representation of them once we are in NDC space. Indeed, as stated earlier, the projection rays are parallel, while the z-axis is orthogonal to the front face of the NDC parallelepiped, and passes through its center (the origin of the NDC space). This means that the x- and y-coordinates of vertices in NDC space are constant along the projection rays (only the z-coordinate can change). That is, the x- and y-coordinates of a vertex in NDC space are the same both inside the NDC parallelepiped and projected onto the front face (which lies in the plane

Most of the time, that’s all we need to know in order to write applications that renders 3D objects on the screen. However, as graphics programmers, we are expected to know how things work under the hood. In particular, knowing how to build a projection matrix might come in useful in the future.
As stated earlier, once we go from view space to NDC space, we implicitly get a 2D representation of 3D mesh vertex positions. So, this transformation is definitely related to the concept of projection. Indeed, the associated matrix is called projection matrix, that can vary depending on the type of projection we are interested in. We will start with a couple of matrices associated with the perspective projection, and then we will show the matrix associated with the orthographic projection.
In the previous section we stated that a helper function is provided by DirectXMath to automatically create the projection matrix from the frustum information. Let’s see how to build this matrix manually. For this purpose, first we will try to find a way to derive NDC coordinates from view coordinates. Then, we will see if we can express the resultant equations in a matrix form. That is, we want to find the projection matrix to go from the view space to the NDC space. Consider the following illustration.

First of all, we need to define a frustum in order to retrieve its information for a later use. As for the projection window, we know that we can intersect a pyramid in view space with whatever plane between the camera (placed at the origin
Since the z-axis is orthogonal to the projection window, and passes through its center, then every 3D vertex projected onto its surface has the y-coordinate already in NDC space (that is, in the range

Let’s start with
Also, we know that
If you want to compute the horizontal FOV
As for
Observe that a vertex in view space
where
As we know, a vertex position is a point, so the w-coordinate is 1. As for
However, before deriving
Observe that if we multiply the NDC coordinates by
Well, it turns out that clip coordinates are exactly what the rasterizer expects before executing the perspective division, so we can take advantage of this trick to find a matrix form to transform view coordinates to clip coordinates.
The rasterizer receives in input primitives with vertices in clip coordinates. This means the last stage before the rasterizer has to output vertices in clip space. If no optional stage is enabled, the last stage before the rasterizer is the vertex shader. Otherwise, it can be one between geometry and domain shader.
With the perspective division automatically performed by the rasterizer, we are able to transform the coordinates of a vertex from clip to NDC space. Now, we need to find a matrix form to go from view space to clip space. First of all, we need to multiply equations
Observe that we still need to derive
Then, to get the NDC coordinates, we simply need to divide all the components of
Now, we can eventually focus on deriving a formula for
The matrix in
because the third column is the only one that can scale and translate the third coordinate of
Observe that the result it's a row vector, but I wrote it as a (transposed) column row to better highlight its components. The coordinates of
However, in this case we know that
We also know that for a vertex in view space that lies in the far plane we have
where we used
However, in this case we know that
Substituting this into equation
So, we just found the values of
Although, that’s not what we wanted to find at the start of this section (the matrix to go from view to NDC space). However, since we get the perspective division for free at the end of the rasterizer stage, we can actually consider
We built the perspective projection matrix

Deriving a perspective projection matrix for this general case won't be too difficult given that we’ve already examined and solved the particular case. Indeed, after the perspective projection of the mesh vertices, we only need to translate the projection window so that the z-axis passes through its center again. However, we first need to make some preliminary observations.
In the general case the frustum is no longer symmetric with respect to the z-axis, so we can’t use the vertical FOV and the aspect ratio to define its extent. We need to explicitly set the width and height of the projection window by specifying the view coordinates of its top, bottom, left, and right sides. Also, we will project 3D vertices onto the projection window that lies in the near plane (that is, we have
In the general case a vertex
Therefore, we need to translate the first two coordinates of
Observe that we used the mid-point formula to subtract the corresponding coordinate of the center (of the projection window) from
Since we are back in the particular case, we can substitute equation
Similarly, we can substitute equation
So, with equations
If we omit the perspective division in
After the perspective division by the w-component, the vertices inside the NDC parallelepiped are the ones with NDC coordinates within the following ranges
So, the vertices in clip space inside the frustum were the ones with homogeneous coordinates within the following ranges
That is, the vertices inside the frustum are the ones bounded by the following homogeneous planes (that is, 4D planes expressed in homogeneous coordinates).
Left:
Right:
Bottom:
Top:
Near:
Far:
The following illustration shows a 2D representation of the frustum in the homogeneous zw-plane.

If
We have

As you can see in the image above, a clipped primitive might no longer be a triangle. Therefore, the rasterizer also needs to triangulate clipped primitives, and re-inserts them in the pipeline.
Whatever perspective projection matrix you decide to use (between
If you set

This can be a problem because if a far mesh A is in front of another mesh B, but A is rendered after B, then A could be considered at the same distance as B with respect to the camera, and discarded from the pipeline if the depth test is enabled. We will return to the depth test in a later tutorial.
To mitigate the problem, we can set
In an orthographic projection we also want the z-axis passes through the center of the projection window, as in the general case of a perspective projection. However, in an orthographic projection we can move the projection window wherever we want because its location doesn’t really matter. This in an interesting property that we will use to derive an equation for

Indeed, we can reuse equations
Also, with an orthographic projection we can’t substitute
This means the matrix above allows us to go straight from view space to NDC space, without passing through the homogeneous clip space. Although, the rasterizer still expects vertices in clip coordinates. Then, we need a way to make the rasterizer believe we are passing clip coordinates, while also avoiding the perspective division. As you can see in the fourth column of the orthographic projection matrix, the unitary value has moved in the last element. This means that if you multiply a vertex by an orthographic projection matrix you will get 1 in the last component of the resultant vector. That way, the rasterizer will divide the remaining components by 1, which nullifies the effect of the perspective division.
DirectXMath provides many helper functions to build various projection matrices, depending on the type of projection and the handedness of the frame. In this tutorial series we will only work with left-handed coordinate systems, so, for example, to build a perspective projection matrix we can use the helper function XMMatrixPerspectiveFovLH.
XMMATRIX XMMatrixPerspectiveFovLH(
float FovAngleY,
float AspectRatio,
float NearZ,
float FarZ
);As you can see, we only need to pass the vertical FOV, the aspect ratio, and the distances of the near and far planes. This means that with this function we can build the matrix
As for the general case of a perspective projection, we can use the helper function XMMatrixPerspectiveOffCenterLH.
XMMATRIX XMMatrixPerspectiveOffCenterLH(
float ViewLeft,
float ViewRight,
float ViewBottom,
float ViewTop,
float NearZ,
float FarZ
);As for the orthographic projection, we can use the helper function XMMatrixOrthographicOffCenterLH.
XMMATRIX XMMatrixOrthographicOffCenterLH(
float ViewLeft,
float ViewRight,
float ViewBottom,
float ViewTop,
float NearZ,
float FarZ
);DirectXMath also provides XMMatrixPerspectiveLH. Please refer to the official API documentation for more details.
After the perspective division, all vertices are in NDC space, and if we only consider the first two NDC coordinates, we also have their 2D representations. Although, we are in a normalized 2D space (the

The rasterizer automatically transforms the vertices from NDC to render target space by using the viewport information we set with ID3D12GraphicsCommandList::RSSetViewports. Observe that, in computer graphics literature, the render target space is also called screen space (along with the related screen coordinates) since it will be eventually mapped to a window’s client area on the screen.
In a previous tutorial (02 - D3D12HelloTriangle) we stated that a viewport can be thought of as a structure that defines a rectangle on the back buffer where we are going to map the 2D projection window. Now, we can be more specific in stating that a viewport is a structure with information used by the rasterizer to build a matrix which transforms vertices from NDC space to a selected rectangle on the render target. That way, we are actually mapping the 2D (normalized) projection window to a rectangle on the back buffer.

In order to not to leave anything to chance, let’s see how we can manually build this matrix from viewport information. Suppose we want to draw on a selected
to the following render target ranges
Starting with the x-coordinate, we need to map
As for the y-coordinate, we need to consider the change of direction between NDC and screen space. That is,
As for the z-coordinate, we only need to scale
At this point, we only need to translate the resulting coordinates to shift the origin of the
Now, we can derive our screen coordinates
which in matrix form becomes
Although, most of the time we don’t want to rescale the NDC z-coordinate, so we have
To avoid stretching in the final image on the screen, it is advisable to set
$w$ and$h$ so that we have the same aspect ratio of the projection window which, in turn, should be equal to the aspect ratio of the render target, and the window’s client area as well.
Once mesh vertices are in render target space the rasterizer can identify the texels covered by the primitives, and emit pixels at the related positions to be consumed by the pixel shader.
[1] Introduction to 3D Game Programming with DirectX 12 (Luna)
If you found the content of this tutorial somewhat useful or interesting, please consider supporting this project by clicking on the Sponsor button. Whether a small tip, a one time donation, or a recurring payment, it's all welcome! Thank you!