NVIDIA GeForce4 Ti 4400
and GeForce4 Ti 4600 (NV25) Review

ANISOTROPIC FILTERING

The MIP-mapping technology improves image quality
for scenes with objects that extend from the foreground deep into
the background. It creates for each texture a set of MIP-levels
(its copies of different detail levels) which are chosen based on
the size and the resulting scale. The further the triangle is, the
more blurry the MIP-level will be used. Trilinear filtering smoothes
over sharp edges of MIP-levels. Thus, while bilinear filtering removes
sharp edges between texture pixels, the trilinear one softens an
image even more so that only close objects can be seen sharper.
At the same time, walls which are at a too sharp angle for us are
too blurry. And anisotropic filtering is used to cope with such
inconvenient objects for bilinear and trilinear filterings.

Different processor makers realize this function
differently. Besides, speed characteristics of anisotropic filterings
of ATI and NVIDIA differ much as well. Only the resulting quality
is similar.

But is that true? As you know, the NVIDIA's anisotropy
(in case of GeForce3) has high quality, but it eats much as well.
The performance drop can reach 50%! ATI's anisotropy (in case of
RADEON 8500) is much cheaper and provides apparently the same quality.

Quality of anisotropy can be estimated by examples
of walls, floors etc. And our attentive readers know that the RADEON
8500 doesn't use any anisotropy on some surfaces located at angles
different from 90 degrees. Look at the screenshots of the Serious
Sam game:

ATI RADEON 8500

NVIDIA GeForce4

Here are animated GIF files:

ATI RADEON 8500

NVIDIA GeForce4

At some angles the RADEON 8500 provides no sharpness.
The NVIDIA GeForce3 and GeForce4 do not have such problems. It isn't
good if a user can't choose between the full anisotropy with great
losses and its cheaper approximation.

The GeForce4 has, in fact, the same anisotropy
method as of the GeForce3, i.e. 3 levels, each having the maximum
texture sampling value for realization of anisotropic filtering
(Level2 - 8, Level4 - 16, Level8 - 32 samples).

NVIDIA's and ATI's approaches to anisotropic
filtering realization

While bilinear and trilinear filterings are mathematically
strictly defined (though some time ago NVIDIA called a trilinear
filtering some approximation method - dithering of values from different
MIP levels), the concept of anisotropic filtering doesn't imply
definite algorithms of its realization. Approaches of NVIDIA and
ATI to this issue are different. Let me show you some figures:

NVIDIA: the figure shows fetching of bilinear
samples in the texture space during implementation of the anisotropic
filtering. Depending on filtering quality settings and inclination
of surface a standard bilinear (or trilinear) filtering is implemented
from one to four times for points lying on a straight line which
divides a pixel projected from the screen onto a texture surface
along its long side (the line is shown with an arrow on the figure).
The values obtained this way (blue circles) are averaged, and make
the result of the filtering. Each value is based on four closest
discrete values of the texture (rectangles) and can have its own
independent coordinates. Such approach suits for arbitrarily oriented
textures but it requires a great level of performance - for the
visible part of triangles non-parallel to the screen the number
of fetched texture samples grows up several times, so does the shading
time.

The ATI's approach is more limited but more efficient
as well:

The values are fetched in a line which can lie
either horizontally or vertically in the texture's plane. For values
of the projection vector which are close to the orts (the arrow
on the figure) the filtering quality will be high, but as it turns
the effect will be decreasing until this method starts making no
sense at all. In real applications the filtering will work good
on walls or ceilings, but the results will be vanishing on surfaces
located at angles different from the right one the result will be
less noticeable till the critical angle of 45 degrees is reached.
However, such approach is beneficial from the computational point
of view. First of all, we can choose organized lines from 2xN
texture points in size (squares on the figure) which can be effectively
fetched during N/2 cycles with the help of standard texture units
meant for bilinear filtering. Then we filter values (circles on
the figure) using every time the same offset values relative to
the discrete points of the original texture. Such operation can
be fulfilled at one clock by a special circuit of ten multipliers
which is integrated into a texture unit; interpolation parameters
are, thankfully, calculated just one time and remain unchangeable
for all 1..5 calculated points. Besides, we can speed up this algorithm
which is anyway efficient by calculating texture variants specially
compressed on axes in advance (so called RIP mapping).

The NVIDIA's approach needs more time to get the
result but it processes objects at any angles equally good, not
only those positioned just horizontally or vertically. The ATI's
method has a rational core as the most of modern games use mostly
horizontal and vertical surfaces.

Quake3

Return to Castle Wolfenstein

3DMark2001, Game1 Low details

3DMark2001, Game2 Low details

3DMark2001, Game3 Low details

3DMark2001, Game4

As you can see, the performance of the GeForce4
falls down by a greater margin as compared with the GeForce3. The
Level8 kills all advantages of the GeForce4 as far as speed is concerned.
But is it possible at least at Level4 to get the same quality as
that of the RADEON 8500? Yes! There will be losses relative to the
Level8, but high quality can be provided by making the LOD BIAS
value lower. Till recently this parameter could be changed only
in the Direct3D with the help of tweakers, for example, RivaTuner.
The 27.* drivers allows making it in the OpenGL as well but only
in the Registry. Let's see what we can get by setting LOD BIAS to
-1 for the Serious Sam: The Second Encounter.

Anisotropic filtering Level 8

Anisotropic filtering Level 4, LOD BIAS =
0

Anisotropic filtering Level 4, LOD BIAS =
-1

Anisotropic filtering Level 2, LOD BIAS =
-1

Well, the effect is achieved. However, there are
also some side effects - moire and texture noise, but the RADEON
8500 has the same when the anisotropy is enabled (at its maximum
level). Here the performance drop is not so great. At the Level2
decreasing the LOD BIAS doesn't help any more, though quality of
the Level4 at LOD BIAS = 0 can be achieved.

ANTI-ALIASING (AA)

This function is used to remove a stair-step effect.
When the AA is enabled the performance drop is even more considerable.

The Quincunx level is fast but it often makes textures
soapy. In the GeForce4 we can use the next AA level (4x) which has
excellent quality.

Let's look at quality of two the most interesting
AA types of the GeForce3 Ti 500 and GeForce4 and compare them.

GeForce3 Ti 500

GeForce4

3DMark2001, Game 1

No AA

AA Quincunx

AA 4x

AA 4x

AA 4xS

3DMark2001, Game 2

No AA

AA Quincunx

AA 4x

AA 4x

AA 4xS

3DMark2001, Game 3

No AA

AA Quincunx

AA 4x

AA 4x

AA 4xS

There is no much difference between the GF3 and
GF4 at AA 4x and AA Quincunx levels. The AA 4xS doesn't improve
visual quality as well.

New hybrid AA mode: 4xS.

The new hybrid (MS and SS simultaneously) mode
of full-screen anti-aliasing is available on NV25 based cards. Two
subunits (2x1) positioned one over the other and obtained the way
typical of the 2x MSAA are averaged in every original 2x2
AA unit (a usual 4x MSAA unit is shown on the right for comparison):

S1 is the first 2x1 subunit, and S2 is the
second one. Samples are calculated according to the multisampling
method inside the subunit, i.e. from one selected texture value,
but texture values can differ in upper and lower subunits, unlike
in a usual 4x MSAA. From the accelerator's standpoint we just calculate
a vertically doubled image in a standard 2x MSAA mode (2x1
units). This mode can also be set up in the NV20 but only through
undocumented driver parameters in the register. The NV25 based cards
allow making this setting in the driver's control panel. It should
be noted that the NV25 performs excellently: although the number
of interpolated texture values is now twice larger, the performance
differs from the 4x by just a couple of percents, and visual
quality is much better. This method can't improve the situation
considerably on polygons' edges - SSAA and MSAA look similar there,
but textures must be now less blurry. Moreover, for horizontal surfaces
(landscapes, floor, ceiling) this method implements some anisotropic
filtering functions (2x quality). Later we will examine closely
realization of the 4xS on real images and performance results.

And now let's estimate a performance drop.

Quake3

Return to Castle Wolfenstein

3DMark2001, Game1 Low details

3DMark2001, Game2 Low details

3DMark2001, Game3 Low details

It is interesting that in AA 2x and AA Quincunx
the performance is almost the same (thanks to a great memory bandwidth
and optimization of the GeForce4 Ti 4600 in the multisampling mode).
Other modes also became more attractive thanks to a greater performance
of the GeForce4.

The 4xS mode (works only in Direct3D) turned out
to be quite strange. Its speed and quality are approximately at
the 4x level. Probably, this mode will be improved.

Joint operation of the anisotropic filtering and
AA will be examined in our next reviews of production video cards
on the GeForce4 Ti.

Conclusion

The bugs and disadvantages of the NV20 in comparison with the
R200 were cured and optimized in the NV25.

Although the technology is the same, and the number of transistors
is just a bit higher, the chip working at the same frequency as
the previous model is much more efficient especially in intensive
tasks and has a much higher frequency limit.

The AA has become cheaper, especially in the Quincunx.

Dual monitor support is excellent both on program and hardware
levels.

The chip doesn't belong to the new generation, it is just a
debugged and improved version of the previous one.

Similar technology and complexity promise that cards on this
chip won't be priced higher than on the Ti500. In this respect
the chip can be considered successful - much higher performance
and wider capabilities at the same sum of money.

Usage of the BGA memory is a successful and justified move.

Some possibilities (anisotropy, EMBM) are worse a little than
in the NV20; but taking into account a much higher clock speed
it can be forgiven.

I know that many fans and owners of NVIDIA cards expected more
of this chip as almost a year passed since the GeForce3 was released.
However, we consider the strategy of gradual growth and optimization
more justified as the activity on the IT market decays. The most
of new capabilities of the DirectX 8 arrived together with the
GeForce3 are just starting their way towards real applications.

The Ti 4400 will probably be positioned as a direct
competitor against RADEON 8500, including the price. The Ti 4600
will take a higher position, and lack of competitors will let its
price go up without limit.

The new chip of NVIDIA (NV25) is able to sit firmly
in the upper-level gaming market, and probably will be the main
carrier of DX8 advanced technologies.