Abstract
In this paper we discuss the mechanism by which process variations determine the overlay accuracy of optical metrology. We start by focusing on scatterometry, and showing that the underlying physics of this mechanism involves interference effects between cavity modes that travel between the upper and lower gratings in the scatterometry target. A direct result is the behavior of accuracy as a function of wavelength, and the existence of relatively well defined spectral regimes in which the overlay accuracy and process robustness degrades (`resonant regimes’). These resonances are separated by wavelength regions in which the overlay accuracy is better and independent of wavelength (we term these `flat regions’). The combination of flat and resonant regions forms a spectral signature which is unique to each overlay alignment and carries certain universal features with respect to different types of process variations. We term this signature the `landscape’, and discuss its universality. Next, we show how to characterize overlay performance with a finite set of metrics that are available on the fly, and that are derived from the angular behavior of the signal and the way it flags resonances. These metrics are used to guarantee the selection of accurate recipes and targets for the metrology tool, and for process control with the overlay tool. We end with comments on the similarity of imaging overlay to scatterometry overlay, and on the way that pupil overlay scatterometry and field overlay scatterometry differ from an accuracy perspective.

1. Overlay Metrology In The Accuracy Age
Overlay metrology techniques typically assume that the only cause for the asymmetric part in their signal is the overlay they are designed to measure. This is true for overlay by pupil-scatterometry [1,2] where the signal asymmetry is obtained per angle, for overlay by field-scatterometry [3] where the signal asymmetry is an angular average of the corresponding one in pupil-scatterometry, and for imaging based overlay where the signal asymmetry is the actual asymmetry in the field conjugate bright field image of the target.

Unfortunately this assumption is unrealistic and leads to inaccuracy [4-21]. There are different types of metrology target asymmetries that contaminate the signal and mix with the overlay-induced asymmetry. Examples for these appear in Figure 1 and include periodic bar asymmetries like side-wall angle asymmetry, top and bottom tilts, fin-depth asymmetries, pad-to-pad variations, and aperiodic asymmetries including the asymmetric target environment contamination of the metrology signal. When such asymmetries are present, they can induce a sizable, nanometer-scale, inaccuracy in the value of the reported overlay. This inaccuracy depends on the process and on the way it is changing with time and wafer position. As a result it cannot be calibrated away.

Fig. 1. Examples for different type of target asymmetries.

The overlay accuracy challenge can become especially significant considering the overlay error budget in advanced node semi-conductor manufacture, which itself can be as tight as a few nanometers. This is a result of the continuous shrinkage of device size and puts stringent bounds on the repeatability, tool-induced shifts, tool-matching and accuracy budgets of the overlay metrology. Traditionally, the former three (precision, TIS, and matching) were taken to comprise the `total measurement uncertainty’, or TMU, while the accuracy portion of the budget was examined less frequently. Conceptually, this means that the statistical stability of the measurement was regularly evaluated and optimized, while its systematic error one was not. Today, optical overlay metrologies have evolved to a stage where one can relatively easily optimize the TMU to be less than a third of a nanometer, and sometimes to the Angstrom level, with a production worthy measurement time. Despite this, and as we shall show in this paper, overlay inaccuracy can still be much larger, making the optimization of overlay accuracy of prime importance.

The simplest way to observe overlay inaccuracies in fab data is to consider the overlay maps of the same alignment, but probed with different combinations of target design and tool setup, all of which with good TMU. An example is seen in Figure 2 and Table 1 where we show wafer maps of the overlay in the a single alignment for four measurement conditions and where one can observe that the per-site overlay numbers of these conditions can differ by up to ~6nm at the wafer edge nanometers. This is despite the precision of all four is very similar and small. Similar examples exist with TIS3S instead of precision, and with TMU.

Figure 2. Examples for experimental evidence for inaccuracies: overlay maps taken at several measurement conditions with good precision that differ in a significant way by their overlay values.

Table 1. Expansion terms and precision for a set of measurement conditions of the same alignment.

Optimizing the measurement conditions (which we define to mean a combination of tool setup and metrology target design) for best TMU is a straight forward exercise since one can quickly and directly measure the TMU. One can also improve the repeatability and calibration level of the tool to better the TMU. In the context of accuracy, however, the situation is quite different and improving tool `imperfections’ such as stage inaccuracies or optics misalignments hardly affects the inaccuracy which is largely independent on such small tool perturbations. Also, direct measurement of the accuracy can be complex and time-consuming. It is not clear how statistically reliable and accurate are the available `reference metrologies’ such as CDSEM, how their accuracy budget depends on the specific alignment and process, how different is the overlay measured after the lithography step from its counter-part after etch (the ADI-AEI bias), and how important is the overlay difference between the metrology target and the device\reference target. For data on this subject see reference [22]. Even if one had a complete budget for all these items, it does not seem realistic or desirable to assume that the optimization and in-line control of the quality of the optical overlay will be done with non-optical technologies. To understand why, recall that the optimization of measurement conditions happens not only once at the R&D during process integration, but also when the process changes during R&D or during pre-production and perhaps also during process excursions in HVM.

In contrast to data, in simulations the definition of accuracy is much better defined. In cases where the process variations that contaminate the overlay signal still leave the grating bars periodic and symmetric the definition is unambiguous. In other cases where the bars acquire certain geometric asymmetries, there may be an ambiguity in defining the overlay, but it is typically expected to be small when the asymmetries are reasonable (for example around ~0.5nm when the side-wall-angle asymmetry is around 2deg).

The above, together with evidence from simulations, theory, and data, which we discuss in this paper, lead us to avoid relying on reference metrology for optimization or control, and to take a different route; developing metrics that characterize the signal and that indicate in which measurement conditions the accuracy is better. As we shall see, the nature and mathematical definition of these metrics follows from simple physics observations into how the optical signal is formed from interferences, and how it responds to different type of process variations.

In the rest of the paper we proceed as follows. In Section 2 we focus on scatterometry and detail how one extracts the overlay from the optical signal, and how process variations can lead to inaccuracy. We then discuss the underlying physics that determines the scatterometry signal in the absence and presence of process variations, and end with the landscape perspective to overlay accuracy. In Section 3 we discuss basic features of landscape phenomenology and especially what features of the landscape are kept invariant under different type of process variations. In Section 4 we describe the metrics defined to locate, on the landscape, the spectral regions where overlay accuracy is optimized and how one can use these metrics for setup and target optimization, and for control. We end in Section 5 where we draw analogies between image-based-overlay from the landscape point of view, and use the same point of view to compare field-based-scatterometry and pupil-based-scatterometry.

Before we proceed we comment that when we refer to `overlay accuracy’ in this paper, we mean the accuracy of the overlay measurement on the metrology target. We do not discuss the accuracy budget connecting the metrology target to the device (including the ADI-AEI bias and other issues related to the different patterning of the target and of the device). Instead, we assume that to control the total overlay budget we first need to control it on the metrology target and that is our focus in this paper.

2. Inaccuracy Mechanisms In Scatterometry

In its simplest version, the SCOL signal comprises of two pupil images of two grating-over-grating structures or two scatterometry `cells’. The gratings represent the two lithography steps one wishes to align, and have a relative offset that is different in each cell. By setting the induced offsets of the two cells to be the known , one sets the average of the offsets to be sought after overlay. The pixels of these images correspond to different diffraction angles, and in particular, 1st order SCOL [1,2] defines the images’ asymmetry, , between the st and st orders. It is expected to be an odd function of the offset between the two gratings and, for example, in the linear regime, one will be able to write

Here we denote the relative offset between the gratings by OFFSET and the measurement conditions by . The proportionality coefficient is the overlay sensitivity and that one can use equation 1 applied to both SCOL cells to extract the overlay and the sensitivity for each measurement condition . Importantly, in SCOL, one has direct and in-line access to the signal of each illumination angle used in the tool and so in that case is any combination of target design, wavelength, polarization, and illumination angle\pupil pixel. This means that even after we set all illumination conditions in setup optimization time, we still have direct access in run time to the pupil images themselves and to the overlay information that they carry on a per-illumination angle basis.

In Section 1 we discuss additional sources for signal asymmetry that are not offset related (see Figure 1). These would add additional terms into the right hand side of equation (1), contaminating and causing it to be nonzero even when OFFSET = 0. In general, these terms will be general functions of and of the OFFSET and so the inaccuracy will, in general, be a function of wavelength, polarization, target design, overlay, and illumination angle. In addition, it is also easy to see that if the amplitude of such terms causes a ~10% contamination to equation (1), the inaccuracy will be at the level of 1nm-3nm nanometers (we take .

In Figure 3 we show examples, from simulations, for the way the inaccuracy behaves as a function of wavelength for a certain target design and polarization. The overlay values shown in the plots are a result of a pupil algorithm that combines the different overlay values from the different illumination angles\pupil pixels into a single reported overlay value.

Figure 3. Two different examples of the SCOL inaccuracy as a function of wavelength for three different alignment steps in the FEOL (left panel) and in DRAM (right panel).

Observing Figure 3 we see that there seem to be an underlying structure to it; some spectral regions are distinct by having better and more stable accuracy values and in other regions one gets the opposite impression of inaccurate overlay and of strong derivatives of the overlay with respect to wavelength. Also, Figure 3 explains Figure 2 and Table 1: sampling the spectrum with a discrete and sparsely chosen set of wavelengths may give the impression that each wavelength is in fact measuring something else, but the full simulated spectrum shown in Figure 3 is telling a different story – of an underlying signature whose physical origin we will discuss with some length in the next sections.

As a prelude to this discussion it is useful to note that the direct access to the pupil in SCOL offers a wealth of in-line information on the signals that opens a door to algorithms that characterize the quality of the measurement and optimize its accuracy. These algorithms effectively search for and identify certain angular patterns in the pupil, which are unrelated to the overlay, and which we find to be important for accuracy.

2.2 Overlay inaccuracy seen from the pupil: ‘Pupil arcs’

We start this section by extending Figure 3 in Figure 4, where we plot the pupil images of the overlay (the nanometer pupil signature of the overlay) for several wavelengths from the curves in the middle panel of Figure 3. From Figure 4 it is clear that there is a striking difference between the pupil images of wavelengths that in the unstable and inaccurate regions to those that are not.

Figure 4. Pupil images of the overlay angular distribution showing that it can be used to distinguish stable and accurate from unstable and inaccurate measurement conditions.

In particular, wavelengths that are close to, or in a, wavelength region of degraded accuracy and stability, are accompanied by pupil images that contain contours that split them into different regions whose inaccuracy values are very different (and typically large in amplitude and opposite in sign). In contrast, wavelength regions that are more accurate have smooth pupils and very moderate per-pixel variability. We found that this type of behavior is very common in many alignment schemes present in various processes (including both the logic and memory segments). Because in many cases we see that these pupil contours have the shape of arcs, we term them `pupil arcs’ in the rest of this paper.

In is interesting to observe how the pupil is changing when we continuously change the wavelength through a region that is inaccurate and has a large derivative with respect to wavelength. This is shown in Figure 5 where we see that, as the inaccurate regime is approached, an arc appears in the pupil, sweeps through it from right to left as the wavelength increases, and exits it as the region is swept though.

Figure 5. Pupil arcs that enter, sweep through, and exit the pupil of scatterometry signals as the wavelength goes through a wavelength region whose overlay is inaccurate and strongly varying.

Such arcs are can also be easily identified in A500-LCM data which we demonstrate Figure 6.

Figure 6. A comparison of three overlay pupil images from A500-LCM of the same physical overlay target. Left panel: an arc-free measurement condition. Middle and right panels: measurement conditions with an arc. Observe the different scales between the three pupil images.

2.3 The physics behind pupil arcs and overlay inaccuracy

From simulations and data we see that pupil arcs are very common in scatterometry. So common that, to a large extent, the vicinity of measurement conditions to wavelengths whose pupil image contains arcs determines these measurement conditions’ accuracy and robustness.

This makes it important to understand what the physical origin of this phenomenon is, and for that purpose, we consider the problem with a multi-scattering approach pictorially described in Figure 7. The figure shows how the diffractions off the scatterometry cell produce a set of coherent waves traveling to the pupil after they go through different scattering processes.

Figure 7. A pictorial representation of the multi-scattering model developed to understand the root cause of arcs.

The simplest processes, indexed by (#0) and by (#1) in the figure are those where light diffracts off the upper grating (#0) or the lower grating (#1). These, together with all the other scattering processes, interfere in the back focal pupil plane of the scatterometer and form the scatterometry signal. Additionally, and if the gratings are optically thick enough, we shall have multi-scatterings inside the grating themselves.

To proceed we consider the case of a signal formed from processes (#0) and (#1). In that case, the information on the relative offset between the upper and lower grating, is contained in the relative phase between the corresponding electric fields, and is captured by the following interference term

(2)

Here OPD is the optical path difference between the electric field in process (#0) and in process (#1). It is an easily calculable analytic function and depends on the intermediate slab thickness, H, on its index of refraction, on the target pitch, and on illumination conditions like the wavelength, λ and the illumination angle. The phases φ^1,2 denote the phases acquired by the waves’ scattering off the grating themselves. Comparing equations (1) and (2) we see that, in fact, G from equation (1) is linear in the cosine appearing in equation (2).

Perturbing the geometry of Figure 7 by adding asymmetries of the type that appear in Figure 1, can be shown to add terms to (2) which translates into signal contaminations of several forms like a constant or a sin (OPD +φ^1 +φ^2. In Figure 8 we plot the overlay inaccuracy from these simple models, both as a function of wavelength, and on the pupil. It is clear that the simple scattering model captures much of the phenomenological features we discussed above, including the important connection between arcs in the pupil and the inaccuracy. This happens because at certain combinations of tool knobs and target design the argument of the cosine is equal to π/2 +2πn; n ∈ integer, making the contribution of the overlay to the asymmetric part of the signal small compared to the contribution of the asymmetric signal contaminations from the asymmetries of Figure 1.

Figure 8. The way the inaccuracy behaves as a function of wavelength in the model described in Figure 7.

What we have shown is that the underlying physical reason for arcs appearing in the pupil and for the way the inaccuracy behaves in the space of measurement conditions is the destructive interference between the different electric fields that are represented by cavity modes in the grating-over-grating cell. These modes can also develop inside a single grating, in which case the mathematical description is quite similar, as is the phenomenological one and the functional effect on overlay accuracy. Importantly, and because it is such a simple and basic physical phenomena that is related to the mere nature of grating-over-grating systems, we see that these destructive interferences are the rule and not the exception – and, indeed, we observe them in simulations of all alignment steps and in much scatterometry data.
2.4 Pupil R

Let us complement the perspective of subsections 2.2-2.3 by plotting a pupil derived quantity that can flag the existence of arcs in the spectrum. We term this merit by R. It ranges from 0 to 1; at R=1, there is an arc close to the pupil center while at R=0 there is no arc and we are at a region with better accuracy and process robustness. In Figure 9 we show how R varies with wavelength. The result is from full electro-magnetic simulations for the same alignment whose inaccuracy is shown in the left panel of figure 3 and it is we see that R easily identifies the regions where the overlay accuracy is good and where it is slowly varying with wavelength. In fact, the way Figure 9 looks like makes the existence of a spectral signature in the overlay though wavelength plots very clear; there are resonance-like regions (these are the regions in which R peaks towards one) where arcs appear in the pupil, R peaks towards 1, and accuracy degrades. These regions are interleaved by regions where accuracy improves and where the overlay weakly depends on the wavelength (see shaded light blue regions in Figure 9). We term these regions `resonant’ regions and ‘flat regions’ correspondingly, and the entirety of the spectral signature as the `landscape’ of overlay metrology.

Figure 9. In blue: the inaccuracy. In red: the metric R multiplied by 10. Observe how R behaves as a function of wavelength and detects the inaccurate and unstable regions.

Figure 10. Complementing the way the inaccuracy behaves as a function of wavelength in the model described in Figure 7 with the metric R predicted by the model. In blue: the inaccuracy (same as Figure 7); in red: R multiplied by 10.

In the next section we further discuss the landscape and how it changes with process variations. That will lead to a useful and intuitive perspective into overlay accuracy.

3. Landscape Phenomenology And Universality

In this section we discuss the way the landscape changes as a function of process variations. We will show that the changes preserve certain features of the landscape, making it a spectral signature that is unique to each alignment.

3.1 Flat regions and resonances
In Section 2 we mentioned that pupil arcs and resonances are very common in all alignment schemes. It is interesting to consider several statistical properties that we find from our simulation database.

• Resonance width: we find that resonances can have a variety of widths and, that to a large extent, the natural scale that determines these widths is the optical thickness of the inter-grating stack (as one can anticipate from the discussions and equations of section 2.3). In practice, however, in logic and memory, the resonances’ widths are on the scale of between 7nm-10nm to up to 70nm.
• Width of flat regions (or distance between resonances): here the statistics is quite similar to that of resonances. Still, it is important to note that the presence of narrow resonances in a certain alignment scheme does not mean that the sizes of the flat regions in that alignment will also be narrow.

1.2 How the landscape varies with process variations: landscape universality
In this section we divide the type of process variations into two types, distinguished by the way they break the symmetry of the scatterometry target.

1.2.1 Asymmetric process variations
Asymmetric process variations are those process imperfections that break the symmetry in the SCOL cell and that appear in Figure 1. In the absence of these, the inaccuracy will always be zero (which does not mean that process robustness will be good – see section 3.2.2).

In the left panel of Figure 11 we show simulation results for the way the inaccuracy landscape of a FEOL alignment changes as a function of asymmetric process variations taken to be an asymmetric side wall angle of different amplitudes. In the figure we quantify the asymmetric variation by the amount of geometric asymmetry, measured in nanometers, it causes. In the right panel of the figure we show how the inaccuracy scales on a single wavelength as a function of the amplitude of the geometric asymmetry.

Figure 11. The effect of asymmetric process variations on the landscape: linear dilation in the inaccuracy only. This is also seen in the right panel of the figure where we show the inaccuracy at

As the Figure shows, even up to 10nm of geometric asymmetry, we are in the linear regime, where the inaccuracy simply scales. Importantly, as is seen in the Figure, this means that the number, location on the wavelength axis, and width, of the resonances does not change as a function of asymmetric process variations. Instead, the only change is that of a linear increase in the amplitude of the inaccuracy around the resonance and elsewhere. This is also seen by directly potting, in Figure 12, the pupil metric for the same landscape shown in Figure 11. Recalling that R signals the resonance, we see that the asymmetric variation did not change the location of the resonance.

Figure 12. The effect of asymmetric process variations on the landscape or R: the effect is very sub-leading showing that the resonance does not move from in the presence of the asymmetry.

3.2.2 Symmetric process variations
Symmetric process variations are those that do not break the symmetry of the target, but instead change the signal in a different way (for example these may change the function from equation (1)). Such variations include film thickness changes, CD changes, changes in the optical properties of the materials, and modifications to the grating heights and pattern. In the left panel of Figure 12 we show an example of how one type of such variations (film thickness variations) change the landscape of another alignment: they laterally shift it and dilate it across the wavelength axis.

Figure 13. The effect of symmetric process variations on the landscape: lateral shifts and dilation in the wavelengths axis.

Despite this lateral dilations, and similarly to the case of asymmetric process variations, the number of resonances and their relative distance does not change so long that the variations are reasonable (shrinking the thickness of the layers by an order or magnitude or removing them all together, will surely change the landscape qualitatively, but in this paper we assume to stay in the range of up to around ~10%-20% variations). In fact, focusing on the resonances in λ 440nm, 510nm, and 546nm, we see that they shift as per the right panel of Figure 12, which can be described by the following approximate relation for thickness variations

(3)
with δλ representing the shift in the wavelength position of a resonance as a result of the variation,δH, of the thickness of one of the films. Looking back to equation (2), we see that it is natural to expect the above relation because the OPD in equation (2) is linear in 1/λ.

3.3 Process robustness
The discussion in section 3.2, together with the fact that the inaccuracy typically peaks in the vicinity of resonances (where it has an S-shaped-like curve – see, for example, figure 11), and has a higher wavelength derivative there (see for example figure 9), means that to achieve process robustness, one needs to be as farther away as possible from resonances. The reason is two-fold:

• As shown in section 3.2.1, the inaccuracy responds linearly to asymmetric process variations. Therefore, and because the coefficient of that linearity peaks close to a resonance (the `bumps’ of the S-shaped curve around a resonance), setups around a resonance are not robust with respect to asymmetric process variations.
• As shown in section 3.2.2, the landscape shifts laterally and linearly with symmetric process variations. Therefore, and because the wavelength derivative coefficient of that linearity peaks close to a resonance, setups around a resonance are not robust with respect to symmetric process variations. In fact, even in the event that the target is symmetric and all asymmetric process variations are zero, one should avoid working in regimes close to resonances. This is because the signal is quickly varying as a function of wavelength in that regime, which means (see again equation 3) that the signal will strongly change as a function of the process. This means that non-accuracy related attributes like the overlay repeatability and \ or sensitivity to calibration (TIS3S) will be unpredictable there.

To demonstrate the importance of being away from resonances observe Figure 14 which shows how a process change of 10% in the height of a grating changes the landscape of an FEOL alignment. It is clear that choosing the measurement to be at λ=540nm before the process change will be not be good for after the process change as it will cause this setup be a resonance wavelength after the process change.

Figure 14. The effect of symmetric process variations on the landscape: lateral shifts and dilations in the wavelengths axis. The dashed black vertical line denotes a possible setup which is good for the nominal film stack and bad (in a resonance signaled by having R=1) for the same stack but with one of its film layers thickness varied by 8nm.

Finally, we note that the above classification of process variations allows one to correctly pick the pupil metrics used for setup optimization and control in the right way; there should be a reasonably small amount of metrics forming a complete set of the type of process variations one expects and that are independent and sensitive to a wide range of process variations. We discuss these metrics in some detail in section 4, but the goal will be to have metrics that whose behavior under symmetric and asymmetric process variations is different.

3.4 The inaccuracy in the flat regions
From the simulation analysis we have made, and from working with A500-LCM data, we see that flat regions are by and large accurate. A more quantitative statement can be made with reference to Figure 15 where we plot the simulated inaccuracy of flat regions as a function of their width. The example in the figure is for the FEOL logic segment. In that illustration we see that as the flat region is wider, and so contains setups that are further away from resonances, it is likely to be more accurate.

Figure 15. Examples for correlations between the width of flat regions and their inaccuracy.

In simulations we see that asymmetric process variations whose geometric asymmetry is on the one to few nanometer scale (for example a side-wall angle asymmetry of ~2deg whose corresponding geometric asymmetry is ~1nm), can induce inaccuracies of up to ~5nm close to resonances and that are often sub-nanometer is flat regions. Turning to data, we see that when compared to other metrologies like CDSEM done at AEI, scatterometry can have expansion differences at the 0.5nm-1nm at the wafer edge and up to 5nm in extreme cases.

Another way to examine the quantitative size of optical overlay inaccuracies is to search for all of the measurement conditions that belong to, or are close to, flat regions in the multitude of landscapes available for measurement, examine their overlay values and test for self-consistency, with the expectation that if all flat regions are accurate, they should all agree on the overlay value. This, however, turns out to not always be the case. The nonzero differences between the overlay values of two flat, or near flat, measurement conditions may provide the scale for the inaccuracy problem and we find it to be at the level of around 3nm if one considers the raw overlay values’ site-by-site 3 sigma across the wafer, or around 1nm if one restricts to modeled overlay values (equivalent to around 0.004ppm in the expansion term). In some cases, however, we have identified the existence of more difficult situations where the inaccuracies can induce a ~5nm difference between the modeled overlay values of the different conditions at the wafer edge (or a ~0.03ppm difference in the expansion terms). This indicates that inaccurate flat regions, which have the form of inaccurate plateaus, also exist in the landscape. While these regions will be process robust, they will be inaccurate and so require a special care. We find that pupil metrics are an important tool in distinguishing between accurate flat regions and inaccurate flat regions.

3.5. Landscape aspects in target optimization
As evident even from the semi-analytical model described in section 2.3, the choice of the scatterometry metrology target parameters (such as pitch, CD, etc.) can shift or in some cases even modify the landscape. By selecting the right combinations of target parameters and measurement setups (specifically, wavelength and polarization) it is possible to avoid the landscape resonances. In the target optimization process we aim to select such combinations which remain well within flat regions, even when reasonable symmetric process variations in the film structure and/or gratings are taken into account. Pupil metrics such as can be used to detect resonances and eliminate unfavorable combinations of target designs and setups. Furthermore, additional metrics that important to the target performance (such as the 1st order diffraction efficiency, sensitivity to overlay, etc.) can also be tuned. This is because beyond avoiding resonances with metrics like , the metrics mentioned above may not all be the same between and within different flat regions.

4. Landscape Pupil Metrics For Recipe Optimization And Control

In this section we introduce a subset of metrics, all derived from the pupil image, that characterize the landscape and the different spectral regions it contains. These metrics are available for accurate recipe optimization in setup as well as during production. In fact, because they capture the way the landscape changes with symmetric and asymmetric process variations, we use them to control the process at all levels: from a single measurement to multiple lots.

4.1 Metrics sensitive to symmetric process variations
These contain the `resonance locator’ R, the overlay sensitivity, the wafer reflected intensity, and the diffraction efficiency. Other metrics contain more advanced pupil analytics that can detect arcs that are further out from the setup at hand. As an example for how such metrics behave we show Figure 16 which plots the landscape of the sensitivity and of R and the way it changes with asymmetric and symmetric process variations in another alignment.

Figure 16. The landscape of Rand the way it changes with two different types of process variations. In the left panel: a symmetric process variation of the film thickness type – different colors correspond to +6nm, 0nm, and -6nm change in the thickness, and we see that the landscape is shifted linearly. In the right panel: an asymmetric process variation which leaves R unchanged.

It is clear that R responds only to the symmetric process variations. As a result, observing how changes as a function of wafer position can be used to estimate the wafer maps of certain type of variations with the overlay tool, and especially evaluate the robustness of the overlay setup with respect to these.

4.2 Metrics sensitive to asymmetric process variations
These contain metrics which are the output of a pupil analysis that probes the sensitivity of the reported overlay value to the pupil sampling used to calculate it. Because the pupil image is two dimensional, these metrics divide into two orthogonal sets. In Figure 17 we show one of these metrics and how it grows, from zero, as the asymmetric process variations from zero in a linear. The simulation is of the same FEOL alignment shown in figure 11.

Figure 17. Left panel: The landscape of one of the metrics that is sensitive to asymmetric process variations, and how it scales with the variation (chosen to be a side-wall angle asymmetry). Right panel: the way the metric is changing linearly with the variation at λ=520nm.

As is clear from Figure 17, the metric is directly and linearly responds to the asymmetry induced (which we parameterize, as we do for figure 11, as the geometric, nanometer, asymmetry of the target bar).

5. The Landscapes Of Image-Based Overlay And Field-Based Scatterometry

The focus on the paper in sections 2-4 was overlay metrology by pupil scatterometry. Let us now move to discuss other technologies including field based scatterometry, and imaging based overlay.

5.1 Pupil-scatterometry vs Field-scatterometry
The basic architectural difference between pupil scatterometry and field scatterometry is shown in Figure 18 where we sketch both technologies’ optical architecture. Especially, in pupil scatterometry (see the left panel of Figure 18), the detector is placed in the back focal pupil plane, and one collects two signals: one for each of the scatterometry cells (here we consider a single alignment direction). Then, per-pixel pupil asymmetry D for each of the two collected pupil images is used, together with equation (1), to extract the overlay per pupil pixel. All the pupil pixels’ overlay values are then combined algorithmically to a single number – the reported overlay. Especially, the algorithms one uses can detect the presence of arcs in the pupil, or their near-presence in close by setups, and treat them algorithmically.

Figure 18. Sketches of a pupil scatterometer (left) and a field scatterometer (right).

In contrast, in overlay by field scatterometry (right panel of Figure 18), one places the detector in a field plane beyond the pupil plane. Here, again, one collects the signal twice: each time changing the configuration of the illumination\collection path so that only a single diffraction order (either the +1st or -1st) arrives to the detector. Because the field scatterometer has a large spot, the field image for each order contains both scatterometry cells and separates them spatially in the detector. Unlike the pupil scatterometer, in the field scatterometer, the angular information is combined together by HW and it is effectively uniformly integrated over. This means the effect of arcs is different on the landscapes of field and pupil scatterometers.

As an example we show in Figure 19 the landscape of pupil-scatterometry versus field-scatterometry for three different alignments (left most and middle one – are one type of alignments, while the right most is another from the BEOL). The target design and other tool knobs (like polarization) are the same for both scatterometers. This figure demonstrates how different can the landscapes be if one algorithmically treats the arcs (blue curve – pupil scatterometry) or uniformly integrates them into the reported overlay by measuring in the field plane.

Figure 19. The landscape of the pupil scatterometry (blue) and field scatterometry (orange) for a MEOL alignment (left panel) and a BEOL alignment (right panel).

Another architectural change is the choice of the coherence level of the illumination, but this will primarily change the way each technology reacts to target size effects, and here we focus on a simpler scenario where the `only’ source of asymmetry is periodic and is not causing different pupil points to interfere.

An additional architectural difference between pupil-scatterometry and field-scattermometry is the typical larger spectral bandwidth of the later and one might wonder whether a finite bandwidth would smear resonances and cause them to disappear. As we show in Figure 20, however, this is false.

Figure 20. A comparison of bandwidth averaging in pupil scatterometry (left panel) and field scatterometry (right panel). The different bandwidths are represented by the black filled circles (1nm of bandwidth), solid red curve (a bandwidth of 10nm), and the dashed blue line (a bandwidth of 20nm). The results are all for a BEOL alignment (the one on the right panel of Figure 19).

Figure 20 shows that the bandwidth has a very minor effect on the landscape of both type of scatterometers. To understand this we recall that the reason for resonances is the arcs in the pupil and that these move with wavelength. Therefore, when one incoherently averages the pupil images of a set of wavelengths, one will not get rid of the arcs. Instead, they may contaminate the wavelengths that are arc free.

Bandwidth averaging will be useful only in regions that are very far from arcs because there, perhaps, one will be able to average out some of the signal contamination from the asymmetries. This cancellation effect seems to strongly depend on the alignment’s stack and especially, in figure 20 is it shown to be a minor effect.

5.2 Imaging-based-overlay vs scatterometry look-up table
It is interesting to see how similar scatterometry and image-based-overlay are from the perspective of the landscape. This similarity is direct result of both being metrologies of an offset-induced asymmetry, both translate this asymmetry into nanometers of overlay, and both suffer from asymmetric process variations that contaminate the signal by bar symmetries that can be misinterpreted to be the overlay induced asymmetry and cause inaccuracy. To see this observe Table 2 which is a look-up-table which `translates’ each attribute in scatterometry overlay to imaging-based overlay.

Table 2. A look-up-table for the analogy between imaging and scatterometry

The key to understand Table 2 is to observe that both the scatterometry signal and the imaging signal are the result of interference between different type of coherent diffraction. In scatterometry it is of the type described in equation (2) and hence the fact that the wavelength is a principal knob in the scatterometry landscape. In imaging the signal is interference between two diffraction orders and a knob that easily controls it is the focus. To demonstrate this analogy we show an example for how an imaging landscape looks like in Figure 20, where we plot the inaccuracy (in blue) and a metric which, as the pupil R does, indicates the presence of a resonance. Figure 21 shows good resemblance to the way the scatterometry landscape looks like.

Figure 21. An example for a landscape of an AIM target in IBO with a side wall angle asymmetry in the bottom layer of 2 degrees. The inaccuracy is in blue and the resonance detecting metric is in orange.

In imaging, the wavelength itself can also serve to move across the landscape and get oneself away from resonances. To see this we plot a two-dimensional imaging landscape in a different alignment in the wavelength-focus space in Figure 22. We also show how the landscape is shifted with symmetric process variations (film thickness variations in this example).

Figure 22. The two-dimensional landscape of an AIM target in imaging-based-overlay. The difference between the left and right panel are symmetric process variations. The text denotes regions that are less process robust and inaccurate – these are the image-based-overlay resonances or contrast reversals.

As the figure shows, one can go through a resonance by changing the focus and this will be accompanied by an increase in the image-based analog for the R metric. Also, the process variation between the two panels can be seen to lead to an overall shift in the landscape towards lower wavelengths as one goes from the left panel to the right one. This can be seen by tracking the resonant regions we marked by the text boxes. Despite these shifts, however, and as in scatterometry, the landscape keeps its invariants – the number, and even the shape, of the resonance ridges and the flat regions in between them, did not change.

6. Summary

In this paper we surveyed the physics by which process variations of different type determine the accuracy and process robustness of overlay metrology. We primarily focused on pupil scatterometry and showed that the way the inaccuracy behaves as a function of wavelength has the form of a certain spectral signature which is unique to each alignment scheme, and changes in a very particular manner with process variations; it laterally shifts and dilates under `symmetric’ variations like film thickness and scales its amplitude under asymmetric ones such as asymmetric side-wall angles. This signature, which we call the `landscape’ can be approximately divided into two different types of spectral regions: resonance-like and, in between them, flat regions. For better accuracy and process robustness one needs to optimize the measurement conditions to be in the flat regions and, in pupil scatterometry, we accomplish this by using pupil-derived quantities that sense the resonances and flag them. These quantities are also available in run time and allow one to control the overlay recipe.

Because the resonant regions have their origin in certain pupil patterns (referred to as arcs on account of their typical two-dimensional shape), and because pupil scatterometry and field scatterometry treat the pupil information differently (the former accesses it algorithmically and the latter integrates over it by HW), then the landscape perspective on overlay accuracy, and the way inaccuracies behave close to a resonance, also explain the differences between the performance of these two technologies.

Finally, we also see that simple analogy arguments between scatterometry and imaging, and the way their signals are formed, lead one to expect resonance-like behaviors in imaging as well, and to write down simple forms for the metrics that would identify them. Testing this analogy in simulations, we see that a simple imaging landscape, where the focus plays the role of the wavelength, has a very similar behavior to that of the scatterometry landscape. Taking one additional step, we also see that when we allow the wavelength to change in imaging, it opens the door to very wide process windows and to the ability of avoiding resonances by changing both the focus and the wavelength.

This behavior in imaging, together with the algorithmic access one has to the angular information in pupil scatterometry, provides a wide process window for accurate control over the overlay in advanced semi-conductor manufacture.