Abstract

Time-Division Duplexing (TDD) allows to estimate the downlink channels for an arbitrarily large number of base station antennas from a finite number of orthogonal pilot signals in the uplink, by exploiting channel reciprocity. Therefore, while the number of users per cell served in any time-frequency channel coherence block is necessarily limited by the number of pilot sequence dimensions available, the number of base station antennas can be made as large as desired. Based on this observation, a recently proposed very simple “Massive MIMO” scheme was shown to achieve unprecedented spectral efficiency in realistic conditions of user spatial distribution, distance-dependent pathloss and channel coherence time and bandwidth.

The main focus and contribution of this paper is a novel network-MIMO TDD architecture that achieves spectral efficiencies comparable with “Massive MIMO”, with one order of magnitude fewer antennas per active user per cell. The proposed architecture is based on a family of network-MIMO schemes defined by small clusters of cooperating base stations, zero-forcing multiuser MIMO precoding with suitable inter-cluster interference constraints, uplink pilot signals reuse across cells, and frequency reuse. The key idea consists of partitioning the users population into geographically determined “bins”, such that all users in the same bin are statistically equivalent, and use the optimal network-MIMO architecture in the family for each bin. A scheduler takes care of serving the different bins on the time-frequency slots, in order to maximize a desired network utility function that captures some desired notion of fairness. This results in a mixed-mode network-MIMO architecture, where different schemes, each of which is optimized for the served user bin, are multiplexed in time-frequency.

In order to carry out the performance analysis and the optimization of the proposed architecture in a clean and computationally efficient way, we consider the large-system regime where the number of users, the number of antennas, and the channel coherence block length go to infinity with fixed ratios. The performance predicted by the large-system asymptotic analysis matches very well the finite-dimensional simulations. Overall, the system spectral efficiency obtained by the proposed architecture is similar to that achieved by “Massive MIMO”, with a 10-fold reduction in the number of antennas at the base stations (roughly, from 500 to 50 antennas).

MIMO (MU-MIMO) technology is being intensively studied for the next generation wireless cellular systems (e.g., LTE-Advanced [1]). Schemes where antennas of different Base Stations (BSs) are jointly processed by centralized BS controllers are usually referred to as “network-MIMO” architectures (e.g., [2]). It is well-known that the improvement obtained from transmit antenna joint processing is limited by a “dimensionality” bottleneck [9]. In particular, the high-SNR capacity of a single-user MIMO system with Nt transmit antennas, Nr receiving antennas, and fading coherence block length T complex dimensions,1 scales as C(\SNR)=min{Nt,Nr,T/2}log\SNR+O(1)[13]. Therefore, even by pooling all base stations into a single distributed macro-transmitter with Nt≫1 antennas and all user terminals into a single distributed macro-receiver with Nr≫1 antennas, the system degrees of freedom2 are eventually limited by the fading coherence block length T. While this dimensionality bottleneck is an inherent fact, emerging from the high-SNR behavior of the capacity of MIMO block-fading channels, [14] (see also [15]) the same behavior also characterizes the capacity scaling of MU-MIMO systems based on explicit training for channel estimation, and can be interpreted as the effect of the “overhead” incurred by pilot signals [16].

For frequency-division duplex (FDD) systems, the training overhead required to collect channel state information at the transmitters (CSIT) grows linearly with the number of cooperating transmit antennas. Such overhead restricts the MU-MIMO benefits that can be harvested with a large number of transmit antennas, as shown in [10] using system-level simulation and in [11] using closed-form analysis based on the limiting distribution of certain large random matrices.

For Time Division Duplexing (TDD) systems, exploiting channel reciprocity[17], the CSIT can be obtained from the uplink training. In this case, the pilot signal overhead scales linearly with the number of active users per cell, but it is independent of the number of cooperating antennas at the BSs. As a result, for a fixed number of users scheduled for transmission, the TDD system performance can be significantly improved by increasing the number of BS antennas.

Following this idea, Marzetta [18] has shown that simple Linear Single-User BeamForming (LSUBF) and random user scheduling, without any inter-cell cooperation, yields unprecedented spectral efficiency in TDD cellular systems, provided that a sufficiently large number of transmit antennas per active user are employed at each BS. This scheme, nicknamed hereafter “Massive MIMO”, was analyzed in the limit of infinite number of BS antennas per user per cell. In this regime, the effects of Gaussian noise and uncorrelated inter-cell interference disappear, and that the only remaining impairment is the inter-cell interference due to pilot contamination[19], i.e., to the correlated interference from other cells due to users re-using the same pilot signal (see Section 3).

In this work, we also focus on TDD systems and exploit reciprocity. The main contribution of this paper is a novel network-MIMO architecture that achieves spectral efficiencies comparable with “Massive MIMO”, with a more practical number of BS antennas per active user (one order of magnitude less antennas for approximately the same spectral efficiency). As in [18], we also analyze the proposed system in the limit of a large number of antennas. However, a different system scaling is considered, where the number of antennas per active user per cell is finite. This is obtained by letting the number of users per cell, the number of antennas per BS, and the channel coherence block length go to infinity, with fixed ratios [11]. We find that in this regime the LSUBF scheme advocated in [18] performs very poorly. In contrast, we consider a family of network-MIMO schemes based on small clusters of cooperating base stations, Linear Zero-Forcing BeamForming (LZFBF) with suitable inter-cluster interference constraints, uplink pilot signals reuse across cells, and frequency reuse. The key idea consists of partitioning the users population into geographically determined “bins”, containing statistically equivalent users, and optimizing the network-MIMO scheme for each individual bin. Then, users in different bins are scheduled over the time-frequency slots, in order to maximize an appropriately chosen network utility function reflecting some desired notion of “fairness”. The geographic nature of the proposed scheme yields very simple system operations, where each time a given bin is scheduled, a subset of active users in the selected bin is chosen at random or in a deterministic round robin fashion, without performing any CSIT-based user selection. This allows a fast turn-around between feedback and transmission, that can take place in the same channel coherence block. The resulting architecture is a mixed-mode network-MIMO, where different schemes, each of which is optimized for the served user bin, are multiplexed in time-frequency.

Using results and tools from the large-system analysis developed in [11] and adapted to the present scenario, we obtain the asymptotic achievable rate for each scheme in closed form. The performance predicted by the large-system analysis match very well with finite-dimensional simulations, in agreement with [11] and with several well-known works on single-user MIMO in the large antenna regime [22]. The large-system analysis developed here is instrumental to the systematic design and optimization of the proposed system architecture, since it allows an accurate and rapid selection of the best network-MIMO scheme for each user bin without resorting to cumbersome and time-consuming Monte Carlo simulation. In fact, the system parameters in the considered family of network-MIMO schemes are strongly mutually dependent, and the system optimization without the analytical tools developed here would just be infeasible.

We hasten to say that the ideas of dynamic clustering of cooperating BSs, and multimodal MU-MIMO downlink have appeared in a large number of previous works (see for example [24]. Giving a fair account of this vast literature would be impossible within the space limits of this paper. Nevertheless, we wish to stress here that the novel contribution of this paper is a systematic approach to multi-modal system optimization based on simple closed-form expressions of the spectral efficiency of each network-MIMO scheme in the family, and on scheduling across the schemes (or “modes”) in order to maximize a desired network utility function.

The remainder of this paper is organized as follows. In Section 2, we describe the family of proposed network-MIMO schemes. We discuss the uplink training, MMSE channel estimation and pilot contamination effect for TDD-based systems in Section 3. In Section 4, we analyze the network-MIMO architectures under considerations and and provide expressions for their achievable rate in the large-system limit. Scheduling under specific fairness criteria and the corresponding system spectral efficiency is presented in Section 5. Numerical results including comparison with finite dimensional simulation results are presented in Section 6 and concluding remarks are given in Section 7.

The TDD cellular architecture for high-data rate downlink proposed in this work is based on the following elements:

A family of network-MIMO schemes, defined in terms of the size and shape of clusters of cooperating BSs, pilot reuse across clusters, frequency reuse factor, and downlink linear precoding scheme;

A partitioning of the user population into bins, according to their position in the cellular coverage area;

The determination of the optimal network-MIMO scheme in the family for each user bin, creating an association between user bins and network-MIMO schemes;

Scheduling of the user bins in time-frequency in order to maximize a suitable concave and componentwise non-decreasing network utility function of the ergodic user rates. The network utility function is chosen in order to reflect some desired notion of fairness (e.g., proportional fairness [29]). When a given bin is scheduled, the associated optimized network-MIMO architecture is used.

Invoking well-known convergence results [22], we use the “large-system analysis” approach for multi-antenna cellular systems pioneered in [33]. In particular, we use the results of [11], which can be easily applied to our system model, and analyze the performance of the network-MIMO schemes in the considered family while scaling the number of users in each bin, the number of antennas per BS and the small-scale fading coherence block length to infinity, with fixed ratios. We define a system size parameter indicated by N, and let all the above quantities scale linearly with N→∞. Specifically, we let MN denote the number of BS antennas, LN denote the channel coherence block length, and UN denote the number of users per location (a bin is defined as a set of discrete locations in the cellular coverage, see Section 2.1), for given constants M,L and U.

Base stations, cells and clusters: The system geometry is concisely described by using lattices on the real line \RR (for 1-dimensional layouts [8]) or on the real plane \RR2 (for 2-dimensional layouts [18]). Consider nested lattices Λ⊆Λbs⊆Λu in \RR (resp., \RR2). The system coverage region is given by the Voronoi cell \Vc of Λ centered at the origin.3 BSs are located at points b∈Λbs∩\Vc. The finer lattice Λu defines a grid of discrete user locations, as explained later in this section. We let B=|Λbs∩\Vc| denote the number of BSs in the system.

For the sake of symmetry, in order to avoid “border effects” at the edges of the coverage region, all distances and all spatial coordinates are defined modulo Λ. The modulo Λ distance between two points u,v in \RR (resp., \RR2) is defined as

dΛ(u,v)=|u−vmodΛ|,(1)

where xmodΛ=x−argminλ∈Λ|x−λ|. Cell \Vcb is defined as the Voronoi region of BS b∈Λbs∩\Vc with respect to the modulo-Λ distance, i.e., \Vcb={x∈\RR:dΛ(x,b)≤dΛ(x,b′),∀b′∈Λbs∩\Vc} (replace \RR with \RR2 for the 2-dimensional case). The collection of cells {\Vcb} forms a partition of \Vc into congruent regions.

A “clustering pattern” u(\Cc), defined by the set of BS locations \Cc={b0,…,bC−1} with bj∈Λbs∩\Vc and rooted at b0=0, is the collection of BS location sets (referred to as “clusters” in the following)

u(\Cc)={{\Cc+c}:c∈Λbs∩\Vc}.(2)

We focus on systems based on single-cell processing (C=1), or with joint processing over clusters of small size: C=2 in the 1-dimensional case, and size C=3 in the 2-dimensional case, as shown in Figure 2. It turns out that larger clusters do not achieve better performance due to the large training overhead incurred, while requiring higher complexity. Therefore, our results are not restrictive in terms of cluster size, since they capture the best system parameters configurations.

User location bins: We assume a uniform user spatial distribution over the coverage region. For the sake of analytical simplicity, we discretize the user distribution into a regular grid of user locations, corresponding to the points of the lattice translate ˜Λu=Λu+u0, where u0≠0 is chosen such that ˜Λu is symmetric with respect to the origin and no points of ˜Λu fall on the cell boundaries.

A “user bin” v(\Xc), defined by the set of user locations \Xc={x0,x1,…,xm−1} with xi∈˜Λu, is the collection of user location sets (indicated by “groups” in the following)

v(\Xc)={{\Xc+c}:c∈Λbs∩\Vc}.(3)

In particular, we choose \Xc to be a symmetric set of points with respect to the positions of the BSs comprising cluster \Cc. The reason for this symmetry is two-fold: on one hand, a symmetric set generalizes the single location case and yet provides a set of statistically equivalent users (same set of distances from all BSs in the cluster), thus providing a richer system optimization parameter space. On the other hand, symmetry yields very simple closed-form expressions in the large-system analysis, by means of [11].

Cluster/group association and user group rate: The BSs forming a cluster are jointly coordinated by a “cluster controller” that collects all relevant channel state information and computes the beamforming coefficients for the desired MU-MIMO precoding scheme. For given sets {\Xc,\Cc}, the users in group \Xc+c are served by the cluster \Cc+c, for all c∈Λbs∩\Vc (see Figure 2). By construction, each BS belongs to C clusters and transmits signals from all the C corresponding cluster controllers. These signals may share the same frequency band, or be defined on orthogonal subbands, depending on the system frequency reuse factor defined later in this section. There are mUN users in each group \Xc+c, and CMN jointly coordinated antennas in each cluster \Cc+c. We assume mU≥CM, such that the downlink DoFs are always limited by the number of antennas.4 The number of users effectively scheduled and served on each given slot is denoted by SN. We refer to these users as the “active users”, and to the coefficient S∈[0,CM] as the “loading factor”. Depending on the geometry of \Xc and \Cc and on the type of beamforming used (see Section 4) S can be optimized for each pair {\Xc,\Cc}. We restrict to consider schemes that serve an equal number SN/m of active users per location x∈\Xc+c. As anticipated before, by symmetry the users in the same bin are statistically equivalent. Therefore, without loss of generality, we may assume that a round-robin scheduling picks all subsets of size SN out of the whole mUN users in each group with the same fraction of time. In this way, the aggregate spectral efficiency of the group (indicated in the following a “group spectral efficiency”) is shared evenly among all the users in the group.

Frequency reuse: The frequency reuse factor of the scheme is denoted by F. This can also be optimized for each given pair {\Xc,\Cc}. The system bandwidth is partitioned into F subbands of equal width. For F=1, all clusters in u(\Cc) transmit on the whole system bandwidth. For F>1, clusters are assigned different subbands according to a regular reuse pattern. For the 1-dimensional layout, any integer F dividing B is possible. For the 2-dimensional layout, we consider reuse factors given by F=i2+ij+j2 for non-negative integer i and j[38]. For later use, we define \Dc(f) as the set of clusters active on subband f∈{0,…,F−1}.

The average received signal power for a user located at x∈\Vc from a BS antenna located at b∈\Vc is denoted by g(x,b), a polynomially decreasing function of the distance dΛ(x,b). The AWGN noise power spectral density is normalized to 1. For fixed For a given clustering pattern u(\Cc) and user bin v(\Xc), the fading channel coefficients from the CMN antennas of BS cluster \Cc+c to an active user k∈{1,…,S/m} at location x+c′:x∈\Xc, on frequency subband f, form a random vector indicated by \hv––––k,c′,c(f;x)∈\CCCMN×1, with circularly-symmetric complex Gaussian entries, i.i.d. across the BS antennas, the subbands and the users (independent small-scale Rayleigh fading). In the considered network-MIMO schemes, active users are served with equal transmit power equal to 1/S. Hence, the total transmit power per cluster is equal to N. Since each BS simultaneously participates in C clusters, also the total transmit power per BS is equal to N. Since we consider the limit for N→∞, the channel coefficients are normalized to have variance 1/N, such that the received signal power is independent of N. This provides the correct scaling of the elements of the random channel matrices in order to obtain the large-system limit results. We let the channel vector covariance matrix be given by \EE[\hv––––k,c′,c(f;x)\hv––––\hermk,c′,c(f;x)]=1N\Gmc′,c(x), where \Gmc′,c(x)=\diag(g(x+c′,b+c)\IdMN:b∈\Cc).5 Notice that \Gmc′,c(x) is independent of the user index k and on the subband index f, since the channels are identically distributed across subbands and co-located users.

Under the standard block-fading assumption [12], the channel vectors are constant on each subband for blocks of length LN signal dimensions. Without loss of generality, we assume that these coherence blocks also correspond to the scheduling slot. Each slot is partitioned into an uplink training phase, of length LPN and a downlink data phase, of length LDN. In this section we deal with the data phase, while the training phase is addressed in Section 3. For the sake of notation simplicity, the slot “time” index is omitted: since we care about ergodic (average) rates, only the per-block marginal channel statistics matter. The data-bearing signal transmitted by cluster \Cc+c on subband f is denoted by

\Xmc(f)=\Umc(f)\Vm\hermc(f)(4)

where the matrix \Umc(f)∈\CCLDN×SN contains the codeword (information-bearing) symbols arranged by columns. We assume that users’ codebooks are drawn from an i.i.d. Gaussian random coding ensemble with symbols ∼\Cc\Nc(0,1/S). Achievable rates shall be obtained via the familiar random coding argument[39] with respect to this input distribution. The matrix \Vmc(f)∈\CCCMN×SN contains the beamforming vectors arranged by columns, normalized to have unit norm. It is immediate to verify that, indeed, the average transmit power of any cluster \Cc+c, active on frequency f, is given by 1LDN\trace(\EE[\Xm\hermc(f)\Xmc(f)])=N, as desired.

Recalling the definition of \Dc(f), the received signal for user k at location x+c:x∈\Xc is given by

where \zvk,c(f;x)∼\Cc\Nc(\zerov,1F\IdLDN). Notice that a scheme using frequency reuse F>1 transmits with total cluster power N over a fraction 1/F of the whole system bandwidth. This is taken into account by letting the noise variance per component be equal to 1/F, in the signal model (Equation 5).

By construction, the encoded data symbols for user k at location x+c:x∈\Xc, are the entries of the k-th column of \Umc(f). The columns k′≠k of \Umc(f) form the intra-cluster (multiuser) interference for user k. All other signals \Umc′(f), with c′∈\Dc(f),c′≠c, form the Inter-Cluster Interference (ICI). As seen in Section 4, intra-cluster interference and ICI are handled by a combination of beamforming and frequency reuse.

The CSIT is obtained on a per-slot basis, by letting all the scheduled (i.e., active) users in the slot sent pilot signals over the LPN dimensions dedicated to uplink training.6 We fix {\Xc,\Cc} and focus on the SN active users in the groups \Xc+c:c∈\Dc(f). These users must send SN orthogonal pilot signals to allow channel estimation at their corresponding serving clusters \Cc+c:c∈\Dc(f).

Let LP=QS, where Q≥1 is an integer pilot reuse factor that can be optimized for each {\Xc,\Cc}. Let \Phim∈\CCQSN×QSN be a scaled unitary matrix, such that \Phim\herm\Phim=αulQSN\IdQSN, where αul denotes the uplink transmit power per user during the training phase. The columns of \Phim are partitioned into Q disjoint blocks of size SN columns each, denoted by \Phim0,…,\PhimQ−1 and referred to as training codebooks. These are assigned to the groups in a periodic fashion, such that the same training codebook \Phimq is reused every Q-th groups \Xc+c:c∈\Dc(f). For later use, we let q(c)∈{0,…,Q−1} denote the index of the training codebook allocated to group \Xc+c, and define \Pc(q,f)={c∈\Dc(f):q(c)=q} as the set of clusters active on subband f and using training codebook q. Pilot reuse is akin frequency reuse, but in general Q and F may be different in order to allow for additional flexibility in the system optimization.

The uplink signal received by the CMN antennas of cluster \Cc+c:c∈\Dc(f), during the training phase, is given by

\Ymc(f)=∑c′∈\Dc(f)\Phimq(c′)\Hm––––\hermc′,c(f;\Xc)+\Zmc(f).(6)

Because of TDD reciprocity, the uplink channel matrix \Hm––––c′,c(f;\Xc)∈\CCCMN×SN contains the downlink channels \hv––––k,c′,c(f;x) arranged by columns, for all active users k=1,…,SN/m at all locations x+c′:x∈\Xc. In (Equation 6), \Zmc(f)∈\CCLP×CMN denotes the uplink AWGN with components ∼\Cc\Nc(0,1). The goal of the uplink training phase is to provide to each cluster \Cc+c an estimate of the channel vectors \hv––––k,c,c(f;x) for all the active users in the corresponding served group \Xc+c.

By projecting \Ymc(f) onto the column of \Phimq(c) associated to user k at location x+c:x∈\Xc and dividing by αulQSN, the relevant observation for estimating the \hv––––k,c,c(f;x) is given by

\rv––––k,c(f;x)=∑c′∈\Pc(q(c),f)\hv––––k,c′,c(f;x)+\nv––––k,c(f)(7)

where \nv––––k,c(f)∼\Cc\Nc(\zerov,(αulQSN)−1\IdCMN). For any c′∈\Pc(q(c),f), the MMSE estimate of \hv––––k,c′,c(f;x) from \rv––––k,c(f;x) is obtained as

where the channel estimate ˆ\hv––––k,c′,c(f;x) and the error vector \ev––––k,c′,c(f;x) are zero-mean uncorrelated jointly complex circularly symmetric Gaussian vectors (and therefore statistically independent due to joint Gaussianity). After some straightforward algebra (omitted for brevity), we obtain the covariance matrices \EE[ˆ\hv––––k,c′,c(f;x)ˆ\hv––––\hermk,c′,c(f;x)]=1N\Ximc′,c(x) and \EE[\ev––––k,c′,c(f;x)\ev––––\hermk,c′,c(f;x)]=1N\Sigmamc′,c(x), where \Ximc′,c(x)=\diag(ξc′,c,b(f;x)\IdMN:b∈\Cc) and \Sigmamc′,c(x)=\diag(σc′,c,b(f;x)\IdMN:b∈\Cc), and where we define

Multiple \label

with

γc′,c,b(f;x)=g(x+c′,b+c)(αulQS)−1+∑c′′∈\Pc(q(c),f)∖c′g(x+c′′,b+c)(11)

The desired channel estimate at cluster \Cc+c is given by ˆ\hv––––k,c,c(f;x), obtained by letting c′=c in (Equation 8) – (Equation 11). Notice that the training phase observation \rv––––k,c(f;x) in (Equation 7) contains the superposition of all the channel vectors \hv––––k,c′,c(f;x) of the users k at location x+c′:x∈\Xc, for all c′∈\Pc(q(c),f), i.e., sharing the same pilot signal. This is the so-called pilot contamination effect, which is a major limiting factor in the performance of TDD systems [19]. Because of pilot contamination, the MMSE estimate ˆ\hv––––k,c,c(f;x) is correlated with the channels \hv––––k,c′,c(f;x), for all c′∈\Pc(q(c),f).

Next, we express the channel vector \hv––––k,c′,c(f;x) for c′∈\Pc(q(c),f) in terms of the estimate ˆ\hv––––k,c,c(f;x) and a component independent of ˆ\hv––––k,c,c(f;x). This decomposition is useful to proof the main results of Theorems 1, 2 and 3 in Section 4.2 and it is the key to understand qualitatively the pilot contamination effect. From (Equation 8), and since \Sigmamc,c(x) is invertible, we have

Joint Gaussianity, the mutual orthogonality of ˆ\hv––––k,c′,c(f;x) and \ev––––k,c′,c(f;x) and the fact that all covariance matrices are diagonal imply that ˆ\hv––––k,c,c(f;x) and \ev––––k,c′,c(f;x) are mutually independent.

As anticipated before, (Equation 13) reveals qualitatively the pilot contamination effect. With LSUBF, as in [18], cluster \Cc+c serves user k at location x+c with beamforming vector ˆ\hv––––k,c,c(f;x)/∥ˆ\hv––––k,c,c(f;x)∥, which is strongly correlated with the channel vector \hv––––k,c′,c(f;x) towards the unintended user k at location x+c′, sharing the same pilot signal. It follows that some constant amount of interfering power, that does not vanish with N→∞, is sent in the “spatial direction” of this user, leading to an interference limited system, as exactly quantified by Theorem 1 in Section 4.2. For the family of LZFBF schemes considered in this work, the pilot contamination effect is less intuitive, and it is precisely quantified by Theorems 2 and 3 in Section 4.2.

In the family of network-MIMO schemes considered in this work, the beamforming matrix \Vmc(f) is calculated as a function of the estimated channel matrix ˆ\Hm––––c,c(f;\Xc). The schemes differ by the type of beamforming employed. In particular, we consider LZFBF where any active user k at location x+c:x∈\Xc, imposes ZF constraints on J≥0 clusters. A ZF constraint consists of the set of linear equations

\vv\hermj,c′(f;x′)ˆ\hv––––k,c,c′(f;x)=0,∀(j,x′,c′)≠(k,x,c)(14)

where \vvj,c′(f;x′) denotes the column of \Vmc′(f) corresponding to user j at location x′+c′:x′∈\Xc.

Next we provide expressions for the cluster precoders for different choice of the parameter J.

Case J=0: In this case no ZF constraints are imposed. Hence, we have

\Vmc(f)=UNorm{ˆ\Hmc,c(f;\Xc)}(15)

where the operation UNorm{⋅} indicates a scaling of the columns of the matrix argument such that they have unit norm. It is immediate to see that (Equation 15) coincides with the Linear Single-User Beamforming (LSUBF) considered in [18].

Case J=1: In this case any active user imposes ZF constraints on its own serving cluster. This yields the classical single-cluster LZFBF, for which

\Vmc(f)=UNorm{ˆ\Hm+c,c(f;\Xc)},(16)

where

\Mm+=\Mm[\Mm\herm\Mm]−1(17)

denotes the Moore-Penrose pseudo-inverse of the full column-rank matrix \Mm. It follows that \vvk,c(f;x) is orthogonal to the estimated channels ˆ\hv––––j,c,c(f;x′) for all other active users (j,x′)≠(k,x) in the same cluster \Cc+c, i.e., ZF is used to tackle intra-cluster interference, but nothing is done with respect to ICI.

Case J>1: In this case, beyond the ZF constraints imposed to the serving cluster, each user imposes additional ZF constraints to J−1 neighboring clusters in order to mitigate the ICI. Mitigating ICI through the beamforming design provides an alternative approach to frequency reuse and, in general, might be used jointly with frequency reuse. Let’s focus on cluster \Cc+c. This is subject to ZF constraints imposed by its own users (i.e., users in group \Xc+c), as well as by some users at some locations x′+c′:x′∈\Xc for J−1 neighboring clusters c′≠c. In order to enable such constraints, the c-th cluster controller must be able to estimate the channels of these out-of-cluster users. This can be done if these users employ training codebooks with indices q≠q(c). In particular, J>1 can be used only if the pilot reuse factor Q is larger than 1. In some cases, only the channel subvectors to the nearest BS in the cluster can be effectively estimated, since there are other users sharing the same pilot signal that are received with a stronger path coefficient. Then, the channel subvectors that cannot be estimated are treated as zero. Since these schemes are complicated to explain in full generality, we shall illustrate two specific examples, the generalization of which is cumbersome but conceptually straightforward, and can be worked out by the reader if interested in other specific cases.

Letting R(N)k,c(f;x) denote the spectral efficiency (in bit/s/Hz) of user k at location x+c:x∈\Xc, served by cluster c according to a scheme as defined above, we define the group spectral efficiency of bin v(\Xc) as

As a corollary of Theorem ?, we can recover the result of [18]. It is sufficient to let M→∞ in ( ?) and obtain the regime of infinite number of BS antennas per active user. Particularizing this for fixed S, C=1, and Q=1 as in [18], the group spectral efficiency becomes

limM→∞R\Xc,{0}(F,1,0)=SmF∑x∈\Xclog(1+g(x,0)2∑c∈\Pc(0,0)∖0g(x,c)2)(19)

As observed in [18], in this regime the system spectral efficiency is uniquely limited by the ICI due to pilot contamination.

The next result yields the achievable group spectral efficiency of LZFBF in the case of single-cell processing (i.e., for C=1). We define \Ec(x) as the set of J−1 clusters c≠0 with centers closest to x∈\Xc (if J=1 then \Ec(x)=∅). Then, we have:

In passing, we notice that the limit of ( ?) for M→∞, coincides with (Equation 19). Therefore, as observed in [18], in the “Massive MIMO” regime LZFBF yields no advantage over the simpler LSUBF.

The case of LZFBF with multicell processing (C>1) needs some more notation. First, as illustrated in Examples ? and ?, we consider the cases J=1, J=Q and J=C(Q−1)+1, referred to as cases (a), (b) and (c), respectively, for the sake of brevity. In case (c) it is useful to define b(x,c)=argmin{dΛ(x,c+b):b∈\Cc}, i.e., the closest BS to location x∈\Xc in cluster c∈\Ec(x). For C>1, an exact asymptotic ICI power expression cannot be found due to the complicated statistical dependence of beamforming vectors and channel vectors due to pilot contamination. However, the following result yields an achievable rate based on an upper bound on the ICI power (see details in Appendix Section 9):

Consider a system with K bins, {v(\Xc0),…,v(\XcK−1), defined by sets \Xck of symmetric locations chosen to uniformly discretize the cellular coverage region \Vc. The net bin spectral efficiency in bit/s/Hz, for each bin v(\Xck), is obtained by maximizing over all possible schemes in the family, i.e., over all possible clusters \Cc of size C=1,2,…, frequency reuse factor F, loading factor S, pilot reuse factor Q, and beamforming scheme indicated by J, the product

max{1−QS/L,0}×R\Xck,\Cc(F,C,J)(20)

where the first term takes into account the pilot dimensionality overhead, and the second term is the spectral efficiency of the data phase for a given network-MIMO scheme, given by Theorems ?, ? or ?, depending on the case. The maximization of (Equation 20) is subject to the constraint JS≤CM, which becomes relevant for J>0 (i.e., for LZFBF precoding). Maximizing (Equation 20) requires searching over a discrete parameter space (apart from S, which is continuous). The simple closed-form expressions given in Theorems ?, ? and ? allow for an efficient system optimization, avoiding lengthy Monte Carlo simulations.

Suppose that for each bin v(\Xck), the best scheme in the family of network-MIMO schemes is found, and let R⋆(\Xck) denote the corresponding maximum of (Equation 20). Then, a scheduler allocates the different bins on the time-frequency slots in order to maximize some desired network utility function of the user rates. With randomized or round-robin selection of the active users in each bin, each user in bin v(\Xck) shares on average an equal fraction of the product ρkR⋆(\Xck), where ρk is the fraction of time-frequency slots allocated to bin v(\Xck). Under the assumption that users in the same bin should be treated with equal priority, we can focus on the maximization of a componentwise non-decreasing concave network utility function of the bin spectral efficiencies, denoted by \Gc(R0,…,RK−1). The scheduler determines the fractions {ρk} by solving the following convex problem:

resulting in the bin time-frequency sharing fractions ρk=1/K (each bin is given an equal amount of slots). In contrast, if the minimum user rate is relevant, we can impose max-min fairness by considering the function

\Gc(R0,…,RK−1)=mink=0,…,K−1Rk.(23)

This results in the bin time-frequency sharing fractions ρk=1R⋆(\Xck)∑K−1j=01R⋆(\Xcj). More in general, a whole family of scheduling rules including (Equation 22) and (Equation 23) as special cases is obtained by using the so-called α-fairness network utility function, as defined in [29].

In this section, we present some illustrative numerical results showing the following facts: 1) the asymptotic large-system analysis yields a very accurate approximation of the performance (obtained by monte Carlo simulation) of actual finite-dimensional systems; 2) the proposed architecture based on partitioning the users’ population in homogeneous bins and serving each bin with specifically tailored network-MIMO scheme provides significant gains with respect to the “Massive MIMO” scheme of [18], in the relevant regime of a finite number of BS antennas per active user.

At this point, it is worthwhile to make a comment on the convergence of finite-dimensional systems to the large-system limit as N→∞. The approach of analyzing multiuser communication systems affected by random parameters (such as random channel matrices or random spreading matrices in CDMA) in the limit of large dimension in order to exploit the rich, powerful and elegant theory of limiting distributions of large random matrices [20] was pioneered in [40] in the case of random-spreading CDMA, and successfully applied to single-user MIMO channels (see for example [42]) and to network-MIMO cellular systems [33]. It was observed experimentally and proved mathematically (e.g., see [22]) that the convergence of the actual finite-dimensional system spectral efficiency to the corresponding large-system limit is very fast, as the system dimension N increases. In particular, well-known techniques can be used to analyze the “fluctuation” of the quantities of interest around their large-system limit for large but finite N. Typically, finite-N “concentration” results are analogous to the Central Limit Theorem for i.i.d. random variables, but the convergence is much faster owing to the fact that the eigenvalues of the matrices appearing in the spectral efficiency expressions are strongly correlated (see for example the discussion of the results in [23]). Since this convergence analysis is standard but cumbersome, and invariably points out that the large-system results are very good predictions of the actual performance in cases of practical interest, here we focused only on the limit for N→∞ and provided a comparison with finite-dimensional simulation in order to corroborate our claims.

Figure 5 shows the group spectral efficiency in (Equation 20) as a function of the bin locations within a cell for different schemes identified by the parameters (F,C,J) and Q. The group spectral efficiency is obtained by Monte Carlo simulation (dotted) and and is compared against the corresponding values from the closed-form large-system analysis (solid), for the 1-dimensional cell layout of Figure 2 with B=24 BSs, M=30 antenna factor per BS, L=40 coherence block dimension factor, and K=10 bins in each cluster, where clusters and location bins are given in Example ? and ?, with x uniformly distributed in [0,1/2]. The pathloss model is the same as in [28], where g(x,b)=G0/(1+(dΛ(x,b)/δ)α), with G0=106, α=3.76, and δ=0.05, and reflects (after suitable normalizations) a typical cellular scenario with 1km diameter cells in a sub-urban environment. The (1,1,1) scheme with Q=1 yields the best performance for locations near the cell center. However, at the cell edges, C=2, J=2, or F=2 (not included in the figure) attains significantly better performance. As anticipated above, the limit for N→∞ matches very accurately with the Monte Carlo simuation even for very small N (we used the minimum possible N=1 in this case). For this reason, in the following we present only the results for the large-system limit, obtained using the closed-form expressions of Theorems ?, ? and ?.

In the 2-dimensional case, we considered the layout with B=19 hexagonal cells as shown in Figure 1. For comparison, we assume the same system model as in [18], with channel coherence block dimension, the cell radius, and pathloss model given by LP=84, 1.6 km, and g(x,b) in the same form as before, with parameters G0=106, δ=0.1 km, and α=3.8, respectively. Log-normal shadowing, considered in [18], is not considered here (see the comment in Section 7). We considered schemes with cluster size C=1 and C=3, K=16 bins with 48 user locations, where the cluster and bin layout are qualitatively described in Figure 2. The frequency reuse factor F and pilot reuse factor Q are selected between 1 and 3 and, when F or Q=3, the frequency subbands or training codebooks are allocated to clusters as shown in Figure 1 where different colors denotes different subbands or training codebooks. Fig. ? illustrates the optimum over the family of network-MIMO schemes for (a) M=20 and (b) M=100. In both cases, (1,1,1) is optimal in the inner part of the cell, but schemes with (3,3,1) or (3,1,1) yield better performance for locations near cell boundary. We notice also that the inner area within which the (1,1,1) scheme is the best increases with the BS antenna factor M.

Next, we compare the performance of the proposed architecture with the one advocated in [18]. Figure 6 shows the bin-optimized spectral efficiency normalized by the spectral efficiency of (1,1,0),Q=1 scheme (corresponding to [18]), under two-dimensional layout with M=50. The gain of the proposed architecture ranges from about 40% to 580%, depending on the users’ location. Figure 7 shows the system throughput as a function of M in the two-dimensional layout. The throughput obtained for fixed parameters in the considered family of network-MIMO schemes, as well as for the bin-optimized mixed-mode letting the scheduler choose the bin and the associate network-MIMO scheme as described in Section 5 is shown, and compared with the reference performance of the (1,1,0),Q=1 scheme. The cluster scheme includes two cases where the cluster pattern is fixed as one of two shown in Figure 2 or can be switched to the closest one depending on the user locations. The system throughput of Figure 7 is obtained under PF scheduling (see (Equation 22)). For the sake of comparison, we assumed 20 MHz bandwidth and the coherence block size L=84 as in [18] (considering the parameters of 3GPP LTE TDD system). As the figure reveals, the (3,3,1) schemes perform very well for small M<20 while, as M increases, the (1,1,1) scheme is best. The bin-optimized architecture improves the throughput further at any value of M. The dotted horizontal line in Figure 7 denotes the cell throughput claimed in [18] in the limit of an infinite number of transmit antennas per user with the (1,1,0),Q=1 scheme. We notice that this limit can be approached very slowly, and more than 10000 antennas per BS are required (clearly impractical). For finite number of antennas, the proposed architecture achieves the same throughput of the scheme in [18] with a 10-fold reduction in the number of antennas at the base stations (roughly, from 500 to 50 antennas, as indicated by the arrow).

We studied a novel network-MIMO TDD architecture that achieves spectral efficiencies comparable with the recently proposed “Massive MIMO” scheme, with one order of magnitude less antennas per active user per cell. The proposed strategy operates by partitioning the users population into geographically determined “bins”. The time-frequency scheduling slots are allocated to the bins in order to form independent MU-MIMO transmissions, each of which is optimized for the corresponding bin. This strategy allows the uplink training reuse factor, the frequency reuse factor, the active user loading factor, the BS cooperative cluster size and the type of MU-MIMO linear beamforming to be finely tailored to the particular user bin. We considered system optimization over 1-dimensional and 2-dimensional cell layouts, based on a family of network-MIMO schemes ranging from single-cell processing to joint processing over clusters of coordinated BSs, with linear precoders ranging from conventional linear single-user beamforming to zero-forcing beamforming with additional zero-forcing constraints for neighboring cells. In order to carry out the system optimization, we developed efficient closed-form expressions for the achievable spectral efficiency for each scheme in the family and each bin in the cellular layout. Our closed-form analysis is based on the large-system limit, where all system dimensions scale to infinity with fixed ratios, and make use of recent results (by some of the authors of this paper) on the analysis of cellular systems with linear zero-forcing beamforming and channel estimation errors [11]. The performance predicted by the large-system asymptotic analysis is shown to match very well with finite-dimensional simulations. Our numerical results show that different schemes in the considered family achieve the best spectral efficiency at different user locations. This suggests the need for a location-adaptive scheme selection to serve efficiently the whole coverage region. The resulting overall system is therefore a “mixed-mode” network-MIMO architecture, where different schemes, each of which is optimized for the corresponding user bin, are multiplexed in the time-frequency plane.

As a final remark, it is worthwhile to point out that the approach of partitioning the users in homogeneous sets, serving each set according to a specifically optimized scheme, and using a scheduler to multiple different schemes in order to maximize some desired network utility function, can be generalized to the case of shadowing, and to the case of users with different mobility. This generalization is, however, non-trivial. For example, in the presence of slow frequency-flat shadowing, “bins” are no-longer uniquely determined by the users geographic position. Rather, the set of large-scale channel gains (including shadowing), should be used to classify the users in equivalence classes. Also, in the presence of users with different mobility, users should be classified also on the basis of their different channel coherence block length. The issue of how to optimally cluster users into equivalence classes that can be efficiently served in parallel, by MU-MIMO spatial multiplexing, represents an interesting and important problem for future work.

We focus on the reference cluster \Cc (i.e., c=0), with corresponding served group of locations \Xc={x0,…,xm−1}. For the sake of notation simplicity, we omit the subchannel index f, and let \Dc denote the set of clusters active on the same subchannel of cluster 0, and \Pc denote the set of clusters that share the same pilot block as cluster 0. From (Equation 5), the (scalar) signal received at some symbol interval of the data phase, at the k-th active user receiver at location x∈\Xc, is given by

where uj,c′(x′) denotes the code symbol transmitted by cluster c′, to user j at location x′+c′:x′∈\Xc. With LSUBF downlink precoding, we have

\vvj,c′(x′)=∥∥ˆ\hv––––j,c′,c′(x′)∥∥−1ˆ\hv––––j,c′,c′(x′)(24)

Using the MMSE decomposition (Equation 9), we isolate the useful signal term from ( ?), given by,

uk,0(x)\vv\hermk,0(x)ˆ\hv––––k,0,0(x).(25)

The sum of the residual self-interference term due to the channel estimation error with the signals in ( ?) transmitted by cluster 0 to the other users, results in the intra-cluster interference term

Both numerator and denominator of the Signal-to-Interference plus Noise Ratio (SINR) appearing inside the log in (Equation 27) converge to deterministic limits as N→∞. We will use extensively the representation of the channel MMSE estimates as

ˆ\hv––––j,c,c′(x′)=1√N\Xim1/2c,c′(x′)\avj,c,c′(x′)(28)

where the vectors \avj,c,c′(x′) are i.i.d. ∼\Cc\Nc(\zerov,\IdCMN), with generic components denoted by {an,b:n=1,…,MN} for all b∈\Cc. We will also make use of the following limit, which follows as a direct application of the strong law of large numbers:

Next, we notice that all the terms forming interference and noise are uncorrelated. Hence, the conditional average interference power can be calculated as a sum of individual terms. The self-interference due to non-ideal CSIT is given by

Following very similar calculations (omitted for brevity) and recalling that g(x,b)=ξ0,0,b(x)+σ0,0,b(x) (see ( ?)) and that SN/m users per location x′∈\Xc are active, we obtain the intra-cluster interference power terms as

1mC∑x′∈\Xc∑b∈\Ccξ0,0,b(x′)g(x,b)ξ–0,0(x′)(31)

Next, we consider the ICI power term. In doing so, we must pay attention to the pilot contamination effect. In particular, we have to separate all contributions in ( ?) coming from the k-th beam of clusters c′∈\Pc (i.e., for users sharing the same pilot signal of the reference user k at x∈\Xc), from the rest. The two contributions to the ICI are

\Icsame pilot=∑c′∈\Pc∖0uk,c′(x)\vv\hermk,c′(x)\hv––––k,0,c′(x)(32)

and

\Icno same pilot=∑c′∈\Pc∖0∑j≠kuj,c′(x)\vv\hermj,c′(x)\hv––––k,0,c′(x)+∑c′∈\Pc∖0∑x′∈\Xc∖x∑juj,c′(x′)\vv\hermj,c′(x′)\hv––––k,0,c′(x)+∑c′∈\Dc−\Pc∑x′∈\Xc∑juj,c′(x′)\vv\hermj,c′(x′)\hv––––k,0,c′(x)(33)

Both \Icsame pilot and \Icno same pilot are independent of ˆ\hv––––k,0,0(x). Therefore, conditioning in the expectation can be omitted. Each individual term appearing in the sum (Equation 33) yields