Research

Cloud Computing

Cloud Computing, such as Amazon's EC2,
offers many value propositions, including "infinite" computation
capacity, on demand, pay as you go, pay for what you use. Many
applications from Accenture's Enterprise clients can benefit from the
various aspects of the value propositions. However, we believe one of
the most attractive use is outsourcing large seasonal computation
demands. As an example, we have shown through a couple of client
examples that we reduced the infrastructure cost by two orders of
magnitude by outsourcing to Cloud. Unfortunately, mapping an
Enterprise application to Cloud is not easy because of the differences
in infrastructure architecture. Inspired by Google's MapReduce,
we developed GridBatch, a system that makes it easy to port large-scale
data-intensive Enterprise batch applications to Cloud. While MapReduce
is designed for web applications (such as word count, computing reverse
link etc.), GridBatch is designed for Enterprise data analysis
applications. We intend to build more capabilities into GridBatch to
free programmer from reimplementing common primitives. Capabilities in
1.0 is described in CCGRID'08[PDF] and 2.0 is described in Cluster'09[PDF].

Another
common use of Cloud is to host a web presence. Since we rarely know the
traffic demand up front, Cloud provides the ability to dynamically
shrink and grow the infrastructure in response to traffic fluctuation.
The WebScalar project conducts performance evaluation of cloud and
designs an optimal web server farm architecture in the cloud. More
details are in CloudCom'09[PDF] and ACM SAC'10[PDF].

Similar
to a compute cloud, we have also build a network cloud for internal
usage. It shares many attributes as a compute cloud, such as on-demand,
pay-per-use, fully-programmable. Full details are described in SIGCOMM
WREN'09[PDF].

Tradeoffs between optical and electronic switching

Unlike in the 70's, switching at a node is the bottleneck in today's
Internet, not the link capacity. To overcome the switching bottleneck, the
Internet backbone transport network will likely employ a combination of
electronic and optical switching in order to reduce the total equipment cost.
The cost of electronic switching increases as more traffic is switched. On the
other hand, the cost of optical switching is independent of the amount of
traffic carried in a wavelength. As a result, optical switching should be used
when there is a large amount of traffic to be sent, in order to amortize the
high cost of establishing all-optical connections, and electronic switching
should be used when the traffic is small. We, for the first time, derive the
quantitative relationship between optical and electronic switching. We show that
their relationship can be described by a power-law and we quantify the power-law
exponents for many different types of topologies. Understanding the tradeoff not
only allows network designers to choose the best combination of optical and
electronic switching, but also allow them to compare many alternative fiber
topologies in terms of the switching cost. (Infocom'06[PDF])

Grooming heterogeneous traffic in WDM/SONET rings

SONET remains the dominant transport network in the Metropolitan area. A
natural migration path to support traffic growth is to use WDM technology where
each wavelength serves as a separate SONET ring. In general, the traffic demand
between a pair of nodes is typically small compared to the wavelength capacity.
If we could intelligently "groom" the traffic, large savings in
electronic equipment can be achieved. For simplicity, most prior work assume
only one line speed is available. E.g., you have to install a 10G box even if
there is only 1bit/s to send, simply because it is the only available box. We
show that large savings in electronic equipment cost can be achieved by having a
mix of line speeds. Higher line speed should be used when traffic is large to
enjoy the economy of scale, and lower line speed should be used when traffic is
small to reduce equipment cost. Given the high cost of deploying a WDM/SONET
network, it is very desirable to solve the traffic grooming problem optimally.
This is possible even though the traffic grooming problem is clearly
NP-complete, because SONET standard limits the network to contain at most 16
nodes. We propose several computation techniques which exploit the problem
structure, allowing the traffic grooming problem to be quickly solved to
optimum. (JSAC'07[PDF] Infocom'05[PDF], Globecom'05[PDF],
ICC'04[PDF])

Designing load-balanced backbone network with performance guarantee

Design an Internet backbone network is hard, mostly because it is hard to
predict the traffic matrix at design time and the traffic matrix changes over
time. Today's networks are designed in an Ad-hoc fashion, and there is no
guarantee that the network can support a traffic matrix if it deviates from the
original estimation. The Valiant Load-Balancing architecture is a promising
solution as
it can support any traffic matrix at a small cost of over-provisioning and
higher delay. Even though it is hard to forecast the traffic matrix at design
time, it is quite easy to estimate the traffic matrix at operation time. Armed
with the current traffic matrix, we could use "direct routing" to overcome one of the short-comings of the
Valiant Load-Balancing architecture---the longer delay. We show how to maximize
directly routed traffic, hence minimizing delay. We prove that direct routing is
not always feasible in the current Valiant Load-Balancing architecture, but a
slight increase in the link capacity is sufficient to guarantee the feasibility.
(Globecom'05[PDF])

Exploiting dense parallelism in commodity hardware

To achieve high performance, instead of implementing in software, one can
design an algorithm into custom ASIC. There are several drawbacks to this
approach: 1) it is costly not only because of the custom design, but also
because of the typically low volume which cannot effectively amortize the high
fixed cost. 2) it is not flexible enough to handle requirement changes, 3) The
increase in speed is limited because hardware parallelism is not fully
exploited. Another approach is using commodity
hardware, such as Ternary Content Addressable Memory (TCAM). There are several
advantages of using TCAM: 1) The hardware design is simple and can be easily
optimized. TCAM is made up of small cells (e.g. 6
transistors cell), which can be hand optimized for speed, power and
area, and this cell can be replicated easily to create a larger chip, 2) being a
commodity, the cost could be very low, 3) it has very high parallelism where all
bits work in parallel in each cycle. Naturally, there are also disadvantages of
using TCAM. For example, the rigid structure means that many applications cannot
be mapped to the hardware directly. This problem can be overcome by designing
innovative algorithms. We showed how to use TCAM more efficiently (accommodate
more routes) to perform route lookup (HOTi'01[PDF] or
IEEE Micro'02[PDF]), and we also showed how to use
TCAM to perform packet classification, in particular, how to handle range classifiers
(HOTi'02[PDF]).

Caching routes in Internet routers

Even though cache is widely used in computer systems, it is hardly ever used
in Internet routers and network processors, partly because the hit rate is so
low that the additional cost is not warranted. Caching aggregate instead of
individual route could greatly increase the hit rate, however, erroneous lookups
could result. Exploiting the structure of routing tables, we show how route
aggregates can be correctly cached and we show that the hit rate is indeed high.
We show that the aggregate route cache can be cheaply built, allowing it to be
used in future Routers. (Globecom'02[PDF],
ICCCN'01[PDF])

Understanding network traffic

Network processor architects have a considerably easier job than
general-purpose CPU architects, because, supposedly, there is an embarrassing
amount of parallelism in network traffic. Besides the constraint that all
packets in a flow have to be in order, most packets are independent of each
other. Therefore, to achieve high throughput, network processor architects can
simply glue together hundreds of processing units, each handling a separate
flow. We show that the amount of parallelism is actually not that large. Blindly
increasing the number of processing units only adds marginal value. Instead, we
should exploit parallelism at packet level or even at instruction level.
(ICC'02[PDF])