2.
IEEE Communications Magazine • June 2013 145
MOBILE VIDEO CHAT
In this section we discuss various components —
mobile phone transmitter and receiver, video
chat software, access segment, and core network
— involved in mobile video chat over cellular
networks and affecting the performance of video
chat. The access uplink-downlink segment is a
wireless radio link. The core network is wired.
We broadly divide the network into three
parts: mobile devices, application provider, and
mobile carriers. We discuss them in detail in the
next few subsections.
MOBILE DEVICES
The video content generators and receivers are
mobile phone devices (end devices, Fig. 2),
which are gradually being replaced by smart
phones. We use the term smart phone(s) or
mobile phone(s) to address both the transmitter
and receiver devices in this article. The improve-
ment in processing power, speed, screen resolu-
tion, RAM, and storage space boosts rich
functionalities on mobile phones, making high-
quality video chat possible in these devices.
Processing power and processing speed play a
key role in the performance of any application in
a device. Processors like NVIDIA TEGRA 4
have already reached speeds ª 1.9 GHz with 72
GPU cores supporting ultra-HD video process-
ing at low power consumption. Current smart
phones are equipped with high-resolution low-
latency front-facing cameras to capture end-user
video while the end user is viewing the smart-
phone screen with pixel densities of 1080p. Dis-
play resolution has increased to 1080 ¥ 1920
pixels, 5.0 in (~ 441 ppi pixel density in the
Samsung GS4) in mobile phone devices, reduc-
ing the impact of resolution loss. There are now
megabytes of RAM (ª 512 Mbytes) available
and also gigabytes (ª 32 Gbytes) of space avail-
able in smart phones.
With the current advancements in mobile
phones, high-quality video chat applications can
be developed and deployed on mobile devices,
which was not possible before.
APPLICATION PROVIDER
With the advancement in mobile phone hard-
ware, obviously the onus is on video chat appli-
cation providers to provide good quality of
experience to end users. Fring, Tango, Qik, and
Google Talk are some of the video chat applica-
tions contending to be the frontrunner in mobile
phones.
A video chat application needs to coordinate
with the mobile phone hardware and network
for efficient content creation satisfying all parties
including end users and network operators. Any
video chat application needs to provide a range
of functionalities — capturing, processing, and
transmission of video data. In Fig. 2, although
we show interaction of coding, decoding, and
camera for capturing with the video chat con-
troller, we elaborate each of these functionalities
below.
Video Capture — The operating systems of
mobile phones provide application programming
interfaces (APIs) for video chat software to
coordinate camera functionality to capture the
content and transmit the content data through a
wireless channel (Fig. 3). The software decides
the frame rate at which the camera will capture
unique consecutive images and then save them
to memory. An appropriate frame rate is very
important to maintain a balance between band-
width, delay, computational power, and user
expectations. When users are communicating via
sign language, for extreme cases, at least 21
frames/s is recommended in order to support
finger spelling. In contrast, various studies sug-
gest that audio and video are perceived as syn-
chronized at minimum 5 frames/s [3]; therefore,
that is the frame rate requirement for video
chat. But with increasing user expectations, the 5
frames/s frame rate will not suffice.
With high-end smart phones making their
Figure 1. Exponential increase in mobile video chat customers (source :
GigaOM Pro [1]).
Mobile video call/chat consumers
20100
20
0
Consumersinmillions
40
60
80
100
120
140
160
2011 2012 013 2014 2015
Figure 2. Block diagram of a mobile video chat engine. The dotted lines indi-
cate participation of other user(s).
Network
End deviceCapabilities
Camera
Coding
DisplayEnd-user
input
Objective
quality
metrics
Subjective
assessment
Privacy
settings
Decoding
Video chat controller
Security and authentication
JANA LAYOUT_Layout 1 5/30/13 4:37 PM Page 145

3.
IEEE Communications Magazine • June 2013146
way into the hands of end users and increased
network bandwidth due to 4G, future video chat
is expected to provide higher bit rates to end
users.
Video Compression — To reduce the band-
width requirements, video applications transmit
compressed video. This encoding is commonly
referred as source coding or simply video com-
pression, and is performed by the codec in an
application. The higher the compression ratio,
the more computational power required but the
lower the bandwidth requirement. Current video
coding standards, H.264 and VP8 both claim to
provide efficient compression and low bit rates.
In the study conducted by the authors in [4],
H.264 shows lower bandwidth usage and better
video quality than VP8. High-efficiency video
coding (sometimes referred to as H.265) is on its
way to standardization and gives up to 40–50
percent improvement in bit rates.
Although the current state-of-the-art H.264
codec has become very mature with efficient
compression and low latency, the codec needs to
be further customized for video chat. The param-
eters of a codec — frame rate, group of picture,
size, quantization parameter, and resolution —
need to be suited to the end system capabilities.
For example, two mobile devices with resolu-
tions x ¥ y (camera) and x1 ¥ y1 (display) in a
group video chat may decide on a common
shared resolution x2 ¥ y2 based on camera reso-
lution of mobile devices rather than using the
H.264 spatial scalability feature, which may
involve high latency.
Layered video coding approaches such as
H.264 scalable video coding (SVC) lead to
increased computational cost and increased
overall bit rate, which is detrimental to the per-
formance of one-to-one video communications.
However, it may play a significant role in group
video chat, where multiple users have varying
network conditions.
Video Transmission — The application uses
wireless driver APIs to transmit the data.
MOBILE CARRIERS
In today’s highly competitive environment, users
have the option of choosing from a plethora of
carriers. Therefore, it is not enough to simply
make services available to users. Mobile carriers
must deliver those services in such a way that
users fully enjoy a rich experience at a reason-
able price. In this section we discuss the key
technologies used by mobile carriers for video
chat. Mobile carriers are responsible for end-to-
end seamless connectivity of video chat applica-
tion. Video chat applications demand strict QoS.
As mentioned before, it requires packet delay £
150 ms and bandwidth requirement > 100 kb/s.
The frame error rate needs to be £ 1 percent [5].
Therefore, continuous feedback is required
between the network and the video chat con-
troller (Fig. 2) for balancing between network
conditions and end users’ video chat expecta-
tions. We now focus our discussion on network
architecture relevant to video chat.
3G uses both a circuit-switched mobile core
network and a packet network, while 4G is a
packet network. The 3G-324M protocol is used
to support real-time multimedia services over
wireless circuit-switched 3G networks. The cir-
cuit-switched network provides a 64 kb/s circuit-
switched path, but published measurements [6]
with live 3G networks gives uplink speeds of
only 50 kb/s. As the current minimum upstream-
downstream bandwidth required by a video chat
application is 128 kb/s, a circuit-switched mobile
core network is unsuitable for video chat.
Figure 4 gives high-level network architecture
of an end-to-end video chat. The dominant stan-
dard for transmitting video telephony in packet-
switched networks is RTP/UDP/IP. IP is a
connectionless network communications proto-
col. It may provide greater bandwidth, but the
bandwidth is not guaranteed. Currently two ver-
sions of IP, version 4 (IPv4) and version 6 (IPv6),
are in use. IPv4 is best effort service, and IPv6 is
designed to ensure QoS. To enhance real-time
multimedia mobile services for IP, agnostic of the
access network, IP multimedia subsystem (IMS)
provides an architectural framework for flexible
multimedia management. IMS ensures QoS,
negotiating mobile device requirements using
Session Initiation Protocol (SIP) during session
setup or modification. Once end-to-end QoS is
established, the end-user terminals use Real-
Time Transport Protocol (RTP) to packetize
video chat data and send it using UDP over IP.
To transport 3G/4G services through IP net-
works and provide end-to-end QoS provisioning
for an IP packet, the Internet Engineering Task
Force (IETF) defines two models — integrated
services (IntServ) and differentiated services
(DiffServ). The IntServ model uses Resource
Reservation Protocol (RSVP) to signal and
reserve the desired QoS for each flow in the IP
network. Under IntServ, video chat has a very
strict guaranteed service providing firm bounds
on end-to-end delay and ensuring bandwidth for
Figure 3. A simpliﬁed view of smart phone hardware-software architecture and
application provider functions.
API
Operating
system
Hardware
Camera
driver
Wireless
driver
Video
capture
module
Transmit/
receive data
module
Video chat
application
JANA LAYOUT_Layout 1 5/30/13 4:37 PM Page 146

4.
IEEE Communications Magazine • June 2013 147
the traffic. Although it is theoretically possible to
provide such QoS for each flow in the network,
practically it is very hard as every device along
the path of a packet needs to be fully aware of
RSVP and capable of delivering the required
QoS. The DiffServ model is relatively simple and
coarse as it groups the network flows based on
different classes, also called classes of service
(CoSs), and applies distinct QoS parameters for
each class. The type of service octet in IPv4
stores the 6-bit DiffServ code point in the IP
header to identify the CoS, whereas in IPv6 the
traffic class octet is used. Under the DiffServ
model, delay-sensitive video chat traffic falls
under the conversational CoS.
The upcoming 4G communication network,
Long Term Evolution-Advanced (LTE-A),
promises a data rate of 500 Mb/s uplink and 1
Gbps downlink at peak. With such increased bit
rates in networks, the quality of delivered video
is also expected to increase. Full high-definition
videos require a bandwidth of nearly 2 Mb/s,
which may become a requirement for video chat
in coming years.
LIMITATIONS
We evaluate the network QoS (Fig. 5) in an
experiment carried out with video chat applica-
tion Vtok (based on Google Talk APIs) between
two end users, one connected to 3G and another
connected to a WiFi network. The end devices
used are Samsung Galaxy II smart phones. We
use Tcpdump to capture packets sent and
received at two stations performing video chat.
In a random packet trace, we observe that while
the video chat is within the limits (average send-
ing rate, Fig. 5), the packet loss is sometimes
around 50 percent, which is way above the
required network QoS, thus reaffirming our
belief that a robust video chat is far from real-
ization.
Current wireless mobile video applications’
performance is limited due to the constrained
network resources, and fading and interference
caused by the wireless medium. In this section,
we discuss limitations of any real-time video
application on mobile phones. We identify tech-
niques that might aid in addressing these limita-
tions for mobile video chat.
BANDWIDTH
Video chat requires real-time communication. If
the application overutilizes the link, it causes
unfairness to other traffic, and if it underutilizes
the link, it may cause low quality of video chat.
In addition to this, the uplink-downlink connec-
tion in 3G is asymmetric. When congestion hap-
pens, the video chat on the uplink will suffer
first, thus determining the quality for the whole
video chat. Hence, lower uplink bandwidth needs
to be widely available for live video chat.
Multicasting can be one option to reduce the
bandwidth requirement when more than one
participating users are present in the video chat.
Multicasting allows a single stream to be served,
which is replicated throughout the network,
thereby reducing the bandwidth required for
both uplink and downlink. Multimedia broad-
cast/multicast service (MBMS) was introduced
by UMTS to provide high-speed multimedia
multicasting and broadcasting services. e-MBMS,
the evolved version of the legacy MBMS, is con-
sidered an important architecture of LTE-A in
this regard. Although multicasting has not been
a reality in 3G so far, it has the potential to
minimize the bandwidth requirement for group
video chat.
The bandwidth consumed by video chat can
be further reduced with a mechanism to notify
end users about incoming video chat calls. Video
chat applications keep sending packets even dur-
ing idle periods. A voice call, in such a case, has
an edge over video chat in not requiring end
users to always be logged into the data network
of the application. With automatic login during
Figure 5. Evaluation of video chat in 3G/WiFi network.
Observation time (s)
548
20
0
Packets/s
40
60
80
562 564 566 568
Observation time (s)
0
50
0
Averagesendingrate(kb/s)
100
150
200
500 600400300200100
Packets sent
Packets lost
550 552 554 556 558 560
Figure 4. High-level network architecture in an end-to-end video chat.
IP network
UTRAN
IP
access
network
Wifi, Bluetooth
etc.
IPv6IPv4
RTP/UDP/IP
RTP/UDP/IP
RTP/UDP/IP
Core
network
PS
IP multimedia subsystem
Video chat application server
SIP
SIP
SIP
JANA LAYOUT_Layout 1 5/30/13 4:37 PM Page 147

5.
IEEE Communications Magazine • June 2013148
an incoming call, the end user can receive the
video call without being logged into the video
chat application. As per Skype users and tech-
nology news, when idle, Skype consumes (1.3
Gbytes/month1) considerable bandwidth com-
pared to the current capped data plans available
for cellular end users.
In other approaches, to address asymmetric
uplink-downlink connection, video chat applica-
tions can use available WiFi hotspots. Although
WiFi brings a realistic and comfortable solution
for bandwidth-constrained 3G to provide such
high-end applications, maintaining QoS during
such vertical handover is an issue.
FACE TO EYE DELAY
Mobile video chat poses unique challenges due
to its requirement for small face (transmitter) to
eye (receiver) delay. ITU G.114 recommends a
maximum delay of 150 ms for video chat in order
to ensure lip synchronization. Due to video chat
having strict delay constraints, the retransmission
mechanism of TCP may not be used. TCP
retransmission increases delay in the system,
making it difficult to adhere to the delay con-
straints of video chat QoS. With UDP, on the
other hand, the network is unable to recover
from packet losses. Delay and packet loss in the
network cause jitters, frame freeze, and video
stalls. We discuss the impact of packet loss in
later subsections.
Other causes of delays are horizontal and
vertical handovers. Video chat can freeze or stall
the video during handover when the delay is not
within the limits. As the mobile moves, its asso-
ciated access point changes as per the current
location. The delay in such horizontal handover
is inevitable for 3G. As mentioned before, cur-
rently, to solve the bandwidth constraints, mobile
carriers prefer video chat applications to use
available WiFi hotspots. Maintaining delay con-
straints during such vertical handover is also a
key limitation [7].
IEEE 802.21 aims for seamless handover
across heterogenous networks, reducing discon-
nection times and packet loss during handover
of real-time communications. LTE-coordinated
multipoint (CoMP) has been proposed by the
Third Generation Partnership Project (3GPP)
standards community [8] to take care of seam-
less connectivity for edge users and handovers in
4G networks. In LTE-CoMP the edge users are
also supported by multiple neighboring base sta-
tions, therefore giving more diversity to the data
received. When edge users cross a cell during
chat, with LTE-CoMP they will still be support-
ed by adjacent base stations and hence not expe-
rience video freeze due to handover delay.
INTERFERENCE, FADING, AND CONGESTION
Interference and fading in the wireless medium
and congestion in the network cause packet loss-
es. Packet loss may cause visible distortions in
video chat. The two main visible distortions are
blocking and blurring. We find with a random
trace (Fig. 5) that packet loss is a major issue in
video chat. Although bandwidth and delay are
within the required QoS limits, limitations
imposed by them may cause packet loss.
Packet loss is inevitable in the wireless medi-
um due to its very nature. Simultaneous trans-
missions from or to different mobile phones in
the same cell may result in interference. As the
mobile moves to the edge of the cell, its sustain-
able data rate may change due to interference
from adjacent cells. Moreover, the impact of
packet loss is further increased as there is error
propagation among successive frames of the
codec. For real-time applications, it would be
more appropriate for the application to dynami-
cally change the parameters of the codec to min-
imize the impact of packet loss.
For delay-tolerant applications, the network
operators can employ better loss-resistant net-
work protocols. But as the packets are streamed
over UDP for video chat, the onus mainly falls
over video chat application providers to imple-
ment packet loss recovery techniques to preserve
network stability.
To combat packet loss, it is very important
for the application to identify the reason for
data loss. Data losses can be due to video coding
losses, random losses, or congestion losses in the
network. Random losses are caused by the wire-
less access link: signal fading, interference, and
channel quality losses. End-to-end packet loss in
a cellular network and the Internet can be
caused by congestion loss due to buffer overflow
in the network. Decreasing transmission rate will
not help in the case of non-congestion-related
losses. To identify the root cause of the data
loss, Spike [9], an end-to-end loss differentiation
algorithm, can be used.
The data recovery techniques (Fig. 6a) in the
literature can be divided into feedback-based
and non-feedback-based. Non-feedback-based
techniques are implemented at the encoder or
decoder. The decoder-based approaches, such as
interpolation, filtering, and spatial and temporal
smoothing, are not very efficient. An encoder-
based technique, while efficient, adds redundant
bits to the video chat data and therefore increas-
es bandwidth requirement. Forward error cor-
rection (FEC) and joint source and channel
coding are some of the encoder-based error con-
trol approaches. Such techniques require signifi-
cant changes to a video codec and have high
computational complexity. Of all the encoder-
based approaches, though, FEC is the most pop-
ular solution; however, studies using FEC in a
video chat application (Skype) show increase in
bandwidth overhead by 25 to 50 percent [10].
Hence, in a congested environment, FEC will
likely increase the loss rate.
In feedback-based error control techniques,
information sent by the decoder is used by the
encoder to adjust the coding parameters or
retransmit lost packets to achieve better data
recovery. In full retransmission, all of the data is
sent again, unlike partial retransmission. Both
approaches increase delay due to increased
round-trip time. However, partial retransmission
takes up less bandwidth than full retransmission.
Some of the examples of partial retransmission
are Reference Picture Selection, Intra Update,
and media rate control protocols such as TFRC
and DCCP.
Limitations imposed by packet loss for video
chat are a challenge that needs to be addressed
to give end users better quality for bandwidth-
1 http://tech.blorge.com/
Structure:%20/2009/
02/24/skype-stealsband-
width-even-when-you-are-
not-using-it/, last accessed
March 27, 2013.
Video chat applica-
tions can use avail-
able WiFi hotspots.
Although WiFi brings
a realistic and com-
fortable solution for
bandwidth-con-
strained 3G to pro-
vide such high-end
applications, main-
taining QoS during
such vertical hand-
over is an issue.
JANA LAYOUT_Layout 1 5/30/13 4:37 PM Page 148

6.
IEEE Communications Magazine • June 2013 149
constrained networks and delay-intolerant video
chat applications. Generally, a good video chat
application, when incurring heavy data-loss,
should drop the video stream in order to pre-
serve the audio stream as much as possible. In
Fig. 6b we propose a three-stage packet loss
recovery technique for mobile video chat adher-
ing to its delay constraints. The plus sign (+)
indicates an increase in the network parameter.
With bandwidth availability in the network, par-
tial retransmission gives the best result. Depend-
ing on the loss type, the system can adopt
encoder- or decoder-based approaches in the
presence of losses. Encoder-based approaches
are more appropriate for random losses, but
have negative impact in the presence of network
congestion. The stages are numbered in order of
preference. Finding the thresholds for switching
between these three stages and satisfying end-
user video chat experience is an important
research area for video chat.
CHALLENGES
A video chat controller needs to address addi-
tional challenges apart from network and appli-
cation limitations, discussed in the previous
sections. It should measure up to end-user expec-
tation in terms of quality, security, and end
device capabilities (Fig. 2). In this section, we
discuss these challenges in detail.
MEASURE AND MAINTAIN USER QOE
Traditionally, quality of service (QoS) metrics
such as packet loss, delay, and jitter have been
used to measure the application performance on
an end device. However, with increased diversity
in application requirements of multimedia ser-
vices and different content types, users’ quality
of experience (QoE) instead of networks’ QoS is
becoming an increasingly popular term in the
mobile video sector. QoE metrics tend to mea-
sure user perception of delivered multimedia
services instead of counting on network service
parameters. Many tools have evolved for evalu-
ating video quality delivered to the end user
using subjective or objective methods.
Subjective quality assessment refers to algo-
rithms that measure the “user-perceived” quality
of received video. Typically, users give a numeri-
cal score (1–5) on the perceived video quality.
Methodologies such as oneclick or crowdsourc-
ing can be employed to carry out video chat sub-
jective assessment in a more economical way.
Oneclick [11] uses a single dedicated key click to
convey dissatisfaction of end users with the video
application in an online manner. Crowdsourcing
applications like Amazon Turk2 take feedback
from the mobile end users in exchange for mon-
etary benefit. These end users are crowdsourced
instead of selecting a pool of observers in a con-
trolled environment.
On the other hand, objective methods refer
to mathematical models that approximate results
of subjective quality assessment (Fig. 2), but are
based on criteria and metrics that can be mea-
sured objectively and automatically using com-
putational techniques. Full-reference objective
metrics such as peak signal-to-noise ratio
(PSNR) and SSIM are not suitable for real-time
interactive content because they require a copy
of the original video. However, metrics such as
blocking and blurring, which are no-reference
metrics [12], can be used to quantify the network
impairment in the video for video chat.
Ideally, a mobile video chat must have robust
mapping of subjective scores to objective mea-
surements. While the chat engine can automati-
cally assess the video quality using objective
metrics, subjective framework (such as one-click
or its variant) can be used to fine-tune its perfor-
mance. This feedback can be used by service
providers to manage network and other service
parameters of an end-to-end system.
BATTERY CONSUMPTION
Battery consumption has become a major issue
for running high-end applications in smart
phones. Mobile phones, being small and support-
ing such a wide range of applications, increase
the pressure on battery life for such devices. Fur-
thermore, battery technology is not improving at
the same pace as the processing power and speed
of mobile devices. With nearly 1500 mAh battery
power in current smart phones, if a typical video
application consumes roughly 300 mW of power
per hour when enabled, neglecting all other
power consumption on the phone, the battery is
expected to last 4–5 h. Effort has been made by
Tango in this regard. Tango uses Google push
notifications to wake up the mobile phone on
receiving a call. Although this technique avoids
draining the mobile phone battery, users do not
get the provision to sign out if they do not intend
to receive calls.
The best way to solve this issue is to design
battery-status-aware and end-user-expectation-
aware video chat applications. One such effort
for image compression is presented in Poly-
DWT architecture [13], which can morph its
Figure 6. a) Summary of packet-loss recovery techniques; b) three-stage packet
loss recovery for video chat (+ indicates increase).
1. Partial
retransmission
2. Encoder-
based
3. Decoder-
based
(a)
(b)
Data loss recovery techniques
Feedback-based
Full retransmission
Non-feedback-based
Partial retransmission Encoder-based Decoder-based
+Random losses
+Random
losses
+Congestion losses
+BW
availability
+BW
availability
+Congestion losses
2 https://www.mturk.com/
mturk/welcome, last
accessed March 27, 2013.
JANA LAYOUT_Layout 1 5/30/13 4:37 PM Page 149

7.
IEEE Communications Magazine • June 2013150
hardware requirements and image reconstruc-
tion quality at runtime, leading to considerable
savings in power.
INTEROPERABILITY
The future video chat needs to be as ubiquitous
as a voice call. The content receiver can be a dif-
ferent mobile phone device with different
requirements under different operators and run-
ning different video chat applications.
Two end users can strike a voice call irrespec-
tive of the system and network differences. The
packet switched telephone network (PSTN) has
been the enabling technology in this regard.
Headway has been achieved in standardizing
multivendor interoperability and operator inter-
connect using IMS-based services. SIP, an open
standard, is used for control plane signaling in
IMS. Devices from any vendor supporting SIP
video chat will be able to interoperate with SIP-
based devices from any other vendor. As men-
tioned earlier, user plane traffic in IMS is
RTP/UDP/IP-based.
Currently, end users using different video
chat applications cannot communicate with each
other. This is because video chat applications
use different video codecs and technologies for
system negotiation, which have not been stan-
dardized yet. To give similar flexibility to end
users as for voice call, video chat application
providers need to work toward standardization.
SECURITY AND PRIVACY
Security and privacy concerns are inherent in
any real-time interactive communication.
Encryption techniques are well studied and
can be employed to provide the required level of
security for video chat users. Joint compression
and encryption schemes have been recently
developed, which lead to significant savings in
computational power while achieving good levels
of unintelligibility of the bitstream [14]. Howev-
er, privacy concerns still need to be addressed
for mobile video chat. Using diary and interview
techniques, [15] points out that privacy during
video chat is a key concern of end users. The
concern stems from the fact that the user at the
other end may make obscene gestures, or the
end user, when using video chat in a highly
crowded place, wants to keep it private.
Blurring techniques have been used to make
a given scene appropriate for users at the other
end to view. Although video blurring techniques
can work appropriately as well as protect users
privacy in both contexts, they are not appropri-
ate for the new generation of video chat services
[16]. This is because the main function of the
new generation of video chat services is to bring
a user face to face, via smart phones, with anoth-
er person from another corner of the world in
his/her background. If the services provide users
with all blurring faces or user background, the
users who use these services will gradually lose
their interest in talking to others.
Other attack models and countermeasures for
video chat over smart phones need in-depth
analysis. A basic level of privacy can be provided
by the application provider by maintaining pub-
lic and private profiles of end users [16]. With
regard to end-user preference, the application
can display a corresponding profile at the other
end. Also, it is the application provider’s respon-
sibility to secure end-user video chat accounts.
CONCLUSION
With the increased proliferation of smartphones
and the advancement in communication tech-
nologies, the demand for video chat has
increased tremendously. To catch up with this
trend and exploit the economic incentive associ-
ated with it, both application providers and
mobile carriers have significant roles to play in
terms of improving the quality of video chat.
Although there have been some approaches
suggested in the literature, applicability of these
solutions over cellular networks for high-quality
end-to-end video chat has not yet been studied in
detail so far. In this article, we lay down the limi-
tations and summarize the challenges faced by
video chat. We also discuss the possible solutions
and their incompleteness. We believe that after
addressing these challenges, the mobile video
chat can be developed into its full momentum.
REFERENCES
[1] S. Higginbotham, “Can You See Me Now? The Future
of Video Chat,” http://gigaom.com/2010/06/07/can-you-
see-me-now-the-future-of-video-chat/
[2] M. G. Ames et al., “Making Love in the Network Closet:
the Benefits and Work of Family Videochat,” Proc. 2010
ACM Conf. Computer Supported Cooperative Work,
2010, pp. 145–54.
[3] J. Scholl et al., “Designing a Large-Scale Video Chat
Application,” Proc. 13th Annual ACM Int’l. Conf. Multi-
media, 2005, pp. 71–80.
[4] P. Seeling et al., “Video Network Traffic and Quality
Comparison of VP8 and H.264 SVC,” Proc. 3rd Wksp.
Mobile Video Delivery, 2010.
[5] 3GPP, “ETSI TS 122 105 V10.0.0; Digital Cellular Telecom-
munications System (Phase 2+); LTE; Services and Service
Capabilities (Release 10),” Tech. Rep., May 2011.
[6] J. Nurminen et al., “Sharing the Experience With Mobile
Video: A Student Community Trial,” 6th IEEE Consumer
Commun. and Networking Conf., 2009, pp. 1–5.
[7] S. Sharma, N. Zhu, and T. cker Chiueh, “Low-Latency
Mobile IP Handoff for Infrastructure-Mode Wireless
LANs,” IEEE JSAC, 2004, pp. 643–52.
[8] 3GPP TR 36.819 v11.0.0, “Coordinated Multi-Point Opera-
tion for LTE,” 3GPP TSG RAN WG1,” tech. rep., 2011.
[9] O. Boyaci, A. Forte, and H. Schulzrinne, “Performance
of Video-Chat Applications Under Congestion,” IEEE
Int’l. Symp. Multimedia, 2009, pp. 213–18.
[10] J. Wang and D. Katabi, “Chitchat: Making Video Chat
Robust to Packet Loss,” tech. rep., July 2010.
[11] K.-T. Chen, C.-C. Tu, and W.-C. Xiao, “Oneclick: A
Framework for Measuring Network Quality of Experi-
ence,” IEEE INFOCOM ’09, 2009, pp. 702–10.
[12] Z. Wang and A. C. Bovik, Modern Image Quality Assess-
ment, Synthesis Lectures on Image, Video, and Multimedia
Processing Series, Morgan & Claypool, 2006.
[13] A. Pande and J. Zambreno, “Poly-DWT: Polymorphic
Wavelet Hardware Support for Dynamic Image Com-
pression,” ACM Trans. Embed. Comput. Sys., vol. 11,
no. 1, Apr. 2012, pp. 6:1–6:26.
[14] A. Pande, P. Mohapatra, and J. Zambreno, “Using Chaotic
Maps for Encrypting Image and Video Content,” IEEE Int’l.
Symp. Multimedia, Dec. 2011, pp. 171–78.
[15] K. O’Hara, A. Black, and M. Lipson, “Everyday Practices
with Mobile Video Telephony,” Proc. SIGCHI Conf. Human
Factors in Computing Systems, 2006, pp. 871–80.
[16] X. Xing et al., “Safevchat: Detecting Obscene Content and
Misbehaving Users in Online Video Chat Services,” Proc.
20th Int’l. Conf. World Wide Web, ACM, 2011, pp. 685–94.
BIOGRAPHIES
SHRABONI JANA (sjana@ucdavis.edu) is currently a Ph.D. stu-
dent in the Department of Electrical and Computer Engi-
neering at the University of California, Davis. She received
With the increased
proliferation of
smartphones and the
advancement in
communication tech-
nologies, the
demand for video
chat has increased
tremendously. To
catch up with this
trend and exploit the
economical incentive
associated with it,
both application pro-
viders and mobile
carriers have signifi-
cant roles to play.
JANA LAYOUT_Layout 1 5/30/13 4:37 PM Page 150

8.
IEEE Communications Magazine • June 2013 151
her M.S. from the University of California, Irvine, and her
B.E. from the National Institute of Technology, Durgapur,
India. Her research interests are in wireless networks, real-
time multimedia and mobile systems.
AMIT PANDE (pande@ucdavis.edu) completed his Ph.D. from
Iowa State University in 2010 where he was awarded
Research Excellence Award for his dissertation, Algorithms
and Architectures for Secure Embedded Multimedia Sys-
tems. Prior to this he completed his Bachelor’s in electron-
ics and communications engineering from the Indian
Institute of Technology Roorkee in 2007 with the Institute
Silver Medal. His research interests are in multimedia sys-
tems, wireless networks, embedded systems, security, pri-
vacy, forensics and trust, wireless health, and water
sustainability.
AN (JACK) CHAN (anch@ucdavis.edu) received B.Eng and
M.Phil degrees in information engineering from the Chi-
nese University of Hong Kong in 2005 and 2007, respec-
tively. He got his Ph.D. in computer science at the
University of California, Davis, in 2012. He is now working
as a scientist at Broadcom Corporation. His research inter-
ests include video and voice transmission and quality of
experience (QoE) over wireless networks, advanced IEEE
802.11-like multi-access protocols, and data path optimiza-
tion in next generation cellular networks.
PRASANT MOHAPATRA [F] (pmohapatra@ucdavis.edu) is cur-
rently the Tim Bucher Family Endowed Chair Professor and
the Chairman of the Department of Computer Science at
the University of California, Davis. He was/is on the Editori-
al Boards of IEEE Transactions on Computers, IEEE Transac-
tions on Mobile Computing, IEEE Transaction on Parallel
and Distributed Systems, ACM WINET, and Ad Hoc Net-
works. He has been on the program/organizational com-
mittees of several international conferences. He received
his doctoral degree from Penn State University in 1993,
and received an Outstanding Engineering Alumni Award in
2008. He also received an Outstanding Research Faculty
Award from the College of Engineering at the University of
California, Davis. He is a Fellow of the AAAS. His research
interests are in the areas of wireless networks, mobile com-
munications, sensor networks, Internet protocols, and QoS.
JANA LAYOUT_Layout 1 5/30/13 4:37 PM Page 151