In bus delay-tolerant networks, the route of bus is determinate but its arrival time is indeterminate. However, most conventional approaches predict future contact without considering its uncertainty, which makes a limitation on routing performance. A novel approach is proposed by employing interval algebra to characterize the contact’s uncertainty and time-varying nature. The contact is predicted by using the Bayesian estimation to achieve a better routing performance. Simulation results show that this approach achieves a good balance between delivery latency and delivery ratio.

The many advantages offered by mobile communications have pushed wireless networks beyond supporting laptops in buildings to more challenging environments. In highly mobile and wireless network environment, the network topology constantly changes and end-to-end paths can hardly be sustained. This kind of networks is the so-called“delay tolerant network (DTN)”. The DTN is considered to be an important branch in the next generation network study and many promising DTN applications have been proposed. At present, such networks have been deployed in the context of portable devices (such as pocket switched networking and opportunistic podcasting), buses, animal tracking, and underwater sensor networks [1, 2].

The network environment considered by this paper is the bus-based network. The bus network consists of buses and bus stations, in which bus stations send messages to each other, and buses are considered as data mules [3-5]. With Wi-Fi facilities, a typical environment of DTN can be constructed. The reasons for choosing this environment are: 1. based on “store-carry-forward” strategy and no end-to-end path between bus stations, the bus network has strong DTN characteristics; 2. recently, several bus DTN test beds (such as DieselNet [6] and DakNet [7]) have been implemented, which shows this is a promising research area.

Besides the common features of DTN, the bus DTN has its unique characteristics. First, each bus station is reachable through different bus shifts. Second, though buses are driven and operated according to the dispatch schedule, their moving patterns are variable because of the irregularity of traffic conditions, traffic signal, driver’s habits, etc. These characteristics make the mobility of bus semi-deterministic, which causes the path to vary in dimensions of both space and time. According to the uncertainty of contact, how to eliminate or alleviate this uncertain effect and to improve the routing performance are some of the most challenging problems in DTN.

As the routing performance is significantly impacted by the prediction accuracy of contacts, when studying the performance of routing protocols and applications in DTNs, it is important to have models which accurately characterize the uncertainty feature of contact. For these issues, uncertainty in bus network contacts cannot be perfectly described by conventional graph theory-based topology description methods such as evolution graph, temporal graphs, time varying graph, etc. We set out to address the uncertainty in bus network contacts through a novel two-sided approach. Firstly, based on interval algebra, interval number and probability density function (PDF) are proposed to quantify the uncertainty and time-varying nature of contact, and we use probabilistic contact graph to describe a bus DTN. Secondly, we proposed a method of contact prediction by using the Bayesian theory. Thirdly, based on the prediction method, we proposed a contact-prediction routing algorithm—IABR (Interval Algebra based Bus net Routing algorithm) and use automata model to verify the effectiveness of IABR. Finally, we use ONE simulator to analyze the performance of IABR. Simulation results show that our proposed algorithm achieves an improved routing performance in the bus DTN scenario.

The rest of the paper is organized as follows. Relevant reported studies about network modeling and routing schemes are briefly described in Section 2. Details of the contact prediction are presented in Section 3. The proposed routing scheme is described in Section 4. Simulation results are discussed in Section 5, followed by a summary in Section 6.

2. Related Works

In this section, we summarize some reported studies about network modeling relevant to our work, especially the description of contact. Then, we categorize the DTN routing schemes and compare their features with bus DTN.

- 2.1. DTN Contact

A contact in DTN is defined as an opportunity to send data [8]. More specifically, a contact is a specific edge and a corresponding time interval during which the edge capacity is strictly positive [9]. Traditionally, a DTN is considered as a graph G(V,E) and E is the set of contacts. In the past, the topology of a DTN over time has been modeled by evolving graph [10, 11], temporal graph [12] and time-varying graph [8]. These methods assume that the network's topology is deterministic and known (or predictable), so the transmission (when and where to forward packets) can be scheduled ahead of time. However, the mobility of nodes such as buses may not be completely predictable due to factors including the traffic conditions and the operations of traffic lights. Therefore, the future connectivity between nodes may not be completely predictable. Given these limitations, traditional shortest path routing algorithms can only compute a shortest delay path without taking the uncertainty of connectivity into consideration.

- 2.2. DTN Routing Scheme

In the past, several routing schemes have been proposed to improve the routing performance in DTNs. This section reviews the related work in the literature and highlights their differences. There are two categories for current routing schemes in DTNs [13]: deterministic routing and stochastic routing. As the deterministic routing scheme assumes that future movements and connections are completely known, we do not apply it in our environment. We thus mainly focus on the stochastic routing scheme.

According to the knowledge involved in a routing scheme, the stochastic routing can be divided into two categories: opportunity-based and prediction-based. The opportunity-based schemes utilize neither the contact history nor the prediction methods for routing. Instead, messages are exchanged only when two nodes meet at the same place by chance. For example, the Epidemic Routing [14-16] delivers the messages by flooding. This scheme can achieve high delivery ratio by consuming significant network resources. To reduce the overhead of flooding, Spray and Wait Routing [17] has been proposed. Instead of flooding, this scheme only forwards a fixed number of copies. These approaches usually distribute multiple copies in the network to ensure a high reliability and a low latency of delivery. However, they also bring a high price to buffer occupancy and bandwidth.

In the prediction-based schemes, the mobility of node is estimated based on a history of observations. PRoPHET Routing [18] uses a probabilistic metric called delivery predictability to indicate how likely a node will meet another. A node forwards a message to the neighbor node that has the path of the lowest cost to the destination (usually the shortest path). Burgess J. et al. proposed the Maxprop Routing that uses knowledge from previous encounters for making delivery optimization [19]. In this protocol, each node keeps a vector called delivery likelihood which is obtained by using incremental averaging. When two nodes meet, they exchange their vectors, so each node can calculate the shortest path to the destination. This definition of likelihood is rather volatile, which may result in accuracy when the shortest path mechanism uses these volatile likelihood vectors for path cost calculations. RCM [20] is another prediction-based algorithm for DTN routing, which uses Markov decision process to do routing decision. In this scheme, it assumes that the mobility of node is strictly cyclic. The aforementioned protocols focus on either random mobility scenarios or cyclic mobility scenarios, which are incompatible with the real scenarios in bus DTN. More importantly, they do not utilize the feature of network to improve the routing performance.

3. Network Model

In DTNs, the patterns of contacts between nodes play a key role in efficient design of protocols. As mentioned earlier, the bus DTN consists of buses and bus stations, in which bus stations send messages to each other, and buses are considered as data mules. Fig. 1 illustrates our research environment. Buses can store messages for some time, carrying them along, and forwarding them to other bus stations when there is a contact. The message forwarding depends on the contacts between different nodes, and the additional complexity of the time dimension significantly complicates the routing decision. How to abstract the timing feature and uncertainty of contact is a key challenge in DTN.

Given a bus and a bus station, when the bus enters into the radio range of the station (i.e., arrives at the station), a contact is established. That is to say, arrival time of the bus is the beginning time of contact. The arrival time is a continuous random variable from current time to infinite time. To represent the arrival time, one way is using probability density function (PDF) to describe the relative likelihood for this random variable to take on a given value; another way is simply using a 2-tuple (t,p) to denote that the probability of arriving at time t is p. Both approaches have advantages and disadvantages. The former is too complex to make routing decisions and the latter is a fixed value approach which cannot reflect the changes in network. In order to combine advantages from the two approaches, we propose interval algebra [21] to describe the arrival time.

- 3.1. Arrival Time Interval

As mentioned above, a bus may arrive in a time interval.We formally define a time interval as follows.

Definition 1. A time interval TI is a real interval: [t1,t2]={x∈R|t1≤x≤t2}, which is a set of real numbers with the property that any number that lies between two numbers in the set is also included in the set.

With this definition, when we use [t1, t2] to denote an arrival time of bus, it means that the bus will arrive between time t1 and time t2. However, unlike a completely random event, the time that a bus arrives at a station may not be uniformly distributed at a period of time, so besides using an interval time to define its time scope, we still need to describe its occurrence likelihood.

The time a bus arrives to a station is a continuous random variable that from current time to infinite time, and we use X to denote this event and f(x) to denote its probability density function. Therefore, its probability of falling into a given interval, namely [t1, t2] is:

PPT Slide

Lager Image

Thus, with the interval time [t1, t2] and its probability distribution, we can easily calculate the probability that the bus arrives at [t1, t2].

- 3.2. Probabilistic Contact Graph

According to the arrival time discussed above, we formally define a bus DTN as follows.

Definition 2. A probabilistic contact graph is a directed multi-graph G(V,E), where V is the set of bus stations in the bus DTN and E is the set of contacts. A contact c is a 5-tuple (ids, idd, bus, TIs, f (x)), where ids is a id of source station, idd is a id of destination station, and ids, idd ∈ V, bus is the bus’s ID, TIs ∈ TI represents the time interval that bus will arrive at ids, f (x) is the PDF of TIs.

As the graph is time-varying, Fig. 2 shows an example of probabilistic contact graph at t=0. For instance, No.1 bus will arrive at B during [10, 12], and its PDF is fAB-1.

The basic idea of IABR is to utilize the encounter historical information of node and the prior information of buses to predict future contacts by using Bayesian estimation. We propose a metric calculation method of path and calculate every possible path using the depth-first search (DFS) strategy, and we select the first n optimal paths to forward the packet. In this section, automata model of IABR, the prediction of contact, methods for calculation of path metric and implement of routing process are explained in detail.

q0 : Initial State, representing that the bus node has no packet and no neighbor node(a: normally run; g: arrive bus station,updating AR,forwarding no packet but recieving packet; b: contact with bus, updating AR, forwarding no packet; h: arrive bus station,updating AR,forwarding and receiving no packet); q1: state representing that the bus node has packet and neighbour bus node (c: arrive bus station,updating AR, forwarding and recieving no packet, or forwarding and recieving packets at the same time; d: arrive bus station,updating AR,forwarding packet but recieving no packet; e: end contacting with bus); q2: state representing that the bus node has packet and neighbour station node (b: contact with bus, updating AR, forwarding no packet; f: leave from station); q3: state representing that the bus node has packet and no neighbor node (a: normally run; b: contact with bus, updating AR, forwarding no packet;c: arrive bus station,updating AR, forwarding and recieving no packet, or forwarding and recieving packet at the same time; d: arrive bus station, updating AR, forwarding packet but receiving no packet); q4: state representing that the bus node has no packet and bus neighbour node (e: end contacting with bus;g: arrive bus station,updating AR,forwarding no packet but recieving packet;h: arrive bus station,updating AR,forwarding and recieving no packet). q5: End state, representing that the bus node has no packet and station neighbour node; Transition matrix of IABR automaton is as follows:

The future contacts between bus and bus station are determined by the arrival time of bus. If we know the interval time of bus, then we can easily predict the next arrival time of bus. Because of this, we predict the inter-contact time instead of predicting the contact directly.

As mentioned above, our goal is to obtain the time interval of two successive buses. According to the Bayesian inference, the posterior distribution of inter-contact time is determined by its prior distribution and the sampling distribution. In order to get the posterior distribution, we first make an assumption as below:

Assumption 1.If the sample is large enough, for each bus line, the interval of arriving time is normally distributed, i.e., its prior distribution is a normal distributionπ(μ)~N(μ0,

PPT Slide

Lager Image

), whereμ0is the departure interval and

PPT Slide

Lager Image

can be given by historical experiences.

Under this assumption, the sampling distribution under the condition of parameter μ is also normally distributed, i.e., p(xi|μ)~N(μ,σ2), and its population variance σ2 can be replaced by its sample variance S2. With the prior distribution and sampling distribution we gave above, according to Bayes' rule: h(μ|x') ∝ π(μ)p(x'|μ) , we can finally derive its posterior distribution:

PPT Slide

Lager Image

Where

PPT Slide

Lager Image

is the sample mean.

After deriving its posterior distribution, we use the concept of confidence interval (CI) to acquire its time interval. In statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. Supposing an arrival time interval posterior distribution h(μ|x') ∝ N(μ1,σ1) using knowledge of probability theory, we can calculate the confidence interval under confidence level [1-α]:

PPT Slide

Lager Image

where zα / 2 follows from the cumulative distribution function, n is the number of samples.

We denote L as the length of the confidence interval, and

PPT Slide

Lager Image

. The L is determined by two parameters: confidence level (1-α) and the number of samples n. Note that the zα / 2 is an increasing function of (1-α). When α increases, L decreases, and the interval time will be more accurate. When α decreases, L increases, and the interval time will be fuzzier. At the same confidence level, the bigger the number of samples is, the more accurate the interval time would be. With these properties, we can adjust the confidence level according to different situations to control the length of time interval. In order to restrict the length and with the consideration that the confidence level should not be too low, we normally use (1-α)=0.95 as the confidence level, and the corresponding zα / 2 =1.96.

- 4.3. Calculation of Path Metric

In section 4.2, we have discussed how to predict a contact. We use a time interval to represent a possible occurrence range of contact. Unlike using a single time slot, the contact with time interval may overlap with other contacts. Considering two contacts’ time intervals A=[a,b] and B=[c,d], supposing their posterior distributions are

PPT Slide

Lager Image

and

PPT Slide

Lager Image

, we calculate the probability of that A occurs after B as follows:

PPT Slide

Lager Image

where

PPT Slide

Lager Image

As every route of bus is fixed, we can get more prior knowledge than that of other DTNs. The prior knowledge includes the following information: every possible path between any station pair, the historical record of bus contact and the ideal distance between two stations. To reflect the time-varying delivery probability between each pair of stations, we introduce a new delay metric to measure a path. For a path l(c1,c2,…,cn), supposing the last perdition time is tp , the current time is tc , we denote Δt as tc - tp, and the metric can be calculated as follows:

PPT Slide

Lager Image

where μ−Δt equals to the time difference from the current time to the next contact's time of occurrence. As mentioned before, because two contacts may overlap with each other, if contact i occurs before contact i+1, it can forward straightly, and its delay

PPT Slide

Lager Image

= P(TIi+1 > TIi)×(μi+1 - Δti+1 + disi+1). If contact i occurs after contact i+1, it have to wait, and its delay

PPT Slide

Lager Image

= (1-P(ITi+1 > ITi)×(2μi+1 - Δti+1 + disi+1).

Fig. 4 shows an example of calculating the metric of a path. The information of each contact is shown in Fig. 4 (a). We first calculate ContactAB : MetricAF=MetricAB=4-3+3=4, then we calculate ContactBD and update ContactAF:

Once the measurement of a path is calculated, we can proceed to analyze the routing process. The routing process consists of two stages, namely the information updating stage and the data forwarding stage. When two nodes meet each other, they first exchange information and then forward packets. We name the two stages as information updating stage and packet forwarding stage, respectively.

- 4.4.1. Information updating stage

As all the bus routes are fixed, we can acquire every possible path between any station pair in advance. In the information updating stage, an n×m matrix called Arriving Record (AR) is used to maintain its contact history, where n is the number of buses and m is the number of bus stations. When two nodes meet each other, they exchange their AR, and if one of the nodes is bus station, it will update each time interval of contact and posterior distribution by using the method mentioned in section 4.2 and 4.3.

- 4.4.2. Packet forwarding stage

In packet forwarding stage, for each message the bus station calculates each metric of path and chooses the first n optimal paths to forward. The calculation scheme is described as follows (refer to Fig. 4).

a) For the first contact of path, update its metric and let its time interval add distance to be its new time interval, i.e. [l1, r1]

b) For the remaining contacts, considering one edge at a time in order, compare it to the previous contact, if P([li,ri] > [li-1,ri-1]) = 0, its time interval should be postponed by adding its expectations [li,ri] [li-1,ri-1]) ≠ 0, then update the metric of path. At the same time, update its time interval by adding its distance, i.e. [li,ri] [li+disi,ri+disi] (Fig. 4(c)).

c) Repeat step (b) until there is no more contacts, and one path’s metric is calculated.

d) Repeat step (a)-(c) until all paths are calculated.

e) Choose the firstnoptimal paths to forward.

The procedures of IABR are shown in Fig. 5. Note that IABR will choose n paths to forward. If n=1, IABR would be single-copy routing; if n>1, IABR would be multi-copy routing. To assess the complexity of the algorithm, supposing the number of path is m, the average hops of path is k, then the complete complexity is O(mk). In fact, this calculation among all possible paths is fast, because paths monotonically increase in cost during a DFS. Once the cost for a path is worse than that of the current best path, the search could stop.

The primary goal of any DTN routing protocol is to maximize the delivery rate of offered packets and to minimize the latency between source and destination. We choose the ONE (Opportunistic Network Environment Simulator) [22] to evaluate our protocol in comparison to other routing protocols including first contact (FC), epidemic, spray and wait (SW) and PRoPHET. Among these routing schemes, first contact is single-copy routing, SW is fixed-copy routing and others are multi-copy routing. To make simulation environment as real as possible, we build a scenario based on real road information of Beijing Olympic Sports Center area. Fig. 6 is a screenshot of this scenario in ONE simulator. Red circles represent bus stations, black circles represent buses, and the background is the road network of Beijing Olympic Sports Center area. We designed 20 routes and each route has five buses. As for most of the DTNs, buffer size is the major factor that limits its routing performance. We compare these routing protocols with its buffer size in a range which is from 10MB to 200MB. Other related settings are showed in Table 2.

Fig. 7 shows the results of delivery rate of IABR with different copies. With increase of copies, we can see the delivery rate is also increasing accordingly. In fact, when we set the copy number of IABR to 500, our number of relay messages is still much smaller than that from other multi-copy routing protocols, such as epidemic routing and PRoPHET routing, which indicates that in our scenario the overload of fixed-copy routing protocol is much smaller than that of other multi-copy routing protocols. In addition, when the number of copies is too low, many messages will be dropped which lead to the rate of delivery becoming limited. As the number of copies increase, the number of dropped messages decrease, which results in delivery rate decrease. As a result, with the increase of copies, the rate of delivery increment is decreasing, and we can see that when the number of copies increases from 200 to 500, the delivery rate only increases by less than 5%. Thus, in the following simulation, the number of copies is set to 200.

Fig. 8 shows the results of average latency with different buffer sizes. In first contact routing scheme, node only forwards messages to the first meet node, which obviously cannot sustain effective transmission, so the average latency of first contact routing scheme is very high. On one hand, when the buffer size is too small, many messages will be dropped, which will decrease average latency seriously. When the buffer size increases, more messages are delivered, which result in that the average latency increases firstly and then decreases. There is a peak value on every curve except in epidemic routing where buffer size equals 20MB. Epidemic routing shows significant fluctuation, which increases at first and starts to decrease from 50MB. However, on the other hand, IABR and SW are not very dependent on buffer size, thus the increment of buffer size has little impact on their routing performance.

Fig. 9 shows the results of delivery rate with different buffer sizes. We can see that the delivery rate of IABR and SW are consistently maintained at 87% and 55%, while PRoPHET and epidemic have a rather clear increment. This result indicates that PRoPHET and epidemic routing are quite dependent on buffer size. In contrast, IABR has a much higher utilization of buffer size.

Fig. 10 shows the results of overhead ratio with different buffer sizes. The overhead ratio refers to the ratio of non-delivered packets to delivered packet. For routing protocols, the overhead ratio reflects its overhead of delivering a packet. As the copy number of IABR and SW is fixed, their overhead ratio is not affected by buffer size, and because IABR needs to exchange information between nodes, its overhead is a little higher than that of SW. Epidemic and PRoPHET are multi-copy routing, so when buffer size increases, their delivery ratio increases, and the overhead ratio decreases correspondingly, yet it is always higher than that of IABR.

Of the protocols tested in this study, our IABR outperforms all others in terms of packet delivery ratio and achieves better results in overhead ratio and average latency comparison. As for other routing protocols, the PRoPHET and SW routing are susceptible to simulation scenarios and epidemic routing requires tremendous network resources. Our results indicate that these routing protocols are not suitable for bus DTN.

6. Conclusion

In this paper, we have proposed IABR as an effective protocol for DTN routing, particularly in the context of bus DTN scenarios. To characterize the uncertainty of contacts and its time-varying nature, a novel approach is proposed by employing interval algebra. With such model, we predict the contacts by using Bayesian estimation to achieve a better routing performance. In future work, the research can be further extended by dynamically controlling the number of copies according to real time data, which would improve the routing performance.

BIO

Haiquan Wang is a lecturer in the School of Software of Beihang University. He received his Bachelor, Master and Doctor degrees from Beihang University. His current research is in DTN architecture and communication system, and computer network security

Weijian Ma is studying for his Master degree in Beihang University from 2013. He received his Bachelor degree in Beihang University. His current research is in DTN architecture and communication system.

Hengkun Shi is studying for his Master degree in Beihang University from 2014. He received his Bachelor degree in Beihang University. His current research is in DTN architecture and communication system.

Chunhe Xia is a professor in the School of Computer Science and Engineering of Beihang University. He is the director of Beijing Key Laboratory of Network Technology. He received his Bachelor degree from Nanjing University, and he received his Master and Doctor degree from Beihang University. His current research is in computer network, information security, and network measurement.