Garbage Collection for Distributed Persistent
Objects

1. Introduction

Distributed object technology provides efficient and safe distribution
and reuse of software components in the network. Applications are
composed of independent components which are distributed over the
network and can communicate with other components. Due to the
technology, developers can create reliable, robust, secure, and
maintainable distributed applications easier and faster. Distributed
object technology enables components of software to plug-and-play,
interoperate across the network, and coexist with legacy applications.

A distributed object consists of data and methods along with
interfaces through which clients invoke the methods of the
object. When a client requests a service to an object, the object is
created and executed on its server process. The object replies to the
client with results after processing and performing the request. The
object is reclaimed as garbage when it is not reachable from any
clients or root objects. For this procedure, called garbage
collection, CORBA [CORBA-A, 1997; CORBA-S, 1997] uses reference counting, DCOM [Chappell, 1996] uses reference counting and
pinging, and Java RMI uses a lease.

The persistent object service enables objects to remain long after its
server process terminates. To be persistent, the state of an object
should be stored in persistent storage such as disks so that the
object can be activated later with the stored previous state. The
state of an object is stored and loaded either transparently or
explicitly: clients can make their objects persistent explicitly to
reuse them later; or objects are stored and loaded transparently by
the activation and de-activation mechanism. The persistent object
service and the activation mechanism can be used to conserve system
resources [Wollrath et al., 1995]. Since a
system may contain a considerable number of objects, it is
unreasonable that objects remain active and consume valuable system
resources although no requests are made on them for a long time. When
persistent objects are not reachable from any clients or roots, they
should be reclaimed as garbage.

In this position paper, we discuss the issues of garbage collection
for distributed persistent objects, and present a practical and
efficient garbage collection mechanism.

2. The issues of garbage collection for distributed persistent
objects

Garbage collection is a critical problem for administratively
decentralized large scale distributed systems which manage distributed
persistent objects. This process is important because manual garbage
collection is error-prone and it is difficult for clients to maintain
the information about references correctly. Many solutions for
distributed garbage collection have been suggested. However, they are
not suitable for large scale distributed systems because their target
systems are small in scale and they cannot collect cyclic garbage.

In the ideal distributed system, objects continue to exist as long as
they are reachable from clients or root objects and should be
reclaimed when unreachable. In practice, this is difficult to support
in administratively decentralized large scale distributed systems
because of the following reasons:

Distributed objects and references are dynamically created,
deleted, migrated, and shared across the network. Therefore, it is
difficult to determine when an object is not reachable, and whether it
is safe to reclaim it.

Distributed systems are administratively decentralized, so
clients' well-behaving cooperation should not be required.

Distributed systems are very large in scale, so it is impossible
to get a global view of clients, objects and their references.

Servers and clients can crash during garbage collection related
operations.

Messages can be lost, and the network can be partitioned for a
while.

Since persistent objects may remain after their clients and servers
terminate, garbage objects might stay in persistent storage forever
unless garbage collection is performed. When garbage collection is
performed by clients manually only, the same problem happens because
manual garbage collection is error-prone and clients' well-behaving
cooperation should not be required. If some clients terminate without
deleting their references, storage is wasted, which even causes system
crash.

CORBA supports the persistent object service and the activation
mechanism. When active objects have not received any requests for a
long time (e.g., 15 minutes in NEO), they are de-activated and stored
on persistent storage. They are activated again when they are
accessed. This process is performed transparently. Clients can store
the state of their objects explicitly as well. The persistent objects
exist until they are explicitly deleted by clients, which means that
garbage objects should be reclaimed by clients manually. DCOM also
supports the persistent object service. Objects can be stored into a
Running Object Table(ROT), and they can be activated
later. DCOM does not support a garbage collection mechanisms for the
ROT, so the client which created objects should come back and delete
the objects. In [Pjama], root objects are created
by clients, and the root objects and objects reachable from the root
objects are assumed to be alive. Garbage objects which are unreachable
from the root objects are reclaimed by the external garbage collector,
opjgc.

The current garbage collection mechanisms for active objects in memory
are not suitable for persistent objects. Reference counting is simple
and scalable, but 1) it cannot collect cyclic garbage objects, and 2)
it is impossible to maintain the referencing link counts
correctly. Since clients might terminate without reducing their
reference counts in a loosely synchronized distributed system, the
reference count of an object can be greater than zero, even though no
other objects are referencing the object. Both pinging and lease are
based on [Birrell et al., 1993] but they cannot
collect cyclic garbage. That is, garbage objects in a same cycle will
refresh each other, and they never expire. Since persistent objects
may remain long after its server process terminates, all garbage
including cyclic garbage should be reclaimed completely. Therefore, we
need a new garbage collection mechanism which is efficient, scalable
and complete.

3. Our approach

We have implemented a garbage collection mechanism for the Prospero
directory service which maintains distributed persistent objects [Neuman, 1992]. In our system model, all
distributed objects are equal, and there are no special root objects.
Objects which have a long lifetime (Time-to-Live) effectively serve as
root objects. Objects reachable from clients are assumed to be alive,
and unreachable objects are assumed garbage. Each object may have a
different reclamation method based on its characteristics. For
example, temporary objects are removed directly once they become
garbage. However, some important objects such as bank accounts are
moved to dormant account repository and notify the customers.
Our mechanism consists of two major steps: 1) acyclic garbage is
collected by timeouts; and 2) cyclic garbage is collected by
last referenceable timestamp propagation and backward
inquiry. We assume that all clocks in the network are synchronized
to some degree, and crashed networks and servers recover within a
finite period of time.

3.1 Timeouts

Information about references are maintained by timeouts which is the
same concept as that of RMI's lease. When an object is created
by a client, the client assigns a TTL (Time-To-Live) to the
object. The new object also gets an expiration time which is the
creation time plus the TTL. The TTL is the period for which the object
is guaranteed to exist after being refreshed (receiving a refresh
message). When a link is made to the object, the TTL is added to the
current time, and the resulting expiration time is stored in both the
object and the link. Since each link contains the expiration of the
target object, clients can send a refresh message before the
expiration time. To guarantee that a link will continue to work (the
target object remains valid), it must be refreshed before its
expiration.

Two optimization techniques are introduced in order to reduce the
communication overhead incurred by the refreshing process. First,
refresh messages are gathered and sent on a host-to-host basis rather
than a client-to-object basis. Second, refreshing frequency is
controlled by the likelihood of becoming garbage. [Lieberman & Hewitt, 1983] showed that the
likelihood of becoming garbage for a newly created object is higher
than that of an old one. Therefore, garbage collection should be
performed more frequently for newly created objects and less for old
objects. To controlled the frequency of refreshing, the TTL of an
object starts from a small number and is increased as the object is
accessed.

3.2 Last referenceable timestamp propagation and backward inquiry

Because timeouts do not collect cyclic garbage, last referenceable
timestamp propagation and backward inquiry are introduced. In
practice, distributed object spaces do not have special root objects
which are always alive. For that reason, our garbage collection
mechanism uses reachability from clients rather than from
roots. Cyclic garbage is reclaimed by two steps: last referenceable
timestamp propagation which detects local garbage; and backward
inquiry which performs partial synchronization and confirms garbage.

Each object maintains an LRTS (Last Referenceable TimeStamp) which
indicates the most recent time when the object was accessible (i.e.,
reachable) by clients. When an object is accessed by a client, the
LRTS of the object is set to the accessed time, and the LRTS (the
accessed time) is propagated to all reachable objects recursively when
links are refreshed. All reachable objects from the accessed object
will get new LRTSs eventually. The LRTS of inaccessible distributed
cyclic garbage, however, will stabilize at a value that is less than
or equal to the time at which the cycle became inaccessible. Objects
whose LRTSs are not more recent than a local threshold are
local garbage. Since each system selects its own local threshold, each
has a different threshold. Moreover, it is possible that LRTSs might
not arrive at target objects on time because they are sent only during
the link refreshing process. Therefore, before local garbage is
reclaimed, it should be confirmed to be garbage.

The basic idea of backward inquiry is that, before a local garbage
object is reclaimed, all referencing objects are examined to see
whether they are garbage or not. If all referencing objects are
garbage, then the object is also garbage and can be reclaimed. The
best way to check referencing objects is to ask them directly to
examine themselves whether they are garbage or not, and to reply with
results. Since referencing objects refresh their target objects before
expiration times, objects can receive refresh messages from all
referencing objects. This means that objects can receive information
about all referencing objects (i.e., an implicit reference
list) within their TTL times, even though they do not have an explicit
reference list.

Since messages necessary for cyclic garbage collection are bundled
with the messages used for timeouts, the overhead of cyclic garbage
collection is minimized.

3.3 Evaluation and implication

Our garbage collection mechanism satisfies the following properties:

The mechanism scales well in terms of the number of distributed
objects due to the two optimization techniques of the refreshing
mechanism.

No extra messages are necessary for cyclic garbage collection
because messages needed for cyclic garbage collection are bundled with
refresh messages.

The latency time to collect garbage increases in proportion to
the length of the garbage chain, which is scalable.

All garbage including distributed cyclic garbage is reclaimed.

Our mechanism does not require global synchronization of all
distributed systems.

We have a plan to apply our garbage collection mechanism on practical
distributed persistent object systems such as CORBA, DCOM and RMI. We
expect that our garbage collection mechanism will work well for
distributed persistent object systems.