Harvard Computer Science Technical Reports for 1990

TR-01-90

Donald Beaver. 1990. ``Perfect Privacy for Two-Party Protocols.''

We examine the problem of computing functions in a distributed
manner based on private inputs, revealing the value of a function
without revealing any additional information about the inputs. A
function $f(x_1,ots,x_n)$ is $t$-private if there is a protocol
whereby $n$ parties, each We holding an input value, can compute
$f,$ and no subset of $t$ or fewer parties gains any information
other than what is computable from $f(x_1,ots,x_n).$ The class
of $t$-private {\em boolean} functions for $t \geqlceil \frac {n}
{2} \rceil$ was described in {ck89} . We give a characterization
of 1-private functions for two parties, without the restriction to
the boolean case. We also examine a restricted form of private
computation in the $n$-party case, and show that addition is the
only $t$-privately computable function in that model. Incorrect
proofs of this characterization appeared in [Kushilevitz,1989] and
an earlier technical report [Beaver,1989]. We present a different
proof which avoids the errors of those works. This report
supersedes Harvard Technical Report TR-11-89.

TR-02-90

David Albert. 1990. ``Cooperative Dialogues While Playing Adventure.''

To collect examples of the use of hedging phrases in informal
conversation, three pairs of subjects were (separately) asked to
cooperate in playing the computer game {\em Adventure} . While
one typed commands into the computer, the two engaged in
conversation about their options and best strategy. The
conversation was recorded on audio tape, and their computer
session was saved in a disk file. Subsequently, the audio tape
was transcribed and merged with the computer record of the game,
to produce a combined transcript showing what was typed along with
a simultaneous running commentary by the participants. This
report contains the complete transcripts and a discussion of the
collection and transcription methodology.

TR-03-90

Robert Muller. 1990. ``M-LISP: A Representation Independent Dialect of LISP with
Reduction Semantics.''

In this paper we propose to reconstruct LISP from first principles
with an eye toward reconciling S-expression LISP's metalinguistic
facilities with the kind of operational semantics advocated by
Plotkin {pl81} . After reviewing the original definition of LISP
we define the abstract syntax and the operational semantics of the
new dialect, M-LISP, and show that its equational theory is
consistent. Next we develop the operational semantics of an
extension of M-LISP which features an explicitly callable {\em
eval} and {\em fexprs} (i.e., procedures whose arguments are
passed {\em by-representation} ). Since M-LISP is independent of
any representation of its programs it has no {\em quotation}
operator or any of its related forms. To compensate for this we
encapsulate the shifting between mention and use which is
performed globally by {\em quote} within the metalinguistic
constructs that require it. The resulting equational system is
shown to be inconsistent. We leave it as an open problem to find
confluent variants of the metalinguistic constructs.

TR-04-90

Robert Muller. 1990. ``Syntax Macros in M-LISP: A Representation Independent Dialect of
LISP with Reduction Semantics.''

In this paper we consider syntactic abstraction in M-LISP, a
dialect of LISP which is independent of any representation of its
programs. Since it is independent of McCarthy's original Meta-
expression representation scheme, M-LISP has no {\em quotation}
form or any of its related forms {\em backquote} , {\em unquote}
or {\em unquote-splicing} . Given that LISP macro systems depend
on this latter representation model, it is not obvious how to base
syntax extensions in M-LISP. Our approach is based on an
adaptation of the {\em Macro-by-Example} {kowa87} and {\em
Hygienic} algorithms {kofrfedu86} . The adaptation for the
quotation-free syntactic structure of M-LISP yields a
substantially different model of syntax macros. The most
important difference is that $lambda$ binding patterns become
apparent when an abstraction is first (i.e., partially)
transcribed in the syntax tree. This allows us to define tighter
restrictions on the capture of identifiers. This is not possible
in S-expression dialects such as Scheme since $lambda$ binding
patterns are not apparent until the tree is completely
transcribed.

In this paper we describe a new semantic
metalanguage which simplifies prototyping of programming
languages. The system integrates Paulson's semantic grammars
within a new dialect of LISP, M-LISP, which has somewhat closer
connections to the $lambda$-calculus than other LISP dialects
such as Scheme. The semantic grammars are expressed as attribute
grammars. The generated parsers are M-LISP functions that can
return denotational (i.e., higher-order) representations of
abstract syntax. We illustrate the system with several examples
and compare it to related systems.

The vital role that embedded systems are playing and will
continue to play in our world, coupled with their increasingly
complex and critical nature, demand a rigorous and systematic
treatment that recognizes their unique requirements. The Time-
constrained Reactive Automaton (TRA) is a formal model of
computation that admits these requirements. Using the TRA model,
an embedded system is viewed as a set of {\em asynchronously}
interacting automata (TRAs), each representing an {\em autonomous}
system entity. TRAs are {\em input enabled} ; they communicate by
signaling events on their {\em output channels} and by reacting
to events signaled on their {\em input channels} . The behavior of
a TRA is governed by {\em time-constrained causal relationships}
between {\em computation-triggering} events. The TRA model is
{\em compositional} and allows time, control, and computation
{\em non-determinism} . In this paper we present ESPRIT, a
specification language that is entirely based on the TRA model. We
have developed a compiler that allows ESPRIT specifications to be
executed in simulated time, thus, providing a valuable validation
tool for embedded system specifications. We are currently
developing another compiler that would allow the execution of
ESPRIT specifications in real-time, thus, making it possible to
write real-time programs directly in ESPRIT.

This paper describes (1) a finite-state model for asynchronous
systems in which the time delays between the scheduling and
occurrence of the events that cause state changes are constrained
to fall between fixed numerical upper and lower time bounds; (2) a
branching-time temporal logic suitable for describing the temporal
and logical properties of asynchronous systems, for which the
structures of (1) are the natural models; and (3) a functional
verification system for asynchronous circuits, which generates,
from a boolean circuit with general feedback and specified min/max
rise and fall times for the gates, a finite-state structure as in
(1), and then exhaustively checks a formal specification of that
circuit in the language (2) against that finite-state model.

Natural language generation systems should customize object
descriptions according to the extent of their user's domain and
lexical knowledge. The task of generating customized descriptions
is formalized as a task of finding descriptions that are {\it
accurate} (truthful), {\it valid} (fulfill the speaker's
communicative goal), and {\it free of false implicatures} (do not
give rise to unintended conversational implicatures) with respect
to the current user model. An algorithm that generates
descriptions that meet these constraints is described, and the
computational complexity of the generation problem is discussed.

Referring expressions and other object descriptions should be
maximal under the Local Brevity, No Unnecessary Components, and
Lexical Preference preference rules; otherwise, they may lead
hearers to infer unwanted conversational implicatures. These
preference rules can be incorporated into a polynomial time
generation algorithm, while some alternative formalizations of
conversational implicature make the generation task NP-Hard.

Natural language generation (NLG) systems must produce different
utterances for users with different amounts of domain and lexical
knowledge. A utterance that is meant to be read by an expert
should use technical vocabulary, and avoid explicitly mentioning
facts the expert can immediately infer from the rest of the
utterance. In contrast, an utterance that is meant to be read by
a novice should avoid specialized vocabulary, and may be required
to explicitly mention facts that would be obvious to an expert.
An NLG system that does not customize utterances according to its
user's domain and lexical knowledge may generate text that is
incomprehensible to a novice, or text that leads an expert to
infer unwanted {\it conversational implicatures} (Grice 1975).
This thesis examines the problem of generating attributive
descriptions of individuals, that is object descriptions that are
intended to inform the user that a particular object has certain
attributes. It proposes that such descriptions will be appropriate
for a particular user if they are {\it accurate, valid} , and
{\it free of false implicatures} with respect to a user-model
that represents that user's relevant domain and lexical
knowledge. Descriptions are represented as definitions of KL-ONE-
type (Brachman and Schmolze 1985) classes, and a description is
called {\it accurate} if it defines a class that subsumes the
object being described; {\it valid} if every attribute the
system wishes to communicate is either part of the description or
a default attribute that is inherited by the class defined by the
description; and {\it free of false implicatures} if it is
maximal under three preference rules: No Unnecessary Components,
Local Brevity, and Lexical Preference.

We prove several lattice theoretical fixpoint theorems based on
the classical result of Knaster-Tarski. These theorems give
sufficient conditions for a system of generally nonmonotonic
functions, on a complete lattice to define a unique fixpoint. The
primary objective of this paper is to develop a domain theoretic
framework to study the semantics of general logic programs as well
as various rule-based systems where the rules define nonmonotonic
functions on lattices.

TR-12-90

Joe Marks. 1990. ``A Syntax and Semantics for Network Diagrams.''

The ability to automatically design graphical displays of data
will be important for the next generation of interactive computer
systems. The research reported here concerns the automated design
of network diagrams, one of the three main classes of symbolic
graphical display (the other two being chart graphs and maps).
Previous notions of syntax and semantics for network diagrams are
not adequate for automating the design of this kind of graphical
display. I present here a new formulation of syntax and semantics
for network diagrams that is used in the ANDD (Automated Network
Diagram Designer) system. The syntactic formulation differs from
previous work in two significant ways: perceptual-organization
phenomena are explicitly represented, and syntax is described in
terms of constraints rather than as a grammar of term-rewriting
rules. The semantic formulation is based on an application-
independent model of network systems that can be used to model
many real-world applications. The paper includes examples that
show how these concepts are used by ANDD to automatically design
network diagrams.

We have developed two systems, FN and ANDD, that use natural
language and graphical displays, respectively, to communicate
information about objects to human users. Both systems must deal
with the fundamental problem of ensuring that their output does
not carry unwanted and inappropriate conversational implicatures.
We describe the types of conversational implicatures that FN and
ANDD can avoid, and the computational strategies the two systems
use to generate output that is free of unwanted implicatures.

Agents collaborating to achieve a goal bring to their joint
activity different beliefs about ways in which to achieve the goal
and the actions necessary for doing so. Thus, a model of
collaboration must provide a way of representing and
distinguishing among agents' beliefs and of stating the ways in
which the intentions of different agents contribute to achieving
their goal. Furthermore, in collaborative activity, collaboration
occurs in the planning process itself. Thus, rather than
modelling plan recognition, per se, what must be modelled is the
{\it augmentation} of beliefs about the actions of multiple agents
and their intentions. In this paper, we modify and expand the
SharedPlan model of collaborative behavior (Grosz & Sidner 1990).
We present an algorithm for updating an agent's beliefs about a
partial SharedPlan and describe an initial implementation of this
algorithm in the domain of network management.

Efficient schemes for the following issues in parallel processing
are presented: fast communication, low congestion, fault
tolerance, simulation of ideal parallel computation models,
synchronization in asynchronous networks, low sensitivity to
variations in component speed, and on-line maintenance. All our
schemes employ Rabin's information dispersal idea. We also
develop an efficient information dispersal algorithm (IDA) based
on the Fast Fourier Transform and an IDA-based voting scheme to
enforce fault tolerance. Let $N$ denote the size of the hypercube
network. We present a randomized communication scheme, FSRA (for
``Fault-tolerant Subcube Routing Algorithm''), that routes in $2
dot log N + 1$ time using only constant size buffers and with
probability of success $1 - N^ {-\Theta (log N)} $. (All log's
are to the base 2.) FSRA also tolerates $O(N)$ random link
failures with high probability. Similar results are also obtained
for the de Bruijn and the butterfly networks (without fault
tolerance in the latter case). FSRA is employed to simulate,
without using hashing, a class of CRCW PRAM (concurrent-read
concurrent-write parallel random access machine) programs with a
slowdown of $O(log N)$ with almost certainty if combining is
used. A fault-tolerant simulation scheme for general CRCW PRAM
programs is also presented. A simple acknowledgement synchronizer
can make all our routing schemes in this dissertation run on
asynchronous networks without loss of efficiency. We further show
that speed of any component--be it a processor or a link--has only
linear impact on the run-time of FSRA; that is, the extra delay in
run-time is only proportional to the drift in the component's
delay and is independent of the size of the network. On-line
maintainability makes the machine more available to the user. We
show that, under FSRA, a constant fraction of the links can be
disabled with essentially no impact on the routing performance.
This result immediately suggests several efficient maintenance
procedures. Based on the above results, a fault-tolerant parallel
computing system, called HPC (for ``hypercube parallel
computer''), is sketched at the end of this dissertation.

We consider a learning problem for the representation class of
$\mu$-formula decision trees, a generalization of $\mu$-formulas
and $\mu$-decision trees. (The ``$\mu$'' form of a representation
has the restriction that no variable appears more than once). The
learning model is one of exact identification by oracle queries,
where learner's goal is to discover an unknown function by asking
membership queries (is the function true on some specified input?)
and equivalence queries (is the function identical to some
hypothesis we present, and if not what is an input on which they
differ?). We present an identification algorithm using these two
types of queries that runs in time polynomial in the number of
variables, we and show that no such polynomial time algorithm
exists that uses either membership or equivalence queries alone
(in the latter case under the stipulation that the hypotheses are
drawn from the same representation class). We further extend the
algorithm to identify a broader class where the formulas taken
over a more powerful basis including arbitrary threshold gates.

Graphical scene modeling is usually a time-consuming, tedious, and
expensive manual activity. This paper proposes an approach to
partially automating the process though a paradigm for cooperative
user-computer scene modeling that we call {\em cooperative
computer-aided design} (CCAD). Formal grammars, referred to as
{\em schema grammars} , are used to imbue the modeling system with
an elementary ``understanding'' of the kinds of scenes to be
created. The grammar interpreter constructs part or all of the
scene after accepting from the user partially completed scene
components and descriptions of scene properties that must be
incorporated into the final scene. This approach to modeling
harnesses the power of the computer to construct scene detail,
thereby freeing the human user to focus on essential creative
decisions. This paper describes the structure and interpretation
of schema grammars, and provides techniques for controlling the
combinatorial explosion that can result from the undirected
interpretation of the grammars. CCAD is explored in the context of
two experimental systems---FLATS, for architectural design, and
LG, for modeling landscapes.

Design activity is often characterized by a search in which the
designer examines various alternatives at several stages during
the design process. Current computer-aided design (CAD) systems,
however, provide very little support for this exploratory aspect
of design. My research provides the foundation for {\em
cooperative computer-aided design (CCAD)} ---a novel CAD
technology that intersperses partial exploration of design
alternatives by the computer with guiding design operations by the
system user. CCAD combines the strengths of manual and automated
design and modeling by allowing the user to make creative
decisions and provide specialized detail, while exploiting the
power of the machine to explore many design alternatives and
create detail that is resonant with the user's design decisions.
In the CCAD paradigm, the user expresses initial design decisions
in the form of a partial design and a set of properties that the
final design must satisfy. The user then initiates the generation
by the system of alternative {\em partial} developments of the
initial design subject to a ``language'' of valid designs. The
results are then structured in a spatial framework through which
the user moves to explore the alternatives. The user selects the
most promising partial design, refines it manually, and then
requests further automatic development. This process continues
until a satisfactory complete design is created. I present the
interpretation of schema grammars---a class of generative grammars
for manipulating graphical objects---as the fundamental generative
mechanism underlying CCAD. I describe in detail several mechanisms
for providing user control over the generative process, thereby
controlling the combinatorial explosion inherent in an
unrestricted, undirected interpretation of generative grammars. I
explore graphical browsing as a facility for efficiently perusing
the set of design alternatives generated by the system. I also
describe FLATS ( {f F} loor plan {f LA} you {f T} {f S}
ystem)---a prototype CCAD system for the design of small
architectural floor plans.

TR-19-90

William A. Woods. 1990. ``Understanding Subsumption and Taxonomy: A Framework for Progress.''

This paper continues a theme begun in my paper, ``What's in a
Link,'' -- seeking a solid foundation for network representations
of knowledge. Like its predecessor, its goal is to clarify issues
and establish a framework for progress. The paper analyzes the
concepts of subsumption and taxonomy and synthesizes a framework
that integrates and clarifies many previous approaches and goes
beyond them to provide an account of abstract and partially
defined concepts. The distinction between definition and
assertion is reinterpreted in a framework that accommodates
probabilistic and default rules as well as universal claims and
abstract and partial definitions. Conceptual taxonomies in this
framework are shown to be useful for indexing and organizing
information and for managing the resolution of conflicting
defaults. The paper introduces a distinction between intensional
and extensional subsumption and argues for the importance of the
former. It presents a classification algorithm based on
intensional subsumption and shows that its typical case complexity
is logarithmic in the size of the knowledge base.

TR-20-90

William A. Woods and James G. Schmolze. 1990. ``The KL-ONE Family.''

The knowledge representation system KL-ONE has been one of the
most influential and imitated knowledge representation systems in
the Artificial Intelligence community. Begun at Bolt Beranek and
Newman in 1978, KL-ONE pioneered the development of taxonomic
representations that can automatically classify and assimilate new
concepts based on a criterion of terminological subsumption. This
theme generated considerable interest in both the formal community
and a large community of potential users. The KL-ONE community
has since expanded to include many systems at many institutions
and in many different countries. This paper introduces the KL-ONE
family and discusses some of the main themes explored by KL-ONE
and its successors. We give an overview of current research,
describe some of the systems that have been developed, and outline
some future research directions.

The $s$-session problem is studied in {\em asynchronous} and
{\em semi-synchronous} networks. Processes are located at the
nodes of an undirected graph $G$ and communicate by sending
messages along links that correspond to the edges of $G$. A
session is a part of an execution in which each process takes at
least one step; an algorithm for the $s$-session problem
guarantees the existence of at least $s$ disjoint sessions. The
existence of many sessions guarantees a degree of interleaving
which is necessary for certain computations. It is assumed that
the (real) time for message delivery is at most $d$. In the
asynchronous model, it is assumed that the time between any two
consecutive steps of any process is in the interval $[0,1]$; in
the semi-synchronous model, the time between any two consecutive
steps of any process is in the interval $[c,1]$ for some $c$ such
that $0 < c leq 1$, the {\em synchronous} model being the
special case where $c=1$. In the {\em initialized} case of the
problem, all processes are initially synchronized and take a step
at time 0. For the asynchronous model, an fupper bound of
$diam(G)(d+1)(s-1)$ and a lower bound of $diam(G)d(s-1)$ are
presented; $diam(G)$ is the {\em diameter} of $G$. For the semi-
synchronous model, an upper bound of $1+\min{lfloor \frac {1}
{c} \rfloor +1, diam(G)(d+1)} (s-2)$ is presented. The main
result of the paper is a lower bound of $1+\min{lfloor \frac
{1} {2c} \rfloor, diam(G)d} (s-2)$ for the time complexity of any
semi-synchronous algorithm for the $s$-session problem, under the
assumption that $d \geq \frac {d} {\min{lfloor\frac {1} {2c}
\rfloor, diam(G)d}} + 2$. These results imply a time separation
between semi-synchronous (in particular, synchronous) and
asynchronous networks. Similar results are proved for the case
where delays are not uniform. In the {\em uninitialized} case of
the problem, all processes but one, the {\em initiator} , start
in a {\em quiescent} state which they may leave upon receiving a
message. Similar results are proved for this case.

TR-22-90

Azer Bestavros and Thomas Cheatham. 1990. ``Efficient execution of homogeneous tasks with unequal run times on
the Connection Machine.''

A lot of scientific applications require the execution of a large
number of identical {\em tasks} , each on a different set of data.
Such applications can easily benefit from the power of SIMD
architectures ( {\em e.g. the Connection Machine} ) by having the
array of processing elements (PEs) execute the task in parallel on
the different data sets. It is often the case, however, that the
task to be performed involves the repetitive application of the
same sequence of steps, {\em a body} , for a number of times that
depend on the input or computed data. If the usual {\em task-level
synchronization} is used, the utilization of the array of PEs
degrades substantially. In this paper, we propose a {\em body-
level synchronization} scheme that would boost the utilization of
the array of PEs while keeping the required overhead to a minimum.
We mathematically analyze the proposed technique and show how to
optimize its performance for a given application. Our technique is
especially efficient when the number of tasks to be executed is
much larger than the number of physical PEs available.

Intelligent agents collaborating to achieve a common goal direct a
significant portion of their effort to planning together. Because
having a plan to perform a given task involves having knowledge
about the ways in which the performance of a set of actions will
lead to the performance of that task, the representation of
actions and of relations among them is a central concern. This
paper provides a formalism for representing act-type relations and
complex act-type constructors in multiagent domains. To determine
a set of relations that would span the space of complex
collaborative actions, I analyzed a videotape of two people
building a piece of furniture, and identified a group of relations
that are adequate to represent the relationships among the actions
occurring in the data. The definitions provided here are more
complex than those used in earlier studies; in particular, I
refine and expand Pollack's (1986) set of act-type relations
(generation and enablement), provide constructors for building
complex act-types from (simpler) act-types (simultaneity,
conjunction, sequence and iteration), and make it possible to
represent the joint actions of multiple agents.

We present efficient and practical algorithms for a large,
distributed system of processors to achieve reliable computations
in a secure manner. Specifically, we address the problem of
computing a general function of several private inputs distributed
among the processors of a network, while ensuring the correctness
of the results and the privacy of the inputs, despite accidental
or malicious faults in the system. Communication is often the
most significant bottleneck in distributed computing. Our
algorithms maintain a low cost in local processing time, are the
first to achieve optimal levels of fault-tolerance, and most
importantly, have low communication complexity. In contrast to
the best known previous methods, which require large numbers of
rounds even for fairly simple computations, we devise protocols
that use small messages and a constant number of rounds {\em
regardless} of the complexity of the function to be computed.
Through direct algebraic approaches, we separate the {\em
communication complexity of secure computing} from the {\em
computational complexity} of the function to be computed. We
examine security under both the modern approach of computational
complexity-based cryptography and the classical approach of
unconditional, information-theoretic security. We develop a clear
and concise set of definitions that support formal proofs of
claims to security, addressing an important deficiency in the
literature. Our protocols are provably secure. In the realm of
information-theoretic security, we characterize those functions
which two parties can compute jointly with absolute privacy. We
also characterize those functions which a weak processor can
compute using the aid of powerful processors without having to
reveal the instances of the problem it would like to solve. Our
methods include a promising new technique called a {\em locally
random reduction} , which has given rise not only to efficient
solutions for many of the problems considered in this work but to
several powerful new results in complexity theory.

This thesis examines the problem of interprocessor communication
in realistic parallel computers. In particular, we consider the
problem of permutation routing and its generalizations in the
mesh, hypercube and butterfly networks. Building on previous
research, we derive lower bounds for a wide class of deterministic
routing algorithms which imply that such algorithms create heavy
traffic congestion. In contrast, we show that randomized routing
algorithms result in both efficient and optimal upper bounds in
the above networks. Experiments were also performed to test the
behaviour of the randomized algorithms. These experiments suggest
interesting theoretical problems. We also examine the problem of
efficient interprocessor communication in a model suggested by
recent advances in optical computing. The main argument of this
thesis is that communication can be made efficient if
randomization is used in the routing algorithms.