"... tutorial article, which has been submitted for publication in a journal or for consideration by the commissioning organization. The report represents the ideas of its author, and should not be taken as the official views of the School or the University. Any discussion of the content of the report sh ..."

tutorial article, which has been submitted for publication in a journal or for consideration by the commissioning organization. The report represents the ideas of its author, and should not be taken as the official views of the School or the University. Any discussion of the content of the report should be sent to the author, at the address shown on the cover.

...ithm Our HILBASR algorithm guarantees that the probability of identifying the query initiator is always bounded by 1/K, even if the attacker knows the locations of all users. HILBASR uses the Hilbert =-=[6]-=- ordering to group users into buckets of K. The Hilbert space filling curve transforms the 2-D coordinates of each user into an 1-D value. With high probability, if two points are in close proximity i...

"... Network coordinates provide a mechanism for selecting and placing servers efficiently in a large distributed system. This approach works well as long as the coordinates continue to accurately reflect network topology. We conducted a long-term study of a subset of a million-plus node coordinate syste ..."

Network coordinates provide a mechanism for selecting and placing servers efficiently in a large distributed system. This approach works well as long as the coordinates continue to accurately reflect network topology. We conducted a long-term study of a subset of a million-plus node coordinate system and found that it exhibited some of the problems for which network coordinates are frequently criticized, for example, inaccuracy and fragility in the presence of violations of the triangle inequality. Fortunately, we show that several simple techniques remedy many of these problems. Using the Azureus BitTorrent network as our testbed, we show that live, large-scale network coordinate systems behave differently than their tame PlanetLab and simulation-based counterparts. We find higher relative errors, more triangle inequality violations, and higher churn. We present and evaluate a number of techniques that, when applied to Azureus, efficiently produce accurate and stable network coordinates. 1

...hat applications that use them often need to make assumptions on maximum distances away from the “true” origin. For example, one could use Hilbert functions to map coordinates into a single dimension =-=[4]-=-. This requires an a priori estimate of the maximum volume the coordinates may fill up. Mapping functions like Hilbert require that the current centroid not drift from the origin without bound. Drift ...

"... . This paper presents and discusses a radically different approach to multi-dimensional indexing based on the concept of the spacefilling curve. It reports the novel algorithms which had to be developed to create the first actual implementation of a system based on this approach, on some compara ..."

. This paper presents and discusses a radically different approach to multi-dimensional indexing based on the concept of the spacefilling curve. It reports the novel algorithms which had to be developed to create the first actual implementation of a system based on this approach, on some comparative performance tests, and on its actual use within the TriStarp Group at Birkbeck to provide a Triple Store repository. An important result that goes beyond this requirement, however, is that the performance improvement over the Grid File is greater the higher the dimension. 1 Introduction Underlying any dbms is some form of repository management system or data store. The classic and dominant model for such repositories is that of some form of logical record or data aggregate type with a collection of instances conforming to that type usually termed a file. Such file systems are, of course, also used directly in many applications. The data model of a dbms may be radically different f...

...n up to 10 dimensions for the Hilbert Curve, beyond which memory requirements become prohibitive. In higher dimensional space, the calculation method of mapping from one to n dimensions given by Butz =-=[2]-=- is used. Some improvements to Butz' method are given by Lawder, together with details of the inverse mapping. 8 Query Execution 8.1 Overall Approach We noted earlier that a page represents a section ...

"... In this paper a new definition of distance-based outlier and an algorithm, called HilOut, designed to efficiently detect the top n outliers of a large and high-dimensional data set are proposed. Given an integer k, the weight of a point is defined as the sum of the distances separating it from its k ..."

In this paper a new definition of distance-based outlier and an algorithm, called HilOut, designed to efficiently detect the top n outliers of a large and high-dimensional data set are proposed. Given an integer k, the weight of a point is defined as the sum of the distances separating it from its k nearest-neighbors. Outlier are those points scoring the largest values of weight. The algorithm HilOut makes use of the notion of space-filling curve to linearize the data set, and it consists of two phases. The first phase provides an approximate solution, within a rough factor, after the execution of at most d + 1 sorts and scans of the data set, with temporal cost quadratic in d and linear in N and in k, where d is the number of dimensions of the data set and N is the number of points in the data set. During this phase, the algorithm isolates points candidate to be outliers and reduces this set at each iteration. If the size of this set becomes n, then the algorithm stops reporting the exact solution. The second phase calculates the exact solution with a final scan examining further the candidate outliers remained after the first phase. Experimental results show that the algorithm always stops, reporting the exact solution, during the first phase after much less than d + 1 steps. We present both an in-memory and disk-based implementation of the HilOut algorithm and a thorough scaling analysis for real and synthetic data sets showing that the algorithm scales well in both cases.

...ubes whose center-points are considered as points in a space of finite granularity. Let p be a point in D. The inverse image of p under this mapping is called its Hilbert value and is denoted by H(p) =-=[8, 18, 23]-=-. Let DB be a set of points in D. These points can be sorted according to the order in which the curve passes through them. We denote by H(DB) the set {H(p) | p ∈ DB} sorted with respect to the order ...

"... The top-k dominating query returns k data objects which dominate the highest number of objects in a dataset. This query is an important tool for decision support since it provides data analysts an intuitive way for finding significant objects. In addition, it combines the advantages of top-k and sky ..."

The top-k dominating query returns k data objects which dominate the highest number of objects in a dataset. This query is an important tool for decision support since it provides data analysts an intuitive way for finding significant objects. In addition, it combines the advantages of top-k and skyline queries without sharing their disadvantages: (i) the output size can be controlled, (ii) no ranking functions need to be specified by users, and (iii) the result is independent of the scales at different dimensions. Despite their importance, top-k dominating queries have not received adequate attention from the research community. In this paper, we design specialized algorithms that apply on indexed multi-dimensional data and fully exploit the characteristics of the problem. Experiments on synthetic datasets demonstrate that our algorithms significantly outperform a previous skyline-based approach, while our results on real datasets show the meaningfulness of top-k dominating queries.

"... Mapping to one-dimensional values and then using a onedimensional indexing method has been proposed as a way of indexing multi-dimensional data. Most previous related work uses the Z-Order Curve but more recently the Hilbert Curve has been considered since it has superior clustering properties. Any ..."

Mapping to one-dimensional values and then using a onedimensional indexing method has been proposed as a way of indexing multi-dimensional data. Most previous related work uses the Z-Order Curve but more recently the Hilbert Curve has been considered since it has superior clustering properties. Any approach, however, can only be of practical value if there are e ective methods for executing range and partial match queries. This paper describes such amethod for the Hilbert Curve. 1

... by Bially [1]. In higher than about 9 dimensions, state diagrams also become too large to accommodate in memory and mappings need to be calculated instead, for example in the manner detailed by Butz =-=[2]-=- and developed by Lawder [12]. 4 Application of the Hilbert Curve We now describe how the Hilbert Curve is used in a practical application. We refer to actual records placed in a data le as datum-poin...

"... Data partitioning and load balancing are important components of parallel computations. Many different partitioning strategies have been developed, with great effectiveness in parallel applications. But the load-balancing problem is not yet solved completely; new applications and architectures requi ..."

Data partitioning and load balancing are important components of parallel computations. Many different partitioning strategies have been developed, with great effectiveness in parallel applications. But the load-balancing problem is not yet solved completely; new applications and architectures require new partitioning features. Existing algorithms must be enhanced to support more complex applications. New models are needed for non-square, non-symmetric, and highly connected systems arising from applications in biology, circuits, and materials simulations. Increased use of heterogeneous computing architectures requires partitioners that account for non-uniform computing, network, and memory resources. And, for greatest impact, these new capabilities must be delivered in toolkits that are robust, easy-to-use, and applicable to a wide range of applications. In this paper, we discuss our approaches to addressing these issues within the Zoltan Parallel Data Services toolkit.

...eveloped software for several query methods (including variations of box-assignment) and fast conversion between Hilbert SFC keys 7sand spatial coordinates; his work was based on earlier work by Butz =-=[35]-=- and Thomas [36]. Lawder [37,38] presents a Hilbert-like SFC and practical conversions and spatial queries for high-dimensional database access; while his algorithms would work for a true Hilbert SFC,...

"... Fractal image compression allows fast decoding but suffers from long encoding times. During the encoding a large number of sequential searches through a list of domains (portions of the image) are carried out while trying to find a best match for another image portion called range. In this article ..."

Fractal image compression allows fast decoding but suffers from long encoding times. During the encoding a large number of sequential searches through a list of domains (portions of the image) are carried out while trying to find a best match for another image portion called range. In this article we review and extend the methods that have been developed to reduce the time complexity of this searching. Also we present a new taxonomy of the methods, provide an evaluation and propose two new techniques.

"... Photon mapping is useful in the acceleration of global illumination and caustic effects computed by path tracing. For hardware accelerated rendering, photon maps would be especially useful for simulating caustic lighting effects on non-Lambertian surfaces. For this to be possible, an efficient hardw ..."

Photon mapping is useful in the acceleration of global illumination and caustic effects computed by path tracing. For hardware accelerated rendering, photon maps would be especially useful for simulating caustic lighting effects on non-Lambertian surfaces. For this to be possible, an efficient hardware algorithm for the computation of the k nearest neighbours to a sample point is required. Existing

"... The top-k dominating query returns k data objects which dominate the highest number of objects in a dataset. This query is an important tool for decision support since it provides data analysts an intuitive way for finding significant objects. In addition, it combines the advantages of top-k and sky ..."

The top-k dominating query returns k data objects which dominate the highest number of objects in a dataset. This query is an important tool for decision support since it provides data analysts an intuitive way for finding significant objects. In addition, it combines the advantages of top-k and skyline queries without sharing their disadvantages: (i) the output size can be controlled, (ii) no ranking functions need to be specified by users, and (iii) the result is independent of the scales at different dimensions. Despite their importance, top-k dominating queries have not received adequate attention from the research community. This paper is an extensive study on the evaluation of topk dominating queries. First, we propose a set of algorithms that apply on indexed multi-dimensional data. Second, we investigate query evaluation on data that are not indexed. Finally, we study a relaxed variant of the query which considers dominance in dimensional subspaces. Experiments using synthetic and real datasets demonstrate that our algorithms significantly outperform a previous skyline-based approach. We also illustrate the applicability of this multi-dimensional analysis query by studying the meaningfulness of its results on real data.