Mastergradsoppgaver i informatikkhttp://hdl.handle.net/10037/222
Tue, 03 Mar 2015 18:48:21 GMT2015-03-03T18:48:21ZControlled sharing of body-sensor data for sports
analytics using code consent capabilitieshttp://hdl.handle.net/10037/6502
Zhang, Wei<br />
With the advent of body sensor technology, athletes can easily record individual
physiological metrics such as heart rate, steps, and blood sugar. In parallel,
there is an increasing number of web services that use the raw body-sensor
data as input to sports analytics. For the individual athletes, this can yield
valuable insights on their performance and suggestions on individual training
programs, which consequently aid their development.
Once the data is imported into these analytics systems, the athletes are however
left with little control over their data. This thesis presents code consent,
a user-centric mechanism which combines informed consent and capabilities
to enables athletes to share their private data in a more controllable manner.
Furthermore, it gives both the athletes and analytical services the extensibility,
flexibility to delegate the authority across protect domains by chaining keyed
cryptographic hashes.
The action and terms of informed consent are transformed to the reference
to the source code and attributes of a capability. When executing a capability,
the policy of access control to the resource is enforced, and the operation to
the resource is performed in OpenCPU server which is a R sandbox. With a
use case, we demonstrate now a user is able to share with others a graph of his
aggregated data by delegating a capability. This paper details the implementation
of constructing a code consent capability, and verification, delegation,
execution of a capability. The security of the prototype is also discussed when
users revokes capabilities. In the prototype implementation, we also evaluate
the end-to-end latency of executing a capability, which includes the time of
verifying the signature, the time of executing the program code, as well as
downloading the output file. The analysis of the performance guides us to
investigate the optimization of our prototype such as capability cache and
function chaining.<br />
Thu, 15 May 2014 00:00:00 GMThttp://hdl.handle.net/10037/65022014-05-15T00:00:00ZROPE: Reducing the Omni-kernel Power Expenseshttp://hdl.handle.net/10037/6389
Karlberg, Jan-Ove<br />
Over the last decade, power consumption and energy efficiency have arisen as important performance metrics for data center computing systems hosting cloud services. The incentives for reducing power consumption are several, and often span economic, technological, and environmental dimensions.
Because of the vast number of computers currently employed in data centers, the economy of scale dictates that even small reductions in power expenditure on machine level can amount to large energy savings on data center scale.
Clouds commonly employ hardware virtualization technologies to allow for higher degrees of utilization of the physical hardware. The workloads encapsulated by virtual machines constantly compete for the available physical hardware resources of their host machines. To prevent execution of one workload from seizing resources that are intended for another, absolute visibility and opportunity for resource allocation is necessary. The Omni-kernel architecture is a novel operating system architecture especially designed for pervasive monitoring and scheduling. Vortex is an experimental implementation this architecture.
This thesis describes ROPE (Reducing the Omni-kernel Power Expenses), which introduces power management functionality to the Vortex implementation of the Omni-kernel architecture. ROPE reduces the power consumption of Vortex, and does so while limiting performance degradation. We evaluate the energy savings and performance impacts of deploying ROPE using both custom tailored and industry standard benchmarks. We also discuss the implications of the Omni-kernel architecture with regards to power management, and how energy efficiency can be accommodated in this architecture.<br />
Thu, 15 May 2014 00:00:00 GMThttp://hdl.handle.net/10037/63892014-05-15T00:00:00ZRS-Seismic Processing and Web-Visualizationhttp://hdl.handle.net/10037/6388
Pedersen, Tom Arne<br />
The University of Tromsø is conducting regular marine seismic acquisition cruises for scientific and educational purposes in the polar regions of the Norwegian Sea and beyond. Leading experts in the field currently employed by the Department of Geology at ui, have found the current seismic visualization tools lacking in several fields. The available seismic software provides a multitude of settings and filters to improve visualization, but has limitations when it comes to user friendliness, and lacks or has shortcomings for data interaction. Because of these limitations the scientists are still reverting back to using thermal paper plots for seismic data interaction. The thermal printers in the possession of UIT are old, bulky and prone to mechanical failure, and are expensive to replace.
The work done in this thesis attempts to address the needs of the Department of Geology, and as a response to these needs presents the system RS. A system to process, visualize and interact with both "live" and previously recorded seismic 2D data. RS provides processing and filtering of seismic data, and presents this data in a web based user interface, using an Open Source JavaScript Tile viewer to visualize the seismic data. RS also consists of among other: a Go two tiered backend system, a custom built tile maker, web sockets for bidirectional communication and NoSQL database storage.
The goal of the system is to provide a new visualization tool to replace or supplement current visualization platforms. RS does this by presenting a visualization client which can run on any device with a modern browser, giving every authorized person on a seismic vessel the ability to view and interact with seismic data.
To present seismic plots to the end user, RS uses binary seismic data acquired by existing seismic software and hardware. This data is a combination of the seismic data and a variety of metadata from sensors aboard the vessel. Data is filtered for noise, and cached before images are created. These images are served by a web server, and made available to the end user through his preferred platform.<br />
Thu, 15 May 2014 00:00:00 GMThttp://hdl.handle.net/10037/63882014-05-15T00:00:00ZLarge Multiples :
exploring the large-scale scattergun approach to visualization and analysishttp://hdl.handle.net/10037/6383
Holsbø, Einar<br />
We create 2.5 quintillion bytes of data every day. A whole 90% of the world’s data was created in the last two years.1 One contribution to this massive bulk of data is Twitter: Twitter users create 500 million tweets a day,2 which fact has greatly impacted social science [24] and journalism [39].
Network analysis is important in social science [6], but with so much data there is a real danger of information overload, and there is a general need for tools that help users navigate and make sense of this.
Data exploration is one way of analyzing a data set. Exploration-based analysis is to let the data suggest hypotheses, as opposed to starting out with a hypothesis to either confirm or refute. Visualization is an important exploration tool.
Given the ready availability of large-scale displays [1], we believe that an ideal visual exploration system would leverage these, and leverage the fact that there are many different ways to visualize something. We propose to use wall- sized displays to provide many different views of the same data set and as such let the user explore the data by exploring visualizations. Our thesis is that a display wall architecture [1, 42] is an ideal platform for such a scheme, providing both the resolution and the compute power required. Proper utilization of this would allow for useful sensemaking and storytelling.
To evaluate our thesis we have built a system for gathering and analyzing Twitter data, and exploring it through multiple visualizations.
Our evaluation of the prototype has provided us with insights that will allow us to create a practicable system, and demonstrations of the prototype has uncovered interesting stories in our case study data set. We find that it is strictly necessary to use clever pre-computation, or pipelining, or streaming to meet the strict latency requirements of providing visualization interactively fast.
Our further experiments with the system have led to new discoveries in streaming graph processing.<br />
Thu, 15 May 2014 00:00:00 GMThttp://hdl.handle.net/10037/63832014-05-15T00:00:00ZKvik : interactive exploration of genomic data from the NOWAC
postgenome biobankhttp://hdl.handle.net/10037/6382
Fjukstad, Bjørn<br />
Recent technological advances provide large amounts of data for epidemiological analyses that can provide novel insights in the dynamics of carcinogenesis. These analyses are often performed without prior hypothesis and therefore require an exploratory approach. Realizing exploratory analysis requires the development of new systems that provide interactive exploration and visualization of large-scale scientific datasets.
This thesis presents Kvik, an interactive system for exploring the dynamics of carcinogenesis through integrated studies of biological pathways and genomic data. Kvik is designed as a three-tiered application, an architecture that is commonly used for peta-scale applications. It provides researchers with a lightweight web application for navigating through biological pathways from the KEGG database integrated with genomic data from the NOWAC postgenome biobank.
In collaboration with researchers from the NOWAC systems epidemiology
group, we have described the requirements for such a system, and by using an
iterative approach we implemented Kvik through small development cycles,
involving the end-users in the development process. Throughout the project we
have gained valuable interdisciplinary experience in developing systems for use
in explorative analysis of carcinogenesis.
Through an evaluation of the exploration tasks and workflow of an end-user, we
demonstrate that Kvik has the capability of interactive exploration of genomic
data and biological pathways.
We believe Kvik is important to enable novel discoveries from
the data produced in the NOWAC systems epidemiology project. It provides
epidemiology researchers with access to powerful compute and storage resources
enabling the use of advanced statistical methods for the analysis. Finally, from
our experiences in developing Kvik, we provide use cases and requirements for
future analysis, computation and storage systems developed in our
research group and by others.<br />
Thu, 15 May 2014 00:00:00 GMThttp://hdl.handle.net/10037/63822014-05-15T00:00:00ZDPC. The distributed personal computerhttp://hdl.handle.net/10037/5905
Bjørndalen, Karen E. Hough<br />
Nowadays people have many different personal devices, like laptops and tablets, that they use to access and process data. Very often it is desirable to access and process the same data on different devices without having to copy it from one device to another. Commercial cloud services provide good services for achieving this, but recent events, such as the Snowdon disclosures, have illustrated some of the trust issues of using external services.
This project introduces the Distributed Personal Computer (DPC), which aims to give a single system view of the user's personal devices without the use of external services. The DPC is meant to be for a single user with multiple devices. A prototype has been designed and implemented, and experiments have been conducted to evaluate the prototype.
The implemented prototype, and the experiments conducted on it, show that the concept of the DPC is worth pursuing further. The experiments show that the operation overhead is small enough to allow several hundred operations to run per second, and that the architecture and prototype for the DPC appears to be good enough for personal use.<br />
Wed, 15 Jan 2014 00:00:00 GMThttp://hdl.handle.net/10037/59052014-01-15T00:00:00ZMario. A system for iterative and interactive processing of biological datahttp://hdl.handle.net/10037/5762
Ernstsen, Martin<br />
This thesis address challenges in metagenomic data processing on clusters of computers; in particular the need for interactive response times during development, debugging and tuning of data processing pipelines. Typical metagenomics pipelines batch process data, and have execution times ranging from hours to months, making configuration and tuning time consuming and impractical.
We have analyzed the data usage of metagenomic pipelines, including a visualization frontend, to develop an approach that use an online, data-parallel processing model, where changes in the pipeline configuration are quickly reflected in updated pipeline output available to the user.
We describe the design and implementation of the Mario system that real- izes the approach. Mario is a distributed system built on top of the HBase storage system, that provide data processing using commonly used bioinformatics applications, interactive tuning, automatic parallelization and data provenance support.
We evaluate Mario and its underlying storage system, HBase, using a benchmark developed to simulate I/O loads that are representative for biological data processing. The results show that Mario adds less than 100 milliseconds to the end-to-end latency of processing one item of data. This low latency, combined with Mario’s storage of all intermediate data generated by the processing, enables easy parameter tuning. In addition to improved interactivity, Mario also offer integrated data provenance, by storing detailed pipeline configurations associated with the data.
The evaluation of Mario demonstrate that it can be used to achieve more interactivity in the configuration of pipelines for processing biological data. We believe that biology researchers can take advantage of this interactivity to perform better parameter tuning, which may lead to more accurate analyses, and ultimately to new scientific discoveries.<br />
Fri, 15 Nov 2013 00:00:00 GMThttp://hdl.handle.net/10037/57622013-11-15T00:00:00ZFeature Detector: a support system for tracking satellite detected dynamic and permanent featureshttp://hdl.handle.net/10037/5433
Jacobsen, Joakim<br />
Even with today's technologies many tasks relies on humans to be completed correctly.
Engineers must monitor steps in large chains of operations, and verify the results before the next process is allowed to continue.
In many such systems, a lot of useful data passes by without ever been stored for efficient future usage.
Even though some operations must be verified by an experienced human eye, many, if not all, could benefit from computed assistance.
When processing satellite imagery there are a lot of steps involved before there is a final product.
This thesis examines one specific part of the process, how to determine if a feature observed is permanent or not.
A self learning geographical information system implementation that can determine the state of specific features will be presented.
The system is capable of filtering out permanent installations from vessel traffic in highly dense areas.
Further we'll see that with such a system at the core, other useful functionality can easily be extended on top of it.
Such functionality could be tracking of vessels, oil spills or ice floes, the latter two which have been implemented.
With such a system at hand, the day to day tasks of engineers monitoring satellite observations can be made easier and less error prone.
In addition such a historical view of the data can help with improving existing services as well as those still under development.<br />
Wed, 15 May 2013 00:00:00 GMThttp://hdl.handle.net/10037/54332013-05-15T00:00:00ZImage and video processing using graphics hardwarehttp://hdl.handle.net/10037/4346
Lanes, Børge<br />
Graphic Processing Units have during the recent years evolved into inexpensive high-performance many-core computing units. Earlier being accessible only by graphic APIs, new hardware architectures and programming tools have made it possible to program these devices using arbitrary data types and standard languages like C.
This thesis investigates the development process and performance of image and video processing algorithms on graphic processing units, regardless of vendors. The tool used for programming the graphic processing units is OpenCL, a rela- tively new specification for heterogenous computing. Two image algorithms are investigated, bilateral filter and histogram. In addition, an attempt have been tried to make a template-based solution for generation and auto-optimalization of device code, but this approach seemed to have some shortcomings to be usable enough at this time.<br />
Sat, 01 May 2010 00:00:00 GMThttp://hdl.handle.net/10037/43462010-05-01T00:00:00ZGeStore : incremental computation for metagenomic pipelineshttp://hdl.handle.net/10037/4272
Pedersen, Edvard<br />
Genomics is the study of the genomes of organisms. Metagenomics is the study of environmental genomic samples. For both genomics and metagenomics DNA sequencing, and the analysis of these sequences, is an important tool. This analysis is done through integration of sequence data with existing meta-data collections.
Genomics is the study of the genomes of organisms, and involves cultivating organisms in a lab and analyzing them. Metagenomics is the study of genomic samples collected directly from the environment, allowing researchers to study organisms that are difficult to cultivate in a petri dish. DNA sequencing and the analysis of these sequences is an important tool for both genomics and metagenomics. The integration of the data produced by sequencing with existing meta-data collections is particularly interesting for metagenomics, as a single biological sample can contain thousands of different organisms.
The recent developments in DNA sequencing technology mean that the volume of data that can be produced per dollar is increasing faster than the volume of data that can be analyzed and stored per dollar. This data growth means that the initial analysis of these massive data sets becomes increasingly expensive. In addition, there is a need to periodically update old results using new meta-data from the many knowledge bases (meta-data collections) for biological data. Today, this typically requires rerunning the experimental analysis. Such incremental analysis is interesting for metagenomics since environmental samples potentially contain thousands of organisms.
In metagenomic analysis, different sets of tools are used depending on the type of information required. These tools are generally arranged in a pipeline, where the output files of one tool acts as the input for the next. The analysis done by some steps is dependent on different meta-data collections. When meta-data is updated, these steps and all subsequent steps typically need to be executed again. Incremental updates can save significant computation time by running these pipelines against the updated segments, rather than the full meta-data collections.
We believe that systems for incremental updates for metagenomic analysis pipelines have the following requirements; (i) reduce the computational resource requirements by using incremental update techniques (ii) the meta-data collections should be accessible without the use of proprietary or computationally expensive techniques (iii) do the incremental updates on demand, due to different needs of experiments, through handling meta-data updates and generating arbitrary delta meta-data collections (iv) support most genomic analysis tools and run on most job management systems (v) no changes should be made to the tools that the pipeline is comprised of, since modifying the many available tools is impractical (vi) the changes to the job management and resource allocation system should be minimal, to save implementation time for the pipeline system maintainer (vii) maintain a view of previous meta-data collections, so old experiments can be repeated with the correct meta-data collection version.
To our knowledge no existing incremental update systems satisfy all seven requirements. Often they do not support on-demand processing or maintaining views of old data, in addition many systems require computations to be done within a specific framework or programming language.
In this thesis we describe the GeStore incremental update system which satisfies all seven requirements. GeStore reduces the size of the meta-data collections, and thus the computational requirements for the pipeline, by leveraging incremental update techniques, satisfying requirements (i) and (iii). In addition it reduces the storage requirements of the meta-data collections, while still maintaining a complete view of the meta-data collection in a plain-text format, fulfilling requirement (ii) and (vii). It also presents a simple interface to the application programmer, so that integrating the system with existing pipeline solutions does not require large changes to the pipeline system or tools, in accordance with requirements (vi), (iv) and (v).
GeStore has been implemented using the MapReduce framework, along with HBase, to provide scalable meta-data processing. We demonstrate the system by generating subsets of meta-data collections for use by the widely used genomic tool BLAST.
In our evaluation, we have integrated GeStore with an existing pipelining system, GePan; a metagenomic pipeline system developed for a local biotech company in Tromsø, Norway, and used real-world data to evaluate the performance and benefits of GeStore.
Our experimental results show that GeStore is able to reduce the runtime of the incremental updates by up to 65\% when compared to unmodified GePan, while introducing a low storage overhead and requiring minimal changes to GePan.
We beleive that efficient on-demand updates of metagenomic data, as provided by GeStore, will be useful to our biology collaborators.<br />
Fri, 01 Jun 2012 00:00:00 GMThttp://hdl.handle.net/10037/42722012-06-01T00:00:00Z