2 Abstract Some of our most precious personal data today belongs in the devices that are personal to us, specifically the mobile phone. And one of the hardest challenges in managing a device as precious as a mobile phone is safeguarding it against data loss in the event of an unexpected crash/loss of the device. A thorough system needs to be laid down that can guarantee the least possible amount of personal loss/damage in the unfortunate face of data loss. We present an application that guarantees periodic, uninterrupted and automated backup of your mobile device data. The application uses incremental backup to save your data onto a server dedicated for this purpose. Incremental backup means lesser data transferred per backup, and lower battery consumption. It also means that backup can be taken more frequently, since each incremental backup only uses a fraction of the space occupied by the actual data. Furthermore, the application allows the user to put a constraint on the bandwidth usage, if need be, so as to prevent the situation of the application hogging all the available bandwidth. The application guarantees complete restoration of the device data to any date for which the backup is available. Project URL: c 2011 Nagarajan S, Neha Bhatia, Savalia Jay Mansukhbhai, Tejaswini L. Naik, Ritika Wadhawan. This material is available under the Creative Commons Attribution-Noncommercial-Share Alike License. See for details.

3 Acknowledgement Our sincere thanks to Dr. Debabrata Das for his support and encouragement during the course of the project. We express our gratitude to Dr. Shrisha Rao for his guidance and to our advisor Dr. Jaya Sreevalsan Nair. Our sincere thanks to all the teaching-assistants for their support.

6 1 INTRODUCTION 1 Introduction 1.1 Problem Definition To create a system that automatically takes a periodic backup from a mobile device, and can also be used to create a cloned device easily. 1.2 Technical Description The target platform for our application, the Nokia N810, runs on Maemo OS, an open source flavour of Debian- GNU/Linux. The device already comes shipped with a backup/restore facility, albeit a not-so-elegant one. This utility works by simply taking a complete backup of all of the device data, and that too onto the device itself. Complete backup every time backup is performed would mean redundancy, as copies of the same data are saved multiple times. In fact, performing backup when little or nothing is changed in the device would result in the same dump being created one more time. Also, backup onto the device is pointless if the device were to crash or be lost. Moreover, battery and network bandwidth are unnecessarily consumed for this purpose. Our application was built to overcome these drawbacks. It works on the principle of incremental backup that saves minimal amounts of data per backup, pertaining to only the changed data. If data is relatively unchanged since the last backup, then only metadata is saved, indicating that little or nothing has changed. Incremental backup provides the advantage of frequent backups at least cost, since each backup occupies only a fraction of the space taken up by the actual data. Network bandwidth is conserved as only the differences (delta) of the changed files is needed to synchronize the server copy of the backup to the client s copy. Hence, the backup/restore utility created saves time taken to transfer, space on the server and the battery consumed in the process. The application provides the facility of multiple restore-points, thus empowering the user to roll-back the device to any date for which a backup is available. 1.3 Gap Analysis The current market already provides plenty of solutions for mobile device backup. PC based tools provide backup/sync facilities between the mobile 6

7 2 COMPONENTS device and the PC [1][2][3]. There also exist Web-based services, some that still require a PC to assist synchronizing the mobile device to an online storage [4][5][6], and others that can run from the device itself [7]. These solutions, however, require the user to initiate the service, and perform the backup as required. Our product aims to solve this issue by providing a periodic solution to backup, one that runs at a specified frequency, and most importantly, without the user s intervention. The process is initiated directly from the device, and thus completely avoids the need to have a PC. Data is backed-up to an online server space dedicated for this purpose, providing the convenience of anywhere-anytime backed-up data availability. 2 Components 2.1 RSync RSync is a remote file (or data) synchronization protocol. It allows synchronization of files between two computers. i.e. ensure that both copies of the file are the same. If there are any differences, RSync detects these differences, and sends across the differences, so the client or server can update their copy of the file, to make the copies the same. [8] RSync is capable of synchronizing files without sending the whole file across the network [9]. Only data corresponding to about 2% of the total file size is exchanged, in addition to any new data in the file. New data has to be sent across the wire completely. The functioning of RSync is such that it can also be used as an incremental download/upload protocol, allowing upload or download of a file over many sessions [9]. If the current upload or download fails, it can just be resumed later. RSync is also an executable program on UNIX systems, which implements the RSync protocol. 7

8 2 COMPONENTS Figure 1: RSync Network Protocol Use of RSync in our application Our application uses RSync for synchronizing as well as minimizing file transfers between the mobile device and the backup server. Rsync does transfer only delta differences; however, it merges these differences into the previous backup, resulting in a larger conglomerate. The problem with this approach is that at any point of time, only the latest backup would be available to the user. Our application has been built by adding extra layers around RSync to achieve what is desired incremental backup. This has been detailed in Sec diff diff is the standard UNIX utility that is used to output the difference between two files or two directories. It forms an integral part of our application, and is being used to separate out the files changed since the last backup into a different location, thus providing restore points. More details in Sec. 3. 8

9 3 IMPLEMENTATION 3 Implementation The detailed functioning of the backup-restore model is as follows: For the very first time, the complete data of the mobile device is backed up onto the server. This constitutes the first full backup. Subsequently, every time the backup process is initiated, an incremental backup is performed. Incremental backup works by only saving the changes since the last backup. This includes: Newly added files since the last backup Modifications to files since the last backup Flagging of files deleted since the last backup In case of modified files, only the changes to the file (delta-difference) are transferred over the network, rather than having to send the whole files. The file is then reconstructed at the server side by merging the delta differences with the file in question from the previous backup (ref. Sec. 3.1 for more). The Backup and Restore process can be understood using the following mathematical model: Let X 0 represent the data initially present in the mobile device (client system) at time t 0. If a backup of this data is taken onto the server system, the server will contain a complete copy of X 0. Let X 1 represent the data in the mobile device at time t 1. Therefore, the difference in data generated since time t 0 is given by x 1 = X 1 X 0. Initiating backup at time t 1 would result in only x 1 being transferred over the network connection. At the server side, this x 1 is merged with (added to) X 0 to produce X 1. Generalizing this, at time t k when the system is at state X k, initiating backup would result in transfer of x k amount of data, which represents the data change since the last backup at time t k 1. The server s copy, which at time t k 1 was X k 1, would now be updated to X k as X k 1 + x k = X k. 9

10 3 IMPLEMENTATION 3.1 Backup Process Figure 2: Backup Process Flow (Steps 1-3) at time t k On the server, two copies of the sync folder are maintained, one for time t i (called current sync or simply, sync), and the other for time t i 1 (previous sync; initially empty at time t 0 ). Assume that the backup process is initiated at time t i = t k. Step 1 The device data is synchronized to the server sync location. RSync transfers x k worth of data, thereby bringing up the server sync location from state X k 1 to X k. This data is merged with the existing sync data, and all the files are brought up to date with the current versions of the same files on the client machine. 10

11 3 IMPLEMENTATION Step 2 The diff utility is then used to calculate the difference between the current sync and the previous sync. This gives us all the files that changed since the last backup. This forms the necessary incremental backup. Step 3 The list of files changed since last time, returned by diff, is stored in a separate bin in the Backup-data Store, a location on the server space that houses all the incremental backups. Note that the first such bin, marked X 0, represents the full backup taken at time t 0, while every other bin, taken at time i = 1, 2..k, represents the incremental backup at those times. Figure 3: Backup Process Flow (Step 4) at time t k Step 4 The previous sync location is locally synchronized with the current sync, to prepare it for the next incremental backup, at time t k+1. This 4-step cycle is repeated whenever backup is initiated. 11

12 3 IMPLEMENTATION Possible scenarios during backup The following scenarios illustrate how the incremental backup process handles the elementary situations of backup when (i) No backup is already present, (ii) A new file is added, (iii) A file is deleted, and (iv)(a),(b) When a file is modified. The purpose of these usecases is to provide an insight into how space saving and efficient incremental backup can be over full backup. Case (i): First time backup Figure 4: Backup at time t 0 : Full Prior to backup, the client and server rsync processes exchange metadata to enable to client to send only the relevant data. This metadata corresponds to about 2% of the total data size, and represents the hash values the data on the client/server used for comparison [9]. The first time, when no backup copy exists on the server, the client must transfer all of the files to the server. This is the only time when incremental backup is slightly more expensive than full backup, given that the 2% overhead is unavoidable. 12

13 3 IMPLEMENTATION Case (ii): New File Added Figure 5: Backup at time t 1 : Incremental Note that the files A,B and C have not changed since time t 0, whereas D has been newly created. Metadata exchanged prior to the start of actual data transfer would indicate to the client that the server already has the latest copies of all the files, except the one newly added (D). Thus, for the incremental backup at time t 1, the client only transfers the complete new file. The server side bin for incremental backup at t 1 simply consists of D, the complete file. 13

14 3 IMPLEMENTATION Case (iii): File Deleted Figure 6: Backup at time t 2 : Incremental In the case where a file has been deleted on the client, the client only infers this after examining the metadata it receives from the server side. Note that the case above is an elementary one, and so we are assuming that no other files have changed. The client now needs to send just metadata back to the server flagging the file for deletion. This information forms the incremental backup at time t 2 on the server side, and as such is used to infer which files are to be excluded from a given restore-point. 14

15 3 IMPLEMENTATION Case (iv)(a): File modified with increase in file size Figure 7: Backup at time t 3 : Incremental In the above case, files A and D have not changed, whereas C has been modified, and is now larger by 20 KB. The metadata exchanged between the server and client would indicate to the client that 20 KB of the file consists of new data, and also the percentage of the other 48 KB that has remained unchanged. The client therefore sends 20 KB pertaining to new data, and the percentage of changed data from the other 48 KB, along with the byte offsets. This exchange forms the delta difference of the file. At the server side, this delta difference is merged with the previous version of the file to form the complete file that is in synchronization with the client s copy. This complete file is stored in the incremental backup bin on the server for time t 3. 15

16 3 IMPLEMENTATION Case (iv)(b): File modified with reduction in file size Figure 8: Backup at time t 4 : Incremental This case is similar to Case 4, except this time the file size (of A) has been reduced by the simple deletion of data. If we are to assume that no part of the existing data was modified, only deleted, then the data transfer from the client to the server would only contain metadata. This metadata contains information such as the offsets of the bytes to be removed. In the same way as in Case 4, the changes indicated in the metadata file are applied on the previous version of the file to create one that is in synchronization with the client s copy. The bin at the server for the incremental backup at time t 4 consists of the complete changed file. 16

17 3 IMPLEMENTATION 3.2 Restore Process Figure 9: Restore Process Flow at time t k Step 1 The user selects a restore point, say X k, to roll back the system to. X k represents the state of the system as it was at time t k. The restore process communicates this restore point to the merger process on the server. Step 2 The merger recreates the state X k by incrementally adding the differences from each of the data bins in the Backup-data Store, starting from X 0, till the restore point is reached, as follows: X 0 + x 1 + x x k = X 0 + k x i = X k 17 i=1

18 4 ANALYSIS AND RESULTS Step 3 The restore-point dump created by the merger is synchronized to the device data dump on the client side, thus bringing the client device upto the desired restore-point. 4 Analysis and Results A comparison of full backup and incremental backup The device used for deploying and testing our application, the Nokia N810, already comes shipped with an inbuilt backup/restore tools, one that performs a full backup of the device data each time, regardless the fact that data backed-up previously is once again copied, leading to redundant copies. The heart of our application is the feature of incremental backup. Our tests show a big margin of savings in costs when performing incremental backup as against full backup each time. By costs, we mean overall smaller data transfer sizes, lesser bandwidth utilization, server-side space saving, and lower battery consumption. The following table and the following chart clearly illustrate the cost savings in incremental backup as against full backup. Time Actual data Data Transferred Data Transferred instant in device (KB) for full backup for incremental backup t t t t t t t t t t t Total

19 4 ANALYSIS AND RESULTS Figure 10: Cost savings in incremental backup over full backup The chart above shows the results of our observations over 10 successive incremental backups after the first full backup. The change of data in the device over the successive intervals can be considered to have resulted from a combination of added, deleted and modified files. Only for the initial run is incremental backup slightly more expensive than full backup. Every subsequent backup then transfers only a fraction of the actual data in the device. Overall, as is clear from the table above, incremental backup resulted in the storage of a mere KB of data, as against a massive KB of data by full backup. This indicates a saving of nearly 80% over just 10 backups. 19

20 5 CONCLUSION 5 Conclusion This project is an attempt at overcoming many of the shortfalls of several backup solutions in the market today. It provides the convenience of being able to run the backup directly from the device itself, and avoid the need for a secondary device such as a personal computer. The backup is incremental, which guarantees space savings with every backup, and ensures the affordability of more frequent backups. The process is automated and performed without the user s intervention. Since the backed-up data is not saved on the device itself, but onto a server dedicated for this purpose, it guarantees minimal loss in the unfortunate event of the device crashing/being lost. 20

23 A APPENDIX A Appendix Our backup/restore application was developed for and tested on the Nokia N810, an internet tablet. The following sections provide details on the tools required for the development and the process involved in deploying RSync on the mobile device. A.1 Tools Required for development The Nokia N810 runs Maemo OS, a flavour of Debian-GNU/Linux. The following tools will be required for development: Maemo SDK [10] [11], an emulator that runs the Maemo OS on a PC, enabling off-device development Linux OS (Maemo SDK is supported on Ubuntu 10.04) [12] ScratchBox, a cross-compilation toolkit designed to make embedded Linux application development easier. It also provides a full set of tools to integrate and cross-compile an entire Linux distribution. [13] ESBox, a multi-platform Eclipse Ganymede-based IDE supporting Maemo development in Scratchbox. [14] Note: The packages necessary to resolve dependencies created at the time of deployment of the application on the Nokia N810 may be found at [15] and [16]. A.2 Deployment of RSync on the mobile device RSync is used both on the server and the client as a file-synchronizer utility. RSync is readily available as an executable for the Intel i386 architecture. However, since the Nokia N810 uses the Armel architecture, the RSync source needs to be cross-compiled to produce the necessary binary executable file. The following procedure details the steps needed to cross compile RSync for the Arm processor: 23

24 A APPENDIX Steps to cross-compile RSync 1. Install Maemo SDK. This provides ScratchBox, the cross-compiler necessary to produce binaries for the ARM Architecture. 2. Download source code of Rsync (available as a.tar.gz file at [17]). 3. Extract the compressed archive. 4. On the PC, open the terminal and navigate to the folder created by uncompressing the RSync archive. 5. Run the following command to cross compile the source code of RSync:./configure CC=/scratchbox/compilers/cs2007q3-glibc2.5-arm7 /bin/arm-none-linux-gnueabi-gcc --host=armel 6. Transfer the binary produced from the compiled code to device. 7. On the device, open the terminal and run rsync using the command:./rsync <server-location> <client-location> 24

REMOTE BACKUP-WHY SO VITAL? Any time your company s data or applications become unavailable due to system failure or other disaster, this can quickly translate into lost revenue for your business. Remote

Business Continuity: Choosing the Right Technology Solution Table of Contents Introduction 3 What are the Options? 3 How to Assess Solutions 6 What to Look for in a Solution 8 Final Thoughts 9 About Neverfail

For Linux distributions Software version 4.1.7 Version 2.0 Disclaimer This document is compiled with the greatest possible care. However, errors might have been introduced caused by human mistakes or by

Introduction. Over the past 10 years, digital content has grown exponentially. Not only that, but the reliance organisations place on crucial electronic data has grown commensurately with the volume growth,

For Linux distributions Software version 4.1.7 Version 2.0 Disclaimer This document is compiled with the greatest possible care. However, errors might have been introduced caused by human mistakes or by

Service Overview Business Cloud Backup Techgate s Business Cloud Backup service is a secure, fully automated set and forget solution, powered by Attix5, and is ideal for organisations with limited in-house

Overview Timeline Cloud is a backup software that creates continuous real time backups of your system and data to provide your company with a scalable, reliable and secure backup solution. Storage servers

One Stop Data & Networking Solutions PREVENT DATA LOSS WITH REMOTE ONLINE BACKUP SERVICE Prevent Data Loss with Remote Online Backup Service The U.S. National Archives & Records Administration states that

Backup and Recovery for Microsoft Hyper-V Using Best Practices Planning Brien M. Posey Introduction There usually isn t anything overly complicated about backing up a physical datacenter. While it s true

Standalone PRESENTS... Reasons to Switch from SourceSafe: How to Make Your Life Easier with SourceAnywhere Standalone Most developers are familiar with Visual SourceSafe. It's a popular version control

Hosted PRESENTS... Reasons to Switch from SourceSafe: Why SourceAnywhere Hosted Makes Life Easier for Systems Administrators and Developers Maintaining a version control system is resource intensive. Look

IT Terminology 1. General IT Environment The general IT environment is the umbrella over the following IT processes: 1. Operating Systems 2. Physical and Logical Security 3. Program Changes 4. System Development

A BasisOnDemand.com White Paper ROADMAP TO DEFINE A BACKUP STRATEGY FOR SAP APPLICATIONS Helps you to analyze and define a robust backup strategy by Prakash Palani (Prakash.Palani@basisondemand.com) Table

orrelog Security Correlation Server Backup and Recovery Guide This guide provides information to assist administrators and operators with backing up the configuration and archive data of the CorreLog server,

User Guide CTERA Agent for Linux September 2013 Version 4.0 Copyright 2009-2013 CTERA Networks Ltd. All rights reserved. No part of this document may be reproduced in any form or by any means without written

This glossary contains explanations of certain terms, definitions and abbreviations used in this prospectus in connection with our Group and our business. The terms and their meanings may not correspond

What is VM Upload? 1. VM Upload allows you to import your own VM and add it to your environment running on CloudShare. This provides a convenient way to upload VMs and appliances which were already built.

Technical factsheet Make life simple and make more money the easy way. MAX Backup - fast, reliable, automatic, offsite, secure backup and disaster recovery to make your life easier! No more worrying about

Hosted Data Disaster Protection Flexiion is based in the UK and delivers Infrastructure as a Service (IaaS) solutions, making the advantages of the Cloud and IaaS more accessible to mid-size, professional

CA ARCserve Family r15 Rami Nasser EMEA Principal Consultant, Technical Sales Rami.Nasser@ca.com The ARCserve Family More than Backup The only solution that: Gives customers control over their changing

Automatic, continuous, and secure protection that backs up data to the cloud, or via a hybrid approach combining on-premise and cloud-based backup. Data Sheet: Symantec.cloud Only 21 percent of SMBs are

Centralized Disaster Recovery using RDS RDS is a cross-platform, scheduled replication application. Using RDS s replication and scheduling capabilities, a Centralized Disaster Recovery model may be used

Centrally managed backup solution User Manual Contents Desktop application...2 Requirements...2 The installation process...3 Logging in to the application...6 First logging in to the application...7 First

The EVault Portfolio Built from the start as a pure disk-to-disk solution, EVault is the only data protection platform that can be consumed as a cloud service, deployed as on-premise software, an on-premise

Frequently Asked Questions Version 1.2 Disclaimer This document is compiled with the greatest possible care. However, errors might have been introduced caused by human mistakes or by other means. No rights

USER GUIDE CLOUDME FOR WD SENTINEL Page 2 of 18 TABLE OF CONTENTS INTRODUCTION 3 Safe European Storage How does this really work? 3 3 GETTING STARTED 4 Setting up an account Setting up a company account

Installation and Setup: Setup Wizard Account Information Once the My Secure Backup software has been installed on the end-user machine, the first step in the installation wizard is to configure their account

An Analysis of TSE and Remote Desktop Services JULY 2010 This document illustrates how TSE can extend your Remote Desktop Services environment providing you with the simplified and consolidated management

UNDERSTANDING DATA DEDUPLICATION Thomas Rivera SEPATON SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual members may use this material

Attix5 Pro Overview V7.x An overview of the Attix5 Pro product suite. Copyright notice and proprietary information This document is published by Attix5 or its local affiliated company, without any warranty.

The problem with backup software Cobian9 Backup Program - Amanita Due to the quixotic nature of Windows computers, viruses and possibility of hardware failure many programs are available for backing up