Abstract

The Internet of Things (IoT) comes with great possibilities as well as major security and privacy issues. Although digital forensics has long been studied in both academia and industry, mobility forensics is relatively new and unexplored. Mobility forensics deals with tools and techniques that work towards forensically sound recovery of data and evidence from mobile devices [1]. In this paper, we explore mobility forensics in the context of IoT. This paper discusses the data collection and classification process from IoT smart home devices in details. It also contains attack scenario based analysis of collected data and a proposed mobility forensics model that fits into such scenarios. The paper concludes with a detail discussion of related research problems and future work.

Introduction

Although mobility forensics for IoT devices is not a well-defined area of knowledge, this description summarizes the idea in general: “Mobility forensics addresses technology’s movement toward mobile devices (smart phones, tablets, small computers) and specialized tools and techniques needed to successfully recover data and evidence from those devices” [1].

While many IoT sensors are stationary, many others are mobile. To mention a few, sensors such as FitBit, sensors within Smartphones, Car Area Network (CAN) sensors are mobile. The context of computing is evolving rapidly and significantly, driven by new mobile and IoT devices in our homes and industries. Successful forensics in this kind of device needs new and updated tools and specialized techniques. Unfortunately, not much has been done so far in the field of IoT digital forensics. One reason is that IoT devices are not widely deployed and industry is focused on implementing the technology rather than securing it. But lack of security in IoT devices may lead to catastrophe in futuristic scenarios such as the Smart City [2] and Smart Grid [3]. This paper takes a close look at security, privacy, and vulnerability issues with IoT devices from a forensic point of view. As an example, we have analyzed the data collected by an IoT smart home device called “Sen.se Mother” [4], and have developed scenarios showing how the collected data can help with forensic investigations. We then propose a model to determine the implications of collected data and discuss what it adds to the digital forensics literature. This is a first such attempt towards mobility forensics for the Internet of Things. Although this paper’s findings are based on IoT devices in smart homes, the results can be generalized to IoT environments, such as industrial and other installations.

Literature Review

Although the field of IoT security research is relatively new, there has already been much interesting work. Weber [5] introduced some of the security, data authentication, access control and client privacy issues in IoT. Jing et al. [6] discuss IoT security in general and divide IoT architecture into layers to ensure more security; they also tried to solve different cross layer issues. Perumal et al. [7] talk about forensic investigation in machine-to-machine (M2M) communications and the Smart Grid. Hegerty [8] discusses the fundamental challenges IoT poses to digital forensics and identifies key areas that solutions should address.

An interesting paper by Oriwoh et al. [9] covers the modeling perspective of IoT forensics. The paper presents an example IoT crime scenario and attempts to identify sources of evidence within it. This paper also discusses how IoT digital forensics differs from classic digital forensics and emphasizes the requirement for a “Next Best Thing (NBT) Model of digital forensics” [9]. Work done in our paper moves the research closer to the goal. Valera et al. [10] cover a special application of IoT: medical devices. They also suggest that their set of security techniques and cryptographic SIM cards can make IoT devices with RFID/NFC more secure. Arias et al. [11] have used Nest Thermostat and Nike++ Fuelband as example IoT devices to discuss some common design practices and their implications for security and privacy of these devices.

Copos et al. [12] collected network data from an IoT device, the Nest Thermostat, using dumpcap, a network traffic sniffing tool. Then, from the collected data, they tried to infer whether someone is home. Peisert et al. [13] show how using a model can result in forensic analysis requiring a much smaller amount of carefully selected, highly useful data. In our paper, we present a new model that summarizes the finding from IoT devices and helps the investigator follow a structured process of investigation.

Problem Statement

The “Internet of Things” is not just a dream anymore. In the form of smart grid, smart homes, smart devices, smart cars (V2V), and M2M (in general), IoT is already here. Lack of security in IoT reflects the lack of security in cyberspace. This raises several important questions:

Are those devices secure in the environment in which they function?

How much are we aware of privacy issues?

Can the new type of data and traces from these devices be utilized for forensic purposes, and if so, how can we collect and model them efficiently?

What kind of data and attack related information and semantics can be retrieved from IoT devices?

What can we learn from looking at IoT device data trails?

How useful is the above mentioned data for forensic purposes?

What are some possible scenarios where such data and information can aid forensic investigations?

How is collected data interpreted in those scenarios?

What new questions arise in addition to those posed by classic digital forensics?

Can we develop a new forensic model to incorporate these questions?

These questions are very important for individuals, organizations and governments. Unfortunately, not much work has been done in the field of IoT mobility forensics. This paper is a significant step towards filling the gap.

Methods and Procedures

IoT Device Selection and Setup

We studied multiple IoT devices [4, 14, 15] and considered two of them.

“Sen.se mother” [4] comes with a hub (called Mother) and 4 sensors (called Cookies). Cookies can be attached to an object (for example, a door, keyring, person or pets). They can be used for multiple purposes: (1) tracking how much you have walked or run or how much coffee you have consumed, (2) child care (to determine that the child is in the house), (3) a door alarm, (4) a medication reminder, and (5) sensing temperature and sleeping habits. In addition, devices from other platforms such as Nest and Philip Hue can be used with the Sen.se mother hub.

The Hub from Samsung [14] can also connect additional smart devices (sold by Samsung) to the Mother. So, its additional functionality can be extended by choosing many kinds of devices to use with it.

As Sen.se Mother offers more flexibility and more diverse applications, we chose the first option. Installing Sen.se Mother was not straight forward. We had to collect the MAC address of the device using Wireshark to get it connected to the Internet. Once the Hub was connected, we deployed 4 sensors to collect different types of data. Collected data is visually displayed in the Sen.se Dashboard [4], a web portal provided by the vendor.

The Cookies and the Hub

As noted in the previous section, the Hub is a collector entity connected to the Internet. The primary job of the Hub is to act as a supervisor, configure the Cookies for specific tasks, collect data from the Cookies, and send them to the web portal. Cookies are sensors deployed to collect application specific data. A Cookie can save up to ten days of data without connecting to the Mother. As soon as a Cookie is reconnected to the Hub, it uploads all contents of its memory. A Cookie contains a CR2016 replaceable button cell with one year of battery life. To communicate with Mother, Cookies use radio (frequencies are 915 MHz in North America and 868 MHz in Europe). Every type of movement has its own pattern and signature. By placing a Cookie on an object or person we can capture and analyze movements. The Cookie will recognize a specific action that is to be monitored and will transmit sensed data for the chosen application. Some Cookies also contain thermometers. They can transmit the ambient temperature as well as sudden abnormal changes in temperature to the Hub. Another interesting feature of the Cookie is its ability to signal presence or absence of a person or object. One Cookie can be used for only one application given at a specific time. We deployed 4 sensors with 4 specific tasks:

Bedroom door for security notification

Thermostat for room temperature monitoring

Sensing physical exercise and

Sensing presence at home

When the smart-phone app for Sen.se Mother is connected to the web portal, it receives real-time notifications of events [4].

Sen.se API Documentation

The Sen.se has an application program interface (API) and associated documentation [16]. This serves three types of audiences:

Users who want to access the data produced by their devices and build their own programs using those data and devices

Developers using the Sen.se platform to create new applications for Sen.se users. Users can install these applications (referred to as native apps) in the same fashion as regular Sen.se applications, and they will be displayed on the web portal

Developers willing to use the data provided by a Sen.se platform to enrich an external application (such as Android apps)

The Sen.se API is REST-oriented and returns data in JSON format. Although this API is a good way to access the data collected by sensors, it doesn’t give the user or developer any opportunity to access actual devices.

Data Collection

Four Cookies have collected data for the four applications from May 11, 2016 to May 31, 2016. First, the bedroom door application that tracks activities at bedroom door stored and reported all activities. Some data were false positives and false negatives. When the sensor sensitivity level is too low, occasionally it can’t detect very light activity at the door. On the other hand, if sensor sensitivity is very high, opening up other doors may trigger the alarm. But sensitivity is adjustable and it is easy to figure out the sensitivity level suitable for a given scenario. Collected data shown in the web portal contains information related to time, place and number of movements detected.

The second sensor was used to collect temperature data. Whenever the temperature crosses a user-specified threshold, a notification is sent to a smart phone application. In our scenario, the lower limit was set to 59°F and the upper limit was set to 78°F.

Another sensor was deployed to trace the presence of a person at home. We observed false positives and negatives at times. When the subject was sleeping and sensor did not detect any movement in the room, it reported that the subject left home. Also, the subject may have left the Cookie behind, violating the sensors’ assumption that the subject is in the same place as the Cookie. Collection of such data can be very important for scenarios like child care. At the same time, this kind of data is very sensitive from the privacy and cyber security point of view. In the wrong hands, presence- and absence- related data can be very harmful and consequential.

The fourth sensor was deployed to monitor the physical activity of the subject such as walking. Again, there was a high rate of false negatives, although very few false positives. For example, the sensor reported that the subject spent four days out of seven without walking, which is absurd; most likely, the subject forgot to carry the sensor. Or, perhaps an attacker deliberately manipulated sensor data. Again, this is sensitive information from both the privacy and security points of view. The data collected can reveal the subject’s pattern of life, which may prove useful to an attacker.

Data collected from sensors gave us insight into what next steps of our work should be. From the API documentation, we understood that we can only access the data stored in the database; we had no direct access to the devices through the API. The API queries enabled us to only read the REST API [17] database. Sen.se already has apps to access and display that data. Unless our target is developing a new app, there is not much motivation to write a new app to collect data from the forensics point of view. On the other hand, Oriwoh et al. [9] have shown that by applying a scenario based approach, we can determine the forensic significance of the data collected by apps. Hence, from practical attack and crime scenarios, we can interpret the data collected from a crime scene. After analyzing such scenarios and data, we have created a general model that formalizes a digital forensics approach for IoT. This approach also enables us to answer the research questions we started with.

Results

Data Classification

We classified the data collected from four applications. In table 1, for each set of information collected, we identify the source of the information and whether the information reveals the subject’s location or daily routine. We have indicated the severity of the leak too. The severity is considered high if both location and daily routine can be derived from the data; medium if only one of those is fully exposed; and Low if neither is exposed. Finally, we have described forensic interpretation of the data.

Scenario Analysis

Attack scenario analysis helps us understand how the data collected will be useful in practical scenarios. Here are some examples of such scenarios.

Event 1: Burglary

Identification: Door sensor data indicates the time when the owner left home. Data indicates that there has been an activity at 11:40 am, even though the owner was not home at that time. The burglary happened on the same day.

Interpretation: Does the data suggest that the burglar knew the owner’s daily schedule? This would help us investigate the incident. For example, would looking into CCTV camera footage from across the street that was collected at 11:40 am be useful?

Preservation: Data collected by the sensor was stored in the cloud at near real-time.

Analysis and presentation: Data presented on graphs is easy to understand and present to a court, so a graph correlating events with burglaries would be helpful.

Event 2: Abnormal death of a businessman, Mr. X

Identification: Medication sensor data indicates that Mr. X took medicine at an abnormal time. In addition, the walk sensor indicates that Mr. X was walking at the time the medication sensor at home reported activity. Does this mean someone has tampered with Mr. X’s medication while he was out for a walk? Or is it simply a bug in the sensor app that shows Mr. X taking his medicine at an irregular time? Does this lead us to the reason for his untimely death?

Interpretation: What does the data tell us? Is it meaningful? Is our interpretation correct? Can we trust the data? What about false positives? False negatives?

Preservation: Data collected by the sensor was stored in the cloud at near real-time.

Analysis and presentation: Data presented on graphs is easy to understand. Investigators may look into other related sensors, such as door activity or motion sensors. Data can be correlated to the events either manually or using automated software resources. Both methods have scope for further improvement.

Event 3: A banker’s laptop at home accessed by intruder

Identification: A transaction was made using the banker’s user name at 7:14 in the morning. The presence/absence sensor indicates that the banker was not home at that time. There was no indication of a break-in. Door sensor data from the room where the laptop was indicates that there has been an activity at the door in 7:12 am (after the owner left home).

Interpretation: Does the data suggest that the intruder knew the owner’s daily walking schedule? As there was no break-in, does this mean someone from inside the house came into the room and accessed the laptop? Could the banker have faked the scene to steal the money?

Preservation: Data collected by the sensor was stored in the cloud at near real-time, with some possibility of false positives and false negatives.

Analysis and presentation: Data presented on graphs is easy to understand and present to a court.

This scenario-based analysis leads us a general model for IoT mobility forensics.

Mobility Forensics Model

Figure 1 illustrates the model. The questions presented in this model are almost the same as classical criminal investigation and digital forensics. But the semantic and scope of them changes as the IoT environment is different than the conventional one.

Figure 1. IoT mobility forensics model

What happened? What is the description of the incident (cyber-attack, crime etc.)? Does it directly impact human life? Is the incident confined to IoT devices only? Does it affect other computers and connected smart electronics devices?

When did it happen? The time of the event is crucial for crime investigations and digital forensics. IoT devices are especially sensitive to time traces. Many critical systems and life-saving machines and IoT devices depend on millisecond of precision in time.

How did it happen? Identifying the transition steps from the safe state to compromised state is one of the most important part of mobility forensics.

Who and/or what did it? Identifying the person or object responsible for the event is the fundamental motivation of mobility forensics. Organizations, investigators and security researchers want to follow the event trails, both electronic and non-electronic, to find the entity responsible for the attack.

Why did it happen? Finding the reason for the event is just as important as finding the entity behind it. When IoT devices are present, an attack scenario can be more complex than before, but at the same time data and digital evidence collected through IoT devices will contribute to unrevealing the complexity using forensics.

What data was collected? This is an important question for IoT mobility forensics. In an attack scenario, forensic decisions may be affected by the amount of data collected by IoT devices. Moreover, how much of the data collected is useful and relevant to the attack is also an important factor.

Data Manipulation and Counter Measures

Understanding the data and model from IoT mobility forensics suggests some other important questions.

How much can we trust the data extracted from IoT devices?

How will the attacker changing the data before or after collection affect the forensic analysis?

Can we prevent or detect such manipulations?

Corruption and manipulation of digital data has always been an issue for the security community. Even if the attacker doesn’t compromise the integrity of the data, the data collection process itself may produce incorrect information intermingled with accurate data. In our discussion, we have assumed that the collection-related errors are known to happen and investigators are aware of the fact that certain portions of the collected data are erroneous. Recently, Altolini et al. [18] proposed an encryption and authentication mechanism for low power IoT platforms. Such an implementation can help prevent data manipulation by attackers.

Discussion

Our findings thus far point to more questions that need to be addressed. We briefly discuss some of these questions here, with some answers from our set-up.

Can the attacker “get into” the sensors? Kasinathan et al. [19] suggests that attackers can gain access to sensors under the right conditions.

Can the attacker “get into” the Hub? The Hub is directly connected to the Internet and interacts with the web portal. Work on IoT intrusion detection [23] suggests such attacks on hubs are feasible.

What is the communication medium? In addition to traditional wireless networks, IoT devices are connected through cellular networks, radio, Bluetooth and other low power communication media. This diversity makes the communication more vulnerable than otherwise, and makes using generic protections against attacks harder.

Can we knock down the sensors with a classic flooding attack? Although we did not try this on our devices, Kassinathan et al. [19] suggest that DoS and flooding attacks may disable IoT devices.

Can data be manipulated deliberately to obstruct or mislead justice in a court of law? We have discussed this issue in the previous section; it needs more attention from the security community.

Is it possible to sniff the hub and sensors? In our experimental set-up, we were able to derive device identity (specifically, the MAC address of the Hub) by observing network packets. Copos et al. [12] provide an example of how sniffing can lead to a major security breach.

False Positives and False Negatives

Incorrect results have long been an issue in security research. Researchers have tried to avoid and mitigate such erroneous results by applying different methods [20]. Unfortunately, there is no single reason behind false positives and negatives. Likewise, there is no standard solution either. There are many reasons behind false positive and false negative reports from IoT devices. As we indicated in our previous discussion, the main reason for false positives and negatives is inaccuracy of sensing, and human error. Sensors are limited to the physical information they register and the implementation of the detection algorithm. Many sensor readings are tunable. That being said, the users of such data and models should be aware of the existence of false positives and false negatives. They should take proper steps to detect and minimize false results from IoT devices.

Limitations

Some limitations of our work are:

Data is collected only from smart home devices

The forensic model proposed here has not been implemented, deployed, and tested

We assume implementation of the model will be scalable for the fast growing number of devices, which may not be true

Our findings depend on data collected from one type of device. Perhaps different kinds of devices would produce more consistent results.

Future Work

This paper contains specific findings and results based on the smart home IoT device Sen.se Mother, its Hub, and its Cookies. Future work should include more generic scenarios where multiple types of IoT devices and their data are analyzed. Working towards more robust and mature model for IoT mobility forensics by providing better data analysis would be an improvement. In-depth analysis and discussion of the data collected is left for future work. As a huge amount of data is collected and stored, the privacy of users is an important issue. If large companies like Google and government organizations such as US Information Awareness Office (IAO) have access to such data [21], they may violate users’ privacy and use the data for profit or special purposes. Hence, privacy is a serious research problem in IoT security. Another interesting open problem is the reverse question: given a digital forensics scenario and a forensic model, what useful data can IoT devices collect for us? This can yield significant result, useful to both security community and manufactures of IoT devices. In our future work, we plane to focus on one specific question that we have discussed here.

Conclusion

As the field of IoT is booming, we need to secure these devices and systems. More work on mobility forensics for IoT can help achieve that goal. This paper analyzed data collected from IoT devices and proposes a new forensics model to make IoT world more secure. The methods discussed in this paper are useful for both industry and academia. Criminal investigation and evidence collection in the realm of Cyber Security can get valuable ideas from the work presented here. We also hope that the users shall be more aware of IoT security and privacy issues from the discussion of our paper.

Acknowledgement: Special thanks to Intel and the INSuRE [22] project team for funding the Sen.se devices. We would also like to thank the anonymous reviewers for their useful feedback that helped us improve our work. This work was supported by the National Science Foundation Grant Number DUE-1344369 to Purdue University, and by a subcontract from Purdue University to the University of California funded by that grant. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, Purdue University, or the University of California.