2 FINDING THE POTENTIAL PRIVACY GAP IN THE BIG DATA SUPPLY CHAIN Master Thesis Flogerta Banaj Klaudia Dardha Presented in June 2014 Supervised by Markus Lahtinen Examined by Björn Johansson Benjamin Weaver Abstract We live in a digitalized society. All the abundant data we produce, today called Big Data it changing our lives, and will soon disrupt it. Different studies and analysis argue about the advantages that Big Data comes in, not only as a competitive advantages for the data holders, but also in health, government, for the citizens and society as a whole. Nevertheless, Big Data comes with significant questions and poses challenges toward the privacy concern. So the path to Big Data gains is risky and also rocky. The decision we take over that data have a real human consequences such as ethical issues. Any data on social subjects raise privacy issues, and when the risk of misuse, intentionally or not, is huge it becomes an issue for the entire information society. In this research, we explore potential gaps among the participants and deduct various reasons of these breaches reaching thus to reasons for improving the interplay among them. The study reflects on the interplay between government, business and consumer in a Big Data Supply Chain. It shows an existing inconsistency partly because of the lack of enforcement government legacy that is also attributed to lack of educated public. Data holders lack transparency and consumers retain their trust toward them. The communication, barriers and legal rights between their interplay are vague, leading so to an important question toward ownership. When data sets are available to be gathered and used in analysis, there is a mist about its usage rights and requirements. Keywords: Big Data; Big Data Supply Chain; Gap; Privacy; Data Subject; Data Holder; Stakeholders; Ethics; Ownership; Transparency; 1

3 Acknowledgments We would like to express our gratitude to Trevor Peirce, Cees Boon and Jon Page for the immense contribution and help with their participation in our research. Their availability, patience and good humor made our communication easier with valuable suggestions and assistance. We would also like to thank our supervisor Markus Lahtinen for his tireless supervision, inspiration and guidance in this study. Special thanks Three intensive months of working spent together from morning till night at study areas, library and also home, made sometimes discussion exhaustive. And so a special thank goes to each of us, Flogerta and Klaudia for supporting and never lose the good spirit in our team. Flogerta Banaj and Klaudia Dardha Lund, June

10 1 Introduction This introductory chapter seeks to provide a general outline of the social- technical dimension of Big Data, introducing though to the research question of the study. The Purpose and motivation of our research will follow up the background and problem area of the study. Consequently, we will discuss delimitation of it and some key concepts of definitions. 1.1 Background Marion Coseja Gonzalez case was one of more than 200 similar cases in Spain (EuropeanVoice, 2014) but its judgment brought crucial consequences for all European Union (EU) citizens. He complained that Google s search engine linked his name to a 1998 announcement of an auction of his house to cover unpaid social security contributions, relating that information undesirable (EuropeanVoice, 2014). On 13 May 2014 the European Union s highest court, according to European Commission data protection proposal in 2012, made available to EU citizens the right to be forgotten making thus exceptions for politicians or other historical data purposes (EuropeanVoice, 2014). Currently, most of the countries in the world are being part of a social, technological revolution. Their way of living, communicating and doing business is moving on the digital world (Bharadwaj, El Sawy, Pavlou, & Venkatraman, 2013). Citizens in these developed countries are all the time surrounded by Internet: in phones, homes, cities, and cars. All this abundant produced data, today called Big Data, will disrupt our life, and change the relationships between government, business, consumers and citizens (Needham, 2013). According to Russom (2011) companies have gathered more data in the last two years than they did in all previous 2,000 years. In addition, Epic (2014) predicts until 2020 data will be doubled every two years. This data is generated from different sources as sensors, digital pictures or videos, social media posts, as well as purchase transactions (IBM, 2012). Every day, 24 Petabytes data are being processed by Google and 400 million tweets are being sent per day (Mayer-Schönberger & Cukier, 2013). Information and communication systems secretly merge within our environment. Thus, not only many objects around us will be on the network in one way, or another through the Internet of Things (IoT), but Radio Frequency Identification (RFID) as well as sensor network technologies will also help to meet these requirements (Gubbi, Buyya, Marusic, & Palaniswami, 2013). 9

11 Surely, many companies were doing Big Data long before the term became worthy of market capitalization, but many factors have brought the practice into the mainstream (Murphy & Barton, 2014). According to Mayer-Schönberger and Cukier (2013), Big Data is revealing as a revolution and is disrupting the way of living, working and thinking. It is not the big numbers in big data that stick out, but the big culture changes that it imply (Mayer-Schönberger & Cukier, 2013). Various definitions of Big Data have emerged lately. Most of them focus on the size of data in storage. Firstly, Big Data was defined by Laney (2001) as the rapidly increasing volume of data, velocity as the data go in and out, and the variety as the dispersed type and sources of data. Chen, Chiang, and Storey (2012) in their paper, describe significant data as large and complex data sets in applications that need advanced and unique data storage, management, analysis and visualization technologies. The three Vs of big data that Laney determine, bust the myth that Big Data is only about data volume. Size is substantial, but there are also other important attributes of significant data, named data variety and data velocity (Russom, 2011). In contrast to previous approaches, where a sampling technique was used, now it is possible to use all the observations without needing a hypothesis in advance before collecting the data (Mayer-Schönberger & Cukier, 2013). Consequently, hypothesis can be changed after being run different regression one by one, arising so to one of the significant challenges to data information, the changing of ownership (Pavolotsky, 2013). Big Data offers a competitive advantage through utilizing the vast amount of data, and radically improve the company s performance (McAfee & Brynjolfsson, 2012), but it can also imply issues in terms of ethics and privacy (Davis & Patterson, 2012). One of the crucial ethical issues of the information age is information privacy (Stone, Gueutal, Gardner, & McClure, 1983). Stone et al. (1983) defined this information privacy as the ability that any individual has to control their information by and about themselves Meanwhile in information ethics, according to Mason (1986) arise four critical areas of concern: privacy, as what personal information should individuals be required to disclose and under what conditions; accuracy, as what policies and standards are needed to protect individuals from errors; property, as who owns data and how should ownership be determined; and access, as who can have access to what information. Confusingly the terms "ownership" and "use" usually substitute each other although they have a different meaning (Van Alstyne, Brynjolfsson, & Madnick, 1995). While the usage rights involve the capability to access, make, normalize, and change data, ownership means the right to establish these privileges for others (Van Alstyne et al., 1995). Mason (1986) related the information ethics concerns with ethical dilemmas arising from the collection, use, and management of information. Base on the fact the information in a supply chain mechanism flows among the participants (Lee, Padmanabhan, & Whang, 2004) we investigate for the data management in the Big Data Supply Chain. The Data Supply Chain as a methodology to manage company s data assets (EvanLevy, 2001), would lead into a conceptual framework of the collecting- analyze-share-feedback of the information (Groth, 2013). Further on, the data supply chain expanded the traditional corporate information life cycle in order to entourage several data sourcing, provisioning, and logistical activities that are needed to manage successfully the company's data (EvanLevy, 2001). In the context of Big Data, the participants within this chain are the data generator and the data users that 10

12 interchangeable are the individuals, the public/ development sector, and the private sector (WorldEconomicForum, 2012). For the purpose of this study we will use the taxonomy of privacy of Solove (2006) where the stakeholder grouped into data subject such as the individual and data holder such as business, government and other individuals. We will generalize these stakeholders respectively as the consumer, the government, and the business. 1.2 Problem area Notice and consent are problematic, particularly so for Big Data as many consumers do not read privacy policies (McDonald & Faith Cranor, 2008). Privacy policies displayed on devices, such as cell phones, with small-form factors are at best difficult to read, and layered notices, providing a brief summary of the privacy terms with a link to the full privacy policy, are often unavailable (McDonald & Faith Cranor, 2008). Again, even if an individual read and understood the privacy policy, that policy may be wrong or at least incomplete (Pavolotsky, 2013). Thus, Big Data, the value of which lies in identifying secondary (and so unimagined) uses of data, stretches the practical limits of meaningful consent (Pavolotsky, 2013). Enabling individuals to exert some control over how these data are used will be an important aspect of an overall solution (McKinseyGlobalInstitute, 2011). Big data poses other challenges to existing approaches in data privacy, pointing to the need for dialogues among stakeholders on a policy framework that can help create a sustainable data ecosystem that will strongly protect individual rights (McKinseyGlobalInstitute, 2011). It may be unreasonable to pretend to have a consent from every person who deliberately posts information for their self or others, but it is awkward to justify actions of organizations that analyze that information as ethical just because the data are accessible (Boyd & Crawford, 2012). Although the content is publicly available does not imply that it is intended to be consumed by just anyone, leading through to serious issues involved in the ethics of online data collection and analysis (Boyd & Crawford, 2012). Users are not aware of the multiplicity of agents and algorithms currently gathering and storing their data for future use (Boyd & Crawford, 2012). But they are not necessarily conscious of all the multiple uses, profits, and other gains that come from information they have posted (Boyd & Crawford, 2012).Therefore, Big Data leads to the question that how the information will be used. 1.3 Motivation of proposed study and Research question The critical evaluation of Big Data phenomenon is vital important as a consequence of its data driven approach. Thus, the decision we take over that data have a real and significant human 11

13 consequences (Mayer-Schönberger & Cukier, 2013) such as ethical issues of online data collection and analysis (Boyd & Crawford, 2012). Understanding of the impact of this development is crucial, and broad social discussion is needed to ensure sustainable social and economic development. Studies have clearly found Internet users information privacy concerns to be a main originator of their disposition to disclose personal information to online companies (Son & Kim, 2008). Hence, if we lack in the balancing of social values that we are sensitive, like privacy, identity, reputation, transparency, then our Big Data society is in risk to lose these values because of the innovation and efficiency (Richards & King, 2014). As we already presented, Big Data phenomenon is quite new approach, leading so to the emerging of new and not previously faced problems. There have been various researches for the Big Data technology approach or the advantages and opportunities that Big Data can enhance. Also, there have been several studies of the privacy in IS research, but few investigation of privacy in the Big Data Supply Chain, which is a new dimension within the individual and organization social context. Hence, we will conduct the research based on the below raised up research question: Research question: What is the potential privacy gap in the Big Data Supply Chain? 1.4 Purpose The purpose of our study is to identify the potential gap between privacy and openness among the stakeholders acting in a Big Data Supply Chain. The study will explore the interplay between privacy and openness in the context of information society driven by Big Data. Any data on human subjects certainly raise privacy issues, and the real risks of abuse of such data are difficult to measure (Boyd & Crawford, 2012). The difficulty is that privacy gaps are hard to make specific if there is damage done at the time, after 20 years and so on. In the coming decade, the whole integration of our online and offline lives will become increasingly important issue facing society (Bus & Nguyen, 2013). This research is, therefore, important as the information privacy of many individuals seems to have been seriously threatened, if not compromised (Son & Kim, 2008). It urges for individuals to be able to manage information related to them, meeting with their preferences, contexts, and values, within existing social and legal boundaries (Bus & Nguyen, 2013). We will try to investigate into these potential privacy gaps in a Big Data Supply Chain in order to bridge these weaknesses and improve the interplay between government, business and consumer. 12

14 1.5 Delimitation According to the main mission of IS field as providing an extensive approach of IT within the individual, organizational, and social contexts (Bakos & Treacy, 1986), the delimitation of our research as a critical evaluation of Big Data would be in the social perspective of it. As Big Data is becoming more common tool in business decision-making, a number of new social jeopardizes arise. The most evident is the risk of privacy violations (Bollier & Firestone, 2010). Although there are physical aspects of privacy, our study will be delimited in the information privacy concerns. People disclose personal information to obtain the benefits of a close relationship; the benefits of disclosure are balanced with an evaluation of the risks (Mary J. Culnan & Armstrong, 1999). Additionally, our study will endeavor to gain a better understanding of the risks in an individual s intention to disclose personal information, and the influence of other stakeholders. Being unable to gather information from National Security Agency or China legislation, we delimited our study only to European Commission (EC) regulation and also the interviews from the legacy part were from EC. The research will be delimited further on the privacy concerns measurement done towards the individual context rather than group, or the organizational level. Besides, the business delimitation is done towards the one participating in the Big Data Supply Chain. Finally, as a delimitation within this research we considered the different feature of online and offline privacy within the limited time to study in details. While, in the offline setting, there is no clear way visually to assess a consumer's personal data when dealing with, in online firms, the information transparency is easier to be tracked (Awad & Krishnan, 2006). For this reason we chose to concentrate only in online privacy issues 1.6 Terms and definition Gap The term of gap is used in our research question as well as throughout the study. We used gap in the context of representing a non-desirable break in continuity (OxfordDictionaries, 2014) between the participants of the Big Data Supply Chain, allowing thus for infringements of privacy and ethical issues. Stakeholders We use the concept of the stakeholder through our study as a person, group or organization are interested or concerned within an organization (BusinessDictionary, 2014), for us the Big Data 13

15 Supply Chain. For the purpose of simplicity, we use these interested actors in three large groups such as the business, government and consumers. Data Supply Chain The Data Supply Chain seeks to broaden the traditional corporate information lifecycle in including various data sourcing, storing and logistical activities required to manage data (EvanLevy, 2014). For this purpose, we will look through Big Data within the Data Supply Chain naming though Big Data Supply Chain (EvanLevy, 2014). Privacy General privacy is considered as the right of freedom from secret surveillance and to determine whether, when, how and to whom one s personal information is to be revealed (BusinessDictionary, 2014). Although privacy might be categorized into four areas: physical, informational, decisional, dispositional (BusinessDictionary, 2014) we will focus our study only in the Information privacy. Information Privacy Information privacy is a multidimensional notion that is reliant on context and also varies with a person s life occurrences (Xu, Dinev, Smith, & Hart, 2008), and it is a limitation on revealing data or searching for them unknown or unknowable to others (BusinessDictionary, 2014). Big Data Analytics Advanced analytic techniques that process on Big Data (Russom, 2011). Ethics Ethics is called moral principles that lead the behavior or the manners of doing activity in a person (OxfordDictionaries, 2014). 1.7 Structure of the thesis The rest of the thesis will continue as follows: Chapter 2: Big Data socio technical phenomenon explains the theoretical guideline on which our research framework is constructed. Afterwards, we evaluate critically the implications of individual information privacy in the new Big Data ecosystem, as a socio technical phenomenon. Chapter 3: Methodological Approach shows the aim and motivation we went throughout our research structure and methodology. We start with the research strategy and approach and further 14

16 on with data collection and analyzing considering through it the reliability, validity and generalizability of the study. Finally, we end up with thematizing and an extended research framework. Chapter 4: Empirical results and data analysis illustrates our empirical findings as a result from the interviews and the diaries conducted. A short analysis follows each of the main themes we based our research. Chapter 5: Discussion and findings include the observations we extracted from the empirical findings combined with the literature review. We point out and discuss the main findings from our study. Chapter 6: Conclusion will summarize the findings inducing for further researches and limitations we went through. In other words, we will give an answer to our research question. 15

17 2 Privacy in the context of Big Data This chapter explains the theoretical guideline on which our research framework is constructed. Firstly, we will conduct a top-down analysis of the macro environment where Big Data operates. Further on, analyzing it as Big Data Supply Chain we will find the potential privacy gap among its stakeholders. In conclusion, it terminates with the selection of five main themes where our research framework is based: Big Data Supply Chain; Individual; Government; Business; and Consumer. 2.1 Introduction According to Recker (2013) a literature review is important in conducting research because of its enormous contribution in findings of specific problem domains, the theories that are available to analyze the raised up problem, and the methodologies appropriate for the research. We organized our literature review as top-down analyses. We start from a macro level analysis where our research is based on, such as Big Data, and afterwards we went down to the micro analysis like the privacy. First, we started with a description of Big Data associated with the new opportunities and challenges that emerge from it. Furthermore, we approached Big Data with the Data Supply Chain to make the description of data flow more understandable (McKinseyGlobalInstitute, 2011). Second, we conducted the analysis towards the impact of Big Data in terms of privacy. Based on that, we started with the transition of privacy from physical to information one related to the technology evolution (Westin, 2003). Consequently, we reflected the analyses of privacy concerns as impinged more in the presence of a new information technology phenomenon that is perceived as a threat from the public (Mary J. Culnan, 1993; Westin, 1967). Finally, we conclude with the theoretical framework adopted from (Solove, 2006) as the taxonomy of privacy. We used the privacy calculus theory to examine the position of Big Data towards consumer s privacy as it attempts to explain the costs and the benefits towards the individual behavior of the consumer (Mary J. Culnan, 1993). 2.2 The changing landscape towards Big Data The term Big Data firstly appeared in 2001(Laney, 2001). Although many of the technologies it bases on have been used in this domain for many years such as clustering, parallel computing, and 16

18 network file systems (Needham, 2013). Furthermore, the increase of storage information capacities in the last twenty years and the exponential rates increasing technological information processing capacities (Hilbert & López, 2011), lead many organizations towards the use of Big Data. The Internet and the Web, starting from the early 2000s, started to offer new data collection and opportunities within the analytical research (Chen et al., 2012). While, the Internet of Things, firstly referred from Ashton (2009), lead toward the extend of data captured nowadays as the data are generated. Because of the Internet connectivity among the things, the information systems can collect up-to-date information on physical objects and processes (Mattern & Floerkemeier, 2010), resulting thus to increase of the amount of data continuously. Nowadays, Big Data overcomes the social networking and machine-generated web logs, leading through the procession of the tremendous amount of data (Epic, 2014) that are done in parallel and quickly (Needham, 2013). When minicomputers firstly appeared in the early 1970s, corporations started to take advantage of the distributed computer sources that became the essential ground for the decision support system and recognizing DSSs from the MIS (Hosack, Hall, Paradice, & Courtney, 2012). It was not until the 1990s, when the business intelligence term became popular in IT communities and business (Chen et al., 2012). However, the analytical techniques used in these systems, commonly in the 1990s, are mainly based on the statistical methods emerged from the 1970s as well as data mining techniques that evolved in the 1980s (Chen et al., 2012). Also, Big Data techniques mostly rely on elaborate commercial technologies of relational DBMS, data warehousing, OLAP, ETL and BPM (Chaudhuri, 2011). Nowadays, as the amount of data available is emerging increasingly, and the knowledge hides within that abundant data, the processing of it is not designed for traditional business intelligence tools (AppliedDataLabs, 2014) which make Big Data stand on the top. In support of this, Big Data as shown in Figure 2.1 from Gartner s IT Hype Cycle (Gartner, 2013), has been identified as one of the emerging technologies in IT that will take 5-10 years for market adoption. 17

19 Figure 2-1 Hype Cycle for Emerging Technologies, 2013 (Gartner, 2013) Boyd and Crawford (2012) generalized Big Data as a social-technical phenomenon that is associated with both opportunities, problems and big responsibilities (Filippi, 2014). We can find Big Data in different aspects of everyday life, from governments, e-commerce to health (Chen et al., 2012) making it a source of strategic use for both the business and the government (Bollier & Firestone, 2010). Big Data is also known as a new potential class of economic asset (WorldEconomicForum, 2012). But now, in terms of Big Data social-technical phenomenon, possible uncertainty arises as the consequence of the new data ecosystem that the technology has built up (Chen et al., 2012). So, the correlation techniques and data analysis can lead towards both behavior and event prediction that potentially can raise new questions and challenges to deal with (Bollier & Firestone, 2010). Hence, we have to analyze ethical implication of the government and the business that get value from the consumers (Bollier & Firestone, 2010) and the consumer s knowledge about the value they generate. In addition, the (WorldEconomicForum, 2012) emphasize that in parallel with the individual, as a data generator, there is a need for the development of instruments to assure their privacy and security. Also, the stakeholders responsible for data generations and data use can control the ethical dimension with their behaviors as the privacy policy agreement process (McDonald & Faith Cranor, 2008) Big Data Supply Chain Firstly, mention by EvanLevy (2001), the Data Supply Chain was seen as a new approach to manage the company's data assets, expanding the traditional corporate information lifecycle 18

20 management. DataSociety (2014) claims that as data moves between participants what prescribes the whole phenomenon is a Data Supply Chain. In the context of Big Data the processes that the data goes through are as follow: Figure 2-2 Big Data Supply Chain (Croll, 2014) These activities helped us defining the participants that take place in a Big Data Supply Chain: Data subjects, the one who generate the data and information (Solove, 2006). Furthermore, according to the European Directive 95-46, the data subject is any individually identified subject, directly or not, through personal number to his physical, physiological, cultural or social identity(europeancommission, 2014). Data holders, the one who collect, analyze, acting and sharing, and measure results and create feedback of the information gathered or produced (Solove, 2006). EuropeanCommission (2014) reflects data holders as the recipient who represents a natural or legal person, public authority, agency or any other entity to whom data are disclosed, whether to a third party or not. 1-Collect: The collection process within Big Data Supply Chain is based on several types and sources of data emerging nowadays (Chen et al., 2012). It can be traditional collection, as through transaction systems purchase transactions (IBM, 2012) but also, through data brokers companies (US.GOA, 2006). Data brokers, or third party (EuropeanCommission, 2014) base their business model upon the collection of information on consumers and afterwards resell it to their customers, in a private or public sector (US.GOA, 2006). 2-Big Data Analytics: The companies that compete on analytics differentiate themselves among the competitors (Davenport, 2006). Once the data is collected, it is being ingested, and manipulated (Chen et al., 2012). But, to achieve the competitive advantage from the analytics, we have to focus 19

21 on analytics effort, establish an analytical culture, hire the right people, and use the right technology (Davenport, 2006). 3-Sharing and acting: Davenport (2006) argues that the use of data has disrupted the decision making process. All the analysis made to the data gathered are not valuable if we are not able to act over them (Chen et al., 2012). But, the collection is not only a technical matter it involves even legislation, and organizational politics (Croll, 2014). 4-Measure results and create feedback: Big Data is mostly about feedback (Croll, 2014). Just the collection process or analysis will not add value without the feedback (Croll, 2014). The data generated within the Big Data Supply Chain, is composed by three main participants such as individuals, public/development and the private sector (WorldEconomicForum, 2012) (Figure 2.3). Figure 2-3 Complex Data Infrastructure/Ecosystem (WorldEconomicForum, 2012) Further on, from the McKinseyGlobalInstitute (2011) was claimed that the data in a Data Supply Chain was roughly duplicated, leading through to disruption and eventually to a changed ownership of data. In addition consumer s privacy is a centric ongoing debate between business representatives, governments, and consumers representatives (Mary J Culnan & Bies, 2003). Based on this approach we established to make an investigation of the stakeholders of the Big Data Supply Chain in order to characterize any potential privacy gap among them. 20

22 2.1.2 Big Data Supply Chain in the Macro-environment Treating Big Data in the context of the social-technical phenomena (Boyd & Crawford, 2012), we looked through the business and social implications that affects the participants of Big Data Supply Chain (McAfee & Brynjolfsson, 2012). Although Big Data enhances the decision process within a business entity, several uncertainty social related rise up, where the most risky one is the challenge of privacy (McAfee & Brynjolfsson, 2012). After revealing the stakeholders in a Big Data Supply Chain from the first theme, we investigate into their relation with the Big Data Supply Chain to characterize any potential privacy gap among them. Accordingly, we will divide this theme into three ones: Big Data Supply Chain - Government Big Data Supply Chain Business Big Data Supply Chain Consumers In addition, when seeing this environment in the context of a Macro - Environment four factors take place: Political, Economic, Social and Technological (Recklies, 2006). The political factor, through the government is revealed in the data regulation process e.g. the General Data Protection Regulation (EuropeanCommission, 2014). Moreover, privacy policies attempt to assure the privacy issues related to the data movement. But, they are complicated, long and difficult to read (McDonald & Faith Cranor, 2008). Furthermore, some companies do not provide appropriate privacy protection for consumers because they either do not have privacy policies, or they do not comply with the Federal Information Processing (FIP) standards (Blanger & Crossler, 2011). However, there are several political initiatives to assure the privacy within the Big Data Supply Chain e.g. Privacy of Big Data (WhiteHouse, 2014). The economical factor and the competitive strategies are based on the vast amount of consumers data they possess (Mary J. Culnan & Armstrong, 1999). Because, it is hard to differentiate yourself based only on products, the achievement of differentiating is done using the data analytics (Davenport, 2006). The social factor, Big Data as a social-technical phenomenon is associated with several social opportunities (Chen et al., 2012). Several challenges also come across (Boyd & Crawford, 2012) e.g. the privacy in health care data, the prejudice because of the information ingested (Mary J. Culnan & Armstrong, 1999) from this Big Data etc. The technological factor, the low cost of the hardware and the evolution of cloud computing make it easy to store data compared with previously (Agrawal, Das, & El Abbadi, 2011). 21

23 2.2 Privacy - a Big Data challenge?! Privacy as a concept is studied from several contexts including even the Information Systems (Pavlou, 2011). Hence, the research within its domain is quite broad. In order, to narrow the objective of the research, according to Smith, Dinev, and Xu (2011), the research in the privacy domain is based in three main aspects such as: 1) the privacy definition, 2) the privacy relationship with the other constructs 3) the privacy importance in a specific context. For the purpose of our study the review is organized as follows: 1) the privacy definition analyzed as a transition from the physical to the information one, 2) the privacy relationship expressed as the privacy concerns but supported also with the privacy calculus, 3) the privacy importance in the context of Big Data Supply Chain. Context is outlined as the catalyst and the phenomena that surround the external environment of the individual (Mowday & Sutton, 1993). So, Big Data perceived as the stimuli phenomena will be the environment where the potential privacy gap will be evaluated. Moreover, the previous experience of the Internet technology that disrupted completely the privacy landscape leads the scholars towards the assumption that another disruption force like the first one we are facing with (Smith et al., 2011). Thus, Big Data as the social-technical phenomena is a potential context that will reshape the privacy debate within the participant and mostly the evaluation of the privacy concerns that rises within the enormous. In order to evaluate despite the macro environment factors we have to take in consideration even the microenvironment one that contains both internal and external factors (Jackson, Joshi, & Erhardt, 2003) Privacy ethical dimension Typically, general privacy is assumed under the ethics theme (Smith et al., 2011). Indeed, it is found mostly affiliated with ethical issues in several sociology readings (Bynum, 2008; Pearlson & Saunders, 2009). General privacy has been investigated based on many ethical theories such as social contract theory, duty-based theory, stakeholder theory, virtue ethics theory, and the power responsibility equilibrium model (Caudill & Murphy, 2000). Even though privacy is not the same with ethics and it can be analyzed without dealing with the ethical perspective (Smith et al., 2011) Privacy transition from physical to information one Firstly, the privacy was perceived as the physical privacy (Smith et al., 2011). Today, the communication is based on digitization and storing it as information (Blanger & Crossler, 2011). Furthermore, the information collected about individuals and groups evolved into a phenomenon, 22

24 converting information privacy into general privacy (Smith et al., 2011). Thereupon, we will use the term privacy to refer to the information privacy that we are analyzing in the context of Big Data Supply Chain. But, what is the definition of information privacy is hard to define it clearly. Nobody can articulate what it means" (Solove, 2006). Even if there are many definitions for it, there is not much difference on the essential elements describing it, that usually consists of any potential secondary use of individual s personal information (Belanger, Hiller, & Smith, 2002). Where secondary use indicates the data usage with the purpose outside the primary purpose it was firstly collected (Smith, Milberg, & Burke, 1996). According to a literature review of information privacy in IS the main taxonomies of the information privacy are as follow (Belanger et al., 2002): Table 2-1 Information Privacy Taxonomy Author Taxonomy Constructs (Smith et al., 1996) (Solove, 2006) (Skinner, Han, & Chang, 2006) (Smith et al., 2011) (Clarke, 1999) Information privacy consists of four dimensions: collection, unauthorized secondary use, improper access, and errors. Information privacy includes information collection, information processing, information dissemination, and invasion. Information privacy in a joint environment is centered on time, matter, and space dimensions. Where the space dimension demonstrates the information privacy related to individual, group, and organization. Information privacy research is related to individual, group, organizational, and societal. He defines information privacy as the concern that an individual has in controlling, significantly changing, or in the administration of data about themselves. Collection Unauthorized secondary use Improper access Errors Information collection Information processing Information dissemination Invasion Time Matter Space dimensions Individual Group Organization Individual privacy Group privacy Organizational privacy Societal privacy Controlling Furthermore, the concept of information privacy existed long before information and communication technologies changed its occurrences, impacts, and management (Blanger & Crossler, 2011). With the IT development, the boundaries started to be reduced and eventually rises up the privacy paradox that associated privacy with the commodity (Bennett, 1995). Thus, a cost-benefit analysis can be embedded to it (Smith et al., 2011). We revised the model of the 23

25 information privacy evolution as a function of IT development (Westin, 2003), adapted from (Smith et al., 2011) in the context of Big Data as shown in Table 2.2. Table 2-2 Evolution of information privacy concept following the evolution of IT (Adapted from Smith et al. (2011)) Even though, in relationship with that commodity, privacy is still an individual and social value (Westin, 2003) Privacy concerns According to the literature review related to privacy concerns we found it as follows: Mainly privacy concerns are defined as an individual subjective judge of legitimacy from the perspective of information privacy (Malhotra, Kim, & Agarwal, 2004). More particularly the privacy concerns are associated with the organizations information privacy policies (Smith et al., 1996). Furthermore, Internet privacy concerns represent individuals perceptions of what happens with the information they provide via the Internet (Dinev & Hart, 2006). So, there are consumers claiming that their personal information is collected and analyzed not in a transparent way (Mary J Culnan & Bies, 2003). Privacy concerns are not new, but they emerge even more in the presence of a new information technology phenomenon, that the public perceive as a threat (Mary J. Culnan, 24

26 1993; Westin, 1967). According to Mary J. Culnan (1993) the use of technology for a strategic purpose can raise privacy concerns if they are not based on some common values. Thus, it is of a growing concern to multiple stakeholders including business leaders, privacy activists, scholars, government regulators, and individual consumers (Smith et al., 2011). But, how are the privacy concerns measured?! There are scales using the concerns for information privacy (CFIP) such as the collection, the errors, the secondary use, and the unauthorized access to information (Smith et al., 1996). Furthermore, Internet user information privacy concerns (IUIPC) is related to Internet users that have three first order components: collection, control, and awareness (Malhotra et al., 2004). Moreover, Smith et al. (2011) stated that the privacy concerns need a macro model to be evaluated from the perspective of their source, and the outcomes resulted from them. So, they introduced the macro model APCO compound of three components such as, "Antecedents => Privacy Concerns => Outcomes". This model is delimited into the personal borders of privacy, excluding the group and the organizational one (Smith et al., 2011). So, the macro-environment analyses previously leads to the antecedents for these privacy concerns in the area of political, economic, social, and technological factors (PEST). Summarizing, the measuring scale of privacy concerns associated with the respective constructs are shown on the table below. Table 2-3 Measuring scale of privacy concerns Author Measure Scale of Privacy Concerns Constructs (Smith et al., 1996) (Malhotra et al., 2004) (Smith et al., 1996) CFIP (Concerns for Information Privacy) IUIPC (Internet User Information Privacy Concerns) APCO Macro Model (Antecedents, Privacy Concerns, Outcomes) Collection of data Unauthorized secondary use of data Improper access to data Errors in data Collection Control Awareness Privacy experiences Privacy awareness Personality differences Demographics differences Culture/Climate Regulation Behavior reactions Trust Privacy notice Privacy benefits Privacy risk 25

27 Nevertheless, the outcomes of privacy concerns generate a taxonomy of information privacyprotective responses known as (IPPR) (Son & Kim, 2008). Furthermore, these concerns have impact on the individual attitudes towards the regulatory environment preferences and also for the profilization willingness (Milberg, Smith, & Burke, 2000; Van Slyke, Shim, Johnson, & Jiang, 2006). Also, the concerns have impact on the trust that is related with the mobility to disclosure information apart the perceived risk (Mayer-Schönberger & Cukier, 2013; Mayer, Davis, & Schoorman, 1995) where the privacy risk is perceived as any potential loss of control related to personal information (Dinev & Hart, 2006). Summarizing, these concerns influence the technology acceptance such as the online purchases (Malhotra et al., 2004; Smith et al., 1996). As a result, the privacy concerns have to be addressed in order to find the potential privacy gap in the Big Data Supply Chain. Furthermore, there is a privacy paradox that privacy calculus explains (Mary J. Culnan, 1993) Privacy calculus theory Privacy Calculus is the theory used to evaluate the tradeoff of costs and benefits towards the individual behavior (Mary J. Culnan, 1993) by assuming that a consequentiality tradeoff of costs and benefits is salient in determining an individual s behavioral reactions. This view is found in various works (Eddy, Stone, & Stone Romero, 1999; Klopfer & Rubenstein, 1977) where the privacy is perceived not as a static concept but as a debate generator in terms of costs and benefits (Klopfer & Rubenstein, 1977). Thus, privacy calculus leads towards the risk-benefit approach when it is requested to disclose personal information (Mary J. Culnan, 1993). Where privacy risk is perceived as a person believes towards a high potential loss of personal information to an organization (Pavlou, Liang, & Xue, 2007). And, privacy benefit related with privacy calculus perspective is associated with the personal assumption that a particular behavior leads to the most level of outcomes (Stone & Stone, 1990) such as, financial rewards, personalization, and social adjustment benefits. 2.3 Theoretical framework After the literature review of information privacy and Big Data we have chosen the theoretical framework where the study is based on. We will use the taxonomy of privacy (Solove, 2006) for the purpose of our study. As we have claimed previously, what prescribes the data movement is a Big Data Supply Chain. That one is composed of four main activities such as collection, processing, share, and measuring and creating feedback. According to the taxonomy of Solove (2006) information privacy includes information collection, information processing, information 26

28 dissemination, and invasion. They are the similar activities that reside in the Big Data Supply Chain. Thus, the potential privacy gap will be searched among the stakeholders within the Big Data Supply Chain related to the privacy taxonomy of Solove (2006). We adopted a new framework to characterize the potential privacy gap among the stakeholders in Big Data Supply Chain. Even though the privacy concerns are related to consumers, business entity, and the government (Mary J Culnan & Bies, 2003) we delimited the research within the individual level privacy concern. Moreover, Big Data in itself has not any value framework, but by analyzing the privacy implications within it as a Data Supply Chain we assure that Big Data is aligned with the values the participants inspire (Davis & Patterson, 2012). Thus, as an outcome of this analyses we came up with a framework of information privacy adapted from Solove (2006). Furthermore, we added the government intervention through its mechanisms such as regulation assurance and policy assurance within the processes of the data holder. Also, the framework is accomplished with the education impact towards the data subject. The adopted theoretical framework is shown in the Figure 2.4. Figure 2-4 Taxonomy of Privacy in a Big Data Ecosystem (Based on Solove (2006)) 27

30 3 Methodological Approach In this chapter we offer the aim and motivation we went through out our research structure and methodology. As part of this, we extend our research framework by conducting a review of Privacy and Big Data Supply Chain literature, in order to dig into a potential gap between privacy and openness among stakeholders in relation to each of the theoretical themes established in our framework. The Micro and Macro Models factors are used to help lead our data collection through interview and diary process. Finally, we describe and analyze in details the results in our research as well as the steps we took to provide the quality of our research. 3.1 Research Strategy Our research aims to identify the potential gap between privacy and openness among stakeholders acting in a Big Data Supply Chain. Thus, our focus when selecting the appropriate approach for our research was choosing the best methodology for our data collection requirement and analysis. We chose to discuss directly with different sector experts within the Big Data ecosystem and found it most appropriate to observe the daily habits of several individuals for one week through a diary written by them. Indeed, diverse interpretations to what counts as good quality work in our research will exist (Seale, 1999). The conduct of a qualitative approach for our research will help us to study the three stakeholders as well as the social frameworks in which they live, work and behave (Recker, 2013). In our study we assume that social reality takes form based on the social contexts and individual experiences that allow us for subjective interpretations (epistemology) (Bhattacherjee, 2012). The qualitative approach is used to inquiry the collection of data in a normal setting tangible to the participants of the study, while data analysis determine patterns or themes (Creswell, 2012). Brinkmann and Kvale (2005) evaluate the qualitative research interview as the one which enquiries individuals life in detail. Further on they say that it allows researchers to describe personal aspects of people s life. Diary methods estimates observational research and are mainly beneficial in situations where first-hand observations are not possible (Czarniawska- Joerges, 2007). Latham (2003) sees the diary as a way of performance or reportage of the week of the participants. Initially, we thought of conducting a mixed method, but then again we found unsuitable to ask questions regarding privacy and ethical issues to consumers. Eventually we would get in the end only their importance perceivement of personal information and nothing about our assumption of having a neglected behavior. As Sapsford (2006) argues, when measuring traits of personal behavior the direct question would probably be unsuitable in any circumstance, because there is no motivation to assume that the respondents are able to place themselves in terms of the researcher's theoretical constructs. 29

31 In conclusion our study, even because of the deficiency of considerably number of observation, cannot be largely generalizable. According to Kvale and Brinkmann (2009), the lack of generalizable is a feature that characterize the qualitative research. Nevertheless, we expect our study to be a good start up for future research and further quantitatively validation in the context of individual privacy within the Big Data Supply Chain. 3.2 Research Approach The descriptive research, according to Sandelowski (2000) is the lower form of inquiry, while the qualitative descriptive studies designed in sampling, data collection and analysis, summarize the everyday life events. Therefore we reflected this approach as most suitable for our study and the fact that we want to show the potential gap between privacy and openness among the participants acting in a Big Data Supply Chain. Our main goal was to inference the findings from the data describing to a proposition that best describe the potential gap among stakeholders (Josephson & Josephson, 1996). In our study we aim not to determine the three stakeholders as the only one taking part in a Big Data Supply Chain, but rather digging in to discover where this gap lays on and which are the reasons for that, generating so enough variations to yield desirable results (McGrath, 2001). Further on, deductive theory derived from more general principles (Ezzy, 2013) to determine other implied stakeholders and causes for gaps as stated in our study may strengthen our research. 3.3 Data collection Initially, we had two important institutions agreements in different time, in Sweden, for assisting us during our thesis with their data, documents, help and expertise. But meanwhile they resigned, the first one found limited the time for the thesis, and the second company was not comfortable with our research toward the privacy. After that, we decided to conduct a mix research strategy, interviews for experts (business side and legal side) and a quantitative study for the individuals. But quantitative research can be quite challenging as a high amount of sampling is needed in order to make it generalizable and avoid bias (Kvale & Brinkmann, 2009). Thus, we changed direction into diaries as a qualitative research. Further on, we would have preferred a face-to-face method for data collection, as they give a rich qualitative response rates (Bhattacherjee, 2012), due to the time limit and the dispersed location, we used Skype interview. Through this mean, we were also able of recording and afterward transcribing it. Though it can be disadvantaged because of the missing image and body gestures 30

32 rather than just words (Kvale & Brinkmann, 2009), the distance that is imposed through phone, made our interviewees free and comfortable. For diaries we chose convenience sample (Marshall, 1996) as a technique that allows us in selecting most accessible subjects with fewer costs in terms of time, reachable and money. Although it might lead for poor quality data and credibility (Marshall, 1996), the restriction in time justifies this selection. For the purpose of generalizability, we chose diarists in different age and residence. We asked them to keep a diary for the specified issue we were interested in to study. When a diary is well managed, it can be used to gather data that are not obtainable through interviews or other data collection (Alaszewski, 2006). Despite, the reserved confession of their activities that we can assume, the use of time in a typical day life of an individual can tell a great amount about a social pattern (Corti, 1993). We were focused in a structured diary for our study, and also provided instructions and questions on how to fill and keep it. Further on we explained the purpose of our study to the diarists when we had the first interview, and tried to persuade them to be part of this study by keeping an everyday diary. 3.4 Thematizing of study Our research phenomena will be investigate into three categories of actors, part of the threedimensional area: Consumers, Government and Business. The collection of data will be made in two different ways. We chose to conduct interviews for data collection of Government and Business and diary for the Consumers. In order to structure our study we used Kvale and Brinkmann (2009) and Alaszewski (2006) books, respectively for qualitative research interview and diaries for social research. Kvale and Brinkmann (2009) suggests to use the linear progression of seven stages, as shown in Figure 3.1, from the original purpose to the final report. Although they prognoses that due to adoption of new conditions that might occur, the interviewer s initial scope and purpose might evolve. Alaszewski (2006) rises some key issues in designing, choosing the population or analyzing the diary, which a researcher should consider when choosing diary as their qualitative research. Figure 3.2 visualize the steps carried out in a diary investigation, adapted from the steps Kvale and Brinkmann (2009) used. Figure 3-1 Seven stages of an Interview Investigation (Kvale &Brinkmann (2009)) 31

33 Figure 3-2 Diary Investigation (Based on: Alaszewski (2006)) During the first stage, thematizing, we will express the purpose of our study and the outset of our research question previously the interview begins (Kvale & Brinkmann, 2009). Further on, we have to clarify the theme of our study (Kvale & Brinkmann, 2009). Which means developing a theoretical understanding of the phenomena we wish to investigate. In addition a base from the literature review, the hypothesis derived from social theory, could be tested against the observations and then enriched and with our findings and knowledge throughout the way (Alaszewski, 2006). Our study was initially inspired for the use of our information from third parties. We have been reading a number of forums and articles from New York Times, IBM, Gartner, Forbes and other press release from EU, NSA etc., which raised up the Big Data ethical issue, and we decided to investigate some of the questions that previous research let as unsolved. Several thoughts and hypothesis answer regarding our research question were assumed from the beginning of the study. Although they were initially based on the literature review, later investigation through interviews and diary changed somehow the initial answer we had in mind. This was a crucial component in thematizing our study through interview (Kvale & Brinkmann, 2009) and dairy research as it constructed a knowledge base where to develop our further findings. We performed a PEST analysis previously the interview and the release of the diary, in order to make a strategic plan (Recklies, 2006) of the Big Data ecosystem. Finally we concentrated in the interview responses as well as in the diaries, and provide for both of them coding based on the data gathered Big Data Supply Chain According to the aim of our study where we were interested to find the interplay between privacy and openness driven by Big Data, we wanted to investigate within the Big Data Supply Chain as the information in a supply chain mechanism flows among the participants (Lee et al., 2004). Based on the fact that data in a Data Supply Chain is being duplicated leads into changing ownership (DataSociety, 2014), we decided to make an investigation of the participants of the Data Supply Chain. We chose this theme initially to define the participants, such as individuals, public/ development and the private sector (WorldEconomicForum, 2012) that act as data subject and data holder. Additionally we explored on the roles that consumer, government, and business entity have in this Data Supply Chain. 32

34 1. Data subjects as the one who generate the data and information. 2. Data holders, the one who collect, analyze, store, and share the information gathered or produced. 3. Their relationships can interchange sometime, meaning a data holder can become a data subject and vice versa. For this reason we also looked to the data ownership, and its management within the Data Supply Chain Big Data Supply Chain Participants After revealing the participants in a Big Data Supply Chain from the first theme, in reliance to their individual interplay within the Big Data Supply Chain we chose to adopt a theme for each relationship. Hence, we split this theme into three ones: Big Data Supply Chain - Government Big Data Supply Chain - Business Big Data Supply Chain - Consumers According to that we chose subthemes as: 1. Legal assurance and its effectiveness in the Internet age we are living. For this purpose we also looked through the invasion of the government within the individual data privacy. 2. Policy assurance, as it is complicated, long and difficult to read (McDonald & Faith Cranor, 2008). 3. Management for data protection. For this component we also investigated through the responsibility among the participants, how will data be managed, and who should be responsible for its integrity in a case of secondary use. 4. Further on we used secondary use of data as a consequence either from lack of legal assurance or flaw from policy assurance. 5. We used transparency as a subtheme, because the importance it has as a component in putting light in the manageability of personal information. The privacy benefit of the consumer from one side and his concerns toward privacy issues were factors that explained this component. 33

35 3.4.3 Big Data Supply Chain Individual According to the fact that there are several factors such as culture, regulatory laws, previous experiences, and personal characteristics that affect an individual in showing various level of information privacy concerns (Malhotra et al., 2004), we chose to study this theme apart of the previous one BDSC- Consumer. Individuals from different countries and different cultures, values, and laws, might generate different perceptions of information privacy and even their impacts toward them might be diverse (Blanger & Crossler, 2011). From this theme we issued two potential gap components that both relies on their privacy awareness: 1. Educated public as it is only through it that we can have any form of effective policing. 2. Trust as there is increasingly people sharing information, which are misrepresentation of the real world and disruption for the Big Data validity analytics Extended research framework Based on the literature review, European Commission press, and PEST analysis we came into conclusion firstly suggesting the stakeholders in a Big Data Supply Chain with compounding data subject and data holders, and afterwards we extended the theoretical framework by connecting the theme-based framework with potential gap components as shown in Table 3.1. The theme of Big Data Supply Chain as the general ecosystem where the phenomena lies, we associated it with the ownership issue. Further on from each of three themes in Big Data Supply Chain - Participants we associated them with potential gap components such as legal/policy assurance, management, responsibility, secondary use and transparency. Finally we added trust educated public and privacy awareness under the BDSC- Individual theme as the last potential gap components in this privacy gap. The division of the information into macro and micro environmental framework and the analysis we previously made based on the literature, but also the potential gap components we provided, supported us in addressing the accurate questions for interviews and diary as well as when analyzing the content. 34

36 Table 3-1 Extended research framework Themes Big Data Supply Chain BDSC Participants BDSC - Government BDSC - Business BDSC - Consumer BDSC - Individual Theories PEST analysis Privacy Calculus Theory Privacy Calculus Theory Potential Gap Components Ownership Legal Assurance Policy Assurance Management Responsibility Secondary Use Transparency Trust Educated Public Privacy Awareness 3.5 Design of collection techniques Design of interview guides We used a semi-structured type of interview for the experts where we outlined the topics and questions to be covered, but also allowed the interviewee to be open for his judgment and point of direction (Kvale & Brinkmann, 2009). We explained the purpose of our study from the beginning and in addition set direct questions as why, what and how questions (Kvale & Brinkmann, 2009). Thus, we developed two interview guides, based on our research framework. The first with study s thematic research question and the second with interview questions to be directed (Kvale & Brinkmann, 2009) as shown in appendix 2 and 2.1 Our prior literature findings helped us in knowing what to ask, and simplify the meaning to the answers during the interview to our study (Kvale & Brinkmann, 2009).The interviews, in consequence, were designed to be flexible based on the type of the experts we were going to interview, whether from Big Data experts or regulatory researchers within the Big Data ecosystem. We were attentive in listening to the answers as we encouraged them with second question to keep the direction of our research investigation. The interview guide was based accordingly to the conduct of interviews of Kvale and Brinkmann (2009) keeping a semi-structured form and including an outline of the subjects to be covered through the interview. The interview guide we provided was mainly separated into 7 Parts: 35

37 Part 1: Introduction and general questions. We mainly introduced and thanked the interviewee and asked him for his background and some general questions about Big Data as a concept, its opportunities and challenges. Part 2: Management This part provided questions related to the management of Big Data within a company, which department makes more use of it, who is the one keeping responsibilities within the organization and ho are the providers of data from where companies do get data to analyze. Part 3: Authority This following part covers questions regarding the authority. It covers the questions about legal assurance, policies of data transfer, data management due to ethical issues, misinterpretation, and finally responsibilities toward data analytics. Part 4: Ownership We addressed some important questions about the ownership information, changing ownership, data collection, integrity and permanent privacy choices. Part 5: Big Data Privacy This is where we aimed to know the point of view of the interviewee toward privacy. We also dig into the privacy issues and obligations and use of technology in the context of privacy. Part 6: Future vision This was the pre- final part, where we wanted to investigate through the future of privacy and the interviewee concerns about privacy. Part 7: Closing and debrief. In the end, we asked the interviewee if there were any other issue that he would wanted to add and that we did not covered. Next to that we thanked the interviewees for their collaboration and let them know when we would send the interview transcriptions. A full summary of the interview questions related to our research framework is in appendix Design of diary instructions We used a structured diary for collecting the data from consumers (individuals) (Alaszewski, 2006). The diary was asked to be kept for one week (7 days), and for each day there was a timeline to follow and questions to be considered when writing it (Alaszewski, 2006). We also provided a set of instructions for diarists for how were they expected to keep it and fill the diary recording system (Alaszewski, 2006). In addition the set of question following the instructions were in accordance to the purpose of the study and the structure of it (Corti, 1993). We checked after 1-2 days during that we hand in the diary, to verify for accuracy and if the diarist had any questions regarding the use of it (Alaszewski, 2006). The diary s formal structure imposed the diarists to have regular entries in order to insure for relevant data (Alaszewski, 2006). For the format and design of the diary we followed the guidelines of (Corti, 1993). Although the instructions in the first page of the diary, we also provided a sample of how the diary could be kept (Corti, 1993). According to (Corti, 1993) our diary was designed as: 1. A 7 structured pages for each day. 2. The first page contains a clear set of instructions on how to keep and fill the diary, emphasizing the importance of writing it as soon as the events occur. 36

38 3. A set of questions for the respondent were also placed in the first page of the diary to be considered accordingly while filling it, or used as a questionnaire in case of aborting the keeping of it. 4. The second page contained a sample of model filled correctly for one day. 5. The subsequent days had denoted the day and date of the diary and the time grouped in 4 time frames. 6. One final page for any further comment as it might helped us in our coding and analyzing. A full summary of the diary design can be found in appendix 3. Finally after getting back the diary from the respondents we made a call interview asking only one final question which would contribute in the analysis of the diary (Corti, 1993). 3.6 Selection respondents For the purpose of our study we chose different set of respondents for interviews and diarist. In order to give our study generalizability, we selected two experts to interview for each category (Big Data experts, and regulatory experts in European Commission). Further on for diarist, we did a convenience sample (Marshall, 1996), selecting 5 respondents from different ages, living in different regions but with residence in Europe for at least the last 10 months Selection of interviewee Cees Boon is the director at Scenarius, which is a management consulting company in Netherland, for almost 4 years. He has been working for more than 19 years as a retail intelligence consultant at Newway. We found Cees appropriate to our research to interview as an expert in Big Data ethics, as he was the founder of many professional network groups but especially of Big Data Ethics. Cees has also experience in Business intelligence, Science and IT. Jon Page, now a freelance consultant motivated in defining BI Strategy and ensuring linkage with Corporate Strategy. He is the 'thought leader' for EMEA in The Big Data Institute (Stockholm). As he is working recently with HP and EMC driving Big Data solution development and sales, we found him adequate to interview as an expert in the Big Data field. The previous background in Business Intelligence and Data Warehousing in Oracle EMEA, and also working for seven years and being a founder member of Teradata Europe, added him expertise for the questions we planned to have. Has directly assisted in the design of some of Europe s largest Data Warehouses. Trevor Peirce is the Activity Chain Leader of governance, security, and privacy in European Commission - Internet of Things, European Research Cluster (Brussels). And also, a Public Policy Activity Leader and CFO in RFID, Brussels, particularly orientated toward the European Parliament, European Commission and European Member State Governments. The engagement 37

39 in European Commission toward privacy and security but also working as a Supply Chain Technology Consultant made him relevant for our investigation. We were interested mostly on the reaction, role and contribute of the regulatory intermediate role into the Big Data Supply Chain, which Trevor had more than 8 years experience in that area. Interviewer 4 is an affiliated senior researcher at Institute of European Law at K.U. Leuven. His experience also as a member legal service at European Commission and at European Data Protection Supervisor made him a valuable respondent for our study. Added his study within the field and his recent PhD thesis about data protection and personal data, as well as some other publications related to it, despite the appreciated knowledge for our study also give us finely insights. Table 3.2 outlines the interview selections and respective interview details. Table 3-2 Data collection (Interviews) summarize Contact with diarists The approach for recruiting diarists was related to the purpose of our research and the data which we were seeking to get (Alaszewski, 2006). Additionally the selection of the cases in the study were guided by the nature of the design (Alaszewski, 2006). Through the diary research methodology we tried to make generalizations for a large population (Alaszewski, 2006), by choosing only a set of respondents. Thus we used a convenience sample (Marshall, 1996) as a technique that allows us in selecting most accessible subjects, choosing thus different people from 38

40 18-30 years old, male-female, residing in Europe for at least 10 month. We were interested in our study not in a specific group of population but rather as randomized as possible. Although, we would have prefer to choose a larger number of respondents, to have more generalizable results (Alaszewski, 2006), that was conditional to the time required for coding. Lacking that, a random selection process was used to reduce the probability of bias undermining the generalizability of the findings (Alaszewski, 2006). Table 3.3 outlines the diarist selections and respective diary details. Table 3-3 Data collection (Diarists) summarize Transcribing The transcribe of the interview recordings, and diaries will enable the produce of a neat, typed copy (Kvale & Brinkmann, 2009). Kvale and Brinkmann (2009) suggests to be cautious when transcribing from interview to typed copy as the spoken context might change or transform due to other factors as body language rather than only spoken words. Although Kvale and Brinkmann (2009) say that there is no only way to transcribe as it depends from the purpose and the analysis of it. We were quite attentive and wanted to capture as much as possible from the interviewee, transcribing thus, in a narrative way rather than artistic. Nevertheless, we avoided pauses, conjunctions between sentences, or other clarifications between talking etc. beyond the general answer to the question itself. While for the diaries, it was easier because the diarist chose by themselves the writing style they wanted. Interviews, as well as diaries, were conducted in English. In consequence no further ambiguity in translating them was shown. For the interviews we made, after transcription, we sent it back to the interviewees asking if we did any unintended 1 For the last 10 months has been living in Sweden. 39

41 misinterpretation to be corrected. Kvale and Brinkmann (2009) advice that the transcribe should represent a good, careful attempt to reproduce the interview, rather than just being accurate. After coding the interview, we sent the interview a copy so they would also know how their words were interpreted, but we didn t do the same as per diaries due to the difference in expertise between them. We did the transcribe as soon as the interview ended so our memories regarding it were fresh in case of not understanding the voice from recording. We also cross checked between us the transcribe of the interview to be sure that nothing was missed or misinterpreted. 3.8 Analyzing Our sources for collecting data were interviews and diaries. Kvale and Brinkmann (2009) recommends researchers to think in advance of how they will do the analyze of the transcription of interviews, before they start doing interviews, as during that point principally it is too late to start thinking about analyzing. The method of analyses decided will guide the preparation of the interview guide, its process and in the end the transcription. (Kvale & Brinkmann, 2009). Qualitative content analysis can be reflexive and interactive as the researcher continuously modify the treatment of data to assist new data and new insights about them (Sandelowski, 2000). The written text created by diaries does not have the inbuilt structure of diaries used in experimental or survey research, so the researcher needs to decide how to manage this text. (Alaszewski, 2006). We used coding which categorizes the text into segments and categories (Bhattacherjee, 2012). For diaries, the process of involving coding of precise entries can be very intense, almost in the same way as it is for treating qualitative interview transcripts (Corti, 1993). Coding according to Kvale and Brinkmann (2009) includes assigning one or more keywords to a document segment which allowed us to uncover potential gaps within the Big Data Supply Chain. The coding of the text's meaning into categories made it possible to quantify how often specific themes were addressed in the text (Kvale & Brinkmann, 2009) themes that we used for our research framework. Content analysis in diaries involves taking a number of written texts such as diaries, breaking them into their constituent parts and re-assembling these parts into a new scientific text (Alaszewski, 2006). The process of identifying themes on diary, indeed begins after the content of it have been transcribed (Alaszewski, 2006), that in our case was no need as the diary was kept in written form. The coding process in both forms required observing the text, rereading it many times, in order to identify and confirm themes and to organize and summarize all the evidence relevant to each theme. We used methods such as investigator triangulation when coding in order to ensure interreliability within the process (Golafshani, 2003). We code the interviews and diaries transcription separately and then exchange it between us to establish thus the existing and new codes we might add during coding, and compered coding with each other in order to conclude with a final established coding structure (Table 3.4 and Table 3.5). Additionally we used text coding for each 40

42 theme and subheading of it for interviews, and color coding for three main themes we were interested in diaries. Table 3-4 Coding structure for data analysis Interviews Table 3-5 Coding structure for data analysis Diaries Usability Privacy Concerns Privacy Benefits Privacy Awareness 3.9 Ensuring research quality Reliability When using qualitative approaches, Seale (1999) says that there need to be reassured into a skeptical audience. Qualitative studies are different in those of quantitative as they use credibility and dependence, and as the validity in such qualitative research do not just provide measure of the studied phenomena but rather show whether the data reflects the interested phenomenon (Kvale & Brinkmann, 2009). We tried to obtain this objective by selecting interviewees from different sector and different background and employing investigator triangulation in coding (Seale, 1999) and by providing a more reliable method of research for the consumer such as an observation method. Finally we used semi-structured interviews so that respondents could feel free to overpass the formal question, and diaries as a narrative way for an individual to get express as he wishes to. Although, structuring the diaries including written instructions and sampling how to be used are relatively impersonal ways of influencing diarists (Alaszewski, 2006). 41

Informatiemanagement: A HOT POTATO IN THE HANDS OF FINANCIAL INSTITUTIONS: DATA QUALITY Poor quality of data is still causing headaches to fi nancial institutions. In an environment of ever-growing regulations,

School of Advanced Studies Doctor Of Health Administration The mission of the Doctor of Health Administration degree program is to develop healthcare leaders by educating them in the areas of active inquiry,

CASE STUDY: EXPLORATION OF HOW TECHNOLOGY AND SOCIAL MEDIA USE IS RELATED TO INTERNET PRIVACY CONCERNS IN A DIRECT SALES ORGANIZATION Robert L. Totterdale, Florida Gulf Coast University, rtotterdale@fgcu.edu

The big data dilemma an inquiry by the House of Commons Select Committee on Science and Technology Evidence from the UK Computing Research Committee Definitive. 1 September 2015 The UK Computing Research

Why is BIG Data Important? March 2012 1 Why is BIG Data Important? A Navint Partners White Paper May 2012 Why is BIG Data Important? March 2012 2 What is Big Data? Big data is a term that refers to data

School of Advanced Studies Doctor Of Management In Organizational Leadership/information Systems And Technology The mission of the Information Systems and Technology specialization of the Doctor of Management

Business Challenges and Research Directions of Management Analytics in the Big Data Era Abstract Big data analytics have been embraced as a disruptive technology that will reshape business intelligence,

School of Advanced Studies Doctor Of Business Administration The mission of the Doctor of Business Administration degree program is to empower business administration practitioners to become business subject

Harnessing Big Data to Improve Customer Service By Marty Tibbitts The goal is to apply analytics methods that move beyond customer satisfaction to nurturing customer loyalty by more deeply understanding

011-0290 A comparison of supply chain risk perceptions in Original Equipment Manufacturers and Tier One suppliers: A case-study in the aerospace industry. Naomi Brookes Amrik Singh Aston Business School

Genomic and Clinical Data Sharing Policy Questions with Technology and Security Implications: Consensus s from the Data Safe Havens Task Team Delivery date: 18 October 2014 When the Security Working Group

CONCEPTUALIZING BUSINESS INTELLIGENCE ARCHITECTURE MOHAMMAD SHARIAT, Florida A&M University ROSCOE HIGHTOWER, JR., Florida A&M University Given today s business environment, at times a corporate executive

1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and

The European Union as a Constitutional Guardian of Internet Privacy and Data Protection: the Story of Article 16 TFEU SHORT SUMMARY There is a wide perception that governments are losing control over societal

ABSTRACT The value of information technology: A case study Lin Zhao Purdue University Calumet Songtao Mo Purdue University Calumet This case requires students to examine how to effectively assess the business

ACTUAL PROBLEMS AND GOOD PRACTICES IN ACCOUNTANCY TEACHING TO STUDENTS IN ALBANIA Alketa Pasholli (Zheku), PhD Head of Finance and Accounting - Department Faculty of Economy Fan S. Noli University,Korce,

Curriculum for Business Economics and Information Technology Copenhagen School of Design and Technology August 2012 1 General regulations for all institutions providing the programme Curriculum Applicable

Metropolitan State University of Denver Master of Social Work Program Evaluation Date: Agency/Program Task Supervisor Faculty Liaison Total Hours Completed To Date for this semester: s will not receive

Privacy and the Emerging Internet of Things: Using the Framework of Contextual Integrity to Inform Policy Jenifer Sunrise Winter University of Hawai i at Mānoa PTC 12 Overview The Internet of Things Framework

INTRUSION PREVENTION AND EXPERT SYSTEMS By Avi Chesla avic@v-secure.com Introduction Over the past few years, the market has developed new expectations from the security industry, especially from the intrusion

All available Global Online MBA routes have a set of core modules required to be completed in order to achieve an MBA. Those modules are: Management and Organizational Change (P.4) Leading Strategic Decision

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION Pilar Rey del Castillo May 2013 Introduction The exploitation of the vast amount of data originated from ICT tools and referring to a big variety

CONTINUOUS DIAGNOSTICS BEGINS WITH REDSEAL WHAT IS CDM? The continuous stream of high profile cybersecurity breaches demonstrates the need to move beyond purely periodic, compliance-based approaches to

101010 100101 1010 101 Factsheet on the Right to be 100 Forgotten ruling (C-131/12) 101 101 1) What is the case about and what did 100 the Court rule? 10 In 2010 a Spanish citizen lodged a complaint against

Call topics 2013 SAF RA joint call on Human and organizational factors including the value of industrial safety September 2013 SAF RA is an ERA-NET on industrial safety funded by the European Commission

Metropolitan State University of Denver Master of Social Work Program Evaluation Date: Agency/Program Task Supervisor Faculty Liaison Total Hours Completed To Date for this semester: s will not receive

Department of Geography GEO 271 Everything is related to everything else, but near things are more related than distant things. - Waldo Tobler s First Law of Geography 1.1 Research in Geography [Meaning

Organizational and Role Transformation of Chinese Higher Education Research Institutions - - - - Based on the Survey about Current Situation of HERIs in 4- year HEIs Paper presented in track 3 at the th

Point of sale 22 Success Secrets - 22 Most Asked Questions On Point of sale - What You Need To Know Copyright by Henry Alford Notice of rights All rights reserved. No part of this book may be reproduced

CALIFORNIA STATE UNIVERSITY, SACRAMENTO Division of Criminal Justice Assessment Plan Bachelor of Science in Criminal Justice The faculty of the Division of Criminal Justice are committed to providing criminal

Exploring the directions and methods of business development A comparative multiple-case study on Ikea and Vodafone Michal Štefan Aalborg University Master thesis for MSc. in International Business Economics

All available Global Online MBA routes have a set of core modules required to be completed in order to achieve an MBA. Those modules are: Building High Performance Organisations Management and Organisational

Planning for Success: Privacy Impact Assessment Guide Acknowledgement This guide is partially based on the Privacy Impact Assessment Guides and Tools developed by the Ministry of Government and Consumer

: Delivering Value from IS & IT Investments John Ward and Elizabeth Daniel John Wiley & Son Ltd ISBN: 9780470094631, 399 pages Theme of the Book This book explores a process and practical tools and frameworks

Exposure Draft May 2014 Comments due: September 11, 2014 Proposed Changes to the International Standards on Auditing (ISAs) Addressing Disclosures in the Audit of Financial Statements This Exposure Draft

Information Technology Research in Developing Nations: Major Research Methods and Publication Outlets Franklin Wabwoba, Anselimo Peters Ikoha Masinde Muliro University of Science and Technology, Computer

BUSINESS INTELLIGENCE Bogdan Mohor Dumitrita 1 Abstract A Business Intelligence (BI)-driven approach can be very effective in implementing business transformation programs within an enterprise framework.

California Mutual Insurance Company Code of Business Conduct and Ethics This Code of Business Conduct and Ethics (the Code ) applies to all officers, employees, and directors of California Mutual Insurance

Building Responsive Enterprises: One decision at a Time James Taylor CEO, Decision Management Solutions Visibility, prediction, impact and action are the keys More information at: www.decisionmanagementsolutions.com

ANNEX E: GLOSSARY OF KEY TERMS IN M&E Source: Development Assistance Committee (DAC). 2002. Glossary of Terms in Evaluation and Results-Based Management. Paris: OECD. This glossary is available in English,

Australian Journal of Basic and Applied Sciences, 5(6): 1491-1495, 2011 ISSN 1991-8178 An Approach to Building and Implementation of Business Intelligence System in Exchange Stock Companies Sherej Sharifi

Search: Go blog.castac.org From the Committee on the Anthropology of Science, Technology, and Computing (CASTAC) About Adventures in Pedagogy Beyond the Academy Member Sound-Off News, Links, and Pointers

«Organization and sustainable development : toward a new form of social responsibility?» Jocelyne Robert et Adeline Goemans With the collaboration of : G. Delhez, M. Frau, M. Pichot HEC-Management School

To ensure the functioning of the site, we use cookies. We share information about your activities on the site with our partners and Google partners: social networks and companies engaged in advertising and web analytics. For more information, see the Privacy Policy and Google Privacy &amp Terms.
Your consent to our cookies if you continue to use this website.