‘Parallelized’ Data Mining (PDM) Security..

Parallel Data Mining is currently attracting much research. Objects involved with ‘Parallel Data Mining’ include special type of entities with the ability to migrate from one processor to another where it can resume / initiate its execution. In this article we consider security issues that need to be addressed before these systems in general, and ‘parallelized systems’ in particular, can be a viable solution for a broad range of commercial tools.

In this section we will briefly describe some properties of these systems and more of parallelized systems. This is not intended to be a complete description of ‘anything and everything’ of the above mentioned topics. We try to focus on issues with possible security implications.

Here when we speak of ‘entities’ we mean an ‘object / process / matter / material / data stream’ that splashes some kind of independent, self-contained and certain ‘intelligence’. So now we believe I can say “An entity is often assumed to represent another entity, such as an integrated output of a classified cluster or some other organization or environment on whose behalf it is acting”. No single universal definition of entity exists, but there are certain widely agreed universal characteristics of entities, these include fluctuating ambiance/environment, autonomy, and elasticity.

Fluctuating Ambiance means that the entity receives tactile input from its environment and that it can perform actions which change the environment in some way.

Autonomy means that an entity is able to act without the direct intervention of other entities (or other objects), and that it has control over its own actions and internal state.

Elasticity can be defined to include the following properties:

Responsive: refers to an entities’ ability to perceive its environment and respond in a timely fashion to changes that occur in it;

Pro-active: entities’ are able to exhibit opportunistic, goal-driven behavior and take the initiative where appropriate;

Social: Entities should be able to interact, when appropriate, with other entities and humans in order to solve their own problems (like distributing instructions to various sects, assigning instructions to respective processors with respect to certain considerations etc.) and to help other entities with their activities.

A number of other attributes are sometimes discussed in the context of ‘Augur’. These include but are not limited to:

Rationale: The assumption that an event will not act in a manner that prevents it from accomplishing its goals and will always attempt to fulfill those goals.

Candor: The concept that an event will not ‘knowingly’ communicate false information.

Cordiality: An entity cannot have conflicting goals that either force it to transmit false information or to effect actions that cause its goals to be unfulfilled or impeded.

Mobility: The ability for an agent to move across networks and between different hosts to fulfill its goals.

Platforms or the desired infrastructure provide entities with environments in which they can execute. A platform typically also provides additional services, such as communication facilities, to the entities it is running. In order for entities to be able to form a useful parallel system where they can communicate and cooperate, certain functionality needs to be provided to the entities. This includes functionality to find other entities or find particular services. This can be implemented as services offered by other processes or services more integrated with the infrastructure itself. Examples of such services include facilitators, mediators, and matchmakers etc.

Security Issues w/t Parallel Data Mining

In this section we will discuss security issues based on the characteristics described as above:

1) Entity Execution: Naturally entities need to execute somewhere. A host and the immediate environment of an entity, is eventually accountable for the accurate execution and protection of the entity. This straight forward leads us to the question of where access control decisions should be performed and enforced. Does the entity contain all necessary logic and information required to decide if an incoming request is authentic (originating from its claimant) and if so, is it authorized (has the right to access the requested information or service)? Or can the agent rely on the platform for access control services? The environment might also need certain protection from the objects that it hosts. An event should, for example, be prevented from launching a denial of service attack through consuming all resources on a processor, thus preventing the host from carrying out other things (such as executing other events scheduled).

2) Fluctuating Ambiance: What the term ‘environment’ indicates is that it totally depends on the application and appears almost to be considerably arbitrary in with respect to events literature; it can for e.g. be the ‘International Network’ viz. Internet or the host on which the entity is executing. An entity is assumed to be ‘conscious’ of certain states or events in its environment. Depending on the ‘nature and origin’ of this information, its authenticity and availability need to be considered. If an event’s ‘environment’ is limited to the processor on which it is executing, no specific security measures might be necessary (assuming the host environment is difficult to be spoofed keeping in mind the ‘objective proportional to time’ ratio). The situation is however likely to be totally different if the event receives environment information from, or via, the Internet.

3) Autonomy: This propertywhen combined with other features given to entities, can introduce serious security concerns. If an entity, for e.g., is given authority to perform an objective, it should not be possible for another ‘party’ to force the event into committing to something, it would not normally commit to. Neither should an event be able to make commitments it cannot fulfill. Hence, issues in around delegation need to be considered for ‘entities ➨ events’ / instructions. The autonomy property does not necessarily introduce any ‘new’ security concerns; this property is held by many existing systems. It is worth mentioning that worms or viruses also hold this property, which enables them to spread efficiently without requiring any (intentional or unintentional) objects interaction. The lesson it indicates is that powerful features can also be ‘remixed’ and used for malicious purposes if not properly controlled in a controlled environment.

4) Communication Botheration: Of the ‘Elasticity’ properties, social behavior is certainly interesting from a security point of view. This means that entities can communicate with other events. Just as an entities communication with its surroundings / environment needs to be protected, so does its communication with other events. The following security properties should be provided:

Confidentiality: Affirmation that communicated / proclaimed information is not accessible to unauthorized parties

Authentication of origin: Affirmation that communication is originating from its claimant;

Availability: Affirmation that communication reaches its intended recipient in a timely fashion (‘Secure Negotiation’ protocols play a HUGE role here);

Non-repudiation: Affirmation that the originating entity can be held responsible for its communications.

It’s a fact that “security usually comes at a cost”. Additional computing and communication resources are required by most solutions to the previously mentioned secure structured structures functionality. Therefore, security needs to be dynamic. A lot of times it makes sense to protect all communication within a system to the same level, as the actual negotiation of security mechanisms then ‘MAY’ be avoided. However, in a large scale parallelized data mining systems, security services and mechanisms need to be adjusted or tweaked to the purpose and nature of the communications of various applications with varying security requirements. Some implementations of varied architectures in the same niche assumes that security can be provided transparently by a lower layer i.e. adding it to data sects while distributing it to varied problems. This approach might be sufficient in closed or more precisely localized systems where the entities can trust each other and the sole concern is external malicious parties.

5) Maneuverability: The use of movable or mobile entities bumps a number of security concerns. Entities need protection from other entities and from the hosts on which they execute. Similarly, hosts need to be protected from entities and from other objects / parties (tools getting co-mingled with processes through varied form of injections and other vulnerable loopholes) that can communicate with the platform. The problems associated with the protection of hosts from malicious code are aptly understood. The problem posed by malicious hosts to entities and the environment seems more complex to solve. Since an entity is under the control of the executing host, the host can in principle do anything to the event and its code.

The particular objective of attack vectors that a malicious host can make / apprehend can be summarized as follows.

Observation of code, data and flow control.

Manipulation of code, data and flow control – including manipulating the route of an entity

Incorrect execution of code

Denial of execution – either in part of an event or whole

Masquerading as a different host

Eavesdropping and Manipulating other event communications

6) Rationality, Candor, and Cordiality: The meaning (from a security point of view) of these properties seems to be: “Events are well behaved and will never act in a malicious manner.” If we make this a bona fide requirement, then the required redundancy for such a system is likely to make the system useless. Affirmation that only information from trusted sources are acted upon and that events (or their initiators) can be held responsible for their actions, as well as monitoring and logging of event behavior, are mechanisms that can help in drafting a system where the implications of malicious events / entities can be minimized.

7) Identification and authentication: Identification is not primarily a security issue in itself; however, the means by which an entity is identified are likely to affect the way an entity can be authenticated i.e. if the labeling environment of an event gets knocked out or uncontrolled further actions would result the same. For example, an entity could simply be identified by something like a serial number, or its identity could be associated with its origin, owner, capabilities, or privileges. If identities are not permanent, security-related decisions cannot (more precisely should not) be made on the basis of an entities identity. While an entity’s identity is of major importance to certain applications and services, it is not needed in others. In fact, entities are likely to be ideal for providing anonymity to their initiators as they are independent pieces of code, possessing some degree of autonomy, and do not require direct third party interaction.

Like this:

Related

Post navigation

2 thoughts on “‘Parallelized’ Data Mining (PDM) Security..”

In the context of parallel DM architecture, you have mentioned some of the intense complexities which we always used to face while having a regular servicing of deployments in fortune firms.
Security does come at a cost, also with enhanced accrual accumulated complexities.
Good post refreshing glom of concerns!