Σχόλια 0

Το κείμενο του εγγράφου

Key Centre of Design Computing and CognitionThe University of Sydney, Australia{andrew, justin, adong}@arch.usyd.edu.auAbstract. This paper presents two novel features of an emergent datavisualization method coined “cellular ants”: unsupervised data class labelingand shape negotiation. This method merges characteristics of ant-based dataclustering and cellular automata to represent complex datasets in meaningfulvisual clusters. Cellular ants demonstrates how a decentralized multi-agentsystem can autonomously detect data similarity patterns in multi-dimensionaldatasets and then determine the according visual cues, such as position, colorand shape size, of the visual objects accordingly. Data objects are representedas individual ants placed within a fixed grid, which decide their visual attributesthrough a continuous iterative process of pair-wise localized negotiations withneighboring ants. The characteristics of this method are demonstrated byevaluating its performance for various benchmarking datasets.1 IntroductionThis paper proposes a simple approach towards unsupervised data visualization. Ituses principles of self-organization to determine the visual representation of complex,high-dimensional datasets. Self-organizing systems generally consist of a number ofsimilar elements that perform numerous internal interactions, which canspontaneously generate an inherently complex pattern on a global level. The rules thatgovern this process are informed by local information only, without any reference tothe global pattern. The proposed method, coined cellular ants, uses self-organizationto determine the visual attributes of data items, including position, shape, color andsize. By self-adapting the visual representation to data attributes, this approach goesbeyond the traditional notion of using fixed and predefined data mapping rules.The cellular ant method combines insights from ant-based clustering in the field ofdata mining and cellular automata in the field of artificial life with data mappingprinciples from the data visualization domain. It can be considered as a simple dataclustering technique that is capable of creating visual representations similar to thoseof multidimensional scaling. As a non-optimized prototype, it demonstrates howsimple behavior rules are capable of clustering complex, high-dimensional and largedatasets. This work is built upon the methodology defined in [1], which introducedthe data scaling in a toroidal grid. In this paper, two novel features are introduced:color negotiation (similar to data labeling or data clustering) and shape negotiation.2 Related WorkAnt-based sorting was introduced by Deneubourg et al. [2] to describe different typesof emergent phenomena in nature. Ants are represented as simple agents that arecapable of roaming around in a toroidal grid, on which objects, representing dataitems, are randomly scattered. Ant actions are biased by probabilistic functions, sothat ants are more likely to pick up objects that are isolated, and more likely to dropthem in the vicinity of similar ones. A predefined object distance measure variableα⁤e瑥牭楮敳⁴e楳⁤e杲敥⁯映獩mi污物瑹⁢e 瑷敥渠灡tr猠潦⁤慴愠潢橥捴献s䅮琭扡獥搠捬畳ue物ng= 桡h⁢敥n= 畳ud= fo爠摡da⁭i湩湧⁰urp潳敳o⁡n搠桡h⁢敥n⁣ombin敤⁷eth=晵空f-s整⁴h敯ry⁛㍝,⁴o灯gr慰桩挠ca灳⁛㑝,⁯r⁢=漭on獰sred=来湥gic⁡=go物thms⁛㕝.⁔桥=捥clula爠慮r⁭ethod⁤=晦f牳⁦rom⁴h攠獴慮摡ad⁭et桯搠批⁭a灰png⁤慴a⁩t敭猠摩牥捴ly=潮o漠瑨攠on瑳⁴桥t獥汶敳⸠剥捥湴⁥硡s灬敳⁯映 th楳⁡灰牯慣h⁥硩獴㨠䱡扲潣桥⁥琠慬⸠嬶崠慳獯捩慴敳a摡瑡⁯扪散瑳⁴漠=nts⁡湤⁳業u污 瑥猠s敥瑩湧猠扥瑷敥渠瑨敭⁴漠摹湡=i捡汬礠扵楬搠灡牴楴楯湳Ⱐ慣捯牤楮朠瑯⁴桥r摡瑡⁬d 扥汳⁴h慴⁢a獴⁦楴⁴桥s爠来湯ge⸠.Multi-dimensional scaling (MDS) displays the structure of distance-like datasets asgeometrical pictures [7]. MDS representations are arranged in 2D space, in which thedistance between pairs of data items denotes the degree of data similarity. Severalsimilar data visualization techniques exist, for instance in combination with animation[8] or recursive pattern arrangements [9]. Multi-dimensional scaling differs fromclustering in that clustering partitions data into classes, while MDS computespositions, without providing an explicit decomposition into groups. Self OrganizingMaps (SOM) is an unsupervised clustering technique capable of detecting andspatially grouping similar data objects in topologically distinct classes [10]. Thisvisualization method orders an initially random distribution of high-dimensional dataobjects as the emergent outcome of an iterative training process. In this paper, wedescribe how the cellular ant method is capable of unsupervised clustering, as it iscapable of coloring ants in classes depending on an emergent data scaling topography.Because the cellular ant methodology governs ants by principles of stigmergy andstate density principles, it resembles that of cellular automata. Cellular automata is acomputational method originally proposed by Ulam and Von Neumann [11]. Itconsists of a number of cells that each represents a discrete state (e.g. alive or dead).Cells are governed by behavior rules that are iteratively applied, and generally onlyconsider the states of the neighboring cells. The cellular ant approach combines ant-based clustering and cellular automata, as the ants’ reasoning takes into account gridcell states, rather than probabilistic functions. While ants can ‘act’ upon theenvironment and even change it to some degree, cellular automata ‘make up’ theenvironment itself. A recent clustering method [12] also maps data objects onto ants,that resemble cellular automata elements. It differs from our methodology as it doesnot order clusters, and is based on probability functions and internal ant states.Agent-based visualizations have typically been used to display intrinsic relations(e.g. messages, shared interests) between agents for monitoring and engineeringpurposes [13], to represent complex fuzzy systems [14], or to support the choice ofthe most effective visualization method [15]. Other systems organize the visualizationdata flow, for instance by determining visualization pipeline parameters [16] orregulating rendering variables in distributed environments [17]. To our knowledge,agents have not yet been used to generate visualizations based on detected datacorrelations. A few simple prototype applications of agent-based data visualizationhave been developed that are capable to represent complex data properties through anemergent, decentralized process: for instance, the infoticle (information-particle)metaphor is capable of representing time-varying data properties as recognizablemotion typologies of dynamic particle or flocking patterns [18].3 Approach3.1 Cellular Ant ConceptEach single normalized data item (e.g. database tuple, row, object) corresponds to asingle agent, coined cellular ant. Each single ant (and thus data item) is represented asa single colored square cell within a toroidal, rectangular grid. Each ant is governedby a set of simple behavior rules. These behavior rules are applied simultaneously toall ants, in a discrete and iterative way. Each ant can only communicate with ants inits immediate vicinity, limited to its eight neighboring cells. The dynamic behavior ofan ant only depends only on the data values it represents, and the data values of itsimmediate neighbors. A cellular ant is capable of determining its visual cuesautonomously, as it can move around or stay put, swap its position with a neighbor,and adapt a color or shape size, by a process of pair-wise negotiating. Each cellularant is determined by four different negotiation processes: data scaling, positionswapping, color determination and shape size adaptation. A detailed description of thedata scaling and the ant swapping methodologies can be found in [1]. This paper willinstead focus on the recent additions that determine the other visual cues of an agent.At initialization, ants are randomly positioned within a grid. Similar to classicalMDS (CMDS) method, each ant calculates the Euclidian distance between its ownnormalized data item and that of each of its eight neighbors. This data distancemeasure represents an approximation of the similarity between pairs of data items,even when they contain multidimensional data values. Next, an ant will only considerand summate those ants of which the pair-wise similarity distance is below a specificdata similarity tolerance threshold value t. Value t is conceptually similar to theobject distance measureα⁩n⁣潭m潮⁡湴-扡獥搠捬畳ue物湧n慰ar潡捨敳⸠.o睥癥爬r toriginates from a cellular automata approach in that it is a fixed and discrete value,which generates a Boolean result (either a pair of data objects is “similar enough” ornot) instead of a continuous similarity value (e.g. representing a numerical degree ofsimilarity between pairs of data objects). Depending on the amount of ants in itsneighborhood it considers as ‘similar’, an ant will then decide either to stay put, or tomove. For instance, an ant decides to stay put when it has more than four similarneighbors. The value four was chosen from the experience of cellular automatasimulations, which tend to generate interesting cell constellations for this number.As a result, ants with similar data items group together emergently. However, theseclusters have little visualization value as they only convey the relative amounts ofdata objects. Therefore, a positional swapping rule was introduced that orders clustersinternally as well as globally in respect to data similarity. As a result, diagrams aregenerated that look conceptually similar to those of basic CMDS approaches.3.2 Color NegotiationConceptually, the color of an ant can be considered as the representation of itsassumed data class or data label, so that the resulting diagrams resemble that of (ant-based) data clustering in the domain of data mining, but inherit the additionalcapability of being spatially and visually ordered. At initialization, all ants areassigned an unspecific color (white). At each iteration, ants execute the followingbehavior rules. Each ant that has not been swapped (and thus is probably well placedwithin its neighborhood) and is fully surrounded by eight similar neighbors, considersthe degree of data similarity with all of its neighbors. If this degree is below apredefined, discrete color seed similarity threshold c, it will request the system to beassigned a unique color. As a result, such ants will act as initial ‘color seeds’. Allother ants will consider whether their neighborhood contains four or more dataobjects that are smaller than t but larger than c. If so, such ant is ‘satisfied’ with itscurrent position and will adopt the color of the most similar ant in its neighborhood.In practice, once colors are introduced within the grid, they will spread gradually overthe ant population. Once the collection of ants is sufficiently ordered, several colorseeds become introduced. Because of the multitude of pair-wise interactions, anysurplus of colors (in respect to data clusters) will disappear, while any shortfall ofcolors will reemerge once a potential seed is surrounded by eight neighbors.However, data clusters that contain less than nine members in a dataset cannot berecognized. In some cases, ants continuously ‘swap’ from one color to another, withina single visual cluster. This dynamic phenomenon generally indicates that thepositional rules could not accurately spatially group two different data types, whichnonetheless were recognized by the color clustering rule. A future research directioncould consist of inversely informing the positional clustering of the color label values.3.3 Size NegotiationInstead of mapping a data value to a specific shape size, each ant can map one of itsdata attributes onto its size by negotiating with its neighbors. Conceptually, the size ofan agent does not necessarily correspond to the ‘exact’ value of that data attribute, butrather how a data value locally relates to its neighborhood, and therefore whetherclusters are homogeneous in respect of a specific data attribute. Because no direct,predefined mapping rule between value and visual cue exists, the shape size scale canautomatically adapt to any data scale, in an autonomous and self-organizing way.For each iteration step, the visual shape size of an ant is determined by followinginducements. First, an ant A chooses a random neighboring ant B with whom itcompares its one-dimensional data value DAand circular radius size SA, measured inscreen pixels. Step size P is a predefined amount of pixels. Ant A evaluates whetherits radius versus data value ratio is similar to that of ant B, and adapts its own as wellas its neighbor’s shape size accordingly. If, in comparison to ant B, its size SAis toolarge in relation to its data value DA, it will decide to ‘shrink’ by decreasing itsamount of available pixels with P pixels, and then provides these P pixels to ant B.⎪⎪⎩⎪⎪⎨⎧⎩⎨⎧−=+=⇒<⎩⎨⎧+=−=⇒>PSSPSSDDSSPSSPSSDDSSBBAAABBABBAAABBA..

⎪⎪⎩⎪⎪⎨⎧⎩⎨⎧+=+=⇒<⎩⎨⎧−=−=⇒>PSSPSSSSPSSPSSSSBBAAABBAAAminmax

(1)

These rules assure that no visual overlapping of ant shapes can occur. Anadditional rule checks whether ants do not grow too large or too small: when an antbecomes too large, it will ‘punish’ and shrink its neighbor, so that in the future, this‘action’ will not longer be required. This constraint will emergently ‘detect’ the upperand lower shape size boundaries according to the data scale, and spreads throughoutall ants. Because all ants are complying with these rules in random directions and overmultiple iterations, a stable constellation of shape sizes appears in an emergent way.3.6 Performance MeasurementsA simple performance graph informs users of the actual visualization state. Thenumber of similar ants in each ant neighborhood is squared for each ant, andsummated over all ants. The visualization efficiency over time corresponds to theslope of the according graph: once a plateau value has been reached over a number ofiterations, the visualization has reached a stable state and can be halted. Figure 1captures the clustering performance of the ‘Thyroid’ dataset depending on varyingvariables similarity tolerance threshold value t (vertical) and color seed similaritytreshold c, for the different amounts of iterations and different initial seeds. These‘solution space’ diagrams enable users to pick the most appropriate variable values.The diagrams illustrate the ‘hotspots’ of effective clustering values, and the limitedinfluence of the amount of iterations and the random initialization seeds on the qualityof the results. Each initialization seed will result in different constellations and thusclustering error rates (see Table 1 for standard deviation).

4 Application4.1 Case StudiesThe synthetic dataset visualized in Figure 2 consists of 500 data items with two datadimensions. Data objects and classes are distributed using a Gaussian distributionfunction to demonstrate the data scaling, color negotiation and size negotiationcapabilities. The color negotiation successfully resulted in four distinct clusters ordata class labels. As shown by the highlighted ants, the clusters are internally ordered:data items that are similar in data space, are positioned nearby each other invisualization space. Also, the clusters are globally ordered: clusters that are dissimilarin data space, have no ‘common’ borders in visualization space. For instance, thepurple and yellow clusters (or blue and green) have no common orthogonally directedborders and have empty cells between their borders in visualization space, as they arediagonally positioned in data space, and thus have a larger ‘global’ data distance.

Fig. 2. Visualization with color and shape negotiation, size representing the 1stdata attribute(left), and data scatterplot (right), on which the 1stdata attribute is mapped on the X-axis.Corresponding ants (left) and data items (right) are highlighted in red, cyan and white.The display of different circular shape sizes enables the user to understand how asingle data attribute is distributed over the clustering representation. For instance, thethree largest ants (highlighted in red) are positioned within an outlying green sub-cluster (see Fig. 2). The ants highlighted in cyan and white show that the smallest antsin shape size correspond to those ants with the smallest data value for that attribute.Figure 3 shows two different clustering techniques of the car dataset, containing 38items and 7 data dimensions, as taken from [19]. On the left, the multidimensionalscaling technique positioned the cars in three apparent clusters (the color coding wasartificially added by the authors for visual clarification). The cellular ant method, onthe right, positioned the cars in a single visual cluster, but recognized 3 separate classlabels that roughly correspond to those apparent clusters.

Fig. 3. The car dataset represented by MDS (left, based on [19], color coding artificially addedby the authors) and the Cellular Ants representation with color negotiation (right).Figure 4 illustrates how shape size negotiation is used to clarify data dependenciesfor high-dimensional datasets, without prior knowledge of the data scale and withoutusing any predefined data mapping rules. Figure 4 uses the same representation asFigure 3, and maps a single data attribute to the decentralized shape size negotiation.As a result, one can investigate how the clusters are internally ordered for differentattributes. Here, it shows the relative dominance of the cylinder count and MPGwithin specific clusters, and some cars visually stand out within the formed clusters.

Fig. 4. A Cellular Ant representation in a toroidal grid using color and shape negotiation. Dataattribute represented by the shape size: cylinders (left) and Miles per Gallon (MPG) (right).The cellular ant method has been evaluated with typical benchmarking data, suchas the IRIS dataset. The iteration timeline in Figure 5 (left) shows how several colorswere introduced, but only three remained. In effect, the IRIS dataset is clustered intwo distinct visual clusters, but the color negotiation recognizes that three differentdata classes exist, of which two are very similar. This interplay between visual andspatial clustering contains a high visualization value. The figure shows a momentarysnapshot only: during the simulation the orange and yellow colors take over ants fromeach other. Using shape negotiation, one can investigate how a data attribute isrelatively distributed over a cluster. As shown in Figure 5 (right), subclusters of highor low data values are made apparent, demonstrating the ordering power of theswapping rule. For instance, one can perceive that for attribute 4, the yellow type(Virginia) has larger data values than the orange one (Versacolor), and that thisattribute is very volatile for the red type (Setosa) (varying between values 0.1 and 0.3)when considering their relative numerical proportion to one another within the cluster.

Fig. 5. Clustered IRIS dataset (150 data items, 4 attributes, 3 clusters, 1821 iterations) in atoroidal grid. Left: iteration timeline. Middle: resulting spatial clustering with color negotiation.Right: same result, with shape size negotiation for attribute 4.Table 1 lists the performance of the color negotiation (or data classification) forvarious standard benchmarking datasets, after executing the cellular ant algorithmover 50 runs, each with a different, random initialization seed. The clustering errorrate is calculated by counting the ants with correct colors over the whole population,and dividing this summation by the total amount of ants. In general, these results areworse but relatively similar to comparable clustering methods, such as reported in [6].

#ClustersClustering ErrorDatasets#Objects#Attributes#ClustersAverage[std]Average[std]Iris 150 4 3 2.68 [0.65]0.37 [0.11]Pima 768 8 2 1.14 [0.35]0.36 [0.04]Thyroid 215 5 3 3.45 [0.80]0.41 [0.14]Table 1. Performance measurements of the color negotiation method for differentbenchmarking datasets. Averages are taken over 50 runs, each with a different random seed.5 DiscussionThe performance of the current implementation depends on two variables: the datasimilarity tolerance threshold t and color seed similarity threshold c. The ant density(or the grid size determined by dividing the available cells in the grid by the datasetsize) has been kept constant at about 75%. Similarly to the object distance measureα=in⁣潭m潮⁡nt-扡獥搠捬usteri湧⁡p灲潡oh敳Ⱐ,he⁯ptim慬⁶慬u攠ef= t and c cannot bedetermined without prior knowledge of the dataset, unless the value is adaptable [20].We consider the current implementation as a simple proof-of-concept prototype,and kept its implementation as simple as possible. Therefore, the scaling andclustering performance of the cellular ant method is not that effective as existingMDS methods. Its first aim is not to compete with alternative approaches, but ratherto be considered as an early prototype towards more powerful cellular automataclustering algorithms, or towards data visualizations that are emergent and self-adaptive. As shown in the diagrams, the combination of spatial clustering with dataclass clustering can result in visual representations that are meaningful and useful.Following aspects can also be considered.• Performance. In its current simple form of implementation, the amount ofrequired iterations seems to be similar with comparable approaches in the field ofant-based data mining. However, the ‘data-to-ant’ model always requires lessiteration steps because all data objects are able to move to increasingly idealpositions simultaneously. Similarly to existing ant-based data miningoptimizations, the clustering performance could be addressed with increasing thedata similarity cap value over time, so that clusters grow more rapidly and steadily.The method requires a considerable amount of calculations, as each ant is requiredto calculate many pair-wise dependencies for each iteration step.• Clustering Quality. Grid density influences the clustering quality in two ways.Small grid densities do not assure that ants with equal colors (data labels) will berepresented in a single spatial cluster, because two or more clusters might emergewithout ever ‘touching’ and ‘merging’. Too dense grids generate single, largegroups with diverse labels and thus little visualization value.• Simplicity. The current behavior rules have been kept as simple as possible, todemonstrate the potential value of the cellular automata-like decentralizednegotiation for data mining and data visualization purposes. Further calculationoptimization or solutions towards data size scalability can be accomplished byconsidering a combination of following three approaches: 1) real-time dataoptimization, including data approximation and gradual data streaming, 2) agentadaptation, which includes the distribution or balancing of loads between multipleagents, and 3) agent cooperation, by generating adaptive coalition formations of‘super agents’ that have similar objectives, experience or goals.6. ConclusionThis paper presented two new features of the cellular ant method: color (dataclustering) and shape size negotiation. It combines ant-based data mining algorithmswith cellular automata insights, or data scaling with data clustering to derive anapproach that is capable of representing multidimensional datasets. The resultingdiagrams are visually similar to those of ant-based data mining clustering approaches.However, the clusters are also similar to multi-dimensional scaling images, as theyare ordered internally as well as globally over multiple data dimensions. As a simpleprototype towards self-organizing visualization, inter-agent negotiations determinetypical visual cues, such as position, color and size, depending on multidimensionaldata properties. Color negotiation can recognize data clusters of similar type. Shapesize negotiation displays the relative distribution of a single data attribute and theinternal structure of clusters. Conceptually, the self-adaptive, unsupervised datamapping process of the cellular ants proposes a conceptual alternative to the commonfixed data mapping rules that are based on preconceived dataset assumptions.Some of the limitations of method are caused by the simplicity of the rule-basedapproach, and its dependency on fixed, discrete cellular automata characteristics,instead of more continuous probability functions. Several optimizations can beaccomplished, for instance by altering the data similarity tolerance threshold overtime, or by informing agents of the global effectiveness. As a simple prototype, itdemonstrates a potential future in which data visualization agents are capable ofautonomously detecting complex data patterns and proactively acting upon them tomake underlying data phenomena more visually apparent and the perceptual andcognitive understanding by humans more effective.References1. Vande Moere, A., Clayden, J.J.: Cellular Ants: Combining Ant-Based Clustering withCellular Automata. International Conference on Tools with Artificial Intelligence(ICTAI'05). IEEE (2005) 177-1842. Deneubourg, J., Goss, S., Franks, N., A., S.F., Detrain, C., Chretian, L.: The Dynamics ofCollective Sorting: Robot-Like Ants and Ant-Like Robots. From Animals to Animats: 1stInternational Conference on Simulation of Adaptative Behaviour (1990) 356-3633. Schockaert, S., De Cock, M., Cornelis, C., Kerre, E.E.: Fuzzy Ant Based Clustering. LectureNotes in Computer Science 3172 (2004) 342-3494. Handl, J., Knowles, J., Dorigo, M.: Ant-Based Clustering and Topographic Mapping.Artificial Life 12 (2005) 35-615. Ramos, V., Abraham, A.: Evolving a Stigmergic Self-Organized Data-Mining. InternationalConference on Intelligent Systems, Design and Applications, Budapest, (2004) 725-7306. Labroche, N., Monmarché, N., Venturini, G.: A New Clustering Algorithm Based on theChemical Recognition System of Ants. European Conference on Artificial Intelligence,Lyon, France (2003) 345-3497. Torgerson, W.S.: Multidimensional Scaling. Psychometrika 17 (1952) 401-4198. Bentley, C.L., Ward, M.O.: Animating Multidimensional Scaling to Visualize N-Dimensional Data Sets. Symposium on Information Visualization. IEEE (1996) 72 -739. Ankerst, M., Berchtold, S., Keim, D.A.: Similarity Clustering of Dimensions for anEnhanced Visualization of Multidimensional Data. Symposium on InformationVisualization (Infovis'98). IEEE (1998) 52-6010. Kohonen, T.: The Self-Organizing Map. Proceedings of the IEEE 78 (1990) 1464-148011. Von Neumann, J.: Theory of Self-Reproducing Automata. University of Illinois Press,Illinois (1966)12. Chen, L., Xu, X., Chen, Y., He, P.: A Novel Ant Clustering Algorithm Based on CellularAutomata. Conference of the Intelligent Agent Technology (IAT'04). IEEE (2004) 148-15413. Schroeder, M., Noy, P.: Multi-Agent Visualisation based on Multivariate Data. InternationalConference on Autonomous Agents. ACM Press, Montreal, Quebec, Canada (2001) 85-9114. Pham, B., Brown, R.: Multi-Agent Approach for Visualisation of Fuzzy Systems. LectureNotes in Computer Science 2659 (2003) 995-100415. Healey, C.G., Amant, R.S., Chang, J.: Assisted Visualization of E-Commerce AuctionAgents. Graphics Interface 2001. Canadian Information Processing, Ottawa, (2001) 201-20816. Ebert, A., Bender, M., Barthel, H., Divivier, A.: Tuning a Component-based VisualizationSystem Architecture by Agents. International Symposium on Smart Graphics. Hawthorne,IBM T.J. Watson Research Center (2001)17. Roard, N., Jones, M.W.: Agent Based Visualization and Strategies. Conference in CentralEurope on Computer Graphics, Visualization and Computer Vision (WSCG), Pilzen (2006)18. Vande Moere, A.: Time-Varying Data Visualization using Information Flocking Boids.Symposium on Information Visualization (Infovis'04). IEEE, Austin, USA (2004) 97-10419. Wojciech, B.: Multivariate Visualization Techniques. Vol. 2006 (2001)20. Handl, J., Meyer, B.: Improved Ant-based Clustering and Sorting in a Document RetrievalInterface. International Conference on Parallel Problem Solving from Nature (PPSN VII)LNCS 2439 (2002) 913-923