V.R.Kanagavalli a,*, Dr. K.Raja b,1 Abstract - The information is presented, transferred and shared using natural language even by nave users. The biggest challenge and research area has been to enable machines understand and decipher what has been communicated to it through natural language. Free form text documents are found aplenty in this information era. In spite of various other mechanisms people have always found expressing their idea using unstructured manner than structured manner. Also knowledge acquisition is necessarily from text corpus which may contain scientific concepts, historical happenings, literature reviews, product reviews, tourist reports, environment impact reports etc. Event reporting in case of disasters or in case of special occasions is also generally done using free form text rather than structured methods since it allows more detailed descriptions to be added in. Much of these afore said text documents acting as an information source and the query posed by the user implicitly have a geographic or spatial reference component present in it. This logically leads to the conclusion by the previous studies that more than 80% of the searches are pertaining to geographic locations. Text documents imply the usage of natural language and as such it yields to explicit vague fuzzy descriptions involving linguistic terms such as near to, far from, to the east of, very close and also implicit vague spatial references. Fuzzy logic is an extension to the Boolean crisp logic to accommodate for the fuzziness of an element belonging to a set. This paper studies the feasibility of fuzzy logic techniques in resolving the spatial uncertainty in text.Index Terms Spatial Uncertainty, Fuzzy Logic,

Possibility Theory, Granulation. I. INTRODUCTION The information stored in multiple forms and sources are bound to be processed for extracting the knowledge component out of it. The searching process of documents is basically for finding out the basic concepts, working methodology, availability and location/availability of a product, natural entities or an event.

The searching of information could be from variety of sources such as image databases, web servers, multimedia servers, spread sheets, geo/spatial databases, blogs, discussion forms, social websites and text documents that exist both in structured and unstructured format. Though there are multitudes of types of information, the growing need and anxiety for sharing information has made the information either as text oriented or as multimedia oriented since people try to add their experiences in the form of free flowing text or as audio/video files. The assimilation of information from multimedia format is less ambiguous whereas knowledge extraction from text has ambiguity, vagueness and uncertainty as an inherent feature. Apart from customer reviews, tourist feedbacks, personal and official blog sites there is a repository of information in text corpus in the form of journals, encyclopedia articles, books, technical reports, environmental impact reports, laws and legislation. Much of the information consists of spatial components as part of it. For instance the happening of an event, availability of natural entity, management and allocation of resources, analyzing historical documents are always tied up to a spatial location. Understanding the spatial references in text would be applicable in various domains like disaster management, Tourism, Archeology, Environmental preservation, laws and legislation [1][2][6]. The handling of spatial terms becomes complex in a document if it consists of fuzzy and ambiguous statements like in front of, at the back side of, to the left of, to the right of, close to the east of Estancia, quite near the bus stop, far away from the residential area, in and around Chennai, behind the Nehru Park, Just before, Few steps away from, at the stones throw away, downtown etc., Information retrieval deals with the problem of finding relevant documents in a collection whereas information extraction identifies relevant text in a document. In the proposed work we deal with the inherent uncertainty arising in the natural language description of events or spatial entities. In this work we focus on information retrieval and information extraction from text documents and resolving spatial uncertainty using fuzzy logic techniques. Our work is more than information retrieval and comes into the periphery of fuzzy knowledge management since the results of the queries are analyzed, organized and reduced to a form that is easy to understand. II. FUZZY LOGIC Fuzzy theory, proposed by Zadeh in 1965 [16], is another good way to deal with vagueness and uncertainty arising from human linguistic labels. The concept of fuzzy set extends the notion of a regular crisp set and expresses classes with ill-defined boundaries such as young, good, and important, etc. Within this framework, there is a gradual rather than sharp transition between non-membership and 22

full membership. A degree of membership in the interval [0, 1] is associated with every element in the universal set X. Such a membership assigning function (A: X [0, 1]) is called a membership function and the set (A) defined by it is called a fuzzy set [4]. IF X is a collection of objects denoted by x, then a fuzzy set A in X is defined as a set of ordered pairs: A= {(x, A(x)/x X} (1)

III. UNCERTAINITY Uncertainty is a partial belief that arises whenever we lack the certainty of existence or occurrence or happening of an object or an event. Degrees of uncertainty are clearly a higher level notion, higher than degrees of truth. When the state of facts is complete, computing the degrees of belief leads to the computation of degrees of truth [21]. Uncertainty in a situation can arise from vagueness (or fuzziness) and ambiguity. The vagueness is the result of lack of clarity of the class to which an element belongs. In other words, the membership of the element to a class is crisp but the set is a fuzzy set (crisp membership to a fuzzy set). Ambiguity arises when there is a fuzzy membership to a crisp set [23]. The authors of [26] have proposed a probability measure for fuzzy sets for resolving the uncertainty or lack of knowledge in the part of the observer. Their argument is based on Laplaces Genie which is no t uncertain about the membership of an element to a set; all the sets are crisp, only the human lack the knowledge about the membership of the element. They argue that the probability measure of a fuzzy set A is proportional to the expected value of the product of membership function provided by the expert and the users probability of natures classification of each x, an element belonging to the subsets of the fuzzy set A. Uncertainty propagates either in collection of data (in case of maps/ satellite images/ images) or expression of data in case of text descriptions. An example for vague spatial expression would be a large dump yard is formed in the periphery of the Major city whereas the spatial expression A big round object is found in the NH4 carries ambiguity in it. The first step in resolving uncertainty is to analyze the nature of uncertainty, whether it is objective or subjective. The objective uncertainty is inherent in nature and hence cannot be eliminated completely even after adding obtaining more information about it. The subjective uncertainty arises due to the lack of knowledge (ambiguity / vagueness) about the event/ natural entity and hence can be reduced by obtaining more information about the same. 3.1 Uncertainty in spatial expressions The natural language expression describing about an entity (either spatial or non-spatial) is said to be spatial expression if it carries qualitative or quantitative spatial information about the entity. The quantitative spatial expressions are easy to detect and decipher whereas qualitative spatial information poses challenges in detecting and deciphering. Qualitative spatial information could be unary or binary relation depending upon the number of entities involved in the spatial expression. The unary spatial relations would be the absolute orientation, length and geometric shapes. The binary spatial relations can be directional, relative orientation, distance information, neighborhood information and topological information. The inherent uncertainty in these qualitative spatial expressions is due to the variation in users perception of the spatial entities or events and due to the domination of qualitative spatial information present in the natural language description. Apart from the qualitative nature of 23

The specification of membership function is subjective which differs due to the individuals perception of the abstract concept and hence is very much different from the randomness associated with probability. Fuzzy memberships represent similarities of objects to imprecisely defined properties whereas probabilities convey assessments of relative frequencies [T2]. Fuzzy Linguistic Variables was proposed by Zadeh on the basis of principle of incompatibility which states As the complexity of a system increases, our ability to make precise and yet significant statements about its behavior diminishes until a threshold is reached beyond which precision and significance become almost mutually exclusive characteristics. A fuzzy if-then rule assumes the form If x is A then y is B (2)

Where A and B are linguistic values defined by fuzzy sets on universes of discourse X and Y, respectively. x is A is called the antecedent or premise while y is B is called the consequence or conclusion. Fuzzy reasoning or approximate reasoning is an inference procedure that derives conclusion from a set of fuzzy if-then rules and known facts. It is similar to Modus Ponens rule applied in propositional logic. If A, A and B are fuzzy sets of X, X and Y, and AB is expressed as a fuzzy relation R on X , then the fuzzy B induced by x is A and the fuzzy rule if X is A then y is B is defined by B(y)= maxx min[A(x), R(x,y)] (3)

The method of fuzzy reasoning is based on four steps, Find the degrees of compatibility by comparing the known facts with the antecedents of fuzzy rules Form the firing strength that indicates the degree t which the antecedent part of the rule is satisfied, Generate a qualified consequent membership functions which represents how the firing strength gets propagated and used in a fuzzy implication statement. Obtain the overall output membership function aggregates all the qualified consequence membership functions to obtain an overall output membership function.

Journal of Computer Applications (JCA) ISSN: 0974-1925, Volume VI, Issue 2, 2013 the spatial information, the presence of homonyms and synonyms add to the ambiguity and uncertainty present in the text description. The identification of the spatial component in the text relies upon the available information or the knowledge base, especially in the case of special names given to certain spatial locations (City of temples, Capital of Tamilnadu, uranga nagaram (Madurai), monument of love in the city = tajmahal if the city of interest or discussion is Delhi) which encompasses the users knowledge about the locality and also highly context dependent. These implicitly mentioned spatial locations poses challenge to the geo indexing process. The presence of spatial description or spatial expressions is inherent in many applications. The current trend is to make machines understand the human language and intention to increase user friendliness of the devices. It is easier for the human to express their spatial knowledge in their native natural language than to provide the same in latitude, longitude pair to the devices in applications like robotics, location based services, logistics. Geospatial reasoning and understanding natural language directions has been an active research area in recent times [7] [8] [9]. The ambiguity arising out of homonymy in the spatial terms is resolved by assigning the unique geographic coordinates [10]. The various methods of handling uncertainty such as Bayesian inference, fuzzy sets, fuzzy logic, possibility theory, time Petri nets, evidence theory, and rough sets are discussed by the authors of [18]. Walley has presented a comparison of the four measures of uncertainty in expert systems including Bayesian probabilities, coherent lower and upper precisions, belief functions of evidence theory, and possibility measures in fuzzy theory [19]. Klir studied uncertainty and information as a foundation of generalized information theory [20]. A summary of the most used techniques is presented by the authors of [17]. The theories most used are probability theory, fuzzy theory, derived uncertainty theory and info-gap theory. Probability theory handles objective uncertainty and subjective uncertainty using statistics. Monte Carlo method, Bayesian method, and DempsterShafer evidence theory are the methods extended from the existing probability theory. Probability theory is based on the law of excluded middle, thus an element either belongs to the set A or A but not to both sets. Fuzzy theory handles subjective uncertainty at ease. Ambiguity arising out of natural language expressions can be handled through fuzzy sets with membership functions. IV. INFORMATION RETRIEVAL AND EXTRACTION Information Retrieval techniques find relevant documents from a text corpus (both structured and unstructured), web databases, multimedia databases etc., whereas information extraction identifies useful content (text, image, pixels) in a document. The information retrieval is used to answer queries relating to a fact, procedure or references. It is used in many academic, scientific, business, administrative applications like building a citation databases, web advertising, bio-informatics, news tracking, customer feedback analysis, opinion mining, The basic activities in retrieving documents from the text corpus are indexing, querying, comparison, and feedback. The major text processing steps include stemming, lemmatization, part of speech tagging (POS), syntactic analysis and Named Entity Recognition (NER). In our work we are concentrating on the spatial references posed to the collection of documents. Some of the spatial information retrieval techniques are Machine learning techniques, Geographic thesaurus and gazetteer, Texts geo-tagging and geo footprints and Geographic ontology. The issue with spatial cognition through information retrieval from text documents is that it is highly domain specific since the gazetteer and thesaurus that can be designed cannot be all encompassing and exhaustive in nature [5]. The information extracted from the text is used to answer queries about the relationships between events/ entities and places. The challenge in spatial information extraction is to minimize the errors that occur while extracting entity or place names from the text which would otherwise be propagated into the database. Though there are various metrics like the coverage of the collection, system response time, the form of the presentation of the output, user efforts involved in obtaining answers to a query, recall and precision, the last two are most frequently used. The authors of [29] list out the various performance metrics used in information retrieval. Precision is the proportion of retrieved documents that are actually relevant to the query posed by the user and is calculated as follows. Precision (P) = (4) Recall is the proportion of documents that are actually retrieved from the set of relevant documents. Recall (R) = (5) Fall out measure is defined as the Proportion of non-relevant documents that are retrieved Fallout measure = (6)

F-measure is another popular measure in the field of information retrieval and information extraction combines recall and precision in a single measure F-measure = 2*(R*P)/(R + P) (7) Average Precision is yet another measure that combines precision, relevance ranking, and recall Average Precision = (8) Where rel(dn) is 1 when the document is relevant, 0 otherwise. R-precision is the precision at R R-precision = (9) Where di - variable representing the relevance level of the ith document in the ranked output to a certain query, N number of documents in the corpus. L -the size of the document collection R -the number of relevant documents for a query. V. FUZZY INFORMATION RETRIEVAL Fuzzy information retrieval treats the document as a set of keywords leading to the argument that higher the membership of a keyword to the set, higher the relevance of the document pertaining to the keyword. Also a weight is 24

assigned to the terms in the query which is compared with that of the weight of the term in the document. If they are above the threshold level then the documents are deemed to be relevant to the query [28]. The authors of [27] have proposed an automatic term weighting technique which requires less expertise. The basic idea is to assign weights to the index terms depending upon the four factors, their tf-idf value, and the level of ambiguity present in the term and whether it is connected to any other index term. A set of fuzzy rules is applied to determine the weight of the index term. This authors previous work [15] proposes a method for detecting the presence of spatial uncertainty in the text and dealing with spatial ambiguity using named entity extraction techniques coupled with self learning fuzzy logic techniques. The ontology based methods cannot resolve deictic references since it cannot handle the observers point of view in the narration. The traditional Boolean logic system is inefficient and insufficient in handling the uncertainty and ambiguity associated with natural language descriptions of events involving spatial descriptions based on the human cognition and perception of the event. There are works that use fuzzy logic for handling uncertainty in GIS [11]. These works concentrate on handling the spatial uncertainty that creeps in to the system while data acquisition and data representation. Users may need information levels at various levels which deal with granulation of information. There are works that deal with granularity of objects, locations and actions that are embedded in an event [12] [13] [14]. The granularity at various levels can be achieved by using fuzzy sets. The degree of membership of a location to a known or indexed location can be easily determined using fuzzy membership values assigned to the spatial terms extracted from the text. Fuzzy sets also enable to understand the spatial context at multiple granularity levels, by understanding the type of specified place, specific building which is achieved by framing more rules or membership functions. Fuzzy modeling uses one dimensional feature and can be used in situations where there is little/ no dependency between input variables. Fuzzy membership grades are adept in handling imprecise information and individual fuzzy membership grade can be combined with other membership grades in the reasoning process. In [24] the authors present Cartesian granule features formed over the cross product of fuzzy partition labels. Possibility theory exploits incomplete knowledge present in a system and is of direct relevance to knowledge representation semantics of natural languages, decision analysis and computation with imprecise probabilities [25]. A fuzzy truth value reflects the partial truth and uncertainty about truth. A possibility distribution is used to quantify the extent to which a document satisfies the query involving fuzzy spatial terms. The qualitative representation can be converted to quantitative representation using fuzzy membership functions. The authors of [22] introduced a scheme of representing qualitative spatial information by associating qualitative relations with fuzzy sets. Starting with the concept of absolute distance they have extended the metric notion of proximity to non-metric notions of proximity. They present a fuzzy agglomerative hierarchical clustering 25

algorithm for clustering documents and to get the document cluster centers of document clusters. Fuzzy logic rules are framed based on the document clusters and their document cluster centers. Finally, the constructed fuzzy logic rules are applied to modify the users query for query expansion and to guide the information retrieval system to retrieve documents relevant to the users request. The authors of this work propose the use of fuzzy logic for effective retrieval of ambiguous spatial terms, spatial expressions from the text and also modeling the same. VI. PROPOSED WORK The authors of this work intend to handle documents containing crisp/fuzzy spatial expressions using fuzzy logic techniques since fuzzy set can scale either to crisp sets or act as fuzzy sets as per the needs of the user. There are multitude of techniques like Cartesian granule feature, possibility distribution, fuzzy rules and fuzzy sets for resolving the uncertainty and ambiguity. Fuzzy modeling has the advantage of accommodating domain knowledge and has an inherent ability to deal with the uncertainty associated with the human knowledge that is expressed in natural language. The Fig. 1 shows the proposed architecture depicting the sequence of steps taken to resolve the uncertainty and ambiguity in the text documents. The input to the system would be the text document stored in a corpus and output would be the classification of the spatial terms present in the document with fuzzy membership values assigned to it. The text of each document is split into tokens and the occurrence of unique spatial tokens in the text is tabulated.A ranked list of document with the fuzzified status or relevance of the documents with respect to a spatial query can be generated from the fuzzy membership values of the spatial terms present in the document. Our system handles fuzziness present in linguistic spatial modifiers, spatial terms and also the fuzziness associated with the quantification of the spatial properties. For example, in handling of the spatial expressions such as about 5 km near the temple, roughly 20 yards from the university campus the numbers are identified as fuzzy numbers and related to spatial location. It handles both fuzzy linguistic modifiers and also fuzzy numbers associated with spatial locations. The fuzzy modeling is divided into two stages, the first stage being the identification of the surface structure. We have outlined the general steps associated with the identification of the surface structure, along with the specific tasks carried out in our system to implement the same. Select relevant input and output variables. The input to the system would be the tokens extracted from the text documents. The output would be the membership function or the degree of belonging of the input token to a fuzzy set, Fs. Choose a specific type of fuzzy inference system.

Journal of Computer Applications (JCA) ISSN: 0974-1925, Volume VI, Issue 2, 2013 The linguistic fuzzy rule-based system is used for capturing the qualitative knowledge and hence is adept in handling natural language descriptions. A fuzzy rule is of the general form If x is A then y is B (10) Where x and y are variables and A and B are fuzzy sets; x and A are said to form the antecedent part and y and B are said to form the consequent part of the rule. Our rule based system would comprise of rules that would determine the degree to which the term belong to the spatial category. Fuzzy rules are used here to ascertain the spatial nature of each token extracted. The fuzzy tf-idf is used where each spatial term is assigned more weight if it has a greater membership to the set of spatial terms. The spatial references are partially disambiguated using the manual annotation using the gazetteer and then the most possible spatial references are fed into the next module which interacts with the fuzzy rule base wherein the granularity is fine tuned and the possible locations are extracted from the document set and displayed to the user through an interface. The spatial information may be retrieved from the document or queried by the user for three different purposes, namely, to find the distance between two points or regions in space, to find the route either shortest route or best route depending on their requirement from one point to another point in space or to have a rough estimate of the time needed to travel to a specific point or region in space. The spatial information searched for generally includes the spatial relationships of geometrically defined spatial entities. The spatial information retrieval from text would found out which of the documents in the text corpus has references about a spatial point/ region of interest. Resolving the uncertainty inherent in spatial expressions and spatial references in text documents can be used to provide an efficient and effective question answering systems, discourse analysis, disaster management applications, mobile robotics, finding resources in tourist applications, GIS etc., VII. CONCLUSION AND FUTURE WORK The authors are retrieving the documents from the text corpus which are relevant to the spatial queries. The order and ranking of the documents retrieved are different from the traditional Boolean ranking since the documents are ranked on the basis of relevance using fuzzy logic techniques. The fuzzy membership functions determine the spatial relevance of the documents and the fuzzy rules decide the relevance. The spatial similarity between two documents is also evaluated on basis of the fuzzy rule base. The granularity of the query and the spatial information present in the text are used to resolve the uncertainty of the spatial information. Granularity can be adjusted as per user requirements by modifying or adding more rules using the user interface provided in the proposed system. Possibility functions, Fuzzy logic techniques are used to model the uncertainty of the spatial information present in the text instead of the probability logic. The limitations of this work is that it would answer spatial queries involving spatial attributes only and not spatial queries involving geometric shapes since it is querying the textual data and not the spatial database. Also, the proposed system handles only point in polygon queries, region are answered. The future work of the author involves solving the multimedia queries to an extent. The path queries cannot be handled since it requires network information which is generally not found in text corpus. Distance and buffer zone queries also require information that is usually not found in the text corpus. REFERENCES[1]C. B. Jones, Ross. S. Purves, 2008. Geographical Information Retrieval, International Journal of Geographical Information Science, 22(3) [2]S. Kikuchi et al., Place of possibility theory in transportation analysis. Transportation Research Part B 2006. Elsevier [3]Rock, Nathaniel Robert."Mapping geospatial events based on extracted spatial information from web documents." master's thesis, University of Iowa, 2011.http://ir.uiowa.edu/etd/1068 [4]George J. Klir and Bo Yuan. Fuzzy sets and Fuzzy logic, Theory and applications, [5]Debra, Rajiv Chopra, Rohini Srihari. Domain Specific Understanding of Spatial Expressions. citeseerx.ist.psu.edu / viewdoc / download ?doi=10 [6] Kate Byrne and Ewan Klein.Automatic Extraction of Archaeological Eventsfrom Text. May 2009. [7] Hans w. guesgen. Reasoning About Distance Based on Fuzzy Sets. Applied Intelligence 17, 265270, 2002 [8] Thomas Kollar et al., Toward Understanding Natural Language Directions. Naval Research [9] Geospatial reasoning in a Natural Language Processing (NLP) Environment. Bitters B.

knowledge based systems, Natural Language Processing. Dr. K. Raja is presently working as Dean (academic) at Alpha College of Engineering, Chennai since Dec 2012. He acted as Principal, NSIT, Salem from Nov 2008 to Nov 2013. He completed his B. Sc (Mathematics) from Madras University,1989, B.E ( Computer Science & Engineering) and P.G. Diploma in Personnel Management from Annamalai University, 1993, M.B.A from Madurai Kamarajar University, 1997, M. E (Computer Science & Engineering) from Madras University, 2001, Ph. D degree in Knowledge based systems from Sathyabama University, India, 2006 . M. Phil (Human Resource Management) from Annamalai University, 2007. MLIS (Master in Library Information Science) in the year 2011 from Annamalai University. He is a life member in various professional bodies like Institution of Engineers (India), Computer Society of India, and Indian Society for Technical Education, International Association of Computer Science and Information Technology etc., He has more than two decades of teaching experience. He has published various research papers and participated in various national, international conferences. He is a reviewer in various National & International Journals. His areas of interest are Knowledge Based Systems, Knowledge Management, Technology Management, Computer Networks, System Software, Software Engineering, Network Security and Data Structures and Algorithms, HR and Quality Systems.

BIOGRAPHYV.R. Kanagavalli received her B. Sc. (Mathematics) from Madras University in 1995, MCA from Bharathidasan University in 1998, M. Phil from Bharathidasan University in 2006. She is currently pursuing her doctoral degree at Sathyabama University, Chennai. She is an Associate professor at Sri Sai Ram Engineering College, Chennai. She has around 16 years of teaching experience. She has published papers in national and international journals and conferences. Her areas of interest include