Extensions to Linguistic Summaries Indicators based on Neutrosophic Theory

The quick development of the markets and companies, especially those that apply information technology, has made it easy to store a large volume of digital information. Nevertheless, the extraction of potentially useful knowledge is difficult; also could not be easily understandable by humans. One of the techniques applied to the solution to this problem is the linguistic data summarizations, whose objective is to discover knowledge to extract patterns from databases, from which are generated explicit and concise summaries. Another important element of the linguistic summaries is the indicators (T) for their evaluation proposed by Zadeh when including linguistic terms evaluation in fuzzy sets. However, these indicators not include the analysis in indeterminate sets. In this paper, it is discussed the use of linguistic data summarization in project management environments and new T indicators are proposed including neutrosophic sets with single value neutrosophic numbers. Authors evaluate T-values proposed by Zadeh and T-values based on neutrosophic theory in the evaluation of linguistic summaries recovered.

Abstract. The quick development of the markets and companies, especially those that apply information technology, has made
it easy to store a large volume of digital information. Nevertheless, the extraction of potentially useful knowledge is difficult;
also could not be easily understandable by humans. One of the techniques applied to the solution to this problem is the linguis-
tic data summarizations, whose objective is to discover knowledge to extract patterns from databases, from which are gener-
ated explicit and concise summaries. Another important element of the linguistic summaries is the indicators (T) for their
evaluation proposed by Zadeh when including linguistic terms evaluation in fuzzy sets. However, these indicators not include
the analysis in indeterminate sets. In this paper, it is discussed the use of linguistic data summarization in project management
environments and new T indicators are proposed including neutrosophic sets with single value neutrosophic numbers. Authors
evaluate T-values proposed by Zadeh and T-values based on neutrosophic theory in the evaluation of linguistic summaries re-
covered.

1 Introduction
The market growth, even in the digital world, has led to the availability of a large volume of data, in different
formats and from various sources. Unfortunately, while greater are data volumes, the more difficult is interpreta-
tion. The important information of those data is non-trivial dependencies, which are encoded. These dependen-
cies usually are hidden; their discovery requires some intelligence.
In general, many companies have limitations in data analysis that affect their decisions. Making decision
problems can be classified in structured and not structured. Structured decision-making problems have defined
methods for solutions and they are supported by procedures and rules. In another hand, not structured decision
making, resolve low frequency problems that need specific solutions. Examples of not structure problems are al-
ternative selection [1] [2] [3], diagnostics, prediction [4] [5] [6], prognosis, a classification, machine learning and
data mining [7]. In the context of this paper, authors focus in a data mining problem by using linguistic data
summarization (LDS) techniques.
Frequently, companies have large databases, that contains heterogeneous data and difficult to understand. In
this context, Kacprzyk [8] and other authors develop linguistic data summarization algorithms. This technique is
oriented to produce linguistic summaries in natural language from numeric data. Besides, it will help the
organization to solve the dilemma rich data poor information for making decisions.
About linguistic data summarization, Kacprzyk and Zadrożny [8] said “data summarization is one of the
basic capabilities that is now needed by any “intelligent” system that is mean to operate in real life”. They define
a set of six protoforms that describe the structure of the linguistic summaries and queries for their search [9], see
Table 1. All summaries are represented in the following two basic structures:

In order to evaluate summaries, authors develop different T measures [8][11], proposed: degree of truth,
degree of imprecision, degree of coverage and the length of the summary.
 Degree of truth (T1): evaluates the truth of summary based on the object's membership to the summary
and to summary's quantifier.
 Degree of imprecision (T2): calculates the summary vagueness degree and summary imprecision´s de-
gree by considering the alternative values for each summarizer.
 Degree of coverage (T3): calculates the objects relative frequency that belongs to summarizer´s fuzzy
set and to the filter´s fuzzy sets.
 Degree of appropriateness (T4): measures the usefulness of the summary, combining the coincidence
relative frequency between the objects, and the summary with the degree of coverage. This measure re-
ports low values with high values of coverage degrees.
 Length of summary (T5): measure to get the summary length based on variables number implicated on
it. Very large summaries with a high length usually are incompressible.
 Degree of validity (T6): measure to combine the rest of T values based on OWA aggregators.

But some of these measures fail when in a database there are objects with a high vagueness that such as ob-
jects with a high level of neutrality respect to the specific fuzzy sets memberships.
Traditional linguistic data summarization techniques do not consider the neutrality on data. The creation of
new measures by considering the linguistic summaries neutrality can help to select the best summaries to make
decisions. Neutrosophic numbers theory extends the fuzzy logic theory and helps improve neutrality treatment.
Neutrosophy was introduced by Smarandache in 1995 [12] and in this theory is essentially the definition of
neutrosophic sets defined by Smarandache and Wang et al. in [13]. The use of neutrosophic theory in linguistic
data summarization techniques allows the introduction of concepts of indetermination; also, improve the inter-
pretability of the summaries [14].
The aim of this work extends measures to evaluate linguistic summaries by considering different elements of
neutrosophic theory. In the work are applied the new measures and traditional measures to evaluate linguistic
summaries on project management environment.
The remaining of the paper is structured as follows. Section 2 describes preliminary concepts and notation of
linguistic data summarization techniques and neutrosophic theory. In section 3, the authors present different ex-
tensions to traditional measures proposed by Zadeh; they introduce neutrosophic sets and other concepts into the
computation of quality´s measures of the linguistic summaries. In section 4, authors compare the results of
measures in a project management environment. Finally, the paper ends with conclusions and further work rec-
ommendations in Section 5.

2 Preliminary concepts and notation
In this section, authors present preliminary concepts associated with this work. The first subsection dis-

cusses in details the measures proposed for Zadeh to evaluate linguistic summaries. In second subsection authors
present preliminary concepts associated to neutrosophic theory useful in linguistic data summarization environ-
ments.
2.2 Measures to evaluate the linguistic summaries quality, based on traditional fuzzy sets
Different authors have been proposed measures to evaluate linguistic summaries quality. In this sense, a set
of measures proposed by Zadeh are well known. In this section, the measures proposed by Zadeh [15] are
described in details.
The degree of truth (T1) is a measure of how much data supports a linguistic summary.
For summaries with the structure “Qy’s are S” can be used equation (1), while for summaries with structure
“QRy’s are S” when R is a filter, can be used equation (2).

1 n

T Qy´s are S =  Q   S (y i ) 
n i 1  (1)

T QRy´s are S =  Q r  (2)

Where

n


i 1
R (y i )   S (y i ) (3)
r= n


i 1
R (y i )

Degree of imprecision (T2) is a useful validity criterion. Basically, a vague linguistic summary has a T2 with
a very high degree of truth, but it is not a relevant summary (for example, on almost projects with low-
performance indicators are bad evaluated).

T2 = 1 - m  in S 
j 1,..., m
j (4)

Where m is the implicated summarizers number in the summary and in(Sj) is defined as:

in S j  =

card x  X j :  S j  x     (5)
cardX j

Equation (5) measure the cardinality of the corresponding set and all Xj domains. That is, the more "flat" the
diffuse Sj set is, the higher the value of in (Sj).
The degree of imprecision T2 depends on the summary form; its calculation does not involve all records on
the database, for this reason, does not require searching the database.
The degree of coverage (T3) is defined by:
n

The degree of appropriateness, T4, describes how relevant is the summary for the particular environment
represented by objects in the database and is defined as:

 
T4 = abs   r j  T3 
 j1,..., m  (9)
Where:

n

h
i 1
i j = 1,…, n
rj =
n

1 if S j (y i )  
hj = 
0 in another case

The length of the summary (T5) can be defined as:


T5 = 2 0.5 card S  (10)

Finally, the total degree of validity, (T6), could be calculated by using different operators of aggregation, for
example in [15], this indicator is defined as the weighted average of the previous 5 degrees of validity, i.e.
k
TLS = w T
i 1
i i (11)

Total validity of a linguistic overview, where:
 k is the quantity of T that is calculated, in this case, there are five, that is, from T1 to T5.
 w is weight assigned for the aggregation of the T, therefore i = [1,…,5]. Each weight is a values be-
tween [0,1].
The combination of T values is very useful to detect the most relevant summaries. To find the optimal sum-
mary for an S* ∈ {S} would be:

S* = arg  max TLS  (12)
 LS1,..., k 
Where k is the total number of language overviews generated and arg is a function that returns the language
summary obtained as a result of the operation.
Art state study of “linguistic data summarization” led to the following partial conclusions:
 The T values proposed does not consider the indeterminacy or the falsity of objects respect to different
fuzzy sets memberships. In this sense objects with high indeterminacy could be considered with the
same weight during calculation than objects with high membership value and low indeterminacy.
 In many T values calculation, are consider all elements with memberships value greater than 0. But this
condition is not so good because this approach considers objects with very low memberships as the
same relevance as objects with the highest membership. Authors of this paper consider as necessary to
limit the calculation just for objects with membership values greater than an epsilon value.
 In particular, there are different scenarios with high vagueness and ambiguous concepts where is neces-
sary taking into account elements as neutrality and the uncertainty of concept to making decisions proc-
ess. For example, in project management some time the experts do not have a definitive opinion about a
decision and they have to take neutral positions before a definitive decision.
 Authors of this paper consider that the aggregation of different T values proposed in T6 (total degree of
validity), in some cases create a noise in the selection of summaries, and recommend the use of Pareto
approach [16].

Next subsection presents preliminary concepts of neutrosophic theory necessaries to introduce the exten-
sions to traditional T values.
2.3 Preliminary concepts about the neutrosophic theory
In [17][18] Smarandache introduced the concept of neutrosophic set and neutrosophic logic, which allows
handling efficiently the indeterminate and inconsistent information. Neutrosophic set is a generalization of the
theory of fuzzy set [19], intuitionistic fuzzy sets, interval-valued fuzzy sets [20] [21] and interval-valued in-
tuitionistic fuzzy sets [6]. A neutrosophic set has the three following degrees: truth-membership degree, indeter-
minacy-membership degree, and a falsity-membership degree. All these degrees are in the interval [-0, 1+].
However, the neutrosophic theory is difficult to be directly applied in real scientific and engineering areas.
For this reason, Smarandache [17] proposed the neutrosophic set theory, which is the more general form of in-
tuitionistic fuzzy logic, whose functions of truth, indeterminacy, and falsity lie in [0, 1]. Since then, publications
on neutrosophic set theory and its applications in several fields have been increasing in recent years; this is evi-
denced by the works presented in [22] [23] [24] [25] [26] [27].
For this work is particularly important the definition 1 of neutrosophic sets defined by Smarandache and
Wang et al. in [13], [12], [27].
Definition 1. Let M a neutrosophic set in universe X characterized by a triple (Label, X, μM(x), M(x), σM(x))
where: Label is a linguistic term which represents the name of set, X represents the universe of discourse,
μM(x) [0,1] represents a membership function , M(x)  [0,1] represents a indeterminacy-membership function
and σM(x)  [0,1] represents a falsity-membership function, where 0 ≤ μM(x) +M(x) + σM(x) ≤ 3.
This definition implies that each value of the domain x  X when evaluated in neutrosophic set M, such that
M(x) returns the value (μM (x), M (x), σM (x)) where the first component represents the membership degree of the
value x to the set M, the second component represents the indetermination degree of the value x to the set M and
the third component means the non-membership degree of the value x to the set M.

Single Valued Neutrosophic Set (SVNS) concept permits the application of neutrosophic set theories on real
scientific and engineering applications [13], see definition 2. Many studies have been done on this theory and
have been used in many application fields. In this theory, the values of truth, falsity, and indeterminacy of a
situation are considered. Many uncertainties and complex situations arise in decision-making applications.
Definition 2. Let X be a set of objects and x  X represents a single valued neutrosophic number (SVN) and
is characterized by a vector (V, I, F) where V indicates truth-value, I indeterminacy-value and F falsity-value.
Other important group of definitions are proposed by Subas [28]. He defines a single valued triangular neu-
trosophic number x = ((a, b, c), μA(x), A(x), σA(x)) where:
a, b ,c + if x is a positive single valued triangular neutrosophic number and
a, b, c   -
if x is a negative single valued triangular neutrosophic number.
In neutrosophic theory, different authors define operations between single value neutrosophic numbers [29]
[30] [18] as follows:
Let A be a variable represented by number ((a1, a2, a3); uA, rA, fA) and B is represented by number ((b1, b2,
b3); uB, rB, fB)
Sum: A (+) B= ((a1+b1 , a2+b2 , a3+b3); T(uA, uB ), S1(rA , rB ), S2(fA, fB )) (13)
Difference: A (-) B= ((a1-b3, a2-b2, b3-a1), T(uA, uB ), S1(rA , rB ), S2(fA, fB )) (14)

3 Extensions to T-values to evaluate linguistic summaries based on neutrosophic numbers
In this section, different extensions to traditional T–values are proposed.
Inspired in rough sets theory [31] the authors of this work propose the following equations and notation.
Lets YA* the set of objects with memberships to neutrosophic set A (definition 2) greater than alpha-cut:
where α[0,1] and YA* the set of objects with memberships to neutrosophic set A greater than 0.

1
Te 5a = 2 (40)
 R  S  0 .15 m 
1   

 0 .2 m 
Where m is the number of variables, R is the cardinality of filters of linguistic summary and S is the
cardinality of summarizer of a linguistic summary.
In order to select the best summaries, the authors of this work recommend using the Pareto approach, com-
bining all the proposed extensions in the selection. The best summaries will be those has the best value regarding
the combination of the proposed T values.

6 Results and discussion
To proof the extensions proposed in this work, data related to project management was used. From them,
linguistic summaries were generated to make decisions in the project management environment. This environ-
ment is characterized by the following elements:
 There are different information systems with a lot of heterogeneous data.
 There are different project management schools [32] that develop good practices through standards as
PMBok [33], ISO 21500 [34] and CMMI v1.3 [35, p. 1], but persist difficulties in projects.
 Different studies develop by the Standish Group [36] shows that there are numerous difficulties in
projects associated with TIC technologies. Approximately 52% of the projects are renegotiated while
just around 33 % corresponds with successful projects.
 In particular, TIC projects are affected by numerous risks due to their high dependence on the creativity
and skills of its human resources.
 Among the fundamental causes of project failure are: poor management, inadequacies in planning, con-
trol and monitoring processes [37]. These causes can be mitigated if techniques are available for knowl-
edge discovery and analysis of data historical summaries in linguistic form. From this, decision-makers
would have easily understandable information to facilitate tasks such as decision analysis, prediction or
forecasting [38].
 Different authors point to causes of this phenomenon: poor management and insufficient planning, con-
trol and monitoring processes [37]. In this scenario, it is useful to have techniques for linguistic summa-
ries discovery that allow the complex interrelationships between variables to be presented in natural
language [39][40][41].
In this scenario, learning from the mistakes and successes contained in project history data becomes a neces-
sity. On the other hand, in the project management scenario, most decision makers are not experts in data mining
and require understandable information for decision-making.
In this context, the techniques for linguistic summarization of data are applied as one of the descriptive
knowledge discovery techniques, with a promising and interesting approach to producing linguistic summaries

cant_rrhh_eval_b, cant_rrhh_eval_m,
Variables to calculate amount of persons for each performance level.
cant_rrhh_eval_r

time_availability, time_plan, time_real, tptp, Variables associated with the availability of time planned and real-time
tptr, trtr dedicated for human resources.

icd, iref (quality) ie, ire, irp (control of time)
Main indicators evaluated during the cut of the project, these indicators
irl (procurements)
are associated to control of scheduling.
irha, irhe, irhf, irht, irrh (human resources)

eval_fuzzysystem_advanced_01 Variable to calculate evaluation of the project.

The summaries were generated by using the algorithm AprioriUnificatorLDS [43] based on the combination
of the apriori algorithm and fuzzy logic techniques. This algorithm generates 79 linguistic summaries that were
evaluated by 7 experts and preferences of these experts were consider in final results.
In order to evaluate the summaries, each expert provides his preferences through a vector X = (xjki, xjki,…,
xj ), where xjki represents the preference of expert ei about summary j and considering the criterion ck. Later the
k

preferences of experts are aggregated by using the computing with words technique 2-tuples [44]. The criteria
used to evaluate the summaries were: level of novelty, complexity, simplicity, relevance for making decision. In
this work authors use specifically, weighted average operator, to combine the preferences of experts. The sum-
maries with high relevance for experts were:

First summary (O1): Many “projects” with (Around 50% "quantity of human resources with high compe-
tences") or (Around 50% "quantity of human resources with low competences”) or (High "quantity of human
resources with bad evaluation ") or (Mean "quantity of human resources with good evaluation ") or (Mean
"quantity of human resources with Regular evaluation ") have Bad “Performance indicator”.

This summary indicates that human resource performance has a high influence on project evaluation.
Summary explains that there are many projects with bad evaluation having a bad performance of its human re-
sources too. Organizations that develop these projects have to improve human resource control.
About its T-values: The T-values proposed by Zadeh presents this summary with low appropriateness than T-
values based on neutrosophic. In this case, the project manager’s preferences are closer to T-values based on
neutrosophic than Zadeh´s T-values. The truth values in T-values, based on neutrosophic, report better results
than Zadeh´s T-values respect to nearness to project managers’ preferences.

Second summary (O2): Around 50% of “projects” with (High "quantity of “human resources with bad
evaluation") have (Perfect "real time of real work").

This summary has a high degree of truth and it explains that in Around 50% of projects with “High” quantity
of human resources bad evaluated to have a “real time of real work” “Perfect”. Besides, this summary states that

in these projects there are false statements of real-time dedicated. The Project Management Office (PMO) that
control these projects have to improve the control of time declared by human resources and to analyze with
Project managers and team leaders the false declarations.

Third summary (O3): Around 50% of “projects” with (Regular "useful performance of human resources ")
or (Low "quantity of human resources well evaluated") or (Regular "performance of human resources ") or
(Mean "time availability ") have (Bad "efficacy").

This summary indicates that human resources have a high influence on project quality. In situations with low
performance of human resources or with low available time, then it affected the quality of the project frequently.
In this case, the project manager has to check the dedicated time of human resources and the quality of the
project.

Four summary (O4): Around 50% of “projects” with (Bad "performance indicator ") or (Low "quantity of
human resources well evaluated ") or (Regular "Project evaluation ") or (Low "quantity of human resources
regular evaluated") have (Bad "production on process of project ").

This summary shows that in 50% of projects with a bad evaluation, presents difficulties with its production
on a process. This situation should be attending quickly because of would trigger conflicts with clients in the fu-
ture.

Five summaries (O5): Around 50% of “projects” with (Mean "quantity of human resources bad evaluated ")
have (Bad "efficacy").

This summary means that projects with bad performance of human resources present serious problems in the
quality of the project. Besides; PMO that controls these projects have to elevate the control of quality. In order to
elevate the levels of quality, they should have decisions such as: to increase the rewards to human resources, pe-
nalize the bad performance of human resources or to contract new workers with better competencies.

This summary shows high dependence between human competences and efficiency. In this case, project
managers should keep this work to improve the competencies.
In order to compare the results of two methods, authors, create three ranking list of linguistic summaries:
 The first ranking called “ideal ranking” represents the order of summaries by considering the prefe-
rences of project managers implicated on validations.
 The second ranking contains the order of summaries by considering the T-values from Zadeh.
 The third ranking of summaries represents the order of summaries by considering the T-values
based on neutrosophic theory, proposed in this work.
Authors calculate deviations between “ideal ranking” with respect to the others by using the least squares
method, see equation (41).
z
D(ideal, output)   (ideal i -output i ) 2 (41)
i 1

Where z represents the number of summaries obtained and ideali outputi represents the position of summary
on ranking. The method with a low deviation to “ideal ranking” represents the method with better results. The
results of the T exposed above of the six summaries analyzed are presented in table 3.

Results of comparison permit to identify that ranking obtained from proposed T-values is closer to “ideal
ranking” than ranking based on T-values of Zadeh. The T-values proposed permits to evaluate the indeterminacy
and the falsity of the membership of objects to linguistic summaries while T-values of Zadeh does not permits to
evaluate these values.
The T-values based on neutrosophic evaluate more dimensions of summaries and report more data useful to
select the relevant summaries. For example, the combination of Te1a, Te1b, Te1c (equations 31, 32, and 33
respectively) reports more information than T1 proposed by Zadeh associated with the truth of summary.
The length of the summary called T5 proposed by Zadeh does not consider the number of variables impli-
cated on search while the Te5 (equations 40) consider the number of variables. Indicator Te5 is represented by a
bell function with a better behavior than the exponential function proposed by Zadeh.

Conclusion
The use of neutrosophic theory in linguistic data summarization techniques permits the introduction of the
indeterminacy concepts on linguistic summaries and permits to improve the interpretability of summaries.
The incorporation of neutrosophic sets in T certainty calculation allows having a fairer notion about the
certainty of the objects of the summary.
Ranking of summaries obtained from T-values based on neutrosophic is closer to “ideal ranking” than rank-
ing based on T-values of Zadeh.
The T-values proposed permits to evaluate the indeterminacy and the falsity of the membership of objects to
linguistic summaries while T-values of Zadeh do not permits to evaluate these values.
The T-values based on neutrosophic evaluate more dimensions of summaries and report more data useful to
select the relevant summaries.
The incorporation of the alpha-cut value avoids recovering objects with low influence in summaries into the
T-values calculation.
Experts consider neutrosophic T-values more expressiveness than traditional T-values. The application of
linguistic data summarization techniques combined with neutrosophic numbers in the project management envi-
ronment reports good results and should be applied in future works too.
Summaries obtained permit to project managers and to PMO personal improve the decisions. The use of neu-
trosophic theory combined with linguistic data summarization techniques constitutes a new area of investigations.
Summaries show high dependence between human competences and efficiency. Summaries obtained permit
detection in some projects of false declarations of "real time dedicated" indicator and permit to increase the con-
trol of projects with these difficulties.
Summaries obtained help to detect project with serious problems in the indicator “production on process”
and to emit alerts to PMO personal about these projects.