Sign up to receive free email alerts when patent applications with chosen keywords are publishedSIGN UP

Abstract:

A plurality of topics encompassed in a document are determined and, for
each such topic, a sentiment for that topic is likewise determined.
Thereafter, credibility of the document is determined based on the
resulting plurality of sentiments. In one embodiment, credibility of at
least one target document is established by first determining, for each
of a plurality of portions of the at least one target document, at least
one topic encompassed in the portion to provide a plurality of target
topics. Likewise, sentiment scores are determined for each portion.
Thereafter, for each prior topic of a plurality of prior topics, a
topic-sentiment score is determined based on sentiment scores
corresponding to those portions of the plurality of portions having a
target topic corresponding to the prior topic. A credibility index is
determined based on the resulting plurality of topic-sentiment scores.

Claims:

1-28. (canceled)

29. A method comprising: identifying, by one or more devices, one or more
particular topics based on one or more particular documents; identifying,
by the one or more devices, one or more portions, of a target document,
that correspond to a particular topic of the one or more particular
topics; determining, by the one or more devices, a measure of credibility
of the target document based on the one or more portions, of the target
document, corresponding to the particular topic; and performing, by the
one or more devices, an action based on the measure of credibility of the
target document.

30. The method of claim 29, where performing the action includes:
eliminating the one or more portions of the target document so as to
increase the measure of credibility of the target document.

31. The method of claim 29, where performing the action includes:
enhancing the one or more portions of the target document so as to
increase the measure of credibility of the target document.

32. The method of claim 29, where performing the action includes:
identifying at least one particular portion, of the one or more portions,
that are negatively affecting the measure of credibility of the target
document, and performing the action on the at least one particular
portion.

33. The method of claim 29, where determining the measure of credibility
comprises: determining one or more scores based on the one or more
portions of the target document, and determining the measure of
credibility based on the one or more scores.

34. The method of claim 33, where the measure of credibility is an
average, a weighted average, or a median of the one or more scores.

35. The method of claim 29, where identifying the one or more particular
topics comprises: generating one or more topic models, and identifying
the one or more particular topics by using the one or more topic models.

36. The method of claim 29, where the target document includes a proposed
communication by an entity.

37. The method of claim 29, further comprising: providing, for display,
information regarding key words describing the one or more particular
topics and the measure of credibility of the target document.

38. An apparatus comprising: one or more processors to: identify one or
more particular topics based on one or more particular documents;
identify one or more portions, of a target document, that correspond to a
particular topic of the one or more particular topics; determine a
measure of credibility of the target document based on the one or more
portions, of the target document, corresponding to the particular topic;
and perform an action based on the measure of credibility of the target
document.

39. The apparatus of claim 38, where, when performing the action, the one
or more processors are to: identify at least one particular portion, of
the one or more portions, that are negatively affecting the measure of
credibility of the target document, and perform the action on the at
least one particular portion.

40. The apparatus of claim 38, where, when performing the action, the one
or more processors are to: enhance at least one particular portion, of
the one or more portions, based on the measure of credibility of the
target document so as to increase the measure of credibility of the
document.

41. The apparatus of claim 38, where, when performing the action, the one
or more processors are to: eliminate at least one particular portion, of
the one or more portions, based on the measure of credibility of the
target document so as to increase the measure of credibility of the
document.

42. The apparatus of claim 38, where, when identifying the one or more
portions that correspond to the particular topic, the one or processors
are to: identify a plurality of portions of the target document based on
punctuation, paragraph boundaries, or section boundaries, and identify
the one or more portions, of the plurality of portions, that relate to
the particular topic.

43. The apparatus of claim 38, where, when determining the measure of
credibility of the target document, the one or more processors are to:
determine one or more scores based on the one or more portions of the
target document, and determine the measure of credibility based on the
one or more scores and one or more weights associated with prior topic
scores.

44. A non-transitory computer-readable medium storing instructions, the
instructions comprising: one or more instructions that, when executed by
at least one processor, cause the at least one processor to: identify one
or more particular topics based on one or more particular documents;
identify one or more portions, of a target document, that correspond to a
particular topic of the one or more particular topics; determine a
measure of credibility of the target document based on the one or more
portions, of the target document, corresponding to the particular topic;
and perform an action based on the measure of credibility of the target
document.

45. The non-transitory computer-readable medium of claim 44, where the
instructions further comprise: one or more instructions that, when
executed by the at least one processor, cause the at least one processor
to: provide, for display, one or more highlighted words that correspond
to the particular topic in the target document.

46. The non-transitory computer-readable medium of claim 44, where the
instructions further comprise: one or more instructions that, when
executed by the at least one processor, cause the at least one processor
to: provide, for display, highlighted text within the one or more
portions of the target document that negatively affect the measure of
credibility.

47. The non-transitory computer-readable medium of claim 44, where the
one or more instructions to perform the action include: one or more
instructions that, when executed by the at least one processor, cause the
at least one processor to: eliminate the one or more portions of the
target document so as to increase the measure of credibility of the
target document.

48. The non-transitory computer-readable medium of claim 44, where the
one or more instructions to perform the action include: one or more
instructions that, when executed by the at least one processor, cause the
at least one processor to: enhance the one or more portions of the target
document so as to increase the measure of credibility of the target
document.

Description:

FIELD

[0001] The instant disclosure relates generally to the automated analysis
of text and, in particular, to the determination of credibility of one or
more documents.

BACKGROUND

[0002] Generally, "credibility" is defined as the quality, capability or
power to elicit belief or trust. To the extent that credibility is thus
necessarily dependent upon the subjective determinations of others, the
process of determining the credibility of someone or something (referred
to hereinafter as an entity) is likewise often an highly subjective
process. Additionally, the effort required to accurately assess
credibility is typically significant to the extent that it requires
gathering data from a relatively large number of people knowledgeable
about the entity in question.

[0003] The relatively recent development of the Internet and World Wide
Web has lead to a commensurate explosion in the availability of textual
documents authored by entities of every conceivable type. Given the
ubiquity and relative ease of accessing such text, interest in techniques
(which techniques typically fall within the general categories of natural
language processing and/or machine learning) for automatically processing
documents in order to "understand" what information they may expressly or
inherently convey has increased. Only recently have developers of such
techniques turned to the task of assessing credibility of a document. As
used herein, a document may comprise a distinct, uniquely identified
collection of text, such as a word processing document, advertising copy,
a web page, a web log entry, etc. or portions thereof.

[0004] For example, techniques have been developed for assessing the
credibility of a document in which the importance or credibility of a
document is determined based at least in part upon the credibility of its
source, e.g., its author or publisher. Obviously, for such techniques to
work, data concerning the reliability of the source must be available or,
at the very least, readily obtainable, which may not always be the case.
Additionally, given the myriad influences that go into the development of
a source's reputation for credibility, it is not unreasonable to assume
that a source's credibility won't always correlate precisely with the
credibility of the document.

[0005] In another technique, the credibility of a topic or concept over
time is determined by comparing the frequency with which an expression of
that topic or concept is detected in a corpus of documents against the
frequency with which a related expression of that topic or concept (e.g.,
a negative or inverse expression of the topic or concept in question) is
detected in the documents. The intuition in this technique is that the
frequency with which a concept is repeated may serve as a form of proxy
for its credibility. For example, over time, the expression "global
warming is real" may occur with increasing frequency as compared to the
related expression "global warming is a hoax," with the resulting
inference that the concept of "global warming is real" is becoming
increasingly credible. However, this technique may likewise suffer from
accuracy problems to the extent that text, particularly in the context of
the Internet and/or World Wide Web, is often reproduced for reasons other
than a subjective belief or trust in its semantic content. As a result,
the frequency numbers could be easily skewed, thus resulting in an
equally skewed credibility determination.

SUMMARY

[0006] The instant disclosure describes techniques for determining the
credibility of a document based on sentiments corresponding to topics
encompassed in the document. In an embodiment, a plurality of topics
encompassed in a document are determined and, for each such topic, a
sentiment for that topic is likewise determined. Thereafter, credibility
of the document is determined based on the resulting plurality of
sentiments. For example, the credibility may be based on a combination of
the plurality of sentiments, such as an average where the sentiments are
expressed as numerical scores. Based on the credibility thus determined,
the document may be revised. Topic and sentiment determinations may be
performed using respective topic and sentiment models. Additionally,
information regarding any of the plurality of topics, the plurality of
sentiments or the credibility of the document may be displayed. Because
credibility is determined based on sentiments corresponding to topics
described in the document itself, the accuracy of the credibility
determination may be improved.

[0007] In an embodiment, credibility of at least one target document is
established by first determining, for each of a plurality of portions of
the at least one target document, at least one topic encompassed in the
portion to provide a plurality of target topics. Likewise, sentiment
scores are determined for each portion. In an embodiment, each portion
may comprise an individual sentence within the at least one target
document. Thereafter, for each prior topic of a plurality of prior
topics, a topic-sentiment score is determined based on sentiment scores
corresponding to those portions of the plurality of portions having a
target topic corresponding to the prior topic. A credibility index is
determined based on the resulting plurality of topic-sentiment scores. In
an embodiment, the plurality of prior topics and corresponding prior
topic scores may be determined based on analysis of prior documents. In
this case, the determination of the credibility index may be carried out
as a weighted average of the topic-sentiment scores in which ones of the
prior topic scores are used as the weights. Related apparatus are
likewise disclosed. Using the techniques described herein, the accuracy
of the credibility techniques may be improved in that the credibility
determination is focused on the topics found in the document to be
analyzed.

BRIEF DESCRIPTION OF THE DRAWINGS

[0008] The features described in this disclosure are set forth with
particularity in the appended claims. These features will become apparent
from consideration of the following detailed description, taken in
conjunction with the accompanying drawings. One or more embodiments are
now described, by way of example only, with reference to the accompanying
drawings wherein like reference numerals represent like elements and in
which:

[0009]FIG. 1 is a block diagram of a processing device that may be used
to implement various embodiments described herein;

[0010]FIG. 2 is a block diagram of an apparatus for determining
credibility of a document in accordance with an embodiment described
herein;

[0011]FIG. 3 is a flow chart illustrating processing for determining
credibility of a document in accordance with the embodiment of FIG. 1;

[0012]FIG. 4 is a block diagram of an apparatus for determining
credibility of a document in accordance with an alternative embodiment
described herein;

[0013]FIG. 5 is a flow chart illustrating processing for determining
credibility of a document in accordance with the embodiment of FIG. 4;
and

[0014] FIGS. 6-8 illustrate examples of a graphical user interface that
may be used in conjunction with the various embodiments described herein.

DETAILED DESCRIPTION OF THE PRESENT EMBODIMENTS

[0015]FIG. 1 illustrates a representative processing device 100 that may
be used to implement the teachings of the instant disclosure. The device
100 may be used to implement, for example, one or more components of the
apparatus 200, 400 as described in greater detail below. Regardless, the
device 100 comprises a processor 102 coupled to a storage component 104.
The storage component 104, in turn, comprises stored executable
instructions 116 and data 118. In an embodiment, the processor 102 may
comprise one or more of a microprocessor, microcontroller, digital signal
processor, co-processor or the like or combinations thereof capable of
executing the stored instructions 116 and operating upon the stored data
118. Likewise, the storage component 104 may comprise one or more devices
such as volatile or nonvolatile memory including but not limited to
random access memory (RAM) or read only memory (ROM). Further still, the
storage component 104 may be embodied in a variety of forms, such as a
hard drive, optical disc drive, floppy disc drive, etc. Processor and
storage arrangements of the types illustrated in FIG. 1 are well known to
those having ordinary skill in the art. In one embodiment, the processing
techniques described herein are implemented as a combination of
executable instructions and data within the storage component 104.

[0016] As shown, the device 100 may comprise one or more user input
devices 106, a display 108, a peripheral interface 110, other output
devices 112 and a network interface 114 in communication with the
processor 102. The user input device 106 may comprise any mechanism for
providing user input (such as, but not limited to, user inputs for
selecting topic-sentiment scores, sentiment scores used to determine a
topic-sentiment score, etc. as described below) to the processor 102. For
example, the user input device 106 may comprise a keyboard, a mouse, a
touch screen, microphone and suitable voice recognition application or
any other means whereby a user of the device 100 may provide input data
to the processor 102. The display 108, may comprise any conventional
display mechanism such as a cathode ray tube (CRT), flat panel display,
or any other display mechanism known to those having ordinary skill in
the art. In an embodiment, the display 108, in conjunction with suitable
stored instructions 116, may be used to implement a graphical user
interface. Implementation of a graphical user interface in this manner is
well known to those having ordinary skill in the art. The peripheral
interface 110 may include the hardware, firmware and/or software
necessary for communication with various peripheral devices, such as
media drives (e.g., magnetic disk or optical disk drives), other
processing devices or any other input source used in connection with the
instant techniques. Likewise, the other output device(s) 112 may
optionally comprise similar media drive mechanisms, other processing
devices or other output destinations capable of providing information to
a user of the device 100, such as speakers, LEDs, tactile outputs, etc.
Finally, the network interface 114 may comprise hardware, firmware and/or
software that allows the processor 102 to communicate with other devices
via wired or wireless networks, whether local or wide area, private or
public, as known in the art. For example, such networks may include the
World Wide Web or Internet, or private enterprise networks, as known in
the art.

[0017] While the device 100 has been described as one form for
implementing the techniques described herein, those having ordinary skill
in the art will appreciate that other, functionally equivalent techniques
may be employed. For example, as known in the art, some or all of the
functionality implemented via executable instructions may also be
implemented using firmware and/or hardware devices such as application
specific integrated circuits (ASICs), programmable logic arrays, state
machines, etc. Furthermore, other implementations of the device 100 may
include a greater or lesser number of components than those illustrated.
Once again, those of ordinary skill in the art will appreciate the wide
number of variations that may be used is this manner. Further still,
although a single processing device 100 is illustrated in FIG. 1, it is
understood that a combination of such processing devices may be
configured to operate in conjunction (for example, using known networking
techniques) to implement the teachings of the instant disclosure.

[0018] Referring now to FIG. 2, an embodiment of an apparatus 200 for
determining credibility of a document is illustrated. As noted above, the
various components 202-206 forming the apparatus 200 may be implemented
by the processing device 100 using executable instructions 116 to
implement the functionality described herein. Of course, as further noted
above, some or all of the apparatus 200 may also be implemented using
firmware and/or hardware devices as a matter of design choice. As shown,
the apparatus 200 comprises a topic determination component 202
operatively connected to a sentiment determination component 204 that, in
turn, is operatively connected to a credibility determination component
208 that provides, as output, a credibility determination 208 of a
document 220. As noted previously, the document 220 may comprise text in
any of number of formats and representations; the instant disclosure is
not limited in this regard. Furthermore, although a single document is
illustrated in FIG. 2, those having ordinary skill in the art will
appreciate that the apparatus 200 may be equally applied to a plurality
of documents.

[0019] Regardless, the topic determination component 202 analyzes the
document 220 to identify one or more topics therein. In particular, the
topic determination component 202 may implement any of a number of
well-known generative modeling techniques, such as latent Dirchlet
allocation (LDA), probabilistic latent semantic analysis (PLSA) or
various extensions thereof, to generate one or more topic models. Using
such topic models, it is possible to discover or determine the existence
of one or more topics in a given document and, furthermore, assign a
score to a given topic, i.e., a probabilistic assessment that the
document encompasses that specific topic. As used herein, a topic is a
set of semantically coherent words or phrases within one or more
documents. The resulting plurality of topics thus determined by the topic
determination component 202 may then be provided to the sentiment
determination component 204, as shown.

[0020] The sentiment determination component 204 determines sentiments for
each of the plurality of topics to provide a plurality of sentiments. As
used herein, a sentiment may be a set of words or phrases representing an
opinion about a topic. To this end, the sentiment determination component
204 may apply any of a number of well-known sentiment analysis techniques
to those portions of the document corresponding to a specific topic. For
example, an overall sentiment for a given document may be determined and
then applied to each of the topics identified therein. A more granular
approach may be employed in which those portions of a document containing
words especially related to a given topic are analyzed for a
corresponding sentiment. For example, lists of words attributable to
specific sentiments (e.g., "bad", "horrible", "pathetic" for a negative
sentiment; "O.K.", "adequate", "indifferent" to a neutral sentiment; and
"great", "pleased", "wonderful" for a positive sentiment) may be
maintained and used to score the sentiment for the topic-specific
portions as noted above.

[0021] The credibility determination component 206 operates to determine
the credibility 208 of the document 220 based on the plurality of
sentiments provided by the sentiment determination component 204. In an
embodiment, this may be accomplished by combining the various sentiments
determined for each of the detected topics in some fashion. For example,
in one embodiment, the credibility of the document may be provided as the
average of the sentiment scores across at least some of the detected
topics, assuming in this case numerical sentiment scores. Other
techniques for determining credibility based on the topics and
corresponding sentiments may equally employed, e.g., a weighted average,
selection of the median sentiment score, discarding outlier scores prior
to averaging, etc. In this way, the effect of the document to inspire
belief or trust is modeled according to the sentiments attributable to
the various topics contained within the document itself, rather than an
extrinsic stand-in factor, such as the source of the document.

[0022] In an alternative embodiment, the tasks of determining the
plurality of topics and corresponding plurality of sentiments may be
combined into a single component implementing a joint topic-sentiment
determination technique. Various joint topic-sentiment modeling
techniques based on the LDA and PLSA generative modeling techniques may
be employed for this purpose. For example, an extension of PLSA may be
found in Mei et al., "Topic Sentiment Mixture: Modeling Facets and
Opinions in Weblogs", Proceedings of the 16th International Conference on
World Wide Web (WWW 2007), p. 171-189, the teachings of which are
incorporated herein by this reference. As another example of a suitable
extension, in this case, of the LDA technique is described by C. Lin and
Y. He, "Joint sentiment/topic model for sentiment analysis", Proceedings
of the ACM international conference on Information and knowledge
management (CIKM) 2009, the teachings of which are incorporated herein by
this reference. Using such techniques, topic discovery is performed
simultaneously with sentiment determinations for the discovered topics.
Regardless, the resulting topic and sentiments determinations may then be
used to determine the credibility of the document as noted above.

[0023] Referring now to FIG. 3, a flowchart illustrating the processing
implemented by the apparatus 200 is provided. Thus, beginning at block
302, a plurality of topics are detected within a document and, at block
304, a sentiment for each topic is likewise determined. Thereafter, at
block 306, a credibility of the document is determined based on the
plurality of sentiments. In an embodiment, the document being analyzed
may comprise a proposed communication by an entity. For example, a
company may be proposing to issue a press release about its most recent
addition to its product lineup. By first assessing the credibility of the
proposed communication in accordance with the instant techniques, the
potential need for changes to improve its credibility may be determined.
This is optionally illustrated at block 308 where revisions to the
document (i.e., the proposed communication) are made based on the
credibility determination. In an embodiment described below, it is
possible to identify specific portions of a proposed communication
affecting the credibility determination one way or another. By
eliminating or enhancing the relevant portions of the document, the
perceived credibility of the document may be improved.

[0024] As a further optional step, illustrated at block 310, information
regarding any of the plurality of topics, the plurality of corresponding
sentiments and/or the resulting credibility may be displayed to a user.
For example, and with reference to FIG. 1, such information could be
displayed via the display 108. Thus, in an embodiment, key words
describing each of the identified topics could be displayed. Likewise,
scores for each of the topic and/or corresponding sentiments or, further
still, the determined credibility could also be displayed. More detailed
examples of a suitable graphical user interface for this purpose are
further described below relative to FIGS. 6-8.

[0025] Referring now to FIG. 4, an embodiment of an alternative apparatus
400 in accordance with the instant disclosure is described. As noted
above, the various components 402-408 forming the apparatus 400 may be
implemented by the processing device 100 using executable instructions
116 to implement the functionality described herein. Of course, as
further noted above, some or all of the apparatus 400 may also be
implemented using firmware and/or hardware devices as a matter of design
choice. The implementation illustrated in FIG. 4 (the operation of which
is further described with reference to FIG. 5) is particularly suitable
for assessing the credibility of an entity (e.g., a product and/or
manufacturer thereof) by analysis of various documents relating thereto.

[0026] As shown, the apparatus 400 comprises a topic scoring component 402
and one or more topic models 404 that operate in conjunction therewith.
Likewise, a sentiment scoring component 406 and one or more sentiment
models 408 that operate in conjunction therewith are also provided. Both
the topic scoring component 402 and the sentiment scoring component 406
are operatively connected to a topic-sentiment scoring component 410
that, in turn, is operatively connected to a credibility determination
component 412. In this embodiment, input to the apparatus 400 is provided
in the form of one or more target documents 420 for which a credibility
determination is to be made. In a further embodiment, as described in
more detail below, the credibility determination 414 made by the
apparatus based on prior topics discovered in one or more prior documents
430.

[0027] The topic scoring component 402 uses the one or more topic models
404 to analyze the documents 420, 430 to discover topics encompassed in
one or more documents. More particularly, as described in further detail
below, the topic scoring component 402 analyzes portions of the
document(s) to discover one or more topics within each portion.
Similarly, the sentiment scoring component 406 analyzes each of the
portions to determine a sentiment score associated with that portion. In
turn, the topic-sentiment scoring component 410 then determines a
topic-sentiment score based on the sentiment scores associated with those
portions having a detected topic corresponding to a prior topic. Finally,
the credibility determination component 412 determines a credibility
index 414 based on at least some of the topic-sentiment scores for each
of the prior topics. The prior topics, in turn, may be ascertained based
on analysis of the prior documents by the topic scoring component 402, or
may be provided separately in advance. A more detailed explanation of the
apparatus of FIG. 4 is provided with further reference to FIG. 5.

[0028] Referring now to FIG. 5, operation of the apparatus 400 is
described in further detail. Beginning at block 502, a plurality of prior
topics and prior topic scores may be optionally determined based on topic
model analysis (via the topic scoring component 402) of the one or more
prior documents 430. As noted above, the prior topic scores may express
the degree of certainty that a given prior topic is expressed in the
prior document(s) 430. For example, in the case of a product
manufacturer, the prior documents(s) 430 may comprise press releases or
the like concerning the product or related products. To further
illustrate this, assume that the product in question is an automobile. In
this case, the prior topics may include "Reliability", "Comfort",
"Exterior", "Interior", "Performance", "Fuel Rating", etc. Alternatively,
rather than discovering the plurality of prior topics through analysis of
the prior documents 430, the prior topics could simply be provided
separately. For example, a subject matter expert or the like may
determine what the relevant prior topics and corresponding prior topic
scores are.

[0029] At block 504, a target topic is determined (via the topic scoring
component 402) for each portion of a plurality of portions of the at
least one target document 420, thereby providing a plurality of target
topics. In an embodiment, the at least one target document 420 may
comprise review text or other web-based data that is authored by someone
other than the entity for whom credibility is being determined. For
example, the at least one target document 420 may comprise review text or
other web-based text. Techniques for obtaining such target documents 420,
particularly via the Internet and/or World Wide Web, are well known in
the art, for example, via the use of one or more web crawlers (sometimes
also referred to as web robots or "bots") programmed to visit websites of
relevant entities and extract (copy) the desired text.

[0030] Furthermore, in an embodiment, each of the plurality of portions
comprises an individual sentence within the at least one target document.
Techniques for identifying individual sentences, e.g., through the use of
punctuation detection, are well-known in the art. It is anticipated,
however, that portions may be equally identified according to any other
desired criteria. For example, each portion may be delimited according to
paragraph or section boundaries, or even as separate documents (in those
instances where each target document comprises a relatively small
quantity of text). Once again, the instant disclosure is not limited in
this regard.

[0031] Referring once again to FIG. 5, processing continues at block 506
where, for each of the plurality of portions, a sentiment score is
determined via the sentiment scoring component 406. Again, where each
portion comprises an individual sentence, then a sentiment for each
sentence is determined. It is noted that the topics and sentiments
detected for each portion at blocks 505 and 506 may be associated with
those portions in the form, for example, of metadata stored along with
the actual text itself.

[0032] Thereafter, at block 508, for each of the plurality of prior topics
(regardless how those prior topics were ascertained), a topic-sentiment
score is determined (by the topic-sentiment scoring component 410) based
on the sentiments scores for those portions of the plurality of portions
have a target topic corresponding to the prior topic. For example, and
with further reference to the automobile example noted above, assume a
first portion has associated therewith the "Reliability" topic, a second
portion has associated therewith the "Interior" topic and a third portion
has associated therewith the "Fuel Rating" topic. In that instance where
the prior topic is "Interior", only the sentiment for the second portion
(and any other portions likewise having a target topic of "Interior")
will be considered when developing a topic-sentiment score for that prior
topic. In an embodiment, each topic-sentiment score is determined as an
average of the sentiment scores associated with the qualifying (i.e.,
matching) portions. However, it is understood that techniques other than
averaging (e.g., weighted averages, selecting a median value, discarding
outlier values prior to averaging, etc.) may be equally employed for this
purpose.

[0033] Having determined the various topic-sentiment scores, processing
continues at block 510 where the credibility index 414 is determined (by
the credibility determination component 412) based on at least some of
the plurality of topic-sentiment scores. In an embodiment, this is done
by calculating an average or a weighted average of the topic-sentiment
scores in which the prior topic scores are used as the weights. In this
manner, the credibility index 414 is most heavily influenced by those
prior topics having the greatest likelihood of being encompassed by the
prior document(s), i.e., that have been most frequently discussed
previously. It is once again noted, however, that the weighted average
used to determine the credibility index 414 is not the only technique
that may be employed for this purpose; other techniques (e.g., selecting
the median valued topic-sentiment score, discarding outliers prior to
averaging/weighted averaging, etc.) may serve equally well depending on
the desired application.

[0034] As noted above, the information regarding the topics, sentiments
and credibility determinations may be displayed to a user. In FIG. 5,
this is illustrated by the optional processing according to blocks
514-520, and as further illustrated by the representative graphical user
interfaces shown in FIGS. 6-8. Thus, at block 512, the plurality of
topic-sentiment scores may be displayed on a display. An example of this
is shown in FIG. 6, which illustrates a graphical user interface 600
comprising a topic-sentiment score region 602, a text display region 604
and a details display region 606. In the illustrated example, the text
display region 604 includes a planned communication (prior document) in
the form of a press release. Based on the topic discovery processing
noted above, a plurality of topics 608 found in the document are also
shown and highlighted (via the underlined words in the text display
region 604). In an embodiment, each topic 608 may be displayed according
to a specific color or other unique highlighting feature, and the
corresponding text in the text display region 604 may be similarly
highlighted by color, etc.

[0035] For each detected topic, a plurality of topic-sentiment scores are
displayed in the topic-sentiment score region 602. In the illustrated
example, the topic-sentiments scores are grouped according to their
corresponding topic (e.g., "Fun-To-Drive", "Performance", "Reliability",
etc.) and further displayed as data points in a timeline graph.
Furthermore, for comparison purposes, each topic is further divided into
sub-groups according to any desired data dimension. In this illustrated
example, each topic is divided according to two "automobile brands"
(shown as "Toyota" and "Nissan") as differentiated by the circular and
square data points. Of course, those having ordinary skill in the art
will appreciate that more than two different data dimension values may be
compared in each graph, and that well-known data mining techniques may be
employed to define other data dimensions that may be suitably employed
for this purpose. For example, in the automotive example illustrated in
FIG. 6, each topic could be further sub-divided according to two or more
specific models of cars, by regions, consumer age groups, etc. Once
again, techniques for permitting a user to select which particular data
dimension they would like to use for comparison purposes are well-known
in the art. Within each timeline graph, the topic-sentiment scores
corresponding to each specified automobile brand are separated by year,
thereby permitting ready comparison of the per topic sentiments meeting
the selected display criteria.

[0036] Referring once again to FIG. 5, processing may continue at block
514 where it is determined whether an user input selection of specific
topic-sentiment scores has been provided (e.g., received by the processor
102 via, for example, a user input device 106, such as a mouse-cursor
combination). If so, processing continues at block 516 where at least one
sentiment score underlying the selected topic-sentiment scores is
displayed. This is illustrated in FIG. 7 where the graphical user
interface 600 has been updated to reflect the selection of the
topic-sentiment scores for the year 2007 under the "Reliability" topic.
In the illustrated example, the selected topic-sentiment scores are
highlighted and related information 702, 704 regarding the sentiment
scores for each topic-sentiment score is shown in overlay form. That is,
for each of the two topic-sentiment scores, the corresponding year and
actual topic-sentiment score is displayed, along with a breakdown of the
number of portions (e.g., sentences) found in the at least one target
document for each sentiment. The breakdown of portions by sentiment is
further illustrated in the details display region 606 in the form of a
bar chart 706. In the illustrated example, the breakdown corresponds to
the topic-sentiment score for the "Toyota" sub-group within the
"Reliability" topic. In this manner, the user is able to better gauge the
relative confidence for the selected topic-sentiment score. In addition
to the bar chart 706, selection buttons 708 are provided that permit the
user to further explore the data underlying the individual sentiment
scores.

[0037] Referring back once again to FIG. 5, the ability to see the
portions underlying the sentiment scores is reflected in block 518 where
further user selection input may be provided such that processing
continues at block 520 where those portions of the plurality of portions
corresponding to a selected sentiment score are displayed. This is
reflected in FIG. 8 where selection of either the "Previous" or "Next"
selection button 708 causes a portion 802 of the plurality of portions to
be displayed in the details display regions 606. In the illustrated
embodiment, the text within the portion giving rise to the sentiment can
be highlighted (as shown by the underlined text) thereby permitting the
user drill all the way down to the specific words used in the target
text. Once again, such detailed views are possible because the topic and
sentiment metadata is stored along with each portion analyzed as
described above.

[0038] While particular preferred embodiments have been shown and
described, those skilled in the art will appreciate that changes and
modifications may be made without departing from the instant teachings.
It is therefore contemplated that any and all modifications, variations
or equivalents of the above-described teachings fall within the scope of
the basic underlying principles disclosed above and claimed herein.

[0039] In particular, specific examples of the types of communications and
documents that may be used in conjunction with the disclosed techniques
have been described above. However, it is understood that the disclosed
techniques may be applied to a wide variety of different types of
communications and documents. For example, the communications and/or
documents may be centered around a specific business function such as, by
way of non-limiting example, "human resources", "public relations",
"recruiting", i.e., the topics may be specialized to the desired business
function. In a similar vein, the topics may be selected according to a
specialized audience type (e.g., employees, partners, supplier vendors,
etc.) or geography (e.g., domestic, international, rural, metropolitan,
etc.). Further specialized adaptations leveraging the techniques
described herein may be readily devised.