Sign up to receive free email alerts when patent applications with chosen keywords are publishedSIGN UP

Abstract:

Social media content items are mapped to relevant time-based media events.
These mappings may be used as the basis for multiple applications, such
as ranking of search results for time-based media, automatic
recommendations for time-based media, prediction of audience interest for
media purchasing/planning, and estimating social interest in the
time-based media. Social interest in time-based media (e.g., video and
audio streams and recordings) segments is estimated through a process of
data ingestion and integration. The estimation process determines social
interest in specific events represented as segments in time-based media,
such as particular plays in a sporting event, scenes in a television
show, or advertisements in an advertising block. The resulting estimates
of social interest also can be graphically displayed.

Claims:

1. A computer-executed method for displaying social interest in time-based
media events, the method comprising:selecting a series of events
occurring in a time-based medium, each event associated with one of a
series of chronological time segments in the time based medium;for each
event associated with one of the time segments, determining a level of
social interest in the event from a plurality of social media content
items that are relevant to the event;for each of the time segments,
determining a level of social interest in the time segment based upon an
aggregate level of social interest in the events aligned with the time
segment; andgraphically displaying the level of social interest in each
time segment.

2. The computer-executed method of claim 1, wherein graphically displaying
each time segment comprises displaying different event types such that
the different event types each are visually distinct.

3. The computer-executed method of claim 1, further comprising filtering
the time segments according to a search term, wherein the graphically
displaying displays only a subset of the series of chronological time
segments corresponding to the search term.

4. The computer-executed method of claim 1, wherein graphically displaying
each time segment comprises:determining a first portion of the segment
corresponding to a positive sentiment;determining a second portion of the
segment correspond to a negative sentiment; anddisplaying the segment
with the first and second portions of the segment visually distinguished
from each other.

5. The computer-executed method of claim 1, further comprising determining
a confidence score indicative of the probability that the plurality of
candidate social media content items are relevant to the event, further
comprising:extracting event features from annotations associated with the
event;extracting social media features from the plurality of social media
content items; andmapping the event to the social media content items
based on a relationship between the event features and social media
features.

6. The computer-executed method of claim 1, further comprising annotating
the event with the annotations using metadata instances relevant to the
event.

7. A tangible computer readable medium storing a computer program
executable by a processor, the computer program producing a user
interface of a social interest estimation system, the user interface
comprising:a social interest heat map area for displaying a social
interest heat map showing levels of social interest for a plurality of
events corresponding to a series of chronological time segments in a
time-based medium;a media display area, visually distinguished from and
concurrently displayed with the social interest heat map area, for
displaying an event selected from the social interest heat map; anda
social media display area, visually distinguished from and concurrently
displayed with the social interest heat map and media display areas, for
displaying social media content items for the selected event.

8. The computer readable medium of claim 7, wherein the social interest
heat map area displays different event types such that the different
event types each are visually distinct.

9. The computer readable medium of claim 7, wherein the user interface
further comprises a field for filtering the time segments according to a
search term, wherein the social interest heat map area displays only a
subset of the series of segments corresponding to the search term.

10. The computer readable medium of claim 7, wherein each time segment
comprises a first portion of the segment corresponding to positive
sentiment that is visually distinguished from a second portion of the
segment corresponding to negative sentiment.

11. A system for displaying social interest in time-based media events,
the system comprising:means for selecting a series of events occurring in
a time-based medium, each event associated with one of a series of
chronological time segments in the time based medium;means for
determining for each event associated with one of the time segments a
level of social interest in the event from a plurality of social media
content items that are relevant to the event;means for determining for
each of the time segments a level of social interest in the time segment
based upon an aggregate level of social interest in the events aligned
with the time segment; andmeans for graphically displaying the level of
social interest in each time segment.

12. The system of claim 11, wherein graphically displaying each time
segment comprises displaying different event types such that the
different event types each are visually distinct.

13. The system of claim 11, further comprising means for filtering the
time segments according to a search term, wherein the graphically
displaying displays only a subset of the series of chronological time
segments corresponding to the search term.

14. The system of claim 11, wherein the means for graphically displaying
each time segment comprises:means for determining a first portion of the
segment corresponding to a positive sentiment;means for determining a
second portion of the segment correspond to a negative sentiment;
andmeans for displaying the segment with the first and second portions of
the segment visually distinguished from each other.

15. The system of claim 11, further comprising means for determining a
confidence score indicative of the probability that the plurality of
candidate social media content items are relevant to the event, further
comprising:means for extracting event features from annotations
associated with the event;means for extracting social media features from
the plurality of social media content items; andmeans for mapping the
event to the social media content items based on a relationship between
the event features and social media features.

16. The system of claim 11, further comprising means for annotating the
event with the annotations using metadata instances relevant to the
event.

17. A system for displaying social interest in time-based media events,
the system comprising:a computer processor; anda computer-readable
storage medium storing computer program modules configured to execute on
the computer processor, the computer program modules comprising:a
multimedia store configured to store a series of events occurring in a
time-based medium, each event associated with one of a series of
chronological time segments in the time based medium;social interest
estimator configured to determine for each event associated with one of
the time segments a level of social interest in the event from a
plurality of social media content items that are relevant to the event
and to determine for each of the time segments a level of social interest
in the time segment based upon an aggregate level of social interest in
the events aligned with the time segment; anda user interface engine
configured to graphically display the level of social interest in each
time segment.

18. The system of claim 17, wherein the graphical display of each time
segment comprises displaying different event types such that the
different event types each are visually distinct.

19. The system of claim 17, further comprising a domain ontology engine
configured to filter the time segments according to a search term,
wherein the graphical display displays only a subset of the series of
chronological time segments corresponding to the search term.

20. The system of claim 17, further comprising an annotation engine
configured to annotate the event with the annotations using metadata
instances relevant to the event.

Description:

PRIORITY INFORMATION

[0001]This application claims priority under 35 U.S.C. §119(e) to
U.S. Provisional Patent Application No. 61/226,002, filed on Jul. 16,
2009 and entitled "Method of Estimating Social Interest in American
Football," which is hereby incorporated herein by reference.

BACKGROUND OF THE INVENTION

[0002]The present invention relates generally to using social media to
estimate interest in media events, and in particular to aggregating
social media content items and references to the media events therein for
estimating social interest in time-based media.

[0003]Online social media services, such as social networking sites,
search engines, news aggregators, blogs, and the like provide a rich
environment for users to comment on events of interest and communicate
with other users. Content items contributed by users of these social
media services often include references to events that appear in time
based media such as television shows, news reports, sporting events,
movies, concert performances, and the like. However, although the content
items refer to the time-based media, the social media content items
themselves typically are isolated from the events and time-based media in
which those events occur.

SUMMARY OF THE INVENTION

[0004]Social media content items and references to events that occur
therein are aligned with the time-based media events they describe. These
mappings may be used as the basis for multiple applications, such as
ranking of search results for time-based media, automatic recommendations
for time-based media, prediction of audience interest for media
purchasing/planning, and estimating social interest in the time-based
media. Social interest in time-based media (e.g., video and audio streams
and recordings) segments is estimated through a process of data ingestion
and integration. The estimation process determines social interest in
specific segments of time-based media, such as particular plays in a
sporting event, scenes in a television show, or steps in an instructional
video. The social interest in a given event is determined by aggregating
social media content items with confidence scores indicating the
likelihood that the content items refer to the given event.

[0005]For an event appearing in time-based media, which event may have
been identified by segmentation of the time-based media, social media
content items are identified as potentially relevant to the event. The
probability that the content item is relevant to the time-based media
event is determined for each social media content item, and a confidence
score reflecting the probability is assigned to the content item. Content
items with higher probabilities are aligned with the event, aggregated,
and stored. The aggregated content items are associated with an aggregate
score for the time-based media event, where the aggregate score is an
estimate of the level of social interest in the time-based media event.
The estimated level of social interest also can be graphically displayed.
The features and advantages described in this summary and the following
detailed description are not all-inclusive. Many additional features and
advantages will be apparent to one of ordinary skill in the art in view
of the drawings, specification, and claims hereof.

BRIEF DESCRIPTION OF DRAWINGS

[0006]FIG. 1 illustrates the computing environment of one embodiment of a
system for associating social media content items with time-based media
events and determining social interest in the events based on the
resulting associations.

[0007]FIG. 2 is a block diagram of one embodiment of a social interest
information provider.

[0008]FIG. 3 is a conceptual diagram illustrating the video/metadata
alignment/annotation and social media/event alignment processes at a high
level according to one embodiment.

[0009]FIG. 3A is a flow diagram illustrating one embodiment of a method
for associating social media content items with time-based media events,
and a related method of determining social interest in the events based
on the resulting associations.

[0010]FIG. 4 is a flow diagram illustrating one embodiment of a video
event segmentation process.

[0011]FIG. 5 is a flow diagram illustrating one embodiment of a metadata
alignment/annotation process.

[0012]FIG. 6 is a flow diagram illustrating one embodiment of a social
media/event alignment process.

[0013]FIG. 7 is a flow diagram illustrating one embodiment of a social
interest estimation process.

[0014]FIGS. 8A and 8B show two embodiments of social interest heat maps
showing levels of social interest for a plurality of events corresponding
to a series of chronological time segments in a time-based medium.

[0015]FIGS. 9A-9C show three embodiments of user interfaces of a social
interest estimation system.

[0016]FIGS. 10A and 10B show two embodiments of user interfaces of a
social interest estimation system showing a sentiment view.

[0017]FIGS. 11A-11C show three embodiments of user interfaces of a social
interest estimation system showing a filtered view.

[0018]FIG. 12A shows one embodiment of user interface of a social interest
estimation system showing a focused unexpanded view.

[0019]FIG. 12B shows one embodiment of user interface of a social interest
estimation system showing a focused expanded view.

[0020]FIGS. 13A-D show yet another embodiment of a user interface
displaying social interest heat maps showing levels of social interest
for a plurality of events corresponding to a series of chronological time
segments in a time-based medium.

[0021]The figures depict various embodiments of the present invention for
purposes of illustration only. One skilled in the art will readily
recognize from the following discussion that alternative embodiments of
the structures and methods illustrated herein may be employed without
departing from the principles of the invention described herein.

DETAILED DESCRIPTION

[0022]FIG. 1 illustrates the computing environment 100 for one embodiment
of a system 130 for associating social media content items and references
to events therein with time-based media events and determining social
interest in the events based on the resulting associations.

[0024]The social media sources 110 include social networks, blogs, news
media, forums, user groups, etc. These sources generally provide a
plurality of users with the ability to communicate and interact with
other users of the source. Users can typically contribute various content
items (e.g., posts, videos, photos, links, status updates, blog entries,
tweets, and the like), which may refer to media events, and can engage in
discussions, games, online events, and other participatory services.

[0026]The social interest information provider 130 provides a system for
associating social media content items and references to events therein
with time-based media events and determining social interest in the
events based on the resulting associations, and is further described in
conjunction with FIG. 2.

[0027]The network 140 may comprise any combination of local area and/or
wide area networks, the Internet, or one or more intranets, using both
wired and wireless communication systems.

[0028]The client devices 150 comprise computing devices that can receive
input from a user and can transmit and receive data via the network 140.
For example, client devices 150 may be a desktop computer, a laptop
computer, a smart phone, a personal digital assistant (PDAs), or any
other device including computing functionality and data communication
capabilities. A client device 150 is configured to communicate with the
social media sources 110 and the social interest information provider
system 130 via the network 140.

[0029]FIG. 2 is a block diagram of one embodiment of a social interest
information provider 130. The embodiment of the social interest
information provider 130 shown in FIG. 2 is a computer system that
includes a web server 200 and associated API 202, a domain ontology
engine 205, an author identifier 210, a closed captioning extractor 215,
an event segmentation engine 220, a feature extraction engine 225, a
metadata alignment engine 230, an annotation engine 235, a comparative
feature extraction engine 240, a media event/alignment engine 245, a
social interest estimator 250, a user interface engine 255, domain
ontologies 257, a social media content store 260, a social media author
store 263, a usage stats store 265, a closed captioning store 267, a
multimedia store 270, an event metadata store 273, a mapping store 275, a
video event store 280, a social interest store 285, and an annotated
event store 290. This system may be implemented using a single computer,
or a network of computers, including cloud-based computer
implementations. The computers are preferably server class computers
including one or more high-performance CPUs, 1 G or more of main memory,
as well as 500 GB to 2 Tb of computer readable, persistent storage, and
running an operating system such as LINUX or variants thereof. The
operations of the system 130 as described can be controlled through
either hardware or through computer programs installed in computer
storage and executed by the processors of such servers to perform the
functions described herein. The system 130 includes other hardware
elements necessary for the operations described here, including network
interfaces and protocols, security systems, input devices for data entry,
and output devices for display, printing, or other presentations of data;
these and other conventional components are not shown so as to not
obscure the relevant details.

[0030]As noted above, system 130 comprises a number of "engines," which
refers to computational logic for providing the specified functionality.
An engine can be implemented in hardware, firmware, and/or software. An
engine may sometimes be equivalently referred to as a "module" or a
"server." It will be understood that the named components represent one
embodiment of the present invention, and other embodiments may include
other components. In addition, other embodiments may lack the components
described herein and/or distribute the described functionality among the
components in a different manner. Additionally, the functionalities
attributed to more than one component can be incorporated into a single
component. Where the engines described herein are implemented as
software, the engine can be implemented as a standalone program, but can
also be implemented through other means, for example as part of a larger
program, as a plurality of separate programs, or as one or more
statically or dynamically linked libraries. In any of these software
implementations, the engines are stored on the computer readable
persistent storage devices of the system 130, loaded into memory, and
executed by the one or more processors of the system's computers. The
operations of the system 130 and its various components will be further
described below with respect to FIG. 2 and the remaining figures. As will
become apparent, the various data processing operations described herein
are sufficiently complex and time consuming as to require the operation
of a computer system such as the system 130.

[0031]The web server 200 links the social interest information provider
130 to the client devices 150, the time-based media sources 120, and the
social media sources 110 via network 140, and is one means for doing so.
The web server 200 serves web pages, as well as other web related
content, such as Java, Flash, XML, and so forth. The web server 200 may
include a mail server or other messaging functionality for receiving and
routing messages between the social interest information provider 130 and
client devices 150.

[0032]The API 202, in conjunction with web server 200, allows one or more
external entities to access information from the social interest
information provider 130. The web server 200 may also allow external
entities to send information to the social interest information provider
130 calling the API 202. For example, an external entity sends an API
request to the social interest information provider 130 via the network
140 and the web server 200 receives the API request. The web server 200
processes the request by calling an API 202 associated with the API
request to generate an appropriate response, which the web server 200
communicates to the external entity via the network 140. The API 202 can
be used for the social interest information provider 130 to receive
extracted features and other inputs to the social media/event alignment
330 and social interest estimation 340 processes from third parties (such
as entities providing the time based media), which then would be used by
the social interest information provider 130 in those processes.

[0033]The action logger 215 is capable of receiving communications from
the web server 210 about user actions on and/or off the social networking
system 130. The action logger 215 populates the action log 230 with
information about user actions, allowing the social networking system 130
to track various actions taken by its users within the social networking
system 130 and outside of the social networking system 130. Any action
that a particular user takes with respect to another user is associated
with each user's profile, through information maintained in the action
log 230 or in a similar database or other data repository. Examples of
actions taken by a user within the social network 130 that are identified
and stored may include, for example, adding a connection to another user,
sending a message to another user, reading a message from

[0034]Domain ontology engine 205 provides domain ontologies indicating
vocabularies specific to different media domains for storage in the
domain ontologies 257, and is one means for doing so. The domain
ontologies 257 encode information relevant to specific domains, and are
beneficial, since nicknames, slang, acronyms, and other shortened terms
commonly are used in certain domains. Domain ontologies 257 may be
organized hierarchically as graphs, where each node in the graph
represents a concept (e.g. "football play," "scoring play") and each edge
represents a relation between concepts (e.g. "type of"). Concept
instances (e.g., a specific touchdown play from a specific football game)
may also be encoded in the domain ontology, as well as, vocabularies that
provide alternate terminology for concept nodes (e.g. "TD" for concept
"touchdown"). The domain ontologies 257 may be engineered based on the
knowledge of human experts or machine-generated. The domain ontologies
are used for initial filtering of social media posts and in the social
media/event alignment process. An exemplary list of social interest
domains for which time-based media is used according to the present
invention includes broadcast video such as television programs, such as
sports, news, episodic television, reality/live event shows, movies, and
advertising in conjunction with any of these domains. More specific
domains also are possible, e.g., football games, entertainment news,
specific reality TV shows, etc., each of which may have their own
domain-specific ontology. The domain ontology engine 205 is configured to
filter the time segments according to a search term, wherein the
graphical display displays only a subset of the series of chronological
time segments corresponding to the search term.

[0035]The author identifier 210 identifies the author, or provider, of
each social media content item, e.g., as provided to the social interest
information provider 130 by the social media sources 110 with the content
items, and is one means for doing so. Additional information about the
authors may be extracted from the content items themselves, e.g., as
stored in the social media content store 260, or extracted from other
external sources. The author information is stored in the social media
author store 263.

[0036]The closed captioning extractor 215 extracts closed captioning data
from the time-based media, and is one means for doing so. Closed
captioning data typically can be extracted from broadcast video or other
sources encoded with closed captions using open source software such as
CCExtractor available via SourceForge.net. For time-based media not
encoded with closed captioning data, imperfect methods such as automatic
speech recognition can be used to capture and convert the audio data into
a text stream comparable to closed captioning text. This can be done, for
example, using open source software such as Sphinx 3 available via
SourceForge.net. Once the closed captioning is ingested, it is preferably
aligned to speech in a video. Various alignment methods are known in the
art. One such method is described in Hauptmann, A. and Witbrock, M.,
Story Segmentation and Detection of Commercials in Broadcast News Video,
ADL-98 Advances in Digital Libraries Conference, Santa Barbara, Calif.
(April 1998), which uses dynamic programming to align words in the closed
captioning stream to the output of a speech recognizer run over the audio
track of the video. The closed captioning information is stored in the
closed captioning store 267.

[0037]The multimedia store 270 stores various forms of time-based media.
Time-based media includes any data that changes meaningfully with respect
to time. Examples include, and are not limited to, videos, (e.g.,
television programs or portions thereof, movies or portions thereof)
audio recordings, MIDI sequences, animations, and combinations thereof.
Time-based media can be obtained from a variety of sources, such as local
or network stores, as well as directly from capture devices such as
cameras, microphones, and live broadcasts. It is anticipated that other
types of time-based media within the scope of the invention will be
developed in the future (e.g., 3D media, holographic presentations,
immersive media, and so forth).

[0038]The event segmentation engine 220 segments time-based media into
semantically meaningful segments corresponding to discrete portions or
"events," and is one means for doing so. Different types of media may
have different types of events which are recognized as part of a video
event segmentation process. For example, a television program or movie
may have scenes and shots; a sporting event may have highly granular
events (e.g., plays, passes, catches, hits, shots, baskets, goals, and
the like) as well has less granular events (e.g., sides, downs, innings,
and the like). A new program may have events such as stories, interviews,
shots, commentary and the like. The video event segmentation process
includes three main components according to one embodiment: shot boundary
detection, event detection, and boundary determination. These components
for event segmentation may vary by domain. The output of video event
segmentation is a set of segmented video events that is stored in the
video event store 280.

[0039]The feature extraction engine 225 converts segmented time-based
media events retrieved from the video event store 280 into feature vector
representations for aligning the events with metadata, and is one means
for doing so. The features may include image and audio properties and may
vary by domain. Feature types may include, but are not limited to,
scale-variant feature transform (SIFT), speeded up robust features
(SURF), local energy based shape histogram (LESH), color histogram, and
gradient location orientation histogram (GLOH).

[0040]The metadata alignment engine 230 aligns video event segments with
semantically meaningful information regarding the event or topic that the
event is about, and is one means for doing so. The metadata alignment
engine 230 uses metadata instances from the event metadata store 273. A
metadata instance is the metadata for a single event, i.e., a single
piece of metadata. The annotation engine 235 annotates the segments with
the metadata, and is one means for doing so. Metadata instances may
include automatic annotations of low level content features, e.g., image
features or content features, hand annotations with text descriptions, or
both. The metadata may be represented as text descriptions of time-based
media events and/or feature vector representations extracted from
examples of events. The annotations are stored in the annotated event
store 290.

[0041]The comparative feature extraction engine 240 converts an annotated
event and a corresponding social media content item into a feature vector
representation, and is one means for doing so. The three major types of
features extracted by the comparative feature extraction engine 240 are
content features, geo-temporal features, and authority features. The
media/event alignment engine 245 aligns the social media content item 610
and annotated event 530 using the extracted features 620, and is one
means for doing so. The media/event alignment engine 245 outputs an
annotated event/social media mapping and associated confidence score to
the mapping store 275.

[0042]The following is a non-comprehensive list of media types that can be
associated with time-based media: audio of commentators on, or
participants of, the event or topic (e.g., announcers on TV or radio) and
text transcriptions thereof (generated manually or automatically),
event-related metadata (e.g., recipes, instructions, scripts, etc.),
statistical data (e.g., sports statistics or financial data streams),
news articles, social media content items, and media usage statistics
(e.g., user behavior such as viewing, rewind, pausing, etc.). The social
media content items include long form and short form social media content
items such as posts, videos, photos, links, status updates, blog entries,
tweets, and the like from various social media and mainstream news
sources that are stored in the social media content store 260. In
general, social networks allow their users to publish text-based content
items to other members of their network, which content items may be open
and viewable by the public through open application program interfaces.

[0043]Typically social media content items are of two varieties: static
text-based media and dynamic text-based media. Static text-based media
describes a large class of information on the Internet (e.g., blogs, news
articles, webpages, etc.). This information changes only minimally once
posted (i.e., is relatively static) and is entirely made up of words
(i.e., is text-based). Dynamic text-based media refer to any of a set of
"data feeds" composed of short, frequently updated user posts to social
network websites that often describe the states and opinions of their
authors.

[0044]For some domains, usage statistics may be ingested, either alone or
generated from the time-based media in the multimedia store 270, and
stored in the usage stats store 265. Usage statistics may include
information regarding how the multimedia data was consumed, e.g., number
of views, length of views, number of pauses, time codes at which a pause
occurs, etc. The statistics can be aggregated with respect to different
populations, such as by user type, location, usage type, media type, and
so forth. The statistics can represent means, modes, medians, variances,
rates, velocities, population measures, and the like.

[0045]The social interest estimator 250 aggregates information from the
annotated event store 290 and the mapping store 275 to estimate social
interest in a given media event using a social interest score, and is one
means for doing so. The social interest score is estimated by the social
interest estimator 250 by cycling through all (or selected) annotated
events, and for each event, taking a weighted sum of the confidence
scores for each social media content item that exceeds a given threshold.
The resulting social interest score is stored in the social interest
store 285.

[0046]The user interface engine 255 converts the social interest into a
format for display on a user interface, e.g., for depicting social
interest heat maps as shown in FIGS. 8A-13D, and is one means for doing
so. The user interface engine 255 allows the client devices 150 to
interact with the user interfaces providing the social interest score.

[0047]The user interface engine 255 provides a user interface display with
three main areas: (1) a social interest heat map area for displaying a
social interest heat map showing the levels of social interest for a
plurality of events corresponding to a series of chronological time
segments, (2) a media display area, visually distinguished from and
concurrently displayed with the social interest heat map area, for
displaying an event selected from the social interest heat map, and (3) a
social media display area, visually distinguished from and concurrently
displayed with the social interest heat map and media display areas, for
displaying social media content items for the selected event.

[0048]Different event types may be displayed such that the different event
types each are visually distinct within the social interest heat map
area, e.g., for a football game on broadcast television, showing events
corresponding to plays of the game in one manner (e.g., a first color)
and events corresponding to commercials in between plays of the game in a
different manner (e.g., a second color).

[0049]In addition, the user interface engine 255 may provide additional
functionality for the user interface. For example, a user interface field
for filtering the time segments according to a keyword or search term,
wherein the social interest heat map area then display only a subset of
time segments matching the search term. See FIG. 11A, reference numeral
1105. In another example, the user interface may allow for separate
display of positive and negative sentiment among aggregated content items
for each event segment. A first portion of the segment may corresponding
to a positive sentiment, a second portion of the segment may correspond
to a negative sentiment, and both segments may be displayed, such that
they are visually distinguished from each other. See FIG. 10A, reference
numerals 1010, 1012. In some embodiments, an additional portion of the
segment may correspond to neutral or uncertain sentiment. The domain
ontology engine 205 may provide the filtering aspects for the user
interface, and the social interest estimator 250 may provide the
sentiment analysis.

Mapping Social Media Content Items to Time-Based Media

[0050]FIG. 3 is a conceptual diagram illustrating the video/metadata
alignment/annotation 320 and social media/event alignment 330 processes
at a high level according to one embodiment. Beginning with metadata
instances 307 and events in time-based media 301 as input, annotated
events 309 are formed. As shown, time-based media (TBM) 301 includes
multiple segments (seg. 1-M) 303, which contain events in the time-based
media, as described herein. The video/metadata alignment/annotation 320
process aligns one or more metadata instances (1-N) 307 with the events
to form annotated events 309, as further described in conjunction with
FIG. 5. The social media/event alignment 330 process aligns, or "maps,"
the annotated events 309 from the video/metadata alignment/annotation 320
to one or more social media content items (A-O) 311, as further described
in conjunction with FIG. 6. Note that in both processes 320, 330, the
various alignments are one-to-one, many-to-one, and/or many-to-many.
Thus, a given social media content item 311 can be mapped to multiple
different annotated events 309, and an annotated event 309 can be mapped
to multiple different social media content items 311. Once so mapped, the
relationships between content items and events can be quantified to
estimate social interest, as further explained below.

[0051]FIG. 3A is a flow diagram illustrating one embodiment of a method
for aligning social media content items (and references to events
therein) with time-based media events, and a related method of
determining social interest in the events based on the resulting
associations.

[0052]Generally, social media content items are candidates for aligning
with time-based media events, and a confidence score is determined for
each indicative of a probability that the content item is relevant to the
event. Based on the confidence scores, the content items may be aligned
with the event, and the alignments are collected in a data store. The
confidence scores are aggregated to produce an aggregate score, and a
level of social interest in the event is established based upon the
aggregate score.

[0053]As a preliminary step in the method, multiple streams of data are
ingested 300 at the social interest information provider 130 for
processing. Data may be received at the social interest information
provider 130 directly from content providers, or via social media sources
110 or time-based media sources 120, e.g., from broadcast television
feeds, directly from content producers, and/or from other third parties.
In one embodiment, web server 200 is one means for ingesting 300 the
data. The types of data may include, but are not limited to, time-based
media, closed captioning data, statistics, social media posts, mainstream
news media, and usage statistics, such as described above.

[0054]The ingested data is stored in data stores specific to one or more
data types that serve as the input data sources for the primary processes
of the method of FIG. 3A (each shown in bold). For example, time-based
media data is stored in the multimedia store 270. The time-based media in
the multimedia store 270 may undergo additional processing before being
used within the methods shown in FIGS. 3-7. For example, closed
captioning data can be extracted from, or created for 305, the time-based
media, e.g., by closed captioning extractor 215. In addition, for some
domains, usage statistics may be ingested, either alone or generated from
the time-based media in the multimedia store 270, and stored in the usage
stats store 265. In addition, event metadata associated with multimedia
is stored in the event metadata store 273, social media content items as
described herein are stored in the social media content store 260,
information about authors of social media content items are stored in the
social media author store 263, and domain ontologies indicating, for
example, vocabularies specific to different media types, are stored in
the domain ontologies 257.

[0055]As a result of the ingestion referenced above, the multimedia store
270 includes various forms of time-based media. The time-based media may
be of various types, as described in conjunction with FIG. 2.

[0056]As shown in FIG. 3A, there are three major processes involved in the
method according to the depicted embodiment: video event segmentation
310, video metadata alignment 320, and social media/event
alignment/mapping 330. In addition, an optional process, social interest
estimation 340, may be included in the method. Each of these processes
310-340 are described below.

Video Event Segmentation

[0057]The first process is video event segmentation 310, in which the
time-based media is segmented into semantically meaningful segments
corresponding to discrete events depicted in video. The input to the
video event segmentation 310 process is a raw video (and/or audio) stream
that is retrieved from the multimedia store 270 according to one
embodiment, and may be performed, e.g., by the event segmentation engine
220, which is one means for performing this function.

[0058]The video event segmentation 310 process is domain dependent to some
extent, e.g., in video of sporting events, event segments may be equated
with individual plays, while in broadcast television, event segments may
be equated with individual scenes and advertisements. Thus the event
types and segment size may vary based on the domain type, and for some
media, e.g., short format media such as very short video clips, the
entire clip is treated as one segment. They system may be pre-configured
with information about to which domain the video belongs. This
configuration may be implemented by hand on a case by case basis, or
based on a preloaded schedule based on the source of video and time of
day (using, for example, a programming guide of broadcast television
shows).

[0059]Segmentation may be achieved via human annotation, known automated
methods, or a hybrid human/automatic approach in which automatic segment
boundaries are corrected by human annotators according to various
embodiments. One automated method is described in Fleischman, M. and Roy,
D., Unsupervised Content-Based Indexing of Sports Video Retrieval, 9th
ACM Workshop on Multimedia Information Retrieval (MIR), Augsburg, Germany
(September 2007).

[0060]The video event segmentation 310 process includes three main
components according to one embodiment: shot boundary detection, event
detection, and boundary determination. These components may vary by
domain. For example, for sporting events an additional component may
correspond to scene classification (e.g., field or stadium
identification).

[0061]The output of video event segmentation 310 is a set of segmented
video events that are stored in the video event store 280. Video event
segmentation 310 is described in further detail in conjunction with FIG.
4.

Metadata Alignment/Annotation

[0062]The next process is metadata alignment/annotation 320, in which the
segments from video event segmentation 310 are annotated with
semantically meaningful information regarding the event that the segment
is relevant to, or depicts. Input to metadata alignment/annotation 320 is
a video event retrieved from the video event store 280 and metadata from
the event metadata store 273. Such metadata can include, but is not
limited to: the type of event occurring, the agents involved in the
event, the location of the event, the time of the event, the
results/causes of the event, etc.

[0063]As with event segmentation 310, the metadata alignment/annotation
320 process is domain dependent. For example, in American football,
metadata for an event may include information such as "Passer: Tom Brady,
Result: Touchdown, Receiver: Randy Moss," while metadata for an event in
a Television series may include information such as: "Agent: Jack Bauer,
Location: White House, Time: 3:15 pm," and for an advertisement the
metadata may include information such as "Brand: Walmart, Scene: father
dresses up as clown, Mood: comic." As illustrated in these examples, the
metadata can be structured as tuples of <name, value> pairs.

[0064]The metadata includes text and, for certain domains, lower level
image and audio properties. Metadata may be generated using human
annotation (e.g., via human annotators watching events or samples
thereof) and, in certain domains, may be supplemented with automatic
annotations for use in the alignment process (e.g., describing lower
level image and audio properties of the event such as number and length
of each shot, average color histograms of each shot, power levels of the
associated audio, etc.) The annotation is stored in the annotated event
store 290.

[0065]Metadata alignment/annotation 320 includes two steps according to
one embodiment: event feature extraction and video metadata alignment.
Metadata alignment/annotation 320 is described in further detail in
conjunction with FIG. 5.

[0066]According to another embodiment, data ingestion 300, video event
segmentation 310, and video metadata alignment 320 could be performed by
a separate entity, such as a content provider or owner, e.g., which does
not want to release the content to others. In this embodiment, the social
interest information provider 130 would provide software, including the
software modules and engines described herein, to the separate entity to
allow them to perform these processes on the raw time-based media. The
separate entity in return could provide the social interest information
provider 130 with the extracted features and other inputs to the social
media/event alignment 330 and social interest estimation 340 processes,
which then would be used by the social interest information provider 130
in those processes. These data exchanges could take place via an
application programming interface (API) provided by the social interest
information provider 130 and exposed to the separate entity, e.g., via
web server 200. The social interest information provider 130 would then
compute the social interest information and provide that back to the
entity, as either data, or displayed information, for example using the
interfaces shown in FIGS. 8A-13D.

[0067]Social Media/Event Alignment

[0068]The next step is to integrate the annotated time-based media event
segments with social media content items that refer to the events. Input
to social media/event alignment 330 according to one embodiment is an
annotated event retrieved from the annotated event store 290, a social
media content item retrieved from the social media content store 260, a
domain ontology retrieved from the domain ontologies 257, and optionally
author information about the social media content item author retrieved
from the social media author store 263.

[0069]Unfortunately, social media content items often are ambiguous as to
whether they refer to an event at all, and if so, which event they refer
to. For example, a simple social media content item, such as the single
word post "Touchdown!" may refer to an event in a football game, or it
may be used as a metaphor for a success in areas unrelated to football.
In order to address such ambiguities, the social media/event alignment
330 determines a confidence score that a given social media content item
refers to a specific event. The method takes as input a single social
media content item and a single annotated event, and outputs a score
representing the confidence (i.e., likelihood, probability) that the
social media content item is relevant to the event. A social media
content item can be relevant to an event by referring to the event. The
social media/event alignment 330 function operates on features of the
individual social media content items and annotated events, and can be
trained using supervised learning methods or optimized by hand. The
media/event alignment engine 245 is one means for performing this
function.

[0070]The output of social media/event alignment 330 is a mapping between
an annotated event and a social media content item (and/or references to
events therein) and an associated confidence score. The mapping and
confidence score are stored in a mapping store 275. The social
media/event alignment 330 process is described in further detail in
conjunction with FIG. 6.

[0071]The mappings output by social media/event alignment 330 are useful
in and of themselves, as they may be used as the basis for multiple
applications, such as, ranking of search results for time-based media,
automatic recommendations for time-based media, prediction of audience
interest for media purchasing/planning, and estimation of social interest
as described further below.

Social Interest Estimation

[0072]One of the uses of the social media/event mappings is the estimation
of social interest in various events. Social interest in an event may be
estimated by aggregating the information gleaned from the processes
described with respect to FIG. 3A. The input to social interest
estimation 340 is an annotated event retrieved from the annotated event
store 290 and the annotated event social media mapping retrieved from the
mapping store 275. In addition, inputs from the social media content
store 260 and social media author store 263 may be used as part of the
weighting process. The social interest estimator 250 is one means for
performing this function.

[0073]The social interest estimation 340 is achieved for an annotated
event by cycling through all social media content items associated with
that event (as indicated by the presence of an annotated event/social
media mapping 630 (FIG. 6) in the mapping store 275), and taking a
weighted sum of the confidence scores for each social media content item.
In one embodiment, a weighted sum of the confidence scores is taken for
social media content items that exceed a threshold. In other embodiments,
no threshold is used or a function with a "sliding scale" of (score,
weight) where the weight is applied to the score, and then added to the
sum. The effect of this weighting is that the events that are associated
with more social media content items (and references to events therein)
correlate with higher estimated social interest. In addition, social
interest in an event often is dependent on the source, author, and/or
sentiment of the social media content item referencing it, as described
further in conjunction with weighting function 710 in FIG. 7.

[0074]The output of the social interest estimation 340 is a social
interest score that is stored in the social interest store 285. The
social interest estimation 340 is described in further detail in
conjunction with FIG. 7. In addition, the social interest estimation 340
results may be displayed to a user of a social interest information
device 150, e.g., using user interface engine 255, as described in
conjunction with FIGS. 8A-13D.

[0075]The social interest score may be used as the basis for multiple
applications, such as data analytics, media planning, ranking of search
results for time-based media, automatic recommendations for time-based
media, direct end-user data navigation via a user interface, and
prediction of audience interest for media purchasing/planning to name a
few.

Event Segmentation

[0076]FIG. 4 is a flow diagram illustrating one embodiment of a video
event segmentation process 310. As described in FIG. 3A, video event
segmentation 310 segments time-based media into semantically meaningful
segments corresponding to discrete video portions or "events," e.g., via
event segmentation engine 220, which is one means for performing this
function.

[0077]Input to the video event segmentation process 310 is a video stream
405 from the multimedia store 270. Video event segmentation 310 includes
3 phases: shot boundary detection 410, event detection 420, and event
boundary determination 430, each of which is described in greater detail
below. The output of video event segmentation 310 is a segmented video
event 435, which is stored in the video event store 280.

[0078]Shot Boundary Detection

[0079]The first step in segmenting is shot boundary detection 410 for
discrete segments (or "shots") within a video. Shot boundaries are points
of non-continuity in the video, e.g., associated with a change in a
camera angle or scene. Shot boundaries may be determined by comparing
color histograms of adjacent video frames and applying a threshold to
that difference. Shot boundaries may be determined to exist wherever the
difference in the color histograms of adjacent frames exceeds this
threshold. Many techniques are known in the art for shot boundary
detection. One exemplary algorithm is described in Tardini et al., Shot
Detection and Motion Analysis for Automatic MPEG-7 Annotation of Sports
Videos, 13th International Conference on Image Analysis and Processing
(November 2005). Other techniques for shot boundary detection 410 may be
used as well, such as using motion features. Another known technique is
described in A. Jacobs, et al., Automatic shot boundary detection
combining color, edge, and motion features of adjacent frames, Center for
Computing Technologies, Bremen, Germany (2004).

[0080]Event Detection

[0081]Event detection 420 identifies the presence of an event in a stream
of (one or more) segments using various features corresponding, for
example, to the image, audio, and/or camera motion for a given segment. A
classifier using such features may be optimized by hand or trained using
machine learning techniques such as those implemented in the WEKA machine
learning package described in Witten, I. and Frank, E., Data Mining:
Practical machine learning tools and techniques (2nd Edition), Morgan
Kaufmann, San Francisco, Calif. (June 2005). The event detection process
420 details may vary by domain.

[0082]Image features are features generated from individual frames within
a video. They include low level and higher level features based on those
pixel values. Image features include, but are not limited to, color
distributions, texture measurements, entropy, motion, detection of lines,
detection of faces, presence of all black frames, graphics detection,
aspect ratio, and shot boundaries.

[0083]Speech and audio features describe information extracted from the
audio and closed captioning streams. Audio features are based on the
presence of music, cheering, excited speech, silence, detection of volume
change, presence/absence of closed captioning, etc. According to one
embodiment, these features are detected using boosted decision trees.
Classification operates on a sequence of overlapping frames (e.g., 30 ms
overlap) extracted from the audio stream. For each frame, a feature
vector is computed using Mel-frequency cepstral coefficients (MFCCs), as
well as energy, the number of zero crossings, spectral entropy, and
relative power between different frequency bands. The classifier is
applied to each frame, producing a sequence of class labels. These labels
are then smoothed using a dynamic programming cost minimization
algorithm, similar to those used in hidden Markov models.

[0084]In addition to audio features, features may be extracted from the
words or phrases spoken by narrators and/or announcers. From a domain
specific ontology 257, a predetermined list of words and phrases is
selected and the speech stream is monitored for the utterance of such
terms. A feature vector representation is created in which the value of
each element represents the number of times a specific word from the list
was uttered. The presence of such terms in the feature vector correlates
with the occurrence of an event associated with the predetermined list of
words. For example, the uttering of the phrase "touchdown" is correlated
with the occurrence of a touchdown in sports video.

[0085]Unlike image and audio features, camera motion features represent
more precise information about the actions occurring in a video. The
camera acts as a stand in for a viewer's focus. As actions occur in a
video, the camera moves to follow it; this camera motion thus mirrors the
actions themselves, providing informative features for event
identification. Like shot boundary detection, there are various methods
for detecting the motion of the camera in a video (i.e., the amount it
pans left to right, tilts up and down, and zooms in and out). One
exemplary system is described in Bouthemy, P., et al., A unified approach
to shot change detection and camera motion characterization, IEEE Trans.
on Circuits and Systems for Video Technology, 9(7) (October 1999); this
system computes the camera motion using the parameters of a
two-dimensional affine model to fit every pair of sequential frames in a
video. According to one embodiment, a 15-state first-order hidden Markov
model is used, implemented with the Graphical Modeling Toolkit, and then
the output of the Bouthemy is output into a stream of clustered
characteristic camera motions (e.g., state 12 clusters together motions
of zooming in fast while panning slightly left). Some domains may use
different, or additional, methods of identifying events. For example, in
American football, an additional factor may be scene classification. In
scene classification, once a shot boundary is detected a scene classifier
is used to determine whether that shot is primarily focused on a
particular scene, e.g., a playing field. Individual frames (called key
frames) are selected from within the shot boundaries and represented as a
vector of low level features that describe the key frame's color
distribution, entropy, motion, etc. A shot is determined to be of a
particular scene if a majority of the sampled frames is classified as
that scene.

[0086]Event Boundary Determination

[0087]Once a segment of video is determined to contain the occurrence of
an event, the beginning and ending boundaries of that event must be
determined 430. In some cases, the shot boundaries determined in 410 are
estimates of the beginning and end of an event. The estimates can be
improved as well by exploiting additional features of the video and audio
streams to further refine the boundaries of video segments. Event
boundary determination 430 may be performed using a classifier that may
be optimized by hand or using supervised learning techniques. The
classifier may make decisions based on a set of rules applied to a
feature vector representation of the data. The features used to represent
video overlap with those used in the previous processes. Events have
beginning and end points (or offsets), and those boundaries may be
determined based on the presence/absence of black frames, shot
boundaries, aspect ratio changes, etc., and have a confidence measure
associated with the segmentation. The result of event boundary
determination 430 (concluding video event segmentation 410) is a (set of)
segmented video event 435 that is stored in the video event store 280.

Metadata Alignment/Annotation

[0088]FIG. 5 is a flow diagram illustrating one embodiment of a metadata
alignment/annotation 320 process. As described in FIG. 3A, the metadata
alignment/annotation 320 process produces annotations of the segments
from video event segmentation 310, which annotations include semantically
meaningful information regarding the event or topic that the segment is
about. Metadata alignment/annotation 320 includes two steps: event
feature extraction 315 and video metadata alignment 520.

Video Feature Extraction

[0089]For any given video event that is to be aligned with metadata, the
first step is to convert the video event into a feature vector
representation via feature extraction 315. The feature extraction engine
225 is one means for performing this function. Input to the process is a
segmented video event 435 retrieved from the video event store 280.
Output from the video feature extraction 315 is a video event feature
representation 510. The features may be identical to (or a subset of) the
image/audio properties discussed above for video events and stored in the
event metadata store 273, and may vary by domain.

Video Metadata Alignment

[0090]Video metadata alignment 520 takes as input the feature vector
representation 510 of an event and a metadata instance 505, defined above
as metadata corresponding to a single event. The metadata alignment
engine 230 is one means for performing this function. It cycles through
each metadata instance 505 in the event metadata store 273 and uses an
alignment function to estimate the likelihood that a particular event may
be described by a particular metadata instance for an event. As described
above, metadata instances may include automatic annotations of low level
content features (e.g., image or audio features), hand annotations of
text descriptions, or both. For domains in which the metadata includes
low level features, the alignment function may be a simple cosign
similarity function that compares the feature representation 510 of the
event to the low level properties described in the metadata instance 505.
For domains in which metadata instances do not include automatic
annotations of low level features, the video metadata alignment 520
method may employ a model which encodes relationships between low level
features and descriptive text. One exemplary model is described in
Fleischman, M. and Roy, D., Grounded Language Modeling for Automatic
Speech Recognition of Sports Video, Proceedings of the Association of
Computational Linguistics (ACL), Columbus, Ohio, pp. 121-129 (June 2008).
This method uses grounded language models that link visual and text
features extracted from a video to the metadata terms used to describe an
event. For the purposes of this example, grounded language models can be
manually estimated based on the visual and text features used for event
segmentation, from which the following equation describes the likelihood
that any particular metadata annotation describes a particular video
event:

The grounded language model is used to calculate the probability that each
video event found is associated with each human generated metadata
annotation.

[0091]When all metadata instances 505 in the event metadata store 273
corresponding to the event have been examined, if the most likely
alignment 525 (i.e., alignment with the highest probability or score)
passes a threshold, the video event associated with the feature
representation 510 is annotated with the metadata instance 505 and the
resulting annotated event 530 is stored in an annotated event store 290
along with a score describing the confidence of the annotation. If no
event passes the threshold, the event is marked as not annotated. In
order to set this threshold, a set of results from the process is hand
annotated into two categories: correct and incorrect results.
Cross-validation may then be used to find the threshold that maximizes
the precision/recall of the system over the manually annotated result
set.

Social Media/Event Alignment

[0092]FIG. 6 is a flow diagram illustrating one embodiment of a social
media/event alignment 330 process. Social media/event alignment 330
associates (maps) the annotated time-based media event segments with
social media content items and references to the events therein.

[0093]Filtering

[0094]As an initial and optional step, social media filtering step 605
occurs; the domain ontologies 257 are one means for performing this
function. Social media content items are filtered in order to create a
set of candidate content items with a high likelihood that they are
relevant to a specific event. Content items can be relevant to an event
by including a reference to the event.

[0095]In this optional step, before social media content items are
integrated with video events, a candidate set of content items is
compiled based on the likelihood that those posts are relevant to the
events, for example, by including at least one reference to a specific
event. The comparative feature extraction engine 240 is one means for
performing this function. At the simplest, this candidate set of content
items can be the result of filtering 605 associated with a given time
frame of the event in question. Temporal filters often are far too
general, as many content items will only coincidentally co-occur in time
with a given event. In addition, for broadcast television, e.g., the
increasing use of digital video recorders has broadened significantly the
relevant timeframe for events.

[0096]Additional filters 605 are applied based on terms used in the
content item's text content (e.g., actual texts or extracted text from
closed caption or audio) that also appear in the metadata for an event
and/or domain specific terms in the ontologies 257. For example, content
item of a social network posting of "Touchdown Brady! Go Patriots" has a
high probability that it refers to an event in a Patriots football game
due to the use of the player name, team name, and play name, and this
content item would be relevant to the event. In another example, a
content item of a post that "I love that Walmart commercial" has a high
probability that it refers to an advertisement event for Walmart due to
the use of the store name, and the term "commercial," and thus would
likewise be relevant to this event. To perform this type of filtering,
terms are used from the metadata of an event as well as those
domain-specific terms stored in ontology 257.

[0097]A social media content item can be relevant to an event without
necessarily including a direct textual reference to the event. Various
information retrieval and scoring methods can be applied to the content
items to determine relevancy, based on set-theoretic (e.g., Boolean
search), algebraic (e.g., vector space models, neural networks, latent
semantic analysis), or probabilistic models (e.g., binary independence,
or language models), and the like.

[0098]Social media content items that do not pass certain of these initial
filters, e.g., temporal or content filters, are removed from further
processing, reducing the number of mappings that occur in the latter
steps. The output of social media filtering 605 is an updated social
media content store 260, which indicates, for each content item, whether
that content item was filtered by temporal or content filters. Additional
filters may apply in additional domains.

[0099]Alignment/Mapping

[0100]Social media/annotated event alignment 330 includes a feature
extraction process 620 and an alignment function 625. The feature
extraction process 620 converts input of an annotated event 530 and a
social media content item 610 into a feature vector representation, which
is then input to the alignment function 625. The feature extraction
process 620 also may receive input from the social media author store 263
and the domain ontologies 257. The three major types of features
extracted in this process 620 are content features 620c, geo-temporal
features 620b, and authority features 620a. The comparative feature
extraction engine 240 is one means for performing this function, which
identifies a relationship between the event features and social media
features. The relationship may be co-occurrence, correlation, or other
relationships as described herein.

[0101]Content features 620c refer to co-occurring information within the
content of the social media content items and the metadata for the video
events, e.g., terms that exist both in the content item and in the
metadata for the video event. Domain ontologies 257 may be used to expand
the set of terms used when generating content features.

[0102]Geo-temporal features 620b refer to the difference in location and
time at which the input media was generated from a location associated
with the social media content item about the event. Such information is
useful as the relevance of social media to an event is often inversely
correlated with the distance from the event (in time and space) that the
media was produced. In other words, social media relevant to an event is
often produced during or soon after that event, and sometimes by people
at or near the event (e.g., a sporting event) or exposed to it (e.g.,
within broadcast area for television-based event).

[0103]For video events, geo-temporal information can be determined based
on the location and/or time zone of the event or broadcast of the event,
the time it started, the offset in the video that the start of the event
is determined, the channel on which it was broadcast. For social media,
geo-temporal information can be part of the content of the media itself
(e.g., a time stamp on a blog entry or status update) or as metadata of
the media or its author.

[0104]The temporal features describe the difference in time between when
the social media content item was created from the time that the event
itself took place. In general, smaller differences in time of production
are indicative of more confident alignments. Such differences can be
passed through a sigmoid function such that as the difference in time
increases, the probability of alignment decreases, but plateaus at a
certain point. The parameters of this function may be tuned based on an
annotated verification data set. The spatial features describe the
distance from the author of the content item location relative to the
geographical area of the event or broadcast. Spatial differences are less
indicative because often times people comment on events that take place
far from their location. A sigmoid function may be used to model this
relationship as well, although parameters are tuned based on different
held out data.

[0105]Authority features 620a describe information related to the author
of the social media and help to increase the confidence that a social
media content item refers to a video event. The probability that any
ambiguous post refers to a particular event is dependent upon the prior
probability that the author would post about a similar type of event
(e.g., a basketball game for an author who has posted content about prior
basketball games). The prior probability can be approximated based on a
number of features including: the author's self-generated user profile
(e.g., mentions of a brand, team, etc.), the author's previous content
items (e.g., about similar or related events), and the author's friends
(e.g., their content contributions, profiles, etc.). These prior
probability features may be used as features for the mapping function.

[0106]The alignment function 625 takes the set of extracted features
620a-c and outputs a mapping 630 and a confidence score 640 representing
the confidence that the social media content item refers to the video
event. The media/event alignment engine 245 is one means for performing
this function. For each feature type 620a-c, a feature specific
sub-function generates a score indicating whether the social media
content item refers to the annotated event. Each sub-function's score is
based only on the information extracted in that particular feature set.
The scores for each sub-function may then be combined using a weighted
sum, in order to output a mapping 630 and an associated confidence score
640, as shown below for an event x and a social media content item y:

[0107]where α, β, and γ are the respective weights
applied to the three feature types, and align(feat(x,y)) is the
confidence score. Both the weights in the weighted sum, as well as the
sub-functions themselves may be trained using supervised learning
methods, or optimized by hand. The output of the social media/event
alignment function 330 is a mapping between an annotated event and a
social media content item. This mapping, along with the real-value
confidence score is stored in the mapping store 275.

Social Interest Estimation

[0108]FIG. 7 is a flow diagram illustrating one embodiment of a social
interest estimation process 340. Social interest in an event may be
estimated by aggregating the information gleaned from the video event
segmentation 310, video metadata alignment 320, and social media/event
alignment 330 processes. The social interest estimator 250 is one means
for performing this function.

[0109]Input to the social interest estimation process 340 includes an
annotated event 530 retrieved from the annotated event store 290 and an
annotated event/social media mapping 620 retrieved from the mapping store
275. In addition, data from the social media content store 260 and social
media author store 263 may be used for the weighting function 710.

[0110]For each of the media types, social interest is estimated based on a
weighted count of references to particular events in each social media
content item. Social media content items relevant to an event are
indicative of interest, and by discovering and aggregating such content
items and references to events therein, a social interest score is
generated that represents the level of social interest of the event based
on the aggregated content items.

[0111]For a particular event, the social interest estimation process 340
includes the computation of a weighted sum over all social media content
items that include at least one reference to an event. The computation
proceeds by cycling through all social media content items that refer to
that event (as determined in the social media/annotated event alignment
330 process). For each item aligned to that event the social interest
score for that event is incremented by a weighted value based on the
metadata of the content item. Thus, the output social interest score 720
can be thought of as an aggregate score aggregated across the confidence
scores 640 for each event.

[0112]These weights typically can be set from zero to one depending on the
configuration of the system. The weights are multiplicative, and are
based on various factors described below: as social media content weights
710a, source-based weights 710b, author-based weights 710c, and/or
event-based weights 710d.

[0113]Social media content weights 710a can be used in the social interest
estimation process 340 based on, for example, the sentiment of the media
that mention it. For example, scores can be weighted such that interest
is computed based only on posts that describe positive sentiment toward
an event (i.e., only posts from authors who expressed positive sentiment
toward the event are incorporated in the weighted sum). The sentiment
expressed in a social media content item may be automatically identified
using a number of techniques. Exemplary techniques are described in B.
Pang and L. Lee, Opinion Mining and Sentiment Analysis, Foundations and
Trends in Information Retrieval 2(1-2), pp. 1-135 (2008).

[0114]Source-based weights 710b can be used in the social interest
estimation process 340 based on how (e.g., in what form) an event is
mentioned. Some sources may be given higher weight if they are determined
to be more influential as measured by, for example, the size of their
audience (as estimated, for example, by QuantCast Corporation, San
Francisco, Calif.) or the number of inbound links to the source site.
Further, certain sources may be given higher weight in order to generate
social interest scores for specific communities of users. For example, a
social interest score may be computed based on only social media content
items generated by sources of a particular political leaning (e.g.,
Republican or Democrat) by setting the weights to zero of all content
items with sources that are not predetermined to be of that particular
political leaning (e.g., where the political leaning of a source is
determined by a human expert or a trained machine classifier).

[0115]Author-based weights 710c can be used in the social interest
estimation process 340 to bias the social interest estimate toward
specific communities of users. For example, the estimate of social
interest may be biased based on demographic information about the author
of the post, such that, for example, only posts that were generated by
men older than 25 years old are given weight greater than zero.
Determination of such demographic information may come from an
examination of publicly available data posted by the author themselves,
by human annotation of specific authors based on expert opinion, or by
machine classifiers trained on human labeled examples. In the sports
context, estimate of social interest can be weighted toward only fans of
the home team by filtering posts based on their location of origin (i.e.
only posts from authors in the home team's city are incorporated in the
weighted sum) or previous history of posts (i.e. the author has a history
of posting positive remarks about the home team).

[0116]Event-based weights 710d can be used in the social interest
estimation process 340 based on evidence of social interest within the
time-based media stream itself. Examples of such media include, but are
not limited to, series television shows, and broadcast sports games. In
such time-based media, multiple features exist that provide information
useful for estimating social interest. Examples of this include, but are
not limited to, visual analysis (e.g., looking for specific events, such
as explosions), audio analysis (e.g., identification of high energy sound
events, such as excited speech), natural language analysis (e.g.
identification of key terms and phrases, such as "home run"), and video
event analysis (e.g., evaluation of replayed events such as those shown
at the beginning of series television shows or intermittently in sports
broadcasts such as an instant replay in a sporting event). Weights based
on such events themselves are predetermined using analysis of human
labeled examples.

[0117]Further, the social interest scores can be weighted based on the
behaviors of viewers of the time-based media, as stored in the usage
statistics 265. Such user behavior is integrated based upon the timing of
user content items relative to media and presentation times of the events
(e.g., how often a particular event was replayed). Analysis of these
behaviors across multiple users can be indicative of social interest, for
example, when the same section of media is paused and reviewed multiple
times (by multiple people). Other recordable user behavior from the usage
statistics 265 that can be used for the social interest estimation
process 340 includes, but is not limited to, viewing times, dwell times,
click through rates for advertisements, search queries, sharing behavior,
etc.

[0118]The output of the social interest estimation process 340 is a social
interest score 720 that is stored in the social interest store 285. The
social interest score 720 may be used to provide information for a user
interface, e.g., as described in the displays depicted herein, via user
interface engine 255, which is one means for performing this function.

[0119]To further illustrate the methods for associating social media
content items with time-based media events, and for determining social
interest in the events based on the resulting associations, two examples
follow in the domains of American football and commercial advertising.

Example: American Football

[0120]As described in conjunction with FIG. 3A, multiples streams of data
are ingested as a preliminary step in the method. For the football
domain, in addition to the data discussed in FIG. 3, an additional source
of data comes from statistical feeds that contain detailed metadata about
events (with text descriptions of those events) in a football game.
Statistical feed are available from multiple sources such as the NFL's
Game Statistics and Information System and private companies such as
Stats, Inc.

Video Event Segmentation

[0121]In the video event segmentation 310 process for American football,
the time-based media, e.g., a broadcast television feed for a football
game, is segmented into semantically meaningful segments corresponding to
discrete "events" that include plays in a game (and advertisements in
between).

[0122]The first step in segmenting events in a football video is to detect
the shot boundaries of a video. Shot boundaries are points in a video of
non-continuity, often associated with the changing of a camera angle or a
scene. In the domain of American football, changes in camera angles are
typically indicative of changes in plays.

[0123]In the football domain, event detection 420 may operate by first
identifying shots that depict the football field. Once a shot boundary is
detected, a scene classifier is be used to determine whether that shot is
primarily focused on the playing field. Field shots may then be further
classified as depicting a game event (i.e. a play). In the football
domain, during event boundary determination 430 the beginning and end
points (i.e., in/out points) of an event may be refined to reflect more
appropriate start and stop points of a play. Such in/out points may be
adjusted based on clock characterization, and/or utterance segmentation.
In a professional football game, the beginning and end of a play is
sometimes (but not always) associated with the starting or stopping of
the play clock. This play clock is often shown as a graphic overlay in a
broadcast football game. The starting/stopping of this play clock can be
determined by monitoring the amount of change (in pixels) of a frame
sub-region (i.e., the region containing the play clock graphic) in the
video over time. When the aggregate change in such sub-regions falls
below a threshold for greater than one second, the state of the
play-clock is assumed to be "inactive." If the aggregate change goes
above a threshold, the state of the play-clock is assumed to be "active."
Changes in the state of the play-clock are strong indicators that an
event has either begun or ended in the video.

[0124]Aesthetic judgment is often required when determining boundaries for
the precise start and end points of a play. Approximating such judgments
can be accomplished using the utterance boundaries in the speech of the
game announcers. These utterances boundaries can be detected by
identifying pauses in the stream of speech in the video. Pauses can be
identified using audio processing software, such as is found in Sphinx 3.

[0125]Thus, the output of video event segmentation 310 for an American
football game on broadcast television is a set of segmented video events
corresponding to plays in a game.

Video Metadata Alignment/Annotation

[0126]The process of metadata alignment/annotation 320 in American
football operates on the video stream segmented into events based on
plays in the game. These events are annotated with metadata concerning
the type of event shown (e.g. "touchdown"), key players in those events
(e.g. "Tom Brady"), the roles of those players (e.g. "Passer"), and,
details of the event (e.g. "number of yards gained"). This metadata can
be added manually by human experts, fully automatically by a machine
algorithm, or semi-automatically using a human-machine hybrid approach.
Metadata is stored in the event metadata store 273.

[0127]For each event (i.e., play) that is to be aligned with metadata, the
play is converted into a feature vector representation via feature
extraction 315. Video metadata alignment 520 then takes as input the
feature vector representation 510 of a single play and a metadata
instance 505. It cycles through each metadata instance 505 in the event
metadata store 273 and estimates the likelihood that the particular play
may be described by a particular metadata instance using, for example, a
probabilistic model. One exemplary model is the grounded language model
described above.

[0129]In addition to exact matches, the domain ontology 257 of football
terms is used to expand the term set to include synonyms and hypernyms
(e.g., "TD" or "score" for "touchdown"), as well as nicknames for players
(e.g. "Tom Terrific" for "Brady").

[0130]Authority feature representations express the prior probability that
any author of social media content may be referring to a football event.
One factor in the estimation of this probability may be based on the
friends, followers, or other connections to a user in their social
network. Such connections are indicative of an author's likelihood to
post about a football event, which can provide additional features for
the social media/event alignment 330 function. The more friends someone
keeps who post about football events, the more likely they will post
about football events. To capture this information, meta-scores are
generated for a user based on the frequency that their contacts have
posted about football events. The meta-scores are the average, mode, and
median of all of the frequency of their friends' football posts.

[0131]The output of social media/event alignment 330 is a mapping between
the annotated play and each social media content item, with an associated
confidence score.

[0132]If information about the social interest in the play is desired, it
may be estimated by aggregating the information gleaned from the above
processes. The social interest estimation 340 may be calculated for every
play in the game. The likely result is higher social interest scores for
plays such as touchdowns, and lower social interest scores for lesser
plays.

Example: Advertising

[0133]As described in conjunction with FIG. 3A, multiples streams of data
are ingested as a preliminary step in the method.

[0134]Video Event Segmentation

[0135]For the advertising domain, during the video event segmentation 310
process, the time-based media is segmented into semantically meaningful
segments corresponding to discrete "events" which are identified with
advertisements (i.e. commercials).

[0136]Event detection 420 in the advertising domain may operate by
identifying one or more shots that may be part of an advertising block
(i.e. a sequence of commercials within or between shows). Advertising
blocks are detected using image features such as the presence of all
black frames, graphics detection (e.g. presence of a channel logo in the
frame), aspect ratio, shot boundaries. Speech/audio features may be used
including detection of volume change, and the presence/absence of closed
captioning.

[0137]Event boundary detection 430 operates on an advertisement block and
identifies the beginning and ending boundaries of individual ads within
the block. Event boundary determination may be performed using a
classifier based on features such as the presence/absence of black
frames, shot boundaries, aspect ratio changes. Classifiers may be
optimized by hand or using machine learning techniques.

Video Metadata Alignment/Annotation

[0138]As with event segmentation 310, the video metadata
alignment/annotation 320 process is domain dependent. In the
advertisement domain, metadata for an advertisement may include
information such as "Brand: Walmart, Scene: father dresses up as clown,
Mood: comic." This metadata is generated by human annotators who watch
sample ad events and log metadata for ads, including, the key
products/brands involved in the ad, the mood of the ad, the
story/creative aspects of the ad, the actors/celebrities in the ad, etc.

[0139]Metadata for advertisements may also include low level image and
audio properties of the ad (e.g. number and length of shots, average
color histograms of each shot, power levels of the audio, etc.).

[0140]For each event (i.e., advertisement) that is to be aligned with
metadata, the advertisement is converted into a feature vector
representation via feature extraction 315. Video metadata alignment 520
then takes as input the feature vector representation 510 of a single
advertisement and a metadata instance 505. It cycles through each
metadata instance 505 in the event metadata store 273 and estimates the
likelihood that the particular advertisement may be described by a
particular metadata instance using, for example, a simple cosign
similarity function that compares the low level feature representation of
the ad event to the low level properties in the metadata.

[0142]In addition to exact matches, the domain ontologies 257 that encode
information relevant the advertising domain may be used to expand the
term set to include synonyms and hypernyms (e.g., "hilarious" for
"comic"), names of companies, products, stores, etc., as well as,
advertisement associated words (e.g., "commercial").

[0143]The output of social media/event alignment 330 is a mapping between
the annotated advertisement and each social media content item, with an
associated confidence score.

[0144]If information about social interest in the advertisement is
desired, it may be estimated by aggregating the information gleaned from
the above processes. The social interest estimation 340 may be calculated
for every advertisement in an advertising block or television show. The
likely result is higher social interest scores for particularly
interesting or funny advertisements, and lower social interest scores for
less exciting or repetitive advertisements.

[0145]Although American football and advertising domains are described
above, the methods described herein can be adapted to any domain using
time-based media. The method of adaptation is general across different
domains and focuses on two changes. First, techniques and features used
for event segmentation and annotation are adapted to reflect domain
specific characteristics. For example, detecting events in football
exploits the visibility of grass as it is represented in the color
distributions in a video frame, while detecting events in news video may
exploit clues in the closed captioning stream and graphic overlays in the
frames. The second change involves the ontology used to link events to
social media content items which refer to them. While for football, the
requisite ontology contains concepts related to football players, teams,
and events, domains such as news video require ontologies with concepts
related to germane concepts such as current events and culturally popular
figures.

Display of Social Interest Estimation

[0146]As mentioned above, the social interest estimations can be used in
various ways. One such application is to display social interest in
various user interfaces and graphic representations. FIGS. 8A and 8B show
two embodiments of social interest heat maps 810, 820 showing levels of
social interest for a plurality of events corresponding to a series of
chronological time segments in a time-based medium.

[0147]FIG. 8A shows a social interest heat map 810 corresponding to a
football game, in which individual events (plays and advertisements) 815
are shown as vertical bars chronologically across a timeline 830; the
time location of a bar corresponds to the beginning point of the event.
The level (height) of estimated social interest in each event 815 is
shown vertically by number of social content items 850 corresponding to
each event 815, with a taller bar representing greater social interest.
Two event types, advertisements 870 and plays 880, are shown.

[0148]FIG. 8B shows a similar social interest heat map corresponding to a
football game, in which individual events (plays and advertisements) 860
are shown chronologically across a timeline 840. The level of estimated
social interest in each event 860 is shown by intensity of color of the
corresponding bar 860, with a darker bar representing greater social
interest. Other color/intensity/texture/pattern scales can be used to
represent the level of interest. Two event types, advertisements 890 and
plays 860, are shown.

[0149]FIGS. 13A-D show yet another embodiment of a user interface 1300
displaying social interest heat maps 1310 showing levels of social
interest for a plurality of events corresponding to a series of
chronological time segments in a time-based medium.

[0150]FIG. 13A shows a user interface 1300a with each social interest heat
map 1310 (horizontal bars) corresponding to a different channel. The
width of the maps 1310 corresponds to a time period as show in the
navigation bar 1320, between the two ends 1325. Channels have multiple
distinct shows, shown as cells 1315, thereby forming a grid. The level of
social interest is indicated by intensity of color in a given cell 1315,
with the darkest cells indicative of the highest social interest in the
show. The navigation bar 1320 allows the user to select the timeframe for
viewing, and the ends 1325 allow the size of the navigation bar to be
expanded to adjust the visible portion of the social interest heat maps
in the user interface 1300, with the left end 1325a controlling the
beginning time and the right end 1325b controlling the ending time for
the social interest heat maps 1310.

[0151]FIG. 13B shows a user interface 1300b similar to that shown in FIG.
13A, except that the social interest heat maps 1310 include indication of
advertisements 1330 that appear during the shows 1315. The darkness of
the lines corresponding to individual advertisements with the darkness as
an indicator of social interest in the advertisements, with darker
indicating greater interest.

[0152]FIG. 13C shows a user interface 1300c similar to that shown in FIG.
13A, except that the social interest heat maps 1310 are zoomed out to the
level of days to show a different time scale on the navigation bar 1337.
Here, each division 1340 in the navigation bar corresponds to a single
day. The cells 1345 correspond to times of day, e.g., Primetime. The
darkness of color of each cell is representative of the social interest
in shows and/or advertisements during that time frame.

[0153]FIG. 13D shows a user interface 1300d similar to that shown in FIG.
13A, except that the social interest heat maps 1310 are zoomed out to the
level of months to show a different time scale. The division 1365 in the
navigation bar 1337 corresponds to a quarter of a year. The cells 1355 in
the grid correspond to months of the year. The darkness of color of each
cell is representative of the social interest in shows and/or
advertisements during that time frame.

[0154]FIGS. 9A-9C show three embodiments of user interfaces 900 of a
social interest estimation system. Each figure shows a social interest
heat map area 910, media display area 920, and a social media display
area 930 (not shown in 9C).

[0155]FIG. 9A shows in the social interest heat map area 910a three social
interest heat maps 940a-c similar to the one described in conjunction
with FIG. 8A, each map 940a-c corresponding to a different channel of
media content. The media display area 920a shows a media player for
displaying the time-based media associated with the selected event 915,
in this example a Dodge Charger advertisement. The social media display
area 930a shows statistical information about the social media
corresponding to the selected event, as well as the individual social
media content items.

[0156]FIG. 9B shows in the social interest heat map area 910b several
social interest heat maps 960 similar to the one described in conjunction
with FIG. 8B, each map 960 corresponding to a different channel, as well
as an overall social interest heat map 970 corresponding to a selected
event across all channels. The media display area 920b shows a media
player for displaying the time-based media associated with a user
selected event 935, in this example an advertisement scene. The user can
select any event 935 in the display and invoke the player to show the
video content of the event. The social media display areas 930b1 and
930b2 show the individual social media content items (930b1) and
statistical information about the social media corresponding to the
selected event (930b2).

[0157]FIG. 9c shows in the social interest heat map area 910c four social
interest heat maps 950a-d similar to the one described in conjunction
with FIG. 8B, each map 950a-d corresponding to a different channel. The
media display area 920c shows a media player for displaying the
time-based media associated with the selected event 925, in this example
a pass in a football game. Again, the user can control the player to show
an event by selecting the event 925 in a map 950.

[0158]FIGS. 10A and 10B show two embodiments of user interfaces 1000 of a
social interest estimation system showing a sentiment view. The user
interfaces 1000 are similar to those shown in FIGS. 9A-9B, except that
the social interest heat maps 940, 970 provide information indicating the
sentiment of the social media content items, i.e., whether they are
negative or positive, e.g., based on the sentiment detection process
described herein.

[0159]FIG. 10A shows for the event 915 in the social interest heat maps
940, a (top) positive portion 1010 corresponding to the number of social
media content items with positive sentiment, and a (bottom) negative
portion 1012 corresponding to the number of social media content items
with negative sentiment. The positive 1010 and negative 1012 portions are
visually distinguished from each other, such that their relative
percentages within the whole of the event bar is visible. A radio button
1015 is shown for toggling on and off the sentiment view.

[0160]FIG. 10B shows for an event 1015 in the overall social interest heat
map 970, a (top) positive portion 1020 corresponding to the number of
social media content items with positive sentiment, and a (bottom)
negative portion 1022 corresponding to the number of social media content
items with negative sentiment. The positive 1020 and negative 1020
portions are visually distinguished from each other, such that their
relative percentages within the whole of the event bar is visible.

[0161]FIGS. 11A-11C show three embodiments of user interfaces 1100 of a
social interest estimation system showing a filtered view. The user
interfaces 1100 are similar to those shown in FIGS. 9A-9C, except that
the social interest heat maps 940, 970 provide information for only a
filtered subset of the social media content items.

[0162]FIG. 11A shows a text-based filter "doritos" applied to the data
such that social media content item bars corresponding to Doritos brand
advertisements (1110) show up darker, or otherwise visually
distinguished, from the non-Doritos brand social media content item bars
(1115).

[0163]FIG. 11B shows a text-based filter applied to the data (not shown)
such that only social media content item bars corresponding to the
applied filter are visible in the overall social interest heat map 970.

[0164]FIG. 11C shows a filter applied to the data corresponding to players
in the user's fantasy football league, such that only social media
content item bars corresponding to plays by the fantasy football players
are shown in the social interest heat maps 950. An additional players
area 1120 shows the players in the user's fantasy football league.

[0165]FIG. 12A shows one embodiment of user interface 1200 of a social
interest estimation system showing a focused unexpanded view. The user
interface 1200 is similar to that of FIG. 10A, except that the social
interest heat map 940a has a subsection 1210 of the social interest heat
map selected. FIG. 12B shows a user interface 1250 similar to that of
FIG. 12A, except that it shows a zoom view 1260 of the social interest
heat map 940a with the subsection 1210 from FIG. 12A expanded.

[0166]The foregoing description of the embodiments of the invention has
been presented for the purpose of illustration; it is not intended to be
exhaustive or to limit the invention to the precise forms disclosed.
Persons skilled in the relevant art can appreciate that many
modifications and variations are possible in light of the above
disclosure.

[0167]Some portions of this description describe the embodiments of the
invention in terms of algorithms and symbolic representations of
operations on information. These algorithmic descriptions and
representations are commonly used by those skilled in the data processing
arts to convey the substance of their work effectively to others skilled
in the art. These operations, while described functionally,
computationally, or logically, are understood to be implemented by
computer programs or equivalent electrical circuits, microcode, or the
like. Furthermore, it has also proven convenient at times, to refer to
these arrangements of operations as modules, without loss of generality.
The described operations and their associated modules may be embodied in
software, firmware, hardware, or any combinations thereof.

[0168]Any of the steps, operations, or processes described herein may be
performed or implemented with one or more hardware or software modules,
alone or in combination with other devices. In one embodiment, a software
module is implemented with a computer program product comprising a
computer-readable medium containing computer program code, which can be
executed by a computer processor for performing any or all of the steps,
operations, or processes described.

[0169]Embodiments of the invention may also relate to an apparatus for
performing the operations herein. This apparatus may be specially
constructed for the required purposes, and/or it may comprise a
general-purpose computing device selectively activated or reconfigured by
a computer program stored in the computer. Such a computer program may be
persistently stored in a non-transitory, tangible computer readable
storage medium, or any type of media suitable for storing electronic
instructions, which may be coupled to a computer system bus. Furthermore,
any computing systems referred to in the specification may include a
single processor or may be architectures employing multiple processor
designs for increased computing capability.

[0170]Embodiments of the invention may also relate to a product that is
produced by a computing process described herein. Such a product may
comprise information resulting from a computing process, where the
information is stored on a non-transitory, tangible computer readable
storage medium and may include any embodiment of a computer program
product or other data combination described herein.

[0171]Finally, the language used in the specification has been principally
selected for readability and instructional purposes, and it may not have
been selected to delineate or circumscribe the inventive subject matter.
It is therefore intended that the scope of the invention be limited not
by this detailed description, but rather by any claims that issue on an
application based hereon. Accordingly, the disclosure of the embodiments
of the invention is intended to be illustrative, but not limiting, of the
scope of the invention, which is set forth in the following claims.