Sign up to receive free email alerts when patent applications with chosen keywords are publishedSIGN UP

Abstract:

A visual query is received from a client system, along with location
information for the client system, and processed by a server system. The
server system sends the visual query and the location information to a
visual query search system, and receives from the visual query search
system enhanced location information based on the visual query and the
location information. The server system then sends a search query,
including the enhanced location information, to a location-based search
system. The search system receives and provides to the client one or more
search results to the client system.

Claims:

1. A computer-implemented method of processing a visual query comprising:
at a server system having one or more processors and memory storing one
or more programs for execution by the one or more processors: receiving a
visual query from a client system; receiving location information from
the client system; sending the visual query and the location information
to a visual query search system; receiving from the visual query search
system enhanced location information for the client system based on the
visual query and the location information; sending a search query to a
location-based search system, the search query including the enhanced
location information; receiving the one or more search results in
accordance with the enhanced location information; and sending at least
one of the search results to the client system.

3. The method of claim 2, wherein the at least one search result includes
a result in the direction of the pose.

4. The method of claim 2, wherein the device has an orientation and the
visual query has an asymmetrical aspect ratio, the method further
comprising: constructing a viewing frustum based on the pose and based
the orientation of the device determined based on one or more of: the
device sensors and the asymmetrical aspect ratio of the visual query, and
sending at least one search result within the viewing frustum to the
client system.

5. The method of claim 4, including receiving a plurality of initial
search results from location-based search system and filtering the
initial search results to exclude search results outside the viewing
frustum.

6. The method of claim 1, wherein the enhanced location information has
greater associated accuracy than the location information received from
the client system.

7. The method of claim 1, further comprising: identifying an accuracy
value for the enhanced location information; favoring search results near
the enhanced location when the enhanced location has an accuracy value at
or above a threshold; and favoring search results with a high prominence
value when the enhanced location has an accuracy value below the
threshold; and sending at least one favored search result to the client
system.

8. The method of claim 1, further comprising: creating an interactive
results document comprising a bounding box outlining a respective
sub-portion of the visual query and including at least one user
selectable link to at least one of the search results, wherein the
bounding box is created by projecting earth coordinates of a search
result onto screen coordinates of the visual query; and sending the
interactive results document to the client system.

9. The method of claim 2, wherein the enhanced location information
comprises first enhanced location information, the method further
comprising: receiving a second visual query from the client system;
receiving second location information from the client system; requesting,
from a visual query search system, second enhanced location information
for the client system based on the second visual query and the second
location information; when the request for second enhanced location
information is successful, resulting in receipt of second enhanced
location information having greater accuracy than the second location
information received from the client system, sending a second search
query to a location-based search system, the second search query
including the second enhanced location information, and receiving one or
more search results in accordance with the second search query; when the
request for second enhanced location information is not successful,
sending a third search query to the location-based search system, the
third search query including the first enhanced location information; and
receiving one or more search results in accordance with the third search
query; and sending at least one of the search results to the client
system.

10. The method of claim 1, further comprising: receiving a plurality of
search results from the location-based search system in response to the
search query, each of the search results having an associated positional
accuracy; selecting one or more of the search results having highest
associated positional accuracy, and returning the selected search results
to the client system.

11. The method of claim 1, wherein each of the search results from the
location-based search system comprises a respective local listing having
an associated position and positional accuracy, the method further
comprising: selecting one or more first search results, each comprising a
local listing having an associated position that A) satisfies a first
positional closeness requirement with respect to the enhanced location
information for the client system, and B) satisfies an accuracy
requirement that the local listing's associated position has positional
accuracy that is equal to or greater than a threshold; and sending the
first results to the client system.

12. The method of claim 11, further comprising: selecting one or more
second search results in accordance with a requirement that each
identified second search result satisfy a second positional closeness
requirement with respect to at least one of the first search results; and
sending the first search results and second search results to the client
system.

13. The method of claim 1, wherein each of the search results from the
location-based search system comprises a respective local listing having
an associated position and positional accuracy, the method further
comprising: selecting search results to send to the client system in
accordance with the associated position and positional accuracy of each
of the search results, the selecting including excluding from the
selected search results those search results that A) have positional
accuracy less than a threshold, and B) do not satisfy a positional
closeness requirement with respect to at least one of the selected search
results that has positional accuracy equal to or greater than the
threshold and that satisfies a first positional closeness requirement
with respect to the enhanced location information for the client system.

14. The method of claim 1, including sending to the client system a
street view image determined by the visual query search system to match
the visual query.

15. A computer-implemented method of processing a visual query
comprising: at a server system having one or more processors and memory
storing one or more programs for execution by the one or more processors:
receiving a visual query from a client system; receiving location
information from the client system; requesting, from a visual query
search system, enhanced location information for the client system based
on the visual query and the location information; when the request for
enhanced location information is successful, resulting in receipt of
enhanced location information having greater accuracy than the location
information received from the client system, sending a first search query
to a location-based search system, the first search query including the
enhanced location information, and receiving one or more search results
in accordance with the first search query; when the request for enhanced
location information is not successful, sending a second search query to
the location-based search system, the second search query including the
received location information from the client system; and receiving one
or more search results in accordance with the second search query; and
sending at least one of the search results to the client system.

16. A server system, for processing a visual query, comprising: one or
more central processing units for executing programs; memory storing one
or more programs be executed by the one or more central processing units;
the one or more programs comprising instructions for: receiving visual
query from a client system; receiving location information from the
client system; sending the visual query and the location information to a
visual query search system; receiving from the visual query search system
enhanced location information for the client system based on the visual
query and the location information; sending a search query to a
location-based search system, the search query including the enhanced
location information; receiving the one or more search results in
accordance with the enhanced location information; and sending at least
one of the search results to the client system.

17. A server system, for processing a visual query, comprising: one or
more central processing units for executing programs; memory storing one
or more programs be executed by the one or more central processing units;
the one or more programs comprising instructions for: receiving visual
query from a client system; receiving location information from the
client system; requesting, from a visual query search system, enhanced
location information for the client system based on the visual query and
the location information; when the request for enhanced location
information is successful, resulting in receipt of enhanced location
information having greater accuracy than the location information
received from the client system, sending a first search query to a
location-based search system, the first search query including the
enhanced location information, and receiving one or more search results
in accordance with the first search query; when the request for enhanced
location information is not successful, sending a second search query to
the location-based search system, the second search query including the
received location information from the client system; and receiving one
or more search results in accordance with the second search query; and
sending at least one of the search results to the client system.

18. A non-transitory computer readable storage medium storing one or more
programs configured for execution by a computer, the one or more programs
comprising instructions for: receiving a visual query from a client
system; receiving location information from the client system; sending
the visual query and the location information to a visual query search
system; receiving from the visual query search system enhanced location
information for the client system based on the visual query and the
location information; sending a search query to a location-based search
system, the search query including the enhanced location information;
receiving the one or more search results in accordance with the enhanced
location information; and sending at least one of the search results to
the client system.

19. A non-transitory computer readable storage medium storing one or more
programs configured for execution by a computer, the one or more programs
comprising instructions for: receiving a visual query from a client
system; receiving location information from the client system;
requesting, from a visual query search system, enhanced location
information for the client system based on the visual query and the
location information; when the request for enhanced location information
is successful, resulting in receipt of enhanced location information
having greater accuracy than the location information received from the
client system, sending a first search query to a location-based search
system, the first search query including the enhanced location
information, and receiving one or more search results in accordance with
the first search query; when the request for enhanced location
information is not successful, sending a second search query to the
location-based search system, the second search query including the
received location information from the client system; and receiving one
or more search results in accordance with the second search query; and
sending at least one of the search results to the client system.

Description:

RELATED APPLICATIONS

[0001] This application claims priority to the following U.S. Provisional
Patent Application which is incorporated by reference herein in its
entirety: U.S. Provisional Patent Application No. 61/266,499, filed Dec.
3, 2009, entitled "Hybrid Use of Location Sensor Data and Visual Query to
Return Local Listing for Visual Query."

[0003] The disclosed embodiments relate generally to systems and methods
of processing visual queries, and in particular to obtaining search
results, including local listings physically located near a client
device, in response to the visual query and location information
associated with the client device.

BACKGROUND

[0004] Text or term based searching, wherein a user inputs a word or
phrase into a search engine and receives a variety of results is a useful
tool for searching. However, term based queries require that a user be
able to input a relevant term. Sometimes a user may wish to know
information about a place where he is currently standing. For example, a
user might want to know the name of a company in a particular building,
find a phone number associated with an organization located in a
particular building, or read a review about a restaurant he is standing
near. Accordingly, a system that can receive from a client device a
visual query and information about the location of the client device and
that can use both the location information and the visual query to
provide relevant search results would be desirable.

SUMMARY

[0005] Some of the limitations and disadvantages described above are
overcome by providing methods, systems, computer readable storage
mediums, and graphical user interfaces (GUIs) described below.

[0006] Some embodiments of methods, systems, computer readable storage
mediums, and graphical user interfaces (GUIs) provide the following.
According to some embodiments, a computer-implemented method of
processing a visual query includes performing the following operations on
a server system having one or more processors and memory storing one or
more programs for execution by the one or more processors. A visual query
is received from a client system. Location information is also received
from the client system, indicating a current location of the client
system. In some embodiments, the client system obtains the location
information from GPS information, cell tower information, and/or local
wireless network information. The server system sends the visual query
and the location information to a visual query search system. The server
system receives from the visual query search system enhanced location
information based on the visual query and the location information. The
server system then sends a search query, including the enhanced location
information, to a location-based search system. The search system
receives and provides to the client one or more search results to the
client system.

[0007] In some embodiments, a server system including one or more central
processing units for executing programs and memory storing one or more
programs be executed by the one or more central processing units is
provided. The programs include instructions for performing the following
operations. A visual query is received from a client system. Location
information is also received from the client system, indicating a current
location of the client system. In some embodiments, the client system
obtains the location information from GPS information, cell tower
information, and/or local wireless network information. The server system
sends the visual query and the location information to a visual query
search system. The server system receives from the visual query search
system enhanced location information based on the visual query and the
location information. The server system then sends a search query,
including the enhanced location information, to a location-based search
system. The search system receives and provides to the client one or more
search results to the client system.

[0008] Some embodiments provide a computer readable storage medium storing
one or more programs configured for execution by a computer. The programs
include instructions for performing the following operations. A visual
query is received from a client system. Location information is also
received from the client system, indicating a current location of the
client system. In some embodiments, the client system obtains the
location information from GPS information, cell tower information, and/or
local wireless network information. The server system sends the visual
query and the location information to a visual query search system. The
server system receives from the visual query search system enhanced
location information based on the visual query and the location
information. The server system then sends a search query, including the
enhanced location information, to a location-based search system. The
search system receives and provides to the client one or more search
results to the client system.

BRIEF DESCRIPTION OF THE DRAWINGS

[0009] FIG. 1 is a block diagram illustrating a computer network that
includes a visual query server system.

[0010]FIG. 2 is a flow diagram illustrating the process for responding to
a visual query, in accordance with some embodiments.

[0011] FIG. 3 is a flow diagram illustrating the process for responding to
a visual query with an interactive results document, in accordance with
some embodiments.

[0012]FIG. 4 is a flow diagram illustrating the communications between a
client and a visual query server system, in accordance with some
embodiments.

[0013] FIG. 5 is a block diagram illustrating a client system, in
accordance with some embodiments.

[0014]FIG. 6 is a block diagram illustrating a front end visual query
processing server system, in accordance with some embodiments.

[0015] FIG. 7 is a block diagram illustrating a generic one of the
parallel search systems utilized to process a visual query, in accordance
with some embodiments.

[0016] FIG. 8 is a block diagram illustrating an OCR search system
utilized to process a visual query, in accordance with some embodiments.

[0017] FIG. 9 is a block diagram illustrating a facial recognition search
system utilized to process a visual query, in accordance with some
embodiments.

[0018]FIG. 10 is a block diagram illustrating an image to terms search
system utilized to process a visual query, in accordance with some
embodiments.

[0019] FIG. 11 illustrates a client system with a screen shot of an
exemplary visual query, in accordance with some embodiments.

[0020] FIGS. 12A and 12B each illustrate a client system with a screen
shot of an interactive results document with bounding boxes, in
accordance with some embodiments.

[0021]FIG. 13 illustrates a client system with a screen shot of an
interactive results document that is coded by type, in accordance with
some embodiments.

[0022] FIG. 14 illustrates a client system with a screen shot of an
interactive results document with labels, in accordance with some
embodiments.

[0023] FIG. 15 illustrates a screen shot of an interactive results
document and visual query displayed concurrently with a results list, in
accordance with some embodiments.

[0024] FIGS. 16A-16C are flow diagrams illustrating the process for using
both location sensor data and a visual query to return local listings for
the visual query, according to some embodiments.

[0025] FIG. 17 is a flow diagram illustrating a frustum method of
selecting search results, in accordance with some embodiments.

[0026] FIG. 18 is a flow diagram illustrating a method of selecting search
results based on prominence and location data, in accordance with some
embodiments.

[0027]FIG. 19 is a flow diagram illustrating a method of selecting search
results based on relative position and accuracy data, in accordance with
some embodiments.

[0028] FIG. 20 is a flow diagram illustrating communications between a
client and a visual query server system with location information
augmentation, in accordance with some embodiments.

[0029]FIG. 21 illustrates a client system display of a results list and a
plurality of actionable search result elements returned for a street view
visual query including a building, in accordance with some embodiments.

[0030] FIG. 22 illustrates a client system display of a plurality of
actionable search result elements overlaying a visual query which are
returned for a street view visual query including a building, in
accordance with some embodiments.

[0031]FIG. 23 is a block diagram illustrating a location-augmented visual
query processing server system, in accordance with some embodiments.

[0032] FIG. 24 is a block diagram illustrating a location-based query
processing server system, in accordance with some embodiments.

[0033] Like reference numerals refer to corresponding parts throughout the
drawings.

DESCRIPTION OF EMBODIMENTS

[0034] Reference will now be made in detail to embodiments, examples of
which are illustrated in the accompanying drawings. In the following
detailed description, numerous specific details are set forth in order to
provide a thorough understanding of the present invention. However, it
will be apparent to one of ordinary skill in the art that the present
invention may be practiced without these specific details. In other
instances, well-known methods, procedures, components, circuits, and
networks have not been described in detail so as not to unnecessarily
obscure aspects of the embodiments.

[0035] It will also be understood that, although the terms first, second,
etc. may be used herein to describe various elements, these elements
should not be limited by these terms. These terms are only used to
distinguish one element from another. For example, a first contact could
be termed a second contact, and, similarly, a second contact could be
termed a first contact, without departing from the scope of the present
invention. The first contact and the second contact are both contacts,
but they are not the same contact.

[0036] The terminology used in the description of the invention herein is
for the purpose of describing particular embodiments only and is not
intended to be limiting of the invention. As used in the description of
the invention and the appended claims, the singular forms "a," "an," and
"the" are intended to include the plural forms as well, unless the
context clearly indicates otherwise. It will also be understood that the
term "and/or" as used herein refers to and encompasses any and all
possible combinations of one or more of the associated listed items. It
will be further understood that the terms "comprises" and/or
"comprising," when used in this specification, specify the presence of
stated features, integers, steps, operations, elements, and/or
components, but do not preclude the presence or addition of one or more
other features, integers, steps, operations, elements, components, and/or
groups thereof.

[0037] As used herein, the term "if" may be construed to mean "when" or
"upon" or "in response to determining" or "in response to detecting,"
depending on the context. Similarly, the phrase "if it is determined" or
"if (a stated condition or event) is detected" may be construed to mean
"upon determining" or "in response to determining" or "upon detecting
(the stated condition or event)" or "in response to detecting (the stated
condition or event)," depending on the context.

[0038] FIG. 1 is a block diagram illustrating a computer network that
includes a visual query server system according to some embodiments. The
computer network 100 includes one or more client systems 102 and a visual
query server system (sometimes called the visual query system) 106. One
or more communications networks 104 interconnect these components. The
communications network 104 may be any of a variety of networks, including
local area networks (LAN), wide area networks (WAN), wireless networks,
wireline networks, the Internet, or a combination of such networks.

[0039] The client system 102 includes a client application 108, which is
executed by the client system, for receiving a visual query (e.g., visual
query 1102 of FIG. 11). A visual query is an image that is submitted as a
query to a search engine or search system. Examples of visual queries,
without limitations include photographs, scanned documents and images,
and drawings. In some embodiments, the client application 108 is selected
from the set consisting of a search application, a search engine plug-in
for a browser application, and a search engine extension for a browser
application. In some embodiments, the client application 108 is an
"omnivorous" search box, which allows a user to drag and drop any format
of image into the search box to be used as the visual query.

[0040] A client system 102 sends queries to and receives data from the
visual query server system 106. The client system 102 may be any computer
or other device that is capable of communicating with the visual query
server system 106. Examples include, without limitation, desktop and
notebook computers, mainframe computers, server computers, mobile devices
such as mobile phones and personal digital assistants, network terminals,
and set-top boxes.

[0041] The visual query server system 106 includes a front end visual
query processing server 110. The front end server 110 receives a visual
query from the client 102, and sends the visual query to a plurality of
parallel search systems 112 for simultaneous processing. The search
systems 112 each implement a distinct visual query search process and
access their corresponding databases 114 as necessary to process the
visual query by their distinct search process. For example, a face
recognition search system 112-A will access a facial image database 114-A
to look for facial matches to the image query. As will be explained in
more detail with regard to FIG. 9, if the visual query contains a face,
the facial recognition search system 112-A will return one or more search
results (e.g., names, matching faces, etc.) from the facial image
database 114-A. In another example, the optical character recognition
(OCR) search system 112-B, converts any recognizable text in the visual
query into text for return as one or more search results. In the optical
character recognition (OCR) search system 112-B, an OCR database 114-B
may be accessed to recognize particular fonts or text patterns as
explained in more detail with regard to FIG. 8.

[0042] Any number of parallel search systems 112 may be used. Some
examples include a facial recognition search system 112-A, an OCR search
system 112-B, an image-to-terms search system 112-C (which may recognize
an object or an object category), a product recognition search system
(which may be configured to recognize 2-D images such as book covers and
CDs and may also be configured to recognized 3-D images such as
furniture), bar code recognition search system (which recognizes 1D and
2D style bar codes), a named entity recognition search system, landmark
recognition (which may configured to recognize particular famous
landmarks like the Eiffel Tower and may also be configured to recognize a
corpus of specific images such as billboards), place recognition aided by
geo-location information provided by a GPS receiver in the client system
102 or mobile phone network, a color recognition search system, and a
similar image search system (which searches for and identifies images
similar to a visual query). Further search systems can be added as
additional parallel search systems, represented in FIG. 1 by system
112-N. All of the search systems, except the OCR search system, are
collectively defined herein as search systems performing an image-match
process. All of the search systems including the OCR search system are
collectively referred to as query-by-image search systems. In some
embodiments, the visual query server system 106 includes a facial
recognition search system 112-A, an OCR search system 112-B, and at least
one other query-by-image search system 112.

[0043] The parallel search systems 112 each individually process the
visual search query and return their results to the front end server
system 110. In some embodiments, the front end server 100 may perform one
or more analyses on the search results such as one or more of:
aggregating the results into a compound document, choosing a subset of
results to display, and ranking the results as will be explained in more
detail with regard to FIG. 6. The front end server 110 communicates the
search results to the client system 102.

[0044] The client system 102 presents the one or more search results to
the user. The results may be presented on a display, by an audio speaker,
or any other means used to communicate information to a user. The user
may interact with the search results in a variety of ways. In some
embodiments, the user's selections, annotations, and other interactions
with the search results are transmitted to the visual query server system
106 and recorded along with the visual query in a query and annotation
database 116. Information in the query and annotation database can be
used to improve visual query results. In some embodiments, the
information from the query and annotation database 116 is periodically
pushed to the parallel search systems 112, which incorporate any relevant
portions of the information into their respective individual databases
114.

[0045] The computer network 100 optionally includes a term query server
system 118, for performing searches in response to term queries. A term
query is a query containing one or more terms, as opposed to a visual
query which contains an image. The term query server system 118 may be
used to generate search results that supplement information produced by
the various search engines in the visual query server system 106. The
results returned from the term query server system 118 may include any
format. The term query server system 118 may include textual documents,
images, video, etc. While term query server system 118 is shown as a
separate system in FIG. 1, optionally the visual query server system 106
may include a term query server system 118.

[0046] Additional information about the operation of the visual query
server system 106 is provided below with respect to the flowcharts in
FIGS. 2-4.

[0047]FIG. 2 is a flow diagram illustrating a visual query server system
method for responding to a visual query, according to certain embodiments
of the invention. Each of the operations shown in FIG. 2 may correspond
to instructions stored in a computer memory or computer readable storage
medium.

[0048] The visual query server system receives a visual query from a
client system (202). The client system, for example, may be a desktop
computing device, a mobile device, or another similar device (204) as
explained with reference to FIG. 1. An example visual query on an example
client system is shown in FIG. 11.

[0049] The visual query is an image document of any suitable format. For
example, the visual query can be a photograph, a screen shot, a scanned
image, or a frame or a sequence of multiple frames of a video (206). In
some embodiments, the visual query is a drawing produced by a content
authoring program (736, FIG. 5). As such, in some embodiments, the user
"draws" the visual query, while in other embodiments the user scans or
photographs the visual query. Some visual queries are created using an
image generation application such as Acrobat, a photograph editing
program, a drawing program, or an image editing program. For example, a
visual query could come from a user taking a photograph of his friend on
his mobile phone and then submitting the photograph as the visual query
to the server system. The visual query could also come from a user
scanning a page of a magazine, or taking a screen shot of a webpage on a
desktop computer and then submitting the scan or screen shot as the
visual query to the server system. In some embodiments, the visual query
is submitted to the server system 106 through a search engine extension
of a browser application, through a plug-in for a browser application, or
by a search application executed by the client system 102. Visual queries
may also be submitted by other application programs (executed by a client
system) that support or generate images which can be transmitted to a
remotely located server by the client system.

[0050] The visual query can be a combination of text and non-text elements
(208). For example, a query could be a scan of a magazine page containing
images and text, such as a person standing next to a road sign. A visual
query can include an image of a person's face, whether taken by a camera
embedded in the client system or a document scanned by or otherwise
received by the client system. A visual query can also be a scan of a
document containing only text. The visual query can also be an image of
numerous distinct subjects, such as several birds in a forest, a person
and an object (e.g., car, park bench, etc.), a person and an animal
(e.g., pet, farm animal, butterfly, etc.). Visual queries may have two or
more distinct elements. For example, a visual query could include a
barcode and an image of a product or product name on a product package.
For example, the visual query could be a picture of a book cover that
includes the title of the book, cover art, and a bar code. In some
instances, one visual query will produce two or more distinct search
results corresponding to different portions of the visual query, as
discussed in more detail below.

[0051] The server system processes the visual query as follows. The front
end server system sends the visual query to a plurality of parallel
search systems for simultaneous processing (210). Each search system
implements a distinct visual query search process, i.e., an individual
search system processes the visual query by its own processing scheme.

[0052] In some embodiments, one of the search systems to which the visual
query is sent for processing is an optical character recognition (OCR)
search system. In some embodiments, one of the search systems to which
the visual query is sent for processing is a facial recognition search
system. In some embodiments, the plurality of search systems running
distinct visual query search processes includes at least: optical
character recognition (OCR), facial recognition, and another
query-by-image process other than OCR and facial recognition (212). The
other query-by-image process is selected from a set of processes that
includes but is not limited to product recognition, bar code recognition,
object-or-object-category recognition, named entity recognition, and
color recognition (212).

[0053] In some embodiments, named entity recognition occurs as a post
process of the OCR search system, wherein the text result of the OCR is
analyzed for famous people, locations, objects and the like, and then the
terms identified as being named entities are searched in the term query
server system (118, FIG. 1). In other embodiments, images of famous
landmarks, logos, people, album covers, trademarks, etc. are recognized
by an image-to-terms search system. In other embodiments, a distinct
named entity query-by-image process separate from the image-to-terms
search system is utilized. The object-or-object category recognition
system recognizes generic result types like "car." In some embodiments,
this system also recognizes product brands, particular product models,
and the like, and provides more specific descriptions, like "Porsche."
Some of the search systems could be special user specific search systems.
For example, particular versions of color recognition and facial
recognition could be a special search systems used by the blind.

[0054] The front end server system receives results from the parallel
search systems (214). In some embodiments, the results are accompanied by
a search score. For some visual queries, some of the search systems will
find no relevant results. For example, if the visual query was a picture
of a flower, the facial recognition search system and the bar code search
system will not find any relevant results. In some embodiments, if no
relevant results are found, a null or zero search score is received from
that search system (216). In some embodiments, if the front end server
does not receive a result from a search system after a pre-defined period
of time (e.g., 0.2, 0.5, 1, 2 or 5 seconds), it will process the received
results as if that timed out server produced a null search score and will
process the received results from the other search systems.

[0055] Optionally, when at least two of the received search results meet
pre-defined criteria, they are ranked (218). In some embodiments, one of
the predefined criteria excludes void results. A pre-defined criterion is
that the results are not void. In some embodiments, one of the predefined
criteria excludes results having numerical score (e.g., for a relevance
factor) that falls below a pre-defined minimum score. Optionally, the
plurality of search results are filtered (220). In some embodiments, the
results are only filtered if the total number of results exceeds a
pre-defined threshold. In some embodiments, all the results are ranked
but the results falling below a pre-defined minimum score are excluded.
For some visual queries, the content of the results are filtered. For
example, if some of the results contain private information or personal
protected information, these results are filtered out.

[0056] Optionally, the visual query server system creates a compound
search result (222). One embodiment of this is when more than one search
system result is embedded in an interactive results document as explained
with respect to FIG. 3. The term query server system (118, FIG. 1) may
augment the results from one of the parallel search systems with results
from a term search, where the additional results are either links to
documents or information sources, or text and/or images containing
additional information that may be relevant to the visual query. Thus,
for example, the compound search result may contain an OCR result and a
link to a named entity in the OCR document (224).

[0057] In some embodiments, the OCR search system (112-B, FIG. 1) or the
front end visual query processing server (110, FIG. 1) recognizes likely
relevant words in the text. For example, it may recognize named entities
such as famous people or places. The named entities are submitted as
query terms to the term query server system (118, FIG. 1). In some
embodiments, the term query results produced by the term query server
system are embedded in the visual query result as a "link." In some
embodiments, the term query results are returned as separate links. For
example, if a picture of a book cover were the visual query, it is likely
that an object recognition search system will produce a high scoring hit
for the book. As such a term query for the title of the book will be run
on the term query server system 118 and the term query results are
returned along with the visual query results. In some embodiments, the
term query results are presented in a labeled group to distinguish them
from the visual query results. The results may be searched individually,
or a search may be performed using all the recognized named entities in
the search query to produce particularly relevant additional search
results. For example, if the visual query is a scanned travel brochure
about Paris, the returned result may include links to the term query
server system 118 for initiating a search on a term query "Notre Dame."
Similarly, compound search results include results from text searches for
recognized famous images. For example, in the same travel brochure, live
links to the term query results for famous destinations shown as pictures
in the brochure like "Eiffel Tower" and "Louvre" may also be shown (even
if the terms "Eiffel Tower" and "Louvre" did not appear in the brochure
itself.)

[0058] The visual query server system then sends at least one result to
the client system (226). Typically, if the visual query processing server
receives a plurality of search results from at least some of the
plurality of search systems, it will then send at least one of the
plurality of search results to the client system. For some visual
queries, only one search system will return relevant results. For
example, in a visual query containing only an image of text, only the OCR
server's results may be relevant. For some visual queries, only one
result from one search system may be relevant. For example, only the
product related to a scanned bar code may be relevant. In these
instances, the front end visual processing server will return only the
relevant search result(s). For some visual queries, a plurality of search
results are sent to the client system, and the plurality of search
results include search results from more than one of the parallel search
systems (228). This may occur when more than one distinct image is in the
visual query. For example, if the visual query were a picture of a person
riding a horse, results for facial recognition of the person could be
displayed along with object identification results for the horse. In some
embodiments, all the results for a particular query by image search
system are grouped and presented together. For example, the top N facial
recognition results are displayed under a heading "facial recognition
results" and the top N object recognition results are displayed together
under a heading "object recognition results." Alternatively, as discussed
below, the search results from a particular image search system may be
grouped by image region. For example, if the visual query includes two
faces, both of which produce facial recognition results, the results for
each face would be presented as a distinct group. For some visual queries
(e.g., a visual query including an image of both text and one or more
objects), the search results may include both OCR results and one or more
image-match results (230).

[0059] In some embodiments, the user may wish to learn more about a
particular search result. For example, if the visual query was a picture
of a dolphin and the "image to terms" search system returns the following
terms "water," "dolphin," "blue," and "Flipper;" the user may wish to run
a text based query term search on "Flipper." When the user wishes to run
a search on a term query (e.g., as indicated by the user clicking on or
otherwise selecting a corresponding link in the search results), the
query term server system (118, FIG. 1) is accessed, and the search on the
selected term(s) is run. The corresponding search term results are
displayed on the client system either separately or in conjunction with
the visual query results (232). In some embodiments, the front end visual
query processing server (110, FIG. 1) automatically (i.e., without
receiving any user command, other than the initial visual query) chooses
one or more top potential text results for the visual query, runs those
text results on the term query server system 118, and then returns those
term query results along with the visual query result to the client
system as a part of sending at least one search result to the client
system (232). In the example above, if "Flipper" was the first term
result for the visual query picture of a dolphin, the front end server
runs a term query on "Flipper" and returns those term query results along
with the visual query results to the client system. This embodiment,
wherein a term result that is considered likely to be selected by the
user is automatically executed prior to sending search results from the
visual query to the user, saves the user time. In some embodiments, these
results are displayed as a compound search result (222) as explained
above. In other embodiments, the results are part of a search result list
instead of or in addition to a compound search result.

[0060] FIG. 3 is a flow diagram illustrating the process for responding to
a visual query with an interactive results document. The first three
operations (202, 210, 214) are described above with reference to FIG. 2.
From the search results which are received from the parallel search
systems (214), an interactive results document is created (302).

[0061] Creating the interactive results document (302) will now be
described in detail. For some visual queries, the interactive results
document includes one or more visual identifiers of respective
sub-portions of the visual query. Each visual identifier has at least one
user selectable link to at least one of the search results. A visual
identifier identifies a respective sub-portion of the visual query. For
some visual queries, the interactive results document has only one visual
identifier with one user selectable link to one or more results. In some
embodiments, a respective user selectable link to one or more of the
search results has an activation region, and the activation region
corresponds to the sub-portion of the visual query that is associated
with a corresponding visual identifier.

[0062] In some embodiments, the visual identifier is a bounding box (304).
In some embodiments, the bounding box encloses a sub-portion of the
visual query as shown in FIG. 12A. The bounding box need not be a square
or rectangular box shape but can be any sort of shape including circular,
oval, conformal (e.g., to an object in, entity in or region of the visual
query), irregular or any other shape as shown in FIG. 12B. For some
visual queries, the bounding box outlines the boundary of an identifiable
entity in a sub-portion of the visual query (306). In some embodiments,
each bounding box includes a user selectable link to one or more search
results, where the user selectable link has an activation region
corresponding to a sub-portion of the visual query surrounded by the
bounding box. When the space inside the bounding box (the activation
region of the user selectable link) is selected by the user, search
results that correspond to the image in the outlined sub-portion are
returned.

[0063] In some embodiments, the visual identifier is a label (307) as
shown in FIG. 14. In some embodiments, label includes at least one term
associated with the image in the respective sub-portion of the visual
query. Each label is formatted for presentation in the interactive
results document on or near the respective sub-portion. In some
embodiments, the labels are color coded.

[0064] In some embodiments, each respective visual identifiers is
formatted for presentation in a visually distinctive manner in accordance
with a type of recognized entity in the respective sub-portion of the
visual query. For example, as shown in FIG. 13, bounding boxes around a
product, a person, a trademark, and the two textual areas are each
presented with distinct cross-hatching patterns, representing differently
colored transparent bounding boxes. In some embodiments, the visual
identifiers are formatted for presentation in visually distinctive
manners such as overlay color, overlay pattern, label background color,
label background pattern, label font color, and border color.

[0065] In some embodiments, the user selectable link in the interactive
results document is a link to a document or object that contains one or
more results related to the corresponding sub-portion of the visual query
(308). In some embodiments, at least one search result includes data
related to the corresponding sub-portion of the visual query. As such,
when the user selects the selectable link associated with the respective
sub-portion, the user is directed to the search results corresponding to
the recognized entity in the respective sub-portion of the visual query.

[0066] For example, if a visual query was a photograph of a bar code,
there may be portions of the photograph which are irrelevant parts of the
packaging upon which the bar code was affixed. The interactive results
document may include a bounding box around only the bar code. When the
user selects inside the outlined bar code bounding box, the bar code
search result is displayed. The bar code search result may include one
result, the name of the product corresponding to that bar code, or the
bar code results may include several results such as a variety of places
in which that product can be purchased, reviewed, etc.

[0067] In some embodiments, when the sub-portion of the visual query
corresponding to a respective visual identifier contains text comprising
one or more terms, the search results corresponding to the respective
visual identifier include results from a term query search on at least
one of the terms in the text. In some embodiments, when the sub-portion
of the visual query corresponding to a respective visual identifier
contains a person's face for which at least one match (i.e., search
result) is found that meets predefined reliability (or other) criteria,
the search results corresponding to the respective visual identifier
include one or more of: name, handle, contact information, account
information, address information, current location of a related mobile
device associated with the person whose face is contained in the
selectable sub-portion, other images of the person whose face is
contained in the selectable sub-portion, and potential image matches for
the person's face. In some embodiments, when the sub-portion of the
visual query corresponding to a respective visual identifier contains a
product for which at least one match (i.e., search result) is found that
meets predefined reliability (or other) criteria, the search results
corresponding to the respective visual identifier include one or more of:
product information, a product review, an option to initiate purchase of
the product, an option to initiate a bid on the product, a list of
similar products, and a list of related products.

[0068] Optionally, a respective user selectable link in the interactive
results document includes anchor text, which is displayed in the document
without having to activate the link. The anchor text provides
information, such as a key word or term, related to the information
obtained when the link is activated. Anchor text may be displayed as part
of the label (307), or in a portion of a bounding box (304), or as
additional information displayed when a user hovers a cursor over a user
selectable link for a pre-determined period of time such as 1 second.

[0069] Optionally, a respective user selectable link in the interactive
results document is a link to a search engine for searching for
information or documents corresponding to a text-based query (sometimes
herein called a term query). Activation of the link causes execution of
the search by the search engine, where the query and the search engine
are specified by the link (e.g., the search engine is specified by a URL
in the link and the text-based search query is specified by a URL
parameter of the link), with results returned to the client system.
Optionally, the link in this example may include anchor text specifying
the text or terms in the search query.

[0070] In some embodiments, the interactive results document produced in
response to a visual query can include a plurality of links that
correspond to results from the same search system. For example, a visual
query may be an image or picture of a group of people. The interactive
results document may include bounding boxes around each person, which
when activated returns results from the facial recognition search system
for each face in the group. For some visual queries, a plurality of links
in the interactive results document corresponds to search results from
more than one search system (310). For example, if a picture of a person
and a dog was submitted as the visual query, bounding boxes in the
interactive results document may outline the person and the dog
separately. When the person (in the interactive results document) is
selected, search results from the facial recognition search system are
retuned, and when the dog (in the interactive results document) is
selected, results from the image-to-terms search system are returned. For
some visual queries, the interactive results document contains an OCR
result and an image match result (312). For example, if a picture of a
person standing next to a sign were submitted as a visual query, the
interactive results document may include visual identifiers for the
person and for the text in the sign. Similarly, if a scan of a magazine
was used as the visual query, the interactive results document may
include visual identifiers for photographs or trademarks in
advertisements on the page as well as a visual identifier for the text of
an article also on that page.

[0071] After the interactive results document has been created, it is sent
to the client system (314). In some embodiments, the interactive results
document (e.g., document 1200, FIG. 15) is sent in conjunction with a
list of search results from one or more parallel search systems, as
discussed above with reference to FIG. 2. In some embodiments, the
interactive results document is displayed at the client system above or
otherwise adjacent to a list of search results from one or more parallel
search systems (315) as shown in FIG. 15.

[0072] Optionally, the user will interact with the results document by
selecting a visual identifier in the results document. The server system
receives from the client system information regarding the user selection
of a visual identifier in the interactive results document (316). As
discussed above, in some embodiments, the link is activated by selecting
an activation region inside a bounding box. In other embodiments, the
link is activated by a user selection of a visual identifier of a
sub-portion of the visual query, which is not a bounding box. In some
embodiments, the linked visual identifier is a hot button, a label
located near the sub-portion, an underlined word in text, or other
representation of an object or subject in the visual query.

[0073] In embodiments where the search results list is presented with the
interactive results document (315), when the user selects a user
selectable link (316), the search result in the search results list
corresponding to the selected link is identified. In some embodiments,
the cursor will jump or automatically move to the first result
corresponding to the selected link. In some embodiments in which the
display of the client 102 is too small to display both the interactive
results document and the entire search results list, selecting a link in
the interactive results document causes the search results list to scroll
or jump so as to display at least a first result corresponding to the
selected link. In some other embodiments, in response to user selection
of a link in the interactive results document, the results list is
reordered such that the first result corresponding to the link is
displayed at the top of the results list.

[0074] In some embodiments, when the user selects the user selectable link
(316) the visual query server system sends at least a subset of the
results, related to a corresponding sub-portion of the visual query, to
the client for display to the user (318). In some embodiments, the user
can select multiple visual identifiers concurrently and will receive a
subset of results for all of the selected visual identifiers at the same
time. In other embodiments, search results corresponding to the user
selectable links are preloaded onto the client prior to user selection of
any of the user selectable links so as to provide search results to the
user virtually instantaneously in response to user selection of one or
more links in the interactive results document.

[0075]FIG. 4 is a flow diagram illustrating the communications between a
client and a visual query server system. The client 102 receives a visual
query from a user/querier (402). In some embodiments, visual queries can
only be accepted from users who have signed up for or "opted in" to the
visual query system. In some embodiments, searches for facial recognition
matches are only performed for users who have signed up for the facial
recognition visual query system, while other types of visual queries are
performed for anyone regardless of whether they have "opted in" to the
facial recognition portion.

[0076] As explained above, the format of the visual query can take many
forms. The visual query will likely contain one or more subjects located
in sub-portions of the visual query document. For some visual queries,
the client system 102 performs type recognition pre-processing on the
visual query (404). In some embodiments, the client system 102 searches
for particular recognizable patterns in this pre-processing system. For
example, for some visual queries the client may recognize colors. For
some visual queries the client may recognize that a particular
sub-portion is likely to contain text (because that area is made up of
small dark characters surrounded by light space etc.) The client may
contain any number of pre-processing type recognizers, or type
recognition modules. In some embodiments, the client will have a type
recognition module (barcode recognition 406) for recognizing bar codes.
It may do so by recognizing the distinctive striped pattern in a
rectangular area. In some embodiments, the client will have a type
recognition module (face detection 408) for recognizing that a particular
subject or sub-portion of the visual query is likely to contain a face.

[0077] In some embodiments, the recognized "type" is returned to the user
for verification. For example, the client system 102 may return a message
stating "a bar code has been found in your visual query, are you
interested in receiving bar code query results?" In some embodiments, the
message may even indicate the sub-portion of the visual query where the
type has been found. In some embodiments, this presentation is similar to
the interactive results document discussed with reference to FIG. 3. For
example, it may outline a sub-portion of the visual query and indicate
that the sub-portion is likely to contain a face, and ask the user if
they are interested in receiving facial recognition results.

[0078] After the client 102 performs the optional pre-processing of the
visual query, the client sends the visual query to the visual query
server system 106, specifically to the front end visual query processing
server 110. In some embodiments, if pre-processing produced relevant
results, i.e., if one of the type recognition modules produced results
above a certain threshold, indicating that the query or a sub-portion of
the query is likely to be of a particular type (face, text, barcode
etc.), the client will pass along information regarding the results of
the pre-processing. For example, the client may indicate that the face
recognition module is 75% sure that a particular sub-portion of the
visual query contains a face. More generally, the pre-processing results,
if any, include one or more subject type values (e.g., bar code, face,
text, etc.). Optionally, the pre-processing results sent to the visual
query server system include one or more of: for each subject type value
in the pre-processing results, information identifying a sub-portion of
the visual query corresponding to the subject type value, and for each
subject type value in the pre-processing results, a confidence value
indicating a level of confidence in the subject type value and/or the
identification of a corresponding sub-portion of the visual query.

[0079] The front end server 110 receives the visual query from the client
system (202). The visual query received may contain the pre-processing
information discussed above. As described above, the front end server
sends the visual query to a plurality of parallel search systems (210).
If the front end server 110 received pre-processing information regarding
the likelihood that a sub-portion contained a subject of a certain type,
the front end server may pass this information along to one or more of
the parallel search systems. For example, it may pass on the information
that a particular sub-portion is likely to be a face so that the facial
recognition search system 112-A can process that subsection of the visual
query first. Similarly, sending the same information (that a particular
sub-portion is likely to be a face) may be used by the other parallel
search systems to ignore that sub-portion or analyze other sub-portions
first. In some embodiments, the front end server will not pass on the
pre-processing information to the parallel search systems, but will
instead use this information to augment the way in which it processes the
results received from the parallel search systems.

[0080] As explained with reference to FIG. 2, for at some visual queries,
the front end server 110 receives a plurality of search results from the
parallel search systems (214). The front end server may then perform a
variety of ranking and filtering, and may create an interactive search
result document as explained with reference to FIGS. 2 and 3. If the
front end server 110 received pre-processing information regarding the
likelihood that a sub-portion contained a subject of a certain type, it
may filter and order by giving preference to those results that match the
pre-processed recognized subject type. If the user indicated that a
particular type of result was requested, the front end server will take
the user's requests into account when processing the results. For
example, the front end server may filter out all other results if the
user only requested bar code information, or the front end server will
list all results pertaining to the requested type prior to listing the
other results. If an interactive visual query document is returned, the
server may pre-search the links associated with the type of result the
user indicated interest in, while only providing links for performing
related searches for the other subjects indicated in the interactive
results document. Then the front end server 110 sends the search results
to the client system (226).

[0081] The client 102 receives the results from the server system (412).
When applicable, these results will include the results that match the
type of result found in the pre-processing stage. For example, in some
embodiments they will include one or more bar code results (414) or one
or more facial recognition results (416). If the client's pre-processing
modules had indicated that a particular type of result was likely, and
that result was found, the found results of that type will be listed
prominently.

[0082] Optionally the user will select or annotate one or more of the
results (418). The user may select one search result, may select a
particular type of search result, and/or may select a portion of an
interactive results document (420). Selection of a result is implicit
feedback that the returned result was relevant to the query. Such
feedback information can be utilized in future query processing
operations. An annotation provides explicit feedback about the returned
result that can also be utilized in future query processing operations.
Annotations take the form of corrections of portions of the returned
result (like a correction to a mis-OCRed word) or a separate annotation
(either free form or structured.)

[0083] The user's selection of one search result, generally selecting the
"correct" result from several of the same type (e.g., choosing the
correct result from a facial recognition server), is a process that is
referred to as a selection among interpretations. The user's selection of
a particular type of search result, generally selecting the result "type"
of interest from several different types of returned results (e.g.,
choosing the OCRed text of an article in a magazine rather than the
visual results for the advertisements also on the same page), is a
process that is referred to as disambiguation of intent. A user may
similarly select particular linked words (such as recognized named
entities) in an OCRed document as explained in detail with reference to
FIG. 8.

[0084] The user may alternatively or additionally wish to annotate
particular search results. This annotation may be done in freeform style
or in a structured format (422). The annotations may be descriptions of
the result or may be reviews of the result. For example, they may
indicate the name of subject(s) in the result, or they could indicate
"this is a good book" or "this product broke within a year of purchase."
Another example of an annotation is a user-drawn bounding box around a
sub-portion of the visual query and user-provided text identifying the
object or subject inside the bounding box. User annotations are explained
in more detail with reference to FIG. 5.

[0085] The user selections of search results and other annotations are
sent to the server system (424). The front end server 110 receives the
selections and annotations and further processes them (426). If the
information was a selection of an object, sub-region or term in an
interactive results document, further information regarding that
selection may be requested, as appropriate. For example, if the selection
was of one visual result, more information about that visual result would
be requested. If the selection was a word (either from the OCR server or
from the Image-to-Terms server) a textual search of that word would be
sent to the term query server system 118. If the selection was of a
person from a facial image recognition search system, that person's
profile would be requested. If the selection was for a particular portion
of an interactive search result document, the underlying visual query
results would be requested.

[0086] If the server system receives an annotation, the annotation is
stored in a query and annotation database 116, explained with reference
to FIG. 5. Then the information from the annotation database 116 is
periodically copied to individual annotation databases for one or more of
the parallel server systems, as discussed below with reference to FIGS.
7-10.

[0087] FIG. 5 is a block diagram illustrating a client system 102 in
accordance with one embodiment of the present invention. The client
system 102 typically includes one or more processing units (CPU's) 702,
one or more network or other communications interfaces 704, memory 712,
and one or more communication buses 714 for interconnecting these
components. The client system 102 includes a user interface 705. The user
interface 705 includes a display device 706 and optionally includes an
input means such as a keyboard, mouse, or other input buttons 708.
Alternatively or in addition the display device 706 includes a touch
sensitive surface 709, in which case the display 706/709 is a touch
sensitive display. In client systems that have a touch sensitive display
706/709, a physical keyboard is optional (e.g., a soft keyboard may be
displayed when keyboard entry is needed). Furthermore, some client
systems use a microphone and voice recognition to supplement or replace
the keyboard. Optionally, the client 102 includes a GPS (global
positioning satellite) receiver, or other location detection apparatus
707 for determining the location of the client system 102. In some
embodiments, the client 102 also includes one or more of: a magnetometer
742, one or more accelerometers 744, or other sensors 746 for providing
location information regarding the client device. In some embodiments,
visual query search services are provided that require the client system
102 to provide the visual query server system to receive location
information indicating the location of the client system 102.

[0088] The client system 102 also includes an image capture device 710
such as a camera or scanner. Memory 712 includes high-speed random access
memory, such as DRAM, SRAM, DDR RAM or other random access solid state
memory devices; and may include non-volatile memory, such as one or more
magnetic disk storage devices, optical disk storage devices, flash memory
devices, or other non-volatile solid state storage devices. Memory 712
may optionally include one or more storage devices remotely located from
the CPU(s) 702. Memory 712, or alternately the non-volatile memory
device(s) within memory 712, comprises a non-transitory computer readable
storage medium. In some embodiments, memory 712 or the computer readable
storage medium of memory 712 stores the following programs, modules and
data structures, or a subset thereof: [0089] an operating system 716
that includes procedures for handling various basic system services and
for performing hardware dependent tasks; [0090] a network communication
module 718 that is used for connecting the client system 102 to other
computers via the one or more communication network interfaces 704 (wired
or wireless) and one or more communication networks, such as the
Internet, other wide area networks, local area networks, metropolitan
area networks, and so on; [0091] a image capture module 720 for
processing a respective image captured by the image capture device/camera
710, where the respective image may be sent (e.g., by a client
application module) as a visual query to the visual query server system;
[0092] one or more client application modules 722 for handling various
aspects of querying by image, including but not limited to: a
query-by-image submission module 724 for submitting visual queries to the
visual query server system; optionally a region of interest selection
module 725 that detects a selection (such as a gesture on the touch
sensitive display 706/709) of a region of interest in an image and
prepares that region of interest as a visual query; a results browser 726
for displaying the results of the visual query; and optionally an
annotation module 728 with optional modules for structured annotation
text entry 730 such as filling in a form or for freeform annotation text
entry 732, which can accept annotations from a variety of formats, and an
image region selection module 734 (sometimes referred to herein as a
result selection module) which allows a user to select a particular
sub-portion of an image for annotation; [0093] an optional content
authoring application(s) 736 that allow a user to author a visual query
by creating or editing an image rather than just capturing one via the
image capture device 710; optionally, one or such applications 736 may
include instructions that enable a user to select a sub-portion of an
image for use as a visual query; [0094] an optional local image analysis
module 738 that pre-processes the visual query before sending it to the
visual query server system. The local image analysis may recognize
particular types of images, or sub-regions within an image. Examples of
image types that may be recognized by such modules 738 include one or
more of: facial type (facial image recognized within visual query), bar
code type (bar code recognized within visual query), and text type (text
recognized within visual query); and [0095] additional optional client
applications 740 such as an email application, a phone application, a
browser application, a mapping application, instant messaging
application, social networking application etc. In some embodiments, the
application corresponding to an appropriate actionable search result can
be launched or accessed when the actionable search result is selected.

[0096] Optionally, the image region selection module 734 which allows a
user to select a particular sub-portion of an image for annotation, also
allows the user to choose a search result as a "correct" hit without
necessarily further annotating it. For example, the user may be presented
with a top N number of facial recognition matches and may choose the
correct person from that results list. For some search queries, more than
one type of result will be presented, and the user will choose a type of
result. For example, the image query may include a person standing next
to a tree, but only the results regarding the person is of interest to
the user. Therefore, the image selection module 734 allows the user to
indicate which type of image is the "correct" type--i.e., the type he is
interested in receiving. The user may also wish to annotate the search
result by adding personal comments or descriptive words using either the
annotation text entry module 730 (for filling in a form) or freeform
annotation text entry module 732.

[0097] In some embodiments, the optional local image analysis module 738
is a portion of the client application (108, FIG. 1). Furthermore, in
some embodiments the optional local image analysis module 738 includes
one or more programs to perform local image analysis to pre-process or
categorize the visual query or a portion thereof. For example, the client
application 722 may recognize that the image contains a bar code, a face,
or text, prior to submitting the visual query to a search engine. In some
embodiments, when the local image analysis module 738 detects that the
visual query contains a particular type of image, the module asks the
user if they are interested in a corresponding type of search result. For
example, the local image analysis module 738 may detect a face based on
its general characteristics (i.e., without determining which person's
face) and provides immediate feedback to the user prior to sending the
query on to the visual query server system. It may return a result like,
"A face has been detected, are you interested in getting facial
recognition matches for this face?" This may save time for the visual
query server system (106, FIG. 1). For some visual queries, the front end
visual query processing server (110, FIG. 1) only sends the visual query
to the search system 112 corresponding to the type of image recognized by
the local image analysis module 738. In other embodiments, the visual
query to the search system 112 may send the visual query to all of the
search systems 112A-N, but will rank results from the search system 112
corresponding to the type of image recognized by the local image analysis
module 738. In some embodiments, the manner in which local image analysis
impacts on operation of the visual query server system depends on the
configuration of the client system, or configuration or processing
parameters associated with either the user or the client system.
Furthermore, the actual content of any particular visual query and the
results produced by the local image analysis may cause different visual
queries to be handled differently at either or both the client system and
the visual query server system.

[0098] In some embodiments, bar code recognition is performed in two
steps, with analysis of whether the visual query includes a bar code
performed on the client system at the local image analysis module 738.
Then the visual query is passed to a bar code search system only if the
client determines the visual query is likely to include a bar code. In
other embodiments, the bar code search system processes every visual
query.

[0100]FIG. 6 is a block diagram illustrating a front end visual query
processing server system 110 in accordance with one embodiment of the
present invention. The front end server 110 typically includes one or
more processing units (CPU's) 802, one or more network or other
communications interfaces 804, memory 812, and one or more communication
buses 814 for interconnecting these components. Memory 812 includes
high-speed random access memory, such as DRAM, SRAM, DDR RAM or other
random access solid state memory devices; and may include non-volatile
memory, such as one or more magnetic disk storage devices, optical disk
storage devices, flash memory devices, or other non-volatile solid state
storage devices. Memory 812 may optionally include one or more storage
devices remotely located from the CPU(s) 802. Memory 812, or alternately
the non-volatile memory device(s) within memory 812, comprises a
non-transitory computer readable storage medium. In some embodiments,
memory 812 or the computer readable storage medium of memory 812 stores
the following programs, modules and data structures, or a subset thereof:
[0101] an operating system 816 that includes procedures for handling
various basic system services and for performing hardware dependent
tasks; [0102] a network communication module 818 that is used for
connecting the front end server system 110 to other computers via the one
or more communication network interfaces 804 (wired or wireless) and one
or more communication networks, such as the Internet, other wide area
networks, local area networks, metropolitan area networks, and so on;
[0103] a query manager 820 for handling the incoming visual queries from
the client system 102 and sending them to two or more parallel search
systems; as described elsewhere in this document, in some special
situations a visual query may be directed to just one of the search
systems, such as when the visual query includes an client-generated
instruction (e.g., "facial recognition search only"); [0104] a results
filtering module 822 for optionally filtering the results from the one or
more parallel search systems and sending the top or "relevant" results to
the client system 102 for presentation; [0105] a results ranking and
formatting module 824 for optionally ranking the results from the one or
more parallel search systems and for formatting the results for
presentation; [0106] a results document creation module 826, is used when
appropriate, to create an interactive search results document; module 826
may include sub-modules, including but not limited to a bounding box
creation module 828 and a link creation module 830; [0107] a label
creation module 831 for creating labels that are visual identifiers of
respective sub-portions of a visual query; [0108] an annotation module
832 for receiving annotations from a user and sending them to an
annotation database 116; [0109] an actionable search results module 838
for generating, in response to a visual query, one or more actionable
search result elements, each configured to launch a client-side action;
examples of actionable search result elements are buttons to initiate a
telephone call, to initiate email message, to map an address, to make a
restaurant reservation, and to provide an option to purchase a product;
and [0110] a local listings selection module 840 for selecting and
filtering location search results returned from a location based search
system 112G (FIG. 24) by various methods explained with reference to
FIGS. 16A-19; [0111] a query and annotation database 116 which comprises
the database itself 834 and an index to the database 836.

[0112] The results ranking and formatting module 824 ranks the results
returned from the one or more parallel search systems (112-A-112-N, FIG.
1). As already noted above, for some visual queries, only the results
from one search system may be relevant. In such an instance, only the
relevant search results from that one search system are ranked. For some
visual queries, several types of search results may be relevant. In these
instances, in some embodiments, the results ranking and formatting module
824 ranks all of the results from the search system having the most
relevant result (e.g., the result with the highest relevance score) above
the results for the less relevant search systems. In other embodiments,
the results ranking and formatting module 824 ranks a top result from
each relevant search system above the remaining results. In some
embodiments, the results ranking and formatting module 824 ranks the
results in accordance with a relevance score computed for each of the
search results. For some visual queries, augmented textual queries are
performed in addition to the searching on parallel visual search systems.
In some embodiments, when textual queries are also performed, their
results are presented in a manner visually distinctive from the visual
search system results.

[0113] The results ranking and formatting module 824 also formats the
results. In some embodiments, the results are presented in a list format.
In some embodiments, the results are presented by means of an interactive
results document. In some embodiments, both an interactive results
document and a list of results are presented. In some embodiments, the
type of query dictates how the results are presented. For example, if
more than one searchable subject is detected in the visual query, then an
interactive results document is produced, while if only one searchable
subject is detected the results will be displayed in list format only.

[0114] The results document creation module 826 is used to create an
interactive search results document. The interactive search results
document may have one or more detected and searched subjects. The
bounding box creation module 828 creates a bounding box around one or
more of the searched subjects. The bounding boxes may be rectangular
boxes, or may outline the shape(s) of the subject(s). The link creation
module 830 creates links to search results associated with their
respective subject in the interactive search results document. In some
embodiments, clicking within the bounding box area activates the
corresponding link inserted by the link creation module.

[0115] The query and annotation database 116 contains information that can
be used to improve visual query results. In some embodiments, the user
may annotate the image after the visual query results have been
presented. Furthermore, in some embodiments the user may annotate the
image before sending it to the visual query search system. Pre-annotation
may help the visual query processing by focusing the results, or running
text based searches on the annotated words in parallel with the visual
query searches. In some embodiments, annotated versions of a picture can
be made public (e.g., when the user has given permission for publication,
for example by designating the image and annotation(s) as not private),
so as to be returned as a potential image match hit. For example, if a
user takes a picture of a flower and annotates the image by giving
detailed genus and species information about that flower, the user may
want that image to be presented to anyone who performs a visual query
research looking for that flower. In some embodiments, the information
from the query and annotation database 116 is periodically pushed to the
parallel search systems 112, which incorporate relevant portions of the
information (if any) into their respective individual databases 114.

[0116] FIG. 7 is a block diagram illustrating one of the parallel search
systems utilized to process a visual query. FIG. 7 illustrates a
"generic" server system 112-N in accordance with one embodiment of the
present invention. This server system is generic only in that it
represents any one of the visual query search servers 112-N. The generic
server system 112-N typically includes one or more processing units
(CPU's) 502, one or more network or other communications interfaces 504,
memory 512, and one or more communication buses 514 for interconnecting
these components. Memory 512 includes high-speed random access memory,
such as DRAM, SRAM, DDR RAM or other random access solid state memory
devices; and may include non-volatile memory, such as one or more
magnetic disk storage devices, optical disk storage devices, flash memory
devices, or other non-volatile solid state storage devices. Memory 512
may optionally include one or more storage devices remotely located from
the CPU(s) 502. Memory 512, or alternately the non-volatile memory
device(s) within memory 512, comprises a non-transitory computer readable
storage medium. In some embodiments, memory 512 or the computer readable
storage medium of memory 512 stores the following programs, modules and
data structures, or a subset thereof: [0117] an operating system 516
that includes procedures for handling various basic system services and
for performing hardware dependent tasks; [0118] a network communication
module 518 that is used for connecting the generic server system 112-N to
other computers via the one or more communication network interfaces 504
(wired or wireless) and one or more communication networks, such as the
Internet, other wide area networks, local area networks, metropolitan
area networks, and so on; [0119] a search application 520 specific to the
particular server system, it may for example be a bar code search
application, a color recognition search application, a product
recognition search application, an object-or-object category search
application, or the like; [0120] an optional index 522 if the particular
search application utilizes an index; [0121] an optional image database
524 for storing the images relevant to the particular search application,
where the image data stored, if any, depends on the search process type;
[0122] an optional results ranking module 526 (sometimes called a
relevance scoring module) for ranking the results from the search
application, the ranking module may assign a relevancy score for each
result from the search application, and if no results reach a pre-defined
minimum score, may return a null or zero value score to the front end
visual query processing server indicating that the results from this
server system are not relevant; and [0123] an annotation module 528 for
receiving annotation information from an annotation database (116, FIG.
1) determining if any of the annotation information is relevant to the
particular search application and incorporating any determined relevant
portions of the annotation information into the respective annotation
database 530.

[0124] FIG. 8 is a block diagram illustrating an OCR search system 112-B
utilized to process a visual query in accordance with one embodiment of
the present invention. The OCR search system 112-B typically includes one
or more processing units (CPU's) 602, one or more network or other
communications interfaces 604, memory 612, and one or more communication
buses 614 for interconnecting these components. Memory 612 includes
high-speed random access memory, such as DRAM, SRAM, DDR RAM or other
random access solid state memory devices; and may include non-volatile
memory, such as one or more magnetic disk storage devices, optical disk
storage devices, flash memory devices, or other non-volatile solid state
storage devices. Memory 612 may optionally include one or more storage
devices remotely located from the CPU(s) 602. Memory 612, or alternately
the non-volatile memory device(s) within memory 612, comprises a
non-transitory computer readable storage medium. In some embodiments,
memory 612 or the computer readable storage medium of memory 612 stores
the following programs, modules and data structures, or a subset thereof:
[0125] an operating system 616 that includes procedures for handling
various basic system services and for performing hardware dependent
tasks; [0126] a network communication module 618 that is used for
connecting the OCR search system 112-B to other computers via the one or
more communication network interfaces 604 (wired or wireless) and one or
more communication networks, such as the Internet, other wide area
networks, local area networks, metropolitan area networks, and so on;
[0127] an Optical Character Recognition (OCR) module 620 which tries to
recognize text in the visual query, and converts the images of letters
into characters; [0128] an optional OCR database 114-B which is utilized
by the OCR module 620 to recognize particular fonts, text patterns, and
other characteristics unique to letter recognition; [0129] an optional
spell check module 622 which improves the conversion of images of letters
into characters by checking the converted words against a dictionary and
replacing potentially mis-converted letters in words that otherwise match
a dictionary word; [0130] an optional named entity recognition module 624
which searches for named entities within the converted text, sends the
recognized named entities as terms in a term query to the term query
server system (118, FIG. 1), and provides the results from the term query
server system as links embedded in the OCRed text associated with the
recognized named entities; [0131] an optional text match application 632
which improves the conversion of images of letters into characters by
checking converted segments (such as converted sentences and paragraphs)
against a database of text segments and replacing potentially
mis-converted letters in OCRed text segments that otherwise match a text
match application text segment, in some embodiments the text segment
found by the text match application is provided as a link to the user
(for example, if the user scanned one page of the New York Times, the
text match application may provide a link to the entire posted article on
the New York Times website); [0132] a results ranking and formatting
module 626 for formatting the OCRed results for presentation and
formatting optional links to named entities, and also optionally ranking
any related results from the text match application; and [0133] an
optional annotation module 628 for receiving annotation information from
an annotation database (116, FIG. 1) determining if any of the annotation
information is relevant to the OCR search system and incorporating any
determined relevant portions of the annotation information into the
respective annotation database 630.

[0134] FIG. 9 is a block diagram illustrating a facial recognition search
system 112-A utilized to process a visual query in accordance with one
embodiment of the present invention. The facial recognition search system
112-A typically includes one or more processing units (CPU's) 902, one or
more network or other communications interfaces 904, memory 912, and one
or more communication buses 914 for interconnecting these components.
Memory 912 includes high-speed random access memory, such as DRAM, SRAM,
DDR RAM or other random access solid state memory devices; and may
include non-volatile memory, such as one or more magnetic disk storage
devices, optical disk storage devices, flash memory devices, or other
non-volatile solid state storage devices. Memory 912 may optionally
include one or more storage devices remotely located from the CPU(s) 902.
Memory 912, or alternately the non-volatile memory device(s) within
memory 912, comprises a non-transitory computer readable storage medium.
In some embodiments, memory 912 or the computer readable storage medium
of memory 912 stores the following programs, modules and data structures,
or a subset thereof: [0135] an operating system 916 that includes
procedures for handling various basic system services and for performing
hardware dependent tasks; [0136] a network communication module 918 that
is used for connecting the facial recognition search system 112-A to
other computers via the one or more communication network interfaces 904
(wired or wireless) and one or more communication networks, such as the
Internet, other wide area networks, local area networks, metropolitan
area networks, and so on; [0137] a facial recognition search application
920 for searching for facial images matching the face(s) presented in the
visual query in a facial image database 114-A and searches the social
network database 922 for information regarding each match found in the
facial image database 114-A. [0138] a facial image database 114-A for
storing one or more facial images for a plurality of users; optionally,
the facial image database includes facial images for people other than
users, such as family members and others known by users and who have been
identified as being present in images included in the facial image
database 114-A; optionally, the facial image database includes facial
images obtained from external sources, such as vendors of facial images
that are legally in the public domain; [0139] optionally, a social
network database 922 which contains information regarding users of the
social network such as name, address, occupation, group memberships,
social network connections, current GPS location of mobile device, share
preferences, interests, age, hometown, personal statistics, work
information, etc. as discussed in more detail with reference to FIG. 12A;
[0140] a results ranking and formatting module 924 for ranking (e.g.,
assigning a relevance and/or match quality score to) the potential facial
matches from the facial image database 114-A and formatting the results
for presentation; in some embodiments, the ranking or scoring of results
utilizes related information retrieved from the aforementioned social
network database; in some embodiment, the search formatted results
include the potential image matches as well as a subset of information
from the social network database; and [0141] an annotation module 926 for
receiving annotation information from an annotation database (116, FIG.
1) determining if any of the annotation information is relevant to the
facial recognition search system and storing any determined relevant
portions of the annotation information into the respective annotation
database 928.

[0142]FIG. 10 is a block diagram illustrating an image-to-terms search
system 112-C utilized to process a visual query in accordance with one
embodiment of the present invention. In some embodiments, the
image-to-terms search system recognizes objects (instance recognition) in
the visual query. In other embodiments, the image-to-terms search system
recognizes object categories (type recognition) in the visual query. In
some embodiments, the image to terms system recognizes both objects and
object-categories. The image-to-terms search system returns potential
term matches for images in the visual query. The image-to-terms search
system 112-C typically includes one or more processing units (CPU's)
1002, one or more network or other communications interfaces 1004, memory
1012, and one or more communication buses 1014 for interconnecting these
components. Memory 1012 includes high-speed random access memory, such as
DRAM, SRAM, DDR RAM or other random access solid state memory devices;
and may include non-volatile memory, such as one or more magnetic disk
storage devices, optical disk storage devices, flash memory devices, or
other non-volatile solid state storage devices. Memory 1012 may
optionally include one or more storage devices remotely located from the
CPU(s) 1002. Memory 1012, or alternately the non-volatile memory
device(s) within memory 1012, comprises a non-transitory computer
readable storage medium. In some embodiments, memory 1012 or the computer
readable storage medium of memory 1012 stores the following programs,
modules and data structures, or a subset thereof: [0143] an operating
system 1016 that includes procedures for handling various basic system
services and for performing hardware dependent tasks; [0144] a network
communication module 1018 that is used for connecting the image-to-terms
search system 112-C to other computers via the one or more communication
network interfaces 1004 (wired or wireless) and one or more communication
networks, such as the Internet, other wide area networks, local area
networks, metropolitan area networks, and so on; [0145] a image-to-terms
search application 1020 that searches for images matching the subject or
subjects in the visual query in the image search database 114-C; [0146]
an image search database 114-C which can be searched by the search
application 1020 to find images similar to the subject(s) of the visual
query; [0147] a terms-to-image inverse index 1022, which stores the
textual terms used by users when searching for images using a text based
query search engine 1006; [0148] a results ranking and formatting module
1024 for ranking the potential image matches and/or ranking terms
associated with the potential image matches identified in the
terms-to-image inverse index 1022; and [0149] an annotation module 1026
for receiving annotation information from an annotation database (116,
FIG. 1) determining if any of the annotation information is relevant to
the image-to terms search system 112-C and storing any determined
relevant portions of the annotation information into the respective
annotation database 1028.

[0150] FIGS. 5-10 are intended more as functional descriptions of the
various features which may be present in a set of computer systems than
as a structural schematic of the embodiments described herein. In
practice, and as recognized by those of ordinary skill in the art, items
shown separately could be combined and some items could be separated. For
example, some items shown separately in these figures could be
implemented on single servers and single items could be implemented by
one or more servers. The actual number of systems used to implement
visual query processing and how features are allocated among them will
vary from one implementation to another.

[0151] Each of the methods described herein may be governed by
instructions that are stored in a non-transitory computer readable
storage medium and that are executed by one or more processors of one or
more servers or clients. The above identified modules or programs (i.e.,
sets of instructions) need not be implemented as separate software
programs, procedures or modules, and thus various subsets of these
modules may be combined or otherwise re-arranged in various embodiments.
Each of the operations shown in FIGS. 5-10 may correspond to instructions
stored in a computer memory or non-transitory computer readable storage
medium.

[0152] FIG. 11 illustrates a client system 102 with a screen shot of an
exemplary visual query 1102. The client system 102 shown in FIG. 11 is a
mobile device such as a cellular telephone, portable music player, or
portable emailing device. The client system 102 includes a display 706
and one or more input means 708 such the buttons shown in this figure. In
some embodiments, the display 706 is a touch sensitive display 709. In
embodiments having a touch sensitive display 709, soft buttons displayed
on the display 709 may optionally replace some or all of the
electromechanical buttons 708. Touch sensitive displays are also helpful
in interacting with the visual query results as explained in more detail
below. The client system 102 also includes an image capture mechanism
such as a camera 710.

[0153] FIG. 11 illustrates a visual query 1102 which is a photograph or
video frame of a package on a shelf of a store. In the embodiments
described here, the visual query is a two dimensional image having a
resolution corresponding to the size of the visual query in pixels in
each of two dimensions. The visual query 1102 in this example is a two
dimensional image of three dimensional objects. The visual query 1102
includes background elements, a product package 1104, and a variety of
types of entities on the package including an image of a person 1106, an
image of a trademark 1108, an image of a product 1110, and a variety of
textual elements 1112.

[0154] As explained with reference to FIG. 3, the visual query 1102 is
sent to the front end server 110, which sends the visual query 1102 to a
plurality of parallel search systems (112A-N), receives the results and
creates an interactive results document.

[0155] FIGS. 12A and 12B each illustrate a client system 102 with a screen
shot of an embodiment of an interactive results document 1200. The
interactive results document 1200 includes one or more visual identifiers
1202 of respective sub-portions of the visual query 1102, which each
include a user selectable link to a subset of search results. FIGS. 12A
and 12B illustrate an interactive results document 1200 with visual
identifiers that are bounding boxes 1202 (e.g., bounding boxes 1202-1,
1202-2, 1202-3). In the embodiments shown in FIGS. 12A and 12B, the user
activates the display of the search results corresponding to a particular
sub-portion by tapping on the activation region inside the space outlined
by its bounding box 1202. For example, the user would activate the search
results corresponding to the image of the person, by tapping on a
bounding box 1306 (FIG. 13) surrounding the image of the person. In other
embodiments, the selectable link is selected using a mouse or keyboard
rather than a touch sensitive display. In some embodiments, the first
corresponding search result is displayed when a user previews a bounding
box 1202 (i.e., when the user single clicks, taps once, or hovers a
pointer over the bounding box). The user activates the display of a
plurality of corresponding search results when the user selects the
bounding box (i.e., when the user double clicks, taps twice, or uses
another mechanism to indicate selection.)

[0156] In FIGS. 12A and 12B the visual identifiers are bounding boxes 1202
surrounding sub-portions of the visual query. FIG. 12A illustrates
bounding boxes 1202 that are square or rectangular. FIG. 12B illustrates
a bounding box 1202 that outlines the boundary of an identifiable entity
in the sub-portion of the visual query, such as the bounding box 1202-3
for a drink bottle. In some embodiments, a respective bounding box 1202
includes smaller bounding boxes 1202 within it. For example, in FIGS. 12A
and 12B, the bounding box identifying the package 1202-1 surrounds the
bounding box identifying the trademark 1202-2 and all of the other
bounding boxes 1202. In some embodiments that include text, also include
active hot links 1204 for some of the textual terms. FIG. 12B shows an
example where "Active Drink" and "United States" are displayed as hot
links 1204. The search results corresponding to these terms are the
results received from the term query server system 118, whereas the
results corresponding to the bounding boxes are results from the query by
image search systems.

[0157]FIG. 13 illustrates a client system 102 with a screen shot of an
interactive results document 1200 that is coded by type of recognized
entity in the visual query. The visual query of FIG. 11 contains an image
of a person 1106, an image of a trademark 1108, an image of a product
1110, and a variety of textual elements 1112. As such the interactive
results document 1200 displayed in FIG. 13 includes bounding boxes 1202
around a person 1306, a trademark 1308, a product 1310, and the two
textual areas 1312. The bounding boxes of FIG. 13 are each presented with
separate cross-hatching which represents differently colored transparent
bounding boxes 1202. In some embodiments, the visual identifiers of the
bounding boxes (and/or labels or other visual identifiers in the
interactive results document 1200) are formatted for presentation in
visually distinctive manners such as overlay color, overlay pattern,
label background color, label background pattern, label font color, and
bounding box border color. The type coding for particular recognized
entities is shown with respect to bounding boxes in FIG. 13, but coding
by type can also be applied to visual identifiers that are labels.

[0158] FIG. 14 illustrates a client device 102 with a screen shot of an
interactive results document 1200 with labels 1402 being the visual
identifiers of respective sub-portions of the visual query 1102 of FIG.
11. The label visual identifiers 1402 each include a user selectable link
to a subset of corresponding search results. In some embodiments, the
selectable link is identified by descriptive text displayed within the
area of the label 1402. Some embodiments include a plurality of links
within one label 1402. For example, in FIG. 14, the label hovering over
the image of a woman drinking includes a link to facial recognition
results for the woman and a link to image recognition results for that
particular picture (e.g., images of other products or advertisements
using the same picture.)

[0159] In FIG. 14, the labels 1402 are displayed as partially transparent
areas with text that are located over their respective sub-portions of
the interactive results document. In other embodiments, a respective
label is positioned near but not located over its respective sub-portion
of the interactive results document. In some embodiments, the labels are
coded by type in the same manner as discussed with reference to FIG. 13.
In some embodiments, the user activates the display of the search results
corresponding to a particular sub-portion corresponding to a label 1302
by tapping on the activation region inside the space outlined by the
edges or periphery of the label 1302. The same previewing and selection
functions discussed above with reference to the bounding boxes of FIGS.
12A and 12B also apply to the visual identifiers that are labels 1402.

[0160] FIG. 15 illustrates a screen shot of an interactive results
document 1200 and the original visual query 1102 displayed concurrently
with a results list 1500. In some embodiments, the interactive results
document 1200 is displayed by itself as shown in FIGS. 12-14. In other
embodiments, the interactive results document 1200 is displayed
concurrently with the original visual query as shown in FIG. 15. In some
embodiments, the list of visual query results 1500 is concurrently
displayed along with the original visual query 1102 and/or the
interactive results document 1200. The type of client system and the
amount of room on the display 706 may determine whether the list of
results 1500 is displayed concurrently with the interactive results
document 1200. In some embodiments, the client system 102 receives (in
response to a visual query submitted to the visual query server system)
both the list of results 1500 and the interactive results document 1200,
but only displays the list of results 1500 when the user scrolls below
the interactive results document 1200. In some of these embodiments, the
client system 102 displays the results corresponding to a user selected
visual identifier 1202/1402 without needing to query the server again
because the list of results 1500 is received by the client system 102 in
response to the visual query and then stored locally at the client system
102.

[0161] In some embodiments, the list of results 1500 is organized into
categories 1502. Each category contains at least one result 1503. In some
embodiments, the categories titles are highlighted to distinguish them
from the results 1503. The categories 1502 are ordered according to their
calculated category weight. In some embodiments, the category weight is a
combination of the weights of the highest N results in that category. As
such, the category that has likely produced more relevant results is
displayed first. In embodiments where more than one category 1502 is
returned for the same recognized entity (such as the facial image
recognition match and the image match shown in FIG. 15) the category
displayed first has a higher category weight.

[0162] As explained with respect to FIG. 3, in some embodiments, when a
selectable link in the interactive results document 1200 is selected by a
user of the client system 102, the cursor will automatically move to the
appropriate category 1502 or to the first result 1503 in that category.
Alternatively, when a selectable link in the interactive results document
is selected by a user of the client system 102, the list of results 1500
is re-ordered such that the category or categories relevant to the
selected link are displayed first. This is accomplished, for example, by
either coding the selectable links with information identifying the
corresponding search results, or by coding the search results to indicate
the corresponding selectable links or to indicate the corresponding
result categories.

[0163] In some embodiments, the categories of the search results
correspond to the query-by-image search system that produce those search
results. For example, in FIG. 15 some of the categories are product match
1506, logo match 1508, facial recognition match 1510, image match 1512.
The original visual query 1102 and/or an interactive results document
1200 may be similarly displayed with a category title such as the query
1504. Similarly, results from any term search performed by the term query
server may also be displayed as a separate category, such as web results
1514. In other embodiments, more than one entity in a visual query will
produce results from the same query-by-image search system. For example,
the visual query could include two different faces that would return
separate results from the facial recognition search system. As such, in
some embodiments, the categories 1502 are divided by recognized entity
rather than by search system. In some embodiments, an image of the
recognized entity is displayed in the recognized entity category header
1502 such that the results for that recognized entity are distinguishable
from the results for another recognized entity, even though both results
are produced by the same query by image search system. For example, in
FIG. 15, the product match category 1506 includes two entity product
entities and as such as two entity categories 1502--a boxed product 1516
and a bottled product 1518, each of which have a plurality of
corresponding search results 1503. In some embodiments, the categories
may be divided by recognized entities and type of query-by-image system.
For example, in FIG. 15, there are two separate entities that returned
relevant results under the product match category product.

[0164] In some embodiments, the results 1503 include thumbnail images. For
example, as shown for the facial recognition match results in FIG. 15,
small versions (also called thumbnail images) of the pictures of the
facial matches for "Actress X" and "Social Network Friend Y" are
displayed along with some textual description such as the name of the
person in the image.

[0165] FIGS. 16A-16C are flow diagrams illustrating a process for using
both location sensor data and a visual query to return local listings for
the visual query according to some embodiments. FIGS. 17-19 illustrate
various methods of selecting search results identified using the process
illustrated in FIGS. 16A-16C. Each of the operations shown in FIGS.
16A-19 may correspond to instructions stored in a computer memory or
computer readable storage medium. Specifically, many of the operations
correspond to executable instructions in the local listings selection
module 840 of the front end search system 110 (FIG. 6), the search
application 2320 of the location-augmented search system 112-F (FIG. 23)
and the search application 2420 of the location-based search system (FIG.
24).

[0166] Using location information or enhanced location information to
improve visual query searching is useful for "street view visual
queries." For example, if a user stands on a street corner and takes a
picture of a building as the visual query, and it is processed using
current location information (i.e., information identifying the location
of the client device) as well as the visual query, the search results
will include information about the business(es) or organization(s)
located in that building.

[0167] As illustrated in FIG. 16A, a front end server receives a visual
query from a client system (202). The front end server also receives
location information (1602). In some embodiments, the location
information includes GPS sensor information or cell phone tower
information (1604). This location information is typically rough, i.e.,
it has a relatively low accuracy, and the following description will
discuss ways to improve its accuracy. The location information received
is likely to pinpoint the user within a specified range. In some
embodiments, the location information locates the client system with an
accuracy of 75 feet or better; in some other embodiments (as described
above) the location information has an accuracy of no worse than A, where
A is a predefined value of 100 meters or less.

[0168] In some embodiments, the location information is computed based on
previously received location information (1606). In some embodiments,
other sensor information is also received from the client device (1608).
The other sensor information may include information from one or more of:
a magnetometer 742, an accelerometer 744, or other sensor 746 in the
client device 102 (discussed with reference to FIG. 5.) In some
embodiments, the additional sensor information is used to calculate a
rough direction that the user is looking or azimuth, referred to herein
as a pose. In some embodiments, the additional sensor information is used
to calculate the movement of the user since the time of the visual query
using the dead reckoning principle.

[0169] The visual query system sends a request for enhanced location
information (the request including the visual query and the location
information) to at least one visual query search system (1610). As
explained with reference to FIG. 2, in most embodiments at least the
visual query is sent to a plurality of parallel search systems for
simultaneous processing. In some embodiments, the visual query search
system sends the visual query to a location-augmented search system
(112-F shown in FIG. 23) (1612). The location-augmented search system
performs a visual query match search on a corpus of street view images
(previously stored in an image database 2322) within a specified range of
the client device's location (as identified by the location information).
If the image match is found within this corpus, an associated pinpoint
location (2310 shown in FIG. 23) identified. In some embodiments, the
pinpoint location 2310 also has an accuracy value 2332 which indicates
the accuracy of the pinpoint location value. The pinpoint location is
used to determine enhanced location information associated with the
visual query. Then the enhanced location information is returned to the
requesting server (e.g., the front end server) of the visual query
system. If no match is found in the corpus of street view images, then no
enhanced location information is determined.

[0170] In response to the aforementioned request (1610), the requesting
server receives enhanced location information (1614). As described above,
the enhanced location information is based on the visual query and the
rough location information provided by the client device's sensors.
Typically, the enhanced location information has a greater accuracy than
the received location information (1616). In some embodiments, the
enhanced location information pinpoints the particular location of the
user within a narrower range than the original range. In some
embodiments, the particular location identified by the enhanced location
information is within predefined distance, such as the 10 or 15 feet,
from the client device's actual location. Optionally (but typically) the
enhanced location information also includes the pose (i.e., the direction
that the user is facing) (1618).

[0171] The visual query system sends a search query to a location-based
search system (112-G shown in FIG. 24) (1620). The location-based search
system uses the location data to identify records 2406 in its location
database 2422 for local listings that are near the location provided in
the search query. If enhanced location information was obtained in
response to the provided to the front end server, the search query will
include the enhanced location information (1622). Furthermore, if pose
information was provided to the front end server, it will also be
included in the search query (1624).

[0172] Referring to FIG. 16B, the location-based search system (112-G
shown in FIG. 24) sends one or more search results to the front end
server (1626). In some embodiments, the search results include one or
more results (e.g., local listings) in accordance with enhanced location
information (1628). In some embodiments, the search results include one
or more results in the direction of the pose (1630).

[0173] Optionally, the visual query system (e.g., the front end server)
creates an interactive results document comprising a bounding box
outlining a respective sub-portion of the visual query and including at
least one user selectable link to at least one of the search results
(1632). The details of bounding boxes were discussed with respect to FIG.
3. Optionally, the bounding box is created by projecting earth
coordinates of a search result onto screen coordinates of the visual
query (1634).

[0174] The visual query system then sends at least one search result to
the client system (1636). The search results include local listings. For
example, they may include search results for entities such as businesses,
organizations, or points of interest near the physical location of the
client device. The search results may include only entities visible in
the visual query. Alternatively, the search results may include may
include entities not visible in the visual query. In some embodiments, a
respective search result sent to the client device is located in the
direction of the pose (1638). These search results may include both
entities that are visible and entities that are not-visible in the visual
query. In some embodiments, a respective search result includes a
bounding box (1640) that identifies a portion of the visual query
corresponding to the respective search result. FIGS. 17-19 describe
embodiments for selecting particular local listings to send to the client
system.

[0175] In some embodiments, the front end server also sends to the client
device, along with the search results, a street view image determined by
the visual query system to match the visual query (1642).

[0176] FIG. 16C includes an optional method for processing a second visual
query. The second visual query is received from the client system (1644),
typically after the client system has been moved from the location of
client system when a first (i.e., earlier) visual query from the same
client system was processed. Second location information is also received
from the client system (1646). The visual query system (e.g., the front
end server of the visual query system) sends a request to the visual
query search system (specifically the location-augmented search system
112-F-FIG. 23) requesting second enhanced location information based on
the second visual query and the second location information (1648).

[0177] When the request for second enhanced location information is
successful, resulting in receipt of second enhanced location information
having greater accuracy than the second location information received
from the client system, the visual query system sends a second search
query to a location-based search system (112-G, FIG. 24), which includes
the second enhanced location information (1650). One or more search
results in accordance with the second search query are then received
(1652), and at least one search result in accordance with the second
search query is sent to the client system (1654).

[0178] When the request for second enhanced location information is not
successful, the visual query system sends a third search query to the
location based search system, which includes the enhanced location
information from the first query (1656). In this embodiment, the original
enhanced location information is preferred over the second location
information received from the client because the original enhanced
location probably more accurately pinpoints the location of the client
device than the rough location information provided by the client device.
In some embodiments, the user may not have moved at all since the time of
the original query. He may have only rotated. As long as the client
device's speed of movement and/or the amount of time that has elapsed
since the first visual query was received from the client device do not
exceed predefined limits, the original pinpoint location of the client
device remains relatively accurate. In this embodiment, one or more
search results in accordance with the third search query are then
received (1658), and at least one search result in accordance with the
second search query is sent to the client system (1660).

[0179] FIG. 17 is a flow diagram illustrating a frustum method of
selecting search results. In this method, a visual query is received from
a client device (202) and a plurality of initial search results (e.g.,
local listings) are received (1701), for example the systems and methods
discussed above. The initial search results are then filtered using a
viewing frustum, as discussed next.

[0180] A viewing frustum is a model of the client device's field of view.
In some embodiments, the frustum is constructed based on the location of
the client device and the pose information (1702). In some embodiments,
the pose information is provided (see 1618) as a part of the enhanced
location information. In embodiments where the pose information was not
determined by the location-augmented search system, a rough pose can
sometimes be determined based on information provided from a client
device sensor such as a magnetometer (742 of FIG. 5) (1704).

[0181] The frustum has a length L which is a certain defined distance from
the location of the client device. In some embodiments, the length of the
frustum is a function of the accuracy of the location information. If the
enhanced location information is highly accurate, then the length of the
frustum is within a "short range." In some embodiments, this short range
is less than 100 yards. If the enhanced location is not accurate, or if
the enhanced location information was not found, the length of the
frustum is within a "large range" relative to the short range. In some
embodiments, this large range is more than the short range and less than
500 yards.

[0182] In some embodiments, the viewing frustum is also constructed based
the current orientation of the device (1706). In some embodiments, the
orientation is determined based on an asymmetrical aspect ratio of the
visual query (1708). Users typically hold an asymmetrical device, a
device whose width and height are not the same length, in one of two
orientations: portrait or landscape. In some embodiments, the orientation
of the device is determined sensor information from a client device
sensor (e.g., information from accelerometers in the client device)
(1710).

[0183] Once the viewing frustum is constructed, it is used to test whether
or not a search result is within the field of view of the client device.
If a search result location is within the frustum, it is considered to be
in the field of view of the client device (also called being "in view of
the client device"). If a search result is not within the frustum, it is
not considered to be in view of the client device. In some embodiments,
when a plurality of search results is received, the search results are
filtered to exclude search results outside of the viewing frustum (also
called "outside the field of view of the client device") (1712). As long
as there are any search results remaining, at least one search result
within the viewing frustum is sent to the client system (1714) as a
response to the visual query.

[0184] FIG. 18 is a flow diagram illustrating a method of selecting search
results based on prominence and location data. An accuracy value for the
enhanced location information is identified (1802). This accuracy value
is identified at least in part based on the accuracy value 2332 for the
pinpoint location 2310 of the street view record 2306 in the image
database 2322 of the location-augmented search system 112-F (references
from FIG. 23) identified as matching the visual query and the location
information provided to the location-augmented search system. In some
embodiments, the accuracy value is a numeric value that indicates
accuracy. In one example, the accuracy value indicates an estimated or
maximum inaccuracy as measured in predefined units (e.g., meters or
feet). Lower accuracy values in this example indicate greater accuracy.
Thus, an accuracy value of "10" would indicate an estimated accuracy of
10 meters, while a value of "50" indicates an estimated accuracy of 50
meters. In another example, the accuracy value may indicate one of two or
more predefined levels. For example, a system could have four predefined
distinct accuracy levels, 1 to 4, or A to D. Any suitable designations of
the levels could be used.

[0185] A prominence value for a respective search result is also
identified (1804). The prominence value is a relative determination of
the importance of a search result. For example, famous landmarks like the
Eiffel Tower have high prominence values. In another example, restaurants
with high ratings (by customers, or critics, or both) are assigned higher
prominence values than restaurants with relatively low ratings. The
prominence value 2436 is associated with a respective record 2410 in the
location database 2422 of the location-based search system 112-G
(references from FIG. 24) returned as a search result from the
location-based search system.

[0186] An associated position of a respective search result is also
identified (1806). In some embodiments, the position is physical location
of an entity (e.g., building, business, landmark, etc.), as determined by
the location information 2410 in a respective record 2406 in the location
database 2422 of the location-based search system 112-G (FIG. 24),
returned as a search result from the location-based search system. In
some embodiments, the location information 2410 is a pair of latitude and
longitude values. In some embodiments, the location information also
provides information regarding a point closest to the entity's front door
and a point closest to the street. The way the entity faces can then be
determined by forming a vector between the two points. In some
embodiments, the position is the postal address 2434 of the entity, which
is likewise associated with a respective record 2406 in the location
database 2422 of the location-based search system 112-G (references from
FIG. 24) returned as a search result from the location-based search
system.

[0187] The server performing the method illustrated in FIG. 18 determines
the distance between the enhanced location (of the client device) and the
associated position of a respective search result (1808).

[0188] Then the server determines favored search results in accordance
with the accuracy value of the enhanced location (1810). When enhanced
location information for the client device is accurate (has a high
accuracy value), nearby listings are preferred over prominent listings
that are less close to the client device for inclusion in the search
results. More specifically, the server favors search results near the
enhanced location when the enhanced location has an accuracy value at or
above a threshold (1812). In some embodiments, when enhanced location
information for the client device is accurate, a first set of weighting
factors that favor listings (i.e., search results) based on close
location as opposed to prominence are used. For example, for accurate
enhanced location information a weighting factor of 0.8 is multiplied by
a closeness metric (which corresponds to how close a search result's
location is to the client device's location) and a weighting factor of
0.2 is multiplied by the prominence value of the search result. In some
embodiments, a variable radius of relevant search results is used. A
large radius is used when the location information for the client device
has low accuracy (an accuracy value below a threshold) and a small radius
is used when the location information for the client device has high
accuracy (an accuracy value above a threshold).

[0189] Similarly, when the client device location is not accurate,
prominent local listings are favored over listings calculated to be
closest to the client device by using a second set of weighting factors.
This is because listings calculated to be closest may not actually be
close at all due to the inaccuracy of the client device location value.
The visual query system favors search results with a high prominence
value when the enhanced location is not available or has an accuracy
value below the threshold (1814). When enhanced location information for
the client device has a low accuracy, a second set of weighting factors
that favor listings based on prominence as opposed to location are used.
For example, when the enhanced location information is below a threshold,
a weighting factor of 0.2 is multiplied by a closeness metric (which
corresponds to how close a search result's location is to the client
device's location) and a weighting factor of 0.8 is multiplied by the
prominence value of the search result. Finally, at least one favored
search result is sent to the client system (1816).

[0190]FIG. 19 is a flow diagram illustrating a method of selecting search
results based on relative position and accuracy data. An associated
position of a respective search result is also identified (1806). As
discussed with reference to FIG. 18, in some embodiments, the associated
position is (physical) location information 2410 and in other embodiments
it is the postal address information 2434 associated with a respective
record 2410 in the location database 2422 of the location-based search
system 112-G (references from FIG. 24) returned as a search result from
the location-based search system.

[0191] Similarly, a positional accuracy associated with a respective
search result is also identified (1904). The positional accuracy is the
accuracy of location 2432 associated with a respective record 2410 in the
location database 2422 of the location-based search system 112-G
(references from FIG. 24), returned as a search result from the
location-based search system. In some embodiments, the visual query
system selects one or more search results having highest associated
positional accuracy (1906).

[0192] The server performing the method illustrated in FIG. 19 determines
a positional closeness value (sometimes called a closeness metric)
between a respective search result position and the enhanced location
information for the client system (1908). In some embodiments, the server
selects one or more first search results whose positional closeness value
satisfies a positional closeness requirement (1910). In some embodiments,
the positional closeness requirement is an absolute value, such as 100
yards. In other embodiments the positional closeness requirement varies
depending on the accuracy of the enhanced location as discussed with
relation to FIG. 18. In some embodiments, the server selects one or more
first search results that also have a positional accuracy that is equal
to or greater than a threshold (1912).

[0193] In some embodiments, the server selects one or more second search
results in accordance with a requirement that each identified second
search result satisfy a second positional closeness requirement with
respect to at least one of the first search results (1914). In other
words, when the candidate search results include local listings having
accurate information and others with less accurate location (sometimes
herein called inaccurate locations), the final search results include
only A) local listings with accurate location information that are near
the device's location, and B) those local listing having inaccurate
information that are known to be near the accurately located local
listings in (A). In some embodiments, the inaccurately located listings
are known to be near the accurately located listing by some other means,
such as postal address, street name, or by clustering locations.

[0194] In some embodiments, the server excludes from the selected search
results those search results that have a positional accuracy less than a
threshold (1916). This threshold value is analogous to that discussed
above. In some embodiments, the server excludes one or more search
results that also do not satisfy a positional closeness requirement with
respect to at least one of the selected search results that has
positional accuracy equal to or greater than the threshold and that
satisfies a first positional closeness requirement with respect to the
enhanced location information for the client system (1918).

[0195] Finally, at least one selected search result is sent to the client
system (1920).

[0196] It should be noted that, as discussed above in relation to FIGS.
16A-19, in embodiments when the request for enhanced location information
is successful, resulting in receipt of enhanced location information
having greater accuracy than the location information received from the
client system, the visual query system sends a first search query to a
location-based search system. The search query includes the enhanced
location information. The visual query system then receives one or more
search results in accordance with the first search query. However, when
the request for enhanced location information is not successful, the
visual query system sends a second search query to the location-based
search system. The second search query includes the received location
information from the client system. Then the visual query system receives
one or more search results in accordance with the second search query,
and culls them in various ways as discussed above before sending at least
one of the search results to the client system.

[0197] FIG. 20 is a flow diagram illustrating the communications between a
client system 102 and a visual query system (e.g., front end visual query
server system 110 of a visual query system) for creating actionable
search results 1700 with location information. In some embodiments, the
location information is enhanced prior to being used. In these
embodiments, visual query results are based at least in part on the
location of the user at the time of the querying.

[0198] Each of the operations shown in FIG. 20 may correspond to
instructions stored in a computer memory or computer readable storage
medium. Specifically, many of the operations correspond to executable
instructions in the local listings selection module 840 of the front end
search system 110 (FIG. 6).

[0199] The client device or system 102 receives an image from the user
(2002). In some embodiments, the image is received from a camera 710
(FIG. 5) in the client device or system 102. The client system also
receives location information (2004) indicating the location of the
client system. The location information may come from a GPS device 707
(FIG. 5) in the client device or system 102. Alternately, or in addition,
the location information may come from cell tower usage information or
local wireless network information. In order to be useful for producing
street-view-assisted results, the location information typically must
satisfy an accuracy criterion. In some embodiments, when the location
information has an accuracy of no worse than A, where A is a predefined
value of 100 meters or less, the accuracy criterion is satisfied. The
client system 102 creates a visual query from the image (2006) and sends
the visual query to the server system (2008). In some embodiments, the
client system 102 also sends the location information to the server
(2010).

[0200] The front end server system 110 receives the visual query (2012)
from the client system. It also receives location information (2014). The
front end server system 110 sends the visual query to at least one search
system implementing a visual query process (2016). In some embodiments,
the visual query is sent to a plurality of parallel search systems. The
search systems return one or more search results (2024). The front end
server system sends the location information to at least one location
augmented search system (2018). The location information received (at
2014) is likely to pinpoint the user within a specified range. In some
embodiments, the location information locates the client system with an
accuracy of 75 feet or better; in some other embodiments (as described
above) the location information has an accuracy of no worse than A, where
A is a predefined value of 100 meters or less.

[0201] The location-augmented search system (112-F shown in FIG. 23)
performs a visual query match search on a corpus of street view images
(previously stored in an image database 2322) within the specified range.
If the image match is found within this corpus, enhanced location
information associated with the matching image is retrieved. In some
embodiments, the enhanced location information pinpoints the particular
location of the user within a narrower range than the original range and
optionally (but typically) also includes the pose (i.e., the direction
that the user is facing.) In some embodiments, the particular location
identified by the enhanced location information is within predefined
distance, such as the 10 or 15 feet, from the client device's actual
location. In this embodiment, the front end server system 110 receives
the enhanced location information based on the visual query and the
location information from the location augmented search system (2020).
Then the front end server system 110 sends the enhanced location
information to a location-based query system (112-G shown in FIG. 24)
(2022). The location-based query system 112-G retrieves and returns one
or more search results, which are received by the front end server system
(2024). Optionally, the search results are obtained in accordance with
both the visual query and the enhanced location information (2026).
Alternately, the search results are obtained in accordance with the
enhanced location information, which was retrieved using the original
location information and the visual query (2028).

[0202] It should be noted that the visual query results (received at 2024)
may include results for entities near the pinpointed location, whether or
not these entities are viewable in the visual query image. For example,
the visual query results may include entities obstructed in the original
visual query (e.g., by a passing car or a tree.) In some embodiments, the
visual query results will also include nearby entities such as businesses
or landmarks near the pinpointed address even if these entities are not
in the visual query image at all.

[0203] The front end server system 110 sends one or more search results to
the client system (2030). As explained with reference to FIGS. 16A-19,
there are numerous methods used to determine which search results should
be sent. The client system 102 receives the one or more search results
(2032). Then the client system displays the one or more search results
(2034).

[0204]FIG. 21 illustrates a client system display of an embodiment of a
results list 1500 and returned for a visual query 1200 of a building. The
visual query 1200 in this embodiment was processed as a street view
visual query, and thus the received search results were obtained in
accordance with both the visual query and location information provided
by the client system 102. The visual query in this embodiment was taken
in portrait mode. The identified entity for this query is the San
Francisco (SF) Ferry building 2101. A thumbnail 2102 of the street view
image for the San Francisco Ferry building is provided along with the
search results. In the embodiment shown in FIG. 21, the "place match"
visual query search result information 2104 is displayed. The place match
result includes the name of the building (SF Ferry Building), the postal
address (Pier 48), a description about the place, and a star rating. Some
of this information was obtained from the associated information 2408 of
this record in the location-based search system 112-G (FIG. 24). Some of
this information was obtained based on other searches performed by other
visual query search systems 112-A-112-N and the term query server system
118

[0205] The search results list includes web results 1514 and related place
matches 2110. The search results list includes other places identified by
the street view place match system. In some embodiments, the place match
system displays other similar and/or other nearby places to the one
identified as currently being in front of the user. For example, if the
place in front of the user were identified as a That restaurant, the
street view place match system may display other That restaurants within
one mile of the identified place.

[0206] In the embodiment shown in FIG. 21 the displayed related places
2110 are places that are also popular tourist stops--the California
Academy of Sciences 2112 and the Palace of Fine Arts 2114. These place
matches have high prominence values. In this embodiment these high
prominence results are displayed rather than results near the SF Ferry
Building. In other words results with high prominence values were favored
over results near the enhanced location. This is probably due to the fact
that an accuracy value for the enhanced location information did not
reach a threshold, i.e., the enhanced location information had a low
accuracy value. If the accuracy value had reached a threshold, rather
than displaying results with high prominence values, the results
displayed would be places geographically next to the identified place,
such as the stores on either side or above the store in the visual query.

[0207] FIG. 22 illustrates a client system display of an embodiment where
a plurality of actionable search result elements 1700 overlay the visual
query 1200. In this embodiment the actionable search result elements
which are returned are for a street view visual query. Actionable search
results are explained in detail in U.S. Provisional Patent Application
No. 61/266,133, filed Dec. 2, 2009, entitled "Actionable Search Results
for Street View Visual Queries," which application is incorporated by
reference herein in its entirety.

[0208] In the embodiment shown in FIG. 22, the front end server system
received enhanced location information with a high accuracy value. As
such, only the closest entity to the enhanced location was provided as a
search result. The location-based search system identified a restaurant
entity called "The City Restaurant" 2201 with a high enough confidence
that it was the only result returned. Then a variety of additional
information about this restaurant entity is provided. The front end
server identified several client side actions corresponding to "The City
Restaurant" entity 2201 and created actionable search result elements for
them. The actionable search result elements include a button 2204 to call
a phone number associated with the restaurant, a button 2206 to read
reviews regarding the restaurant, a button 2208 to get information
regarding the restaurant, a button 2210 for mapping the address
associated with the restaurant, a button 2212 for making reservations at
the restaurant, and a button 2214 for more information such as nearby or
similar restaurants. The actionable result elements in the embodiment
shown in FIG. 22 are displayed overlaying a portion of the visual query
1200 in an actionable search result element display box 2216. In this
embodiment, the display box 2216 is partially transparent to allow the
user to see the original query under the display box 2216. In some
embodiments, the display box 2216 includes a tinted overlay such as red,
blue, green etc. In other embodiments, the display box 2216 grays out the
original query image. The display box 2216 also provides the name of the
identified entity 2218, in this case the restaurant name "The City
Restaurant." The partially transparent display box 2216 embodiment is an
alternative to the results list style view shown in FIG. 21. This
embodiment allows the user to intuitively associate the actionable search
result buttons with the identified entity in the query.

[0209]FIG. 23 is a block diagram illustrating one of the location
augmented search system utilized to process a visual query. FIG. 23
illustrates a location augmented search system 112-F in accordance with
some embodiments. The location augmented search system 112-F includes one
or more processing units (CPU's) 2302, one or more network or other
communications interfaces 2304, memory 2312, and one or more
communication buses 2314 for interconnecting these components. The
communication buses 2314 may include circuitry (sometimes called a
chipset) that interconnects and controls communications between system
components. Memory 2312 includes high-speed random access memory, such as
DRAM, SRAM, DDR RAM or other random access solid state memory devices;
and may include non-volatile memory, such as one or more magnetic disk
storage devices, optical disk storage devices, flash memory devices, or
other non-volatile solid state storage devices. Memory 2312 may
optionally include one or more storage devices remotely located from the
CPU(s) 2302. Memory 2312, or alternately the non-volatile memory
device(s) within memory 2312, comprises a computer readable storage
medium. In some embodiments, memory 2312 or the computer readable storage
medium of memory 2312 stores the following programs, modules and data
structures, or a subset thereof: [0210] an operating system 2316 that
includes procedures for handling various basic system services and for
performing hardware dependent tasks; [0211] a network communication
module 2318 that is used for connecting the location augmented search
system 112-F to other computers via the one or more communication network
interfaces 2304 (wired or wireless) and one or more communication
networks, such as the Internet, other wide area networks, local area
networks, metropolitan area networks, and so on; [0212] a search
application 2320 which searches a street view index for relevant images
matching the visual query which are located within a specified range of
the client system's location, as specified by location information
associated with the client system, and if a matching image is found,
returns augmented/enhanced location information, which is more accurate
than the previously available location information for the client system;
[0213] an image database 2322 that includes street view image records
2306; each street view image record includes an image 2308, pinpoint
location information 2310, and an accuracy value 2332; [0214] an optional
index 2324 for organizing the street view image records 2306 in the image
database 2320; [0215] an optional results ranking module 2326 (sometimes
called a relevance scoring module) for ranking the results from the
search application, the ranking module may assign a relevancy score for
each result from the search application, and if no results reach a
pre-defined minimum score, may return a null or zero value score to the
front end visual query processing server indicating that the results from
this server system are not relevant; and [0216] an annotation module 2328
for receiving annotation information from an annotation database (116,
FIG. 1) determining if any of the annotation information is relevant to
the particular search application and incorporating any determined
relevant portions of the annotation information into the respective
annotation database 2330.

[0217] FIG. 24 is a block diagram illustrating a location based search
system 112-G in accordance with some embodiments. The location based
search system 112-G, which is used to process location queries, includes
one or more processing units (CPU's) 2402, one or more network or other
communications interfaces 2404, memory 2412, and one or more
communication buses 2414 for interconnecting these components. The
communication buses 2414 may include circuitry (sometimes called a
chipset) that interconnects and controls communications between system
components. Memory 2412 includes high-speed random access memory, such as
DRAM, SRAM, DDR RAM or other random access solid state memory devices;
and may include non-volatile memory, such as one or more magnetic disk
storage devices, optical disk storage devices, flash memory devices, or
other non-volatile solid state storage devices. Memory 2412 may
optionally include one or more storage devices remotely located from the
CPU(s) 2402. Memory 2412, or alternately the non-volatile memory
device(s) within memory 2412, comprises a computer readable storage
medium. In some embodiments, memory 2412 or the computer readable storage
medium of memory 2412 stores the following programs, modules and data
structures, or a subset thereof: [0218] an operating system 2416 that
includes procedures for handling various basic system services and for
performing hardware dependent tasks; [0219] a network communication
module 2418 that is used for connecting the location based search system
112-G to other computers via the one or more communication network
interfaces 2404 (wired or wireless) and one or more communication
networks, such as the Internet, other wide area networks, local area
networks, metropolitan area networks, and so on; [0220] a search
application 2420 which searches the location based index for search
results that are located within a specified range of the enhanced
location information provided by the location augmented search system
(112-F) or the rough location information provided be the client system;
in some embodiments all search results within the specified range are
returned, while in other embodiments the returned results are the closest
N results to the enhanced location, in yet other embodiments the search
application returns search results that are topically similar to the
result associated with the enhanced location information (for example,
all restaurants within a certain range of the restaurant associated with
the enhanced location information); [0221] an location database 2422
which includes records 2406, each record includes a location information
2310 which may include one or more locations of the entity in the image
such as a point near the front door and a point near the street,
information regarding the accuracy of the location 2432, an optional
postal address 2434, a prominence value 2436 indicating the relative
importance of the record, and associated other information 2308 (such as
metadata, contact information, reviews, and images); [0222] an optional
index 2424 for organizing the records 2406 in the location database 2420;
[0223] an optional results ranking module 2426 (sometimes called a
relevance scoring module) for ranking the results from the search
application, the ranking module may assign a relevancy score for each
result from the search application, and if no results reach a pre-defined
minimum score, may return a null or zero value score to the front end
visual query processing server indicating that the results from this
server system are not relevant; and [0224] an annotation module 2428 for
receiving annotation information from an annotation database (116, FIG.
1) determining if any of the annotation information is relevant to the
particular search application and incorporating any determined relevant
portions of the annotation information into the respective annotation
database 2430.

[0225] Each of the software elements shown in FIGS. 23 and 24 may be
stored in one or more of the previously mentioned memory devices, and
corresponds to a set of instructions for performing a function described
above. The above identified modules or programs (i.e., sets of
instructions) need not be implemented as separate software programs,
procedures or modules, and thus various subsets of these modules may be
combined or otherwise re-arranged in various embodiments. In some
embodiments, memory of the respective system may store a subset of the
modules and data structures identified above. Furthermore, memory of the
respective system may store additional modules and data structures not
described above.

[0226] Although FIGS. 23 and 24 show search systems, these Figures are
intended more as functional descriptions of the various features which
may be present in a set of servers than as a structural schematic of the
embodiments described herein. In practice, and as recognized by those of
ordinary skill in the art, items shown separately could be combined and
some items could be separated. For example, some items shown separately
in FIGS. 23 and 24 could be implemented on single servers and single
items could be implemented by one or more servers. The actual number of
servers used to implement a location-based search system or
location-augmented search system and how features are allocated among
them will vary from one implementation to another, and may depend in part
on the amount of data traffic that the system must handle during peak
usage periods as well as during average usage periods.

[0227] The foregoing description, for purpose of explanation, has been
described with reference to specific embodiments. However, the
illustrative discussions above are not intended to be exhaustive or to
limit the claims to the precise forms disclosed. Many modifications and
variations are possible in view of the above teachings. The embodiments
were chosen and described in order to best explain the principles of the
invention and its practical applications, to thereby enable others
skilled in the art to utilize the invention and various embodiments with
various modifications as are suited to the particular use contemplated.