That's right. In a way, Search is just like that. Crawling and indexing content, both good and bad, and trying its best to provide fast and relevant results. But, much like most scenarios, there are always trade-offs. In this case, it's usually relevancy vs. speed. Over the past major iterations of the Jive platform, Jive's engineers have maintained a constant focus on ways to keep delivering on faster and more relevant search results. Most recently in Jive 5, the search architecture was completely revamped to set the stage for conversations such as this one:

How can we make search relevancy better?

Problem: Search relevancy is often considered a qualitative metric. In order for Jive engineers to properly tune and tweak the search relevancy algorithms that ship with the product, we need use-cases that we can test against, but most important ... they need a common data-set, which is very difficult to share in most cases.

If only there was a way we could articulate use-cases on a common data-set.

If only feedback could be received prior to product release using a live system, and not mock data.

Solution: The Jive Community. It's a live Jive 5 instance with a large data sample, and houses many of the same use-cases customers see in their own instances.

Instructions on How to Help using the Jive Community:

We are asking customers to share their search relevancy grievances with Jive in this conversation, and asking you to provide the following detail(s), if available.

How are you searching? (@mentions, spotlight search, search page, other)

Also, specifics about the context. Where you searching across containers? Single Container? Filters? etc...

Example of the Search Results Returned

For example, providing a screenshot of the Jive Community results (redacted if needs be) is extremely helpful.

What is your "result sentiment"? Are the results returned: (fair, unexpected, right on, random)

Please also indicate if this is negative or positive search behavior.

Note: If a search use-case is already listed, please Like it so we can gauge the reach of a use-case.

How will this feedback be used?

For each piece of reproducible feedback that our engineers receive that is articulated in terms of the Jive Community, Jive will include the scenario into a suite of regression tests that Jive can use to validate future relevancy tuning initiatives.

If you have any questions about this effort, please feel free to ask either myself or Karl Rumelhart (who is leading this effort from Engineering). We look forward to your feedback.

3.) The name of the intended target is Business and should match on the Subject. Global search, no filters.

4.)

5.) Not ideal. There is nothing else for me to type to get a match. It would be ideal if exact matches were promoted.

Existing Workaround: If you find yourself in a similar situation, you can usually append an _ (underscore) as the next character and this will relay a stop word to the search engine, and it usually matches...but this is somewhat non-intuitive. The idea is to not have to leave the Context to do the search.

@lungarini is now working for every character from @lun thru @lungarini since upgrade of JC to 5.0.2

However as alluded to by Claire elsewhere, @gary_l thru @gary_lungarin doesnt find him - it's only on the last 'i' that he re-appears. Which I suspect maybe to do with "gar" recurring in the last name?

It also still seems critical that you know what separator they use, ie dot, underscore or nothing. It would be useful if you didn't have to guess which to use, and that you got result(s) even when using the wrong separator.

I'd like to see the Lucene search configured to weight results differently if the search term is found in the title, tags, or body of the content. Also the number of page views should be taken into account.

So content that has the term in the title and a greater number of page views wins out over content that has several hits of the search term within the body of the content and a lower number of page views.

Our example, a search for 'customer contact list' places a document titled 'customer contact list', with a huge number of views and tagged with customer contact, about 10th place in the rankings. Even putting peoples bookmarks of this content higher up in the ranking!

I just want to jump in here to emphasize how valuable it is for us to get this feedback. We are going to put considerable effort into search relevance work over the coming months but we need test cases! I know that many folks have examples in their own communities of searches that aren't giving them what they would like. If we can get equivalent examples in the Jive Community -- where we have access to the data set -- we can actually put tests in place where we tweak the search algorithms until they produce the desired results against the actual data set.

So please keep the examples coming!!

By the way, it is also great to get examples of searches that DO give what you want. When we make changes to search to improve in cases where it isn't working as well, we need to make sure that we don't start to do worse in places where it is working great. So if you do a search and get exactly what you want, let us know about that too.

By the way, one of the things that makes this topic so interesting is that there isn't always a clear right answer to what is the 'best' search results. When I search for jiveworld11 in the Jive Community I get

I am not sure that this is the best possible set of results. What do you think should come back first, or at least near the top, if we search for jiveworld11?

For Content Items, I would expect items to rank highest where the entire search string appears in the item's Title. Items where the entire string is in both the item's Title and its Tags would rate higher than where the string is found only in the item Title, and higher still would be items where the string is found in Title, Tags and Body. Where the string is found only in the item's Tags, it would rate lower than where found only within the item's Title, but higher than where found only in the item's Body. Title is more important than Tags, Tags are more important than Body; full matches are more important than partial matches. Where there's a tie, the rank goes to the item that the magic sauce has determined matters most to me.

For Places, Name is more important than Description, Description is more important than Tags.

For People, Name is most important, the Tags field is lowest rank among the Profile fields.

Sure. This makes sense as a general framework. The challenge comes when you need to make statements like "full matches are more important than partial matches" precise. This is where it really helps to have examples of searches against real data sets so the system can be tuned and tested. If you could try a few searches here in the JC and spot situations where results that you would like to see high up are actually way down in the list, capture them and we can use that in improving search.

By the way, if you haven't seen it already, here is a document with some info on how search works in 4.5 and 5.0.

I think it is really hard for us to give examples from the JC as many of us either don't do searches frequently here, or already have found workarounds that we don't even consciously recognize any longer.

What might work for you (as I know it has been done in the past) is to see if some of us who have already provided examples of search results would allow you to download and use our databases to test against. Jive did this in the past with us, and it did provide an excellent testbed for the developers and a great result for us. One advantage is that the content make-up can be very different for internal communities than for external. And having more than one DB to test against gives you a better possibility of seeing all the possible use cases, gotchas, etc.

My request is for the search system to actually return the number of search results, and preferably a breakdown of the amount of each content type within the search listing.

This is related to relevancy in the way that it makes it easier to tell how likely the search has the result I'm looking for. i.e if I get 25 results found, I know it's worth paging through a couple of pages to see if I can find what I'm looking for. If I get 2000 results then I know straight away that I should probably add some more terms to refine the search.

By having the content type filters show the result count breakdown I know that if I'm looking for a blog result and there are 10 blog results out of 1000 total, I can refine simply by clicking on blogs and likely find the result I need.

An example:

There were 250 results for your search 'jive'

documents (120)

discussions (25)

events (0)

status (12)

etc

etc

etc

Clicking on the filter name would filter by that content type, a check box would allow you to select multiple content types.

This is fairly standard functionality on most search systems and I've certainly never seen one that doesn't give you the amount of results that were found.

My request is for the search system to actually return the number of search results, and preferably a breakdown of the amount of each content type within the search listing.

A great question, actually. It would be awesome to show the number of results. This is something that those of us on the search team would really like to provide. But to be open about it, this is still a ways out. To give you some insight into why this is harder than it sounds, the challenge is related to permissions. We know how many items match the query. What is harder is to show how many items match the query that you have permission to see.

I know the back ground for the reasons, I just hoped that the archtiecture issues might have been addressed, or at least planned with some more urgency. This affects paging on all the widgets and browse all pages etc.

I actually wrote a proof of concept that sent the lucene results and passed those to the DB to do a nice set based join with the permissions tables. Worked quite well and pretty fast.

We've also started indexing a flag to say if the content is 'public'. This allows us to do a simple search and know that all the results are valid for a user who isn't logged in, we use this to provide an api style feed to our other websites so they can see if there is related community content.

I've been trying to think of some sort of hash or other encoding you could index in lucene that would convey the permissions somehow. I would expect in most setups any particular user wouldn't have specific access to such a great number of containers. I'll give this some more thought.

Another exact match candidate in spotlight, "Admin Essentials Plugin" result #1 (awesome) ... in @mention, not on the page. It has been there for a while now, so not sure why it has been bumped. But exact matches should be #1 result...general feedback. =)

I believe that Jive already has a synonym dictionary that as admins, we are able to modify. Perhaps making this something that more people can easily add to would assist in the search relevancy? For example, there is a group or space admin doing a search. The first search doesn't return the item they want, so they try something different. If at that point, they could mark the word they used to find the document as a synonym to the first word they tried, that might help other people get better results. Or as a sys admin, I could be more proactive about adding to the synonyms because it is in the workflow (front end, easy) instead of me having to bounce over to the Admin Console, remember where the dictionary was, etc.

In order for Jive engineers to properly tune and tweak the search relevancy algorithms that ship with the product, ...

Allow the admins to modify the factors which have influence on relevancy.

I do want only one result page which contains content, people and places. With an option to drill down / cluster into threads, documents, people, ...

Why "only" one page with all results? Look up the phone number of someone. Search for "CTO" or "Matt Tucker" and expect as the first result a link to Matt Tucker. Of course there's no phone number on the profile/summary page but internal instances may show one. Currently one does need to click on "people" to see it. So one would save one click looking up a phone number.

one thing I've noticed/observed has to do with search relevancy being impacted by the author being disabled or not. It appears that if a user is disabled, that search relevancy either ignores content matches or applies at least a different path to relevance. Regardless of a person's status (enabled/disabled), content should be found the same. Dont have a strong use-case on this one, but pretty sure that this can be replicated.

That's an important catch, Ryan, because there are so many times (at least for internal communities) that content authors leave the company but the content is still relevant. And since we can't reassign the content to another person, we need to make sure the relevancy doesn't die when the user is disabled.

In many cases the content is not the exact answer. It may be a partial answer or simply lead one to believe that this person is an SME. Without being able to indicate a new owner the trail ends. Unless we choose not to support a subject area we shouldnt reduce relevance, we should allow the user to pick up the trail.

Exactly why the relevance of the content shouldn't die. If you can still see the content, you can see others who may have subsequently edited the document, or commented on it. That is part of the trail.

This is a great thread. It looks like you are constantly improving the search relevance logic. (including things like # of times the search criteria are in the title, content and tags, # of views, rating?, documents higher than blogs?, other?)

Can someone provide me with a "user friendly" explanation on how Jive 5.0 search relevance works?

Karl, Ryan - I was just responding to something else and an idea struck me. What if there was a tool that would allow either a sysadmin or a content creator to check the findability of their document? There are SEO tools that help you check whether you've done a good job optimizing an article for certain keywords. Something similar for content in Jive would be a win-win!

As a content author, I could validate whether or not I was likely to reach my audience.

If the content author didn't bother to do that and there were complaints, the sysadmin could check to see why the content wasn't being found by the audience.

The tool should not just say whether or not the piece of content would be found using the keyword, but offer suggestions as to how to improve the relevance. Because each Jive instance is closed, you could even offer suggestions for improvement based on the existing body of content. For example, if I asked whether X document would be found if I searched for "keyword", the result could say, "This piece of content would be ranked #25 for relevance. Your selected keyword is mentioned in the content body once. Increase the number of mentions to 10 for it to reach the top 5 results."

"each Jive instance is closed" - What does this mean? There are a lot of public instances.

Adding hints will either confuse the user or lead to silly documents. If the document does contain "example" only one time in the body then it is likely not as relevant as other documents. If users try to push their documents by adding random or useless keywords nothing is won.

Most content authors do not know what search terms the users do enter, so it can be really hard to use the right keywords. If the users search for "paradigm" instead of "example" then the document will likely never be found. Unless the list of synonyms contains both words.

Checking whether the title matches the body and the tags/keywords would be a great help.

I meant that each Jive instance is a relatively fixed body of content. It isn't like trying to provide suggestions based on the entire internet, just your own community. And so the suggestions for my community are going to be different than they are in your community, etc.

And to your point about "users trying to push content by adding random keywords" - in our experience, Jive's search doesn't work that way. So, for example, we have people who are writing a document about their HR department contacts. The use the term "human resources" several times in the document and then call me wanting to know why, when people search for "hr contacts", the document is not found. The tool I describe would help them see that they have only used the term "hr" once. So to optimize the content for a search on "hr", they need to reference that specific term more often within the body of the document itself.

One use case that I see is showing results if a configurable percentage of the terms show up in the title, body, tags vs all of them needing to be present. As a real world example if my search string was "configure xauth vpn on router 2055" and some content lived in Jive that contained vpn, 2055, and xauth I would want that to show up in the results. I see the value in having AND be the default logic as it gets you better results, but it feels like the more terms you search with the better chances you have to miss relevant content. Our search admin uses a Solr parameter (on our non-Jive website) to set configurable values like if 2 out of 3 or 3 out of 5 terms are tied to the content throw it in the results. It would be nice if we could tap into that somehow from the admin panel, or give people who are Lucene ninjas access to set things like that and other parameters that can alter the "under the hood" operation of Lucene/Solr within Jive.

It would be nice if we could tap into that somehow from the admin panel, or give people who are Lucene ninjas access to set things like that and other parameters that can alter the "under the hood" operation of Lucene/Solr within Jive.

Is this possible to do in any way? I've been requested to improve search results in our Jive instance and am trying to ascertain how much leeway we have. Also, I'd like to know how to tell whether the same changes could be applicable in Jive 6 (we're on Jive 5) or if it has changed so much that any parameter setting from 5 would be rendered useless. Ryan Rutan, any ideas who might be able to answer this?

As you look ahead to Jive 6 and the progress of social search, I honestly believe a lot of these types of optimizations start to become cost ineffective. There is only so much Lucene/SOLR can do with arbitrary data about content. I strongly believe that the future of search lies with the incorporation of personal context, and while Jive 6 has social search ... I feel as though there is ample road ahead to make is even better.

As for people who might be worth talking to, I'd reach out to Nigel Daley to see if he has anyone on the search team that might have some opinions. Does that help at all?

One of my biggest frustrations (at least in our 4.5.6.3 instance) is the apparent lack of weighting on title/subject. I don't know how many times I've used quick search to try to find a document using words in the title, and have a list of quick search results appear that have no visible relevance to the words I typed. I KNOW the document has "Academy Feb Meeting" as three of the four words in the title. Why doesn't it appear in the quick hit list????

I try and find similar results from this community and post (or maybe it is better in 5.0?)

A one-line document (a placeholder, containing no useful information) in our internal Jive instance was returned first in the search results for a particular term, despite having the lowest star-rating, no "like"s and no tags. Someone added a comment to the document complaining about this, and now the comment is the top search result!

To the user who pointed this out, I responded that the 13-word document contains 4 words that are in the search term and this high concentration may contribute to the document being highly ranked. Also hardly any docs on our site are rated, so ratings probably don't contribute much to rankings. His reply was very interesting and I hope it will help to improve search relevancy:

I don’t think concentration alone should be the measure of quality. If you tell me you’ve got two articles which contain the search term once, one 20 characters long and the other 200, I’m pretty sure the 20 character article is going to be useless.

This algorithm also favours comments over articles. [User cites a second example from our internal Jive site, where] a comment on the article that I am looking for is ranked higher than the original article. Both have one instance of the string "[search term]" (although my article also has an attachment called "[search term]"). I don’t think that is correct.

I think there are a couple things that could be changed to improve the results:

prioritise articles over comments (I’m not sure I ever want to see comments in my results, but I appreciate that some might feel differently)

include absolute article size as another parameter, perhaps with an asymptotic value function so that this only affects small entries (this might be sufficient to solve the point above)

use rating as a search parameter (if you build it, they will come – this could be a good way of ensuring that good content bubbles up to the top of search results).

4. Spotlight results after typing "search" (without quotes) includes one of the messages in the thread I was looking for (ranks it #1).

Results after typing "search engine" did not include anything from the thread I was looking for.

Results at the end of typing "search engine ranking" included a different message from the thread I was looking for.

5. My reaction to these results? At 2 out of 3 stages I was provided with a link that would have got me to the right place, so on the whole not bad.

How could this search have gone better? Firstly there's no point including messages in results. Better to link to the top of the thread, because people usually need to read the whole thing anyway to understand the context of the message that the search returned. Secondly why did the correct result disappear and then reappear? Some of our users have noticed this tendency and it's unnerving. (On 4.5, just typing a space makes a difference - but on the Jive community it doesn't, so I assume that changed in Jive 5.)

A related and even stranger case is the behaviour of the search box for choosing where to move content to on Jive 4.5. Start typing the name of a space that you know exists, and in the beginning it returns reasonable results that include the space. However if you continue typing the space name and it contains a short word (e.g. "for", "and") there are suddenly no results displayed at all! Finish typing the name, and the correct space reappears. (Does this still happen on Jive 5?)

Thanks for the post. This is exactly the sort of example that is super helpful!

Let me make a couple comments that may be useful in interpreting what could be going on.

First, regarding returning replies or comments rather than the main document: as you can imagine, often a particular search query will return multiple results from the same thread. Rather than returning a ton of closely related results, the system tries to decide which is the "closest" result from the thread and only display that one. Always returning the parent would be a reasonable policy, but currently the highest relevance result is the one selected.

Regarding the issue of correct results appearing and then disappearing: there have been reports of this, largely in the context of at-mention. I believe that what is happening is the following: every time you type a character, a search query is issued. E.g. if you at-mention mike first there is a query for m* then for mi* then for mik* and so forth. As it happens, smaller queries are actually much more costly for the search infrastructure so it can happen that a search for m* actually takes much longer than a search for mike* and, especially if you are typing fast, the results actually come back out of order. Unfortunately, there is a problem with the UI where it doesn't properly account for that possibility and so it can happen that one of the older queries will overwrite a newer one. This is supposed to be handled better in an upcoming maintenance release. Of course, I don't know if this is necessarily what is going on in your case, but it would in interesting if you can play around with it a bit and see if the phenomena you experience is consistent with this hypothesis.

I'm pretty sure the UI and speed are not issues here. I've been intentionally doing these spotlight searches very slowly and pausing between words, in order to see changes in the results list.

I think the issue is that the result I'm looking for is disappearing because it just doesn't rank highly for that particular intermediate search term. If that's true, then the real question is whether the spotlight search can be based on some sort of cumulative ranking system, so that top results for (say) the first word are retained (even if they slip down the list a little) when the second word is typed.

One situation I find difficult is when I'm searching for a top level space like ebusiness, but in the spotlight search I get results like ebusiness product management, ebusiness XYZ project and other groups that have ebusiness and some modifier behind it. But, just plain ebusiness doesn't show up. When I hit enter and look at the full results, I have to go to the places tab and scroll down quite a ways to find it.

It would be fine if the ebusiness product management group didn't show up; I could add the words "product management" in my search to find it. But there's nothing I can do with just "ebusiness." I think groups and spaces like this should appear higher up in the results.

I agree that it can become really difficult to find a place (or document) that has a "root" name.

FWIW, I have found that for places, if I add "group" or "space" to my search, then I can find the place I'm looking for. And while this does work, I would prefer it it was addressed by the search team. Me teaching my community members to search this way is not a viable solution. Especially since most of them don't even notice that there is a difference between a group and a space.

Thanks for the use case! As usual, Tracy is able to come up with clever work-arounds but as she says this is clearly not something we should need to train users around. An exact match should get the result highest.

The search engine should learn. Or at least remember the last 5 search terms and the last result which was clicked. So if one searches for "Jive 6" and clicks Jive 6 Has Arrived then the next search (likely one other day) should return Jive 6 Has Arrived as the first result.

Of course it would be better if it would learn and remember which result was clicked for a specific search term, both user specific and also for the whole community using different weights.

Configure your Jive Community experience by selecting a track. We'll use this track on the homepage to show you relevant content and help you find resources quickly. You can change your track easily from the homepage or your profile.

Community Management

Technology Management

Business Strategy

Developer Topics

Partner Topics

General User Topics

As a community manager, you're an ambassador for your Jive community as you build places, curate content and engage with fellow community members. To help you go further with your site, we'll share success stories and other resources.

Whether considering a new purchase or working on an upgrade, technology managers need insight on the best ways to implement a community and learn more about the various upgrades in software releases.

An effective community starts with sound business strategy. As your community matures, you'll want to learn and share best practices for implementation and continued success.

Whether you're a developer, system administrator or a designer, you need insight on building a great user experience for your Jive community. Get the scoop on theming, API's, upgrades and more.

We've got a special area for partners to get essential information and best practices they need for describing and selling Jive to potential customers.

Whether or not you have a specific role in your Jive community, this track highlights areas of interest to Jive users such as training materials, community best practices and an opportunity to network with other customers.