the enterprise search and findability blog by Findwise

Main menu

Google released Google Search Appliance, GSA 7.0, in early October. Magnus Ebbesson and I joined the Google hosted pre sales conference in Zürich where we had some of the new functionality presented and what the future will bring to the platform. Google is really putting an effort into their platform, and it gets stronger for each release. Personally I tend to like hardware and security updates the most but I have to say that some of the new features are impressive and have great potential. I have had the opportunity to try them out for a while now.

In late November we held a breakfast seminar at the office in Gothenburg where we talked about GSA in general with a focus on GSA 7.0 and the new features. My impression is that the translate functionality is very attractive for larger enterprises, while the previews brings a big wow-factor in general. The possibility of configuring ACLs for several domains is great too, many larger enterprises tend to have several domains. The entity extraction is of course interesting and can be very useful; a processing framework would enhance this even further however.

It is also nice to see that Google is improving the hardware. The robustness is a really strong argument for selecting GSA.

It’s impressive to see how many languages the GSA can handle and how quickly it performs the translation. The user will be required to handle basic knowledge of the foreign language since the query is not translated. However it is reasonably common to have a corporate language witch most of the employees handle.

The preview functionality is a very welcome feature. The fact that it can highlight pages within a document is really nice. I have played around to use it through our Jellyfish API with some extent of success. Below are two examples of usage with the preview functionality.

A few thoughts

At the conference we attended in Zürich, Google mentioned what they are aiming to improve the built in template in the GSA. The standard template is nice, and makes setting up a decent graphical interface possible for almost no cost.

My experience is however that companies want to do the frontend integrated with their own systems. Also, we tend to use search for more purposes than the standard usage. Search driven intranets, where you build intranet sites based on search results, is an example where the search is used in a different manner.

A concept that we have introduced at Findwise is search as a service. It means that the search engine is a stand-alone product that has APIs that makes it easy to send data to it and extract data from it. We have created our own APIs around the GSA to make this possible. An easy way to extract data based on filtering of data is essential.

What I would like to see in the GSA is easier integration with performing search, such as a rest or soap service for easy integration of creating search clients. This would make it easier to integrate functionality, such as security, externally. Basically you tell the client who the current user is and then the client handles the rest. It would also increase maintainability in the sense of new and changing functionality does not require a new implementation for how to parse the xml response.

I would also like to see a bigger focus of documentation of how to use functionality, previews and translation, externally.

Final words

My feeling is that the GSA is getting stronger and I like the new features in GSA 7.0. Google have succeeded to announce that they are continuously aiming to improve their product and I am looking forward for future releases. I hope the GSA will take a step closer to the search as a service concept and the addition of a processing framework would enhance it even further. The future will tell.

Google has released yet another version of the Google Search Appliance (GSA). It is good to see that Google stay active when it comes to improving their enterprise search product! Below is a list of the new features:

Dynamic navigation for secure search

The facet feature, new since 6.8, is still being improved. When filters are created, it is now possible to take in account that they only include secure documents, which the user is authorized to see.

Nested metadata queries

In previous Search Appliance releases there were restrictions for nesting meta tags in search queries. In this release many of those restrictions are lifted.

LDAP authentication with Universal Login

You can configure a Universal Login credential group for LDAP authentication.

Index removal and backoff intervals

When the Search Appliance encounters a temporary error while trying to fetch a document during crawl, it retains the document in the crawl queue and index. It schedules a series of retries after certain time intervals, known as “backoff” intervals. This before removing the URL from the index.

An example when this is useful is when using the processing pipeline that we have implemented for the GSA. GSA uses an external component to index the content, if that component goes down, the GSA will receive a “404 – page does not exist” when trying to crawl and this may cause mass removal from the index. With this functionality turned on, that can be avoided.

Specify URLs to crawl immediately in feeds

Release 6.12 provides the ability to specify URLs to crawl immediately in a feed by using the crawl-immediately attribute. This is a nice feature in order to prioritise what needs to get indexed quickly.

X-robots-tag support

The Appliance now supports the ability to exclude non-html documents by using the x-robots-tag. This feature opens the possibility to exclude non-html documents by using the x-robots-tag.

To stay in the front edge within search technology, Findwise has a focus on research, both in the form of larger research projects and with different thesis projects. Mohammad Shadab and I just finished our thesis work at Findwise, where we have explored an idea of search user interfaces which we call search driven portals. User interfaces are mostly based on analysis of a smaller audience but the final interface is then put in production which targets a much wider range of users. The solution is in many cases static and cannot easily be changed or adapted. With Search driven portals, which is a portlet based UI, the users or administrators can adapt the interface specially designed to fulfill the need for different groups. Developers design and develop several searchlets (portlets powered by search technology), where every searchlet provides a specific functionality such as faceted search, results list, related information etc. Users can then choose to add the searchlets with functionality that suits them into their page on a preferred location. From architectural perspective, searchlets are standalone components independent from each other and are also easy to reuse.

Such functionality includes faceted search which serves as filters to narrow a search. These facets might need to be different based on what kind of role, department or background users have. Developers can create a set of facets and let the users choose the ones that satisfy their needs. Search driven portals is a great tool to make sure that sites don’t get flooded with information as new functionalities are developed. If a new need evolves, or if the provider comes with new ideas, the functionality is put into new searchlets which are deployed into the searchlet library. The administrator can broadcast new functionality to users by putting new searchlets on the master page, which affects every user’s own site. However, the users can still adjust new changes by removing the new functionality provided.

Search driven portals opens new ways of working, both in developer and usage perspective. It is one step away from the one size fits all concept, which many sites is supposed to fulfill. Providers such as Findwise can build a large component library which can be customized into packages for different customers. With help of the searchlet library, web administrators can set up designs for different groups, project managers can set up a project adjusted layout and employees can adjust their site after their own requirements. With search-driven portals, a wider range of users needs can more easily be covered.