Today the focus of my sessions have been content indexing and content processing in both SharePoint 2010 and FAST Search for SharePoint. But first I started of with a session covering the new metadata focus in SharePoint 2010.
Metadata is information about the content and is of key importance for making a good search experience. In SharePoint 2010 the focus and this has increased drastically. New features is Term store which a service that can contain taxonomies, folksonomies, social tagging and keywords. Through this feature they make meta data to accessible through out the whole system on all levels of item creation. Having a structured way of working with meta data will drastically increase the quality of the search result.
Now back to the fun technical geek side of these sessions. In SharePoint 2010 Microsoft have introduced a lot of improvements to the indexing side search. First of they have aliened the two versions of search into using the same connectors. Both FAST Search and SharePoint use a common set of connectors and a common way of building new ones. With this you can use systems as BDC to create connectors even from the application SharePoint Designer which is an extremely simple to use application. BDC, which was found in 2007 as well, has though been enveloped with new features like full security support and support for creating connectors in .NET. This making it easy and streamlined to create new connectors for indexing all kinds of systems.
One of the strengths in FAST Search for SharePoint has always been the document processing. This is a feature that SharePoint search is lacking and is probably an important thing why Microsoft bought FAST. In FAST Search for SharePoint they have taken this in to SharePoint to be easily managed and streamlined. Processing as for example entity extraction, lemmatisation and advanced language detection is now done automatically and can be configured through adding for example inclusion/exclusion words for entity extraction straight in UI (can be manged through PowerShell as well).
But what about all those custom pipeline stages that was used to be a large part of en ESP configuration before? This is a function that is not done as before. No python coded pipeline stage can be added however what you now can do is that you add a “extensibility pipeline stage”. This stage can then be configured to call an .NET application with a set of input properties and then a list of returning properties. In this way you can basically do what ever you want with the text content and then do it with the full power of .NET. Some nice side effects of this for us developers is that creating pipeline stages in the past has always been a hussel. Both since it has bin done in python and that testing it had either demanded an hard to setup instance of Eclipse and ESP or to try it out live in ESP. In the new system since it actually are small console applications that is running this can easily be tested stand alone with good debugging through Visual Studio.
Tomorrow is actually the last day of this conference. That they will for me focused on partner events that covers more the sales perspective on all the new things in SharePoint and FAST.
But now its time for some relaxing and then its time for Enterprise Search evening by the pool event.