Findwise releases Open Pipeline Plugins

Findwise is proud to announce that we now have released our first publicly available plugins to the Open Pipeline crawling and document processing framework. A list of all available plugins can be found on the Open Pipeline Plugins page and the ones Findwise have created can be downloaded on our Findwise Open Pipeline Plugins page.

OpenPipeline is an open source software for crawling, parsing, analyzing and routing documents. It ties together otherwise incomplete solutions for enterprise search and document processing. OpenPipeline provides a common architecture for connectors to data sources, file filters, text analyzers and modules to distribute documents across a network. It includes a job scheduler and a full UI with a point-and-click interface.

Findwise have been using this framework in a number of customer projects with great success. It ties particularly good together with Apache Solr, not only because it is open source but most importantly because it fills a hole in functionality that Solr lacks – an easy to use framework for developing document processors and connectors. However we are not using this for Solr only, a number of plugins for the Google Search Appliance have also been made and we have started investigating how Open Pipeline can be integrated with the IBM Omnifind search engine as well.

The best thing with this framework is that it is very flexible and customizable but still easy to use AND, maybe most importantly for me as a developer, easy to work with and develop against. It has a simple yet powerful enough API to handle all that you need. And because it is an open source framework any shortcomings and limitations that we find along the way can be investigated in detail and a better solution can be proposed to the Open Pipeline team for inclusion in future releases.

We have in fact already contributed to the development of the project in a great deal by using it, testing it and by reporting bugs and suggested improvements on their forums. And the response from the team has been very good – some of our suggested improvements have already been included and some are on the way in the new 0.8 version. We are also in the process of further deepening the collaboration by signing a contributors agreement so that we eventually can be able to contribute with code as well.

So how do our customers benefit from this?

First it makes us develop and deliver search and index solutions more quickly and of better quality to our customers. This is because more developers can work with the same framework as a base and the overall code base will be used more, tested more and is thus of better quality. We have also the possibility to reuse good and well tested components so that several customers together can share the costs of development and thus get a better service/product for less money which is always a good thing of course!

Interesting New Search Features

Out on the web there are a large number of small search engines that try to stand out and maybe take some of the market shares from Google. Many of them have interesting search features.

I would like to introduce some of them in order to help other realize that search can (and should) be a bit more then a search bar and a list of hits. A number of these alternative search engines have focused on the visual presentation of the search result in interesting ways. For example the search engine quintura uses tag clouds of related terms and concepts to the original query.

A slightly different approach has been taken by mnemomap and webbrain that presents related concepts in a graph instead. The other part is to visually show the divisions of the search results into different categories so they can easily be navigated through but also to give a quick overview of the subject, examples of that can be seen at e.g. mooter and kooltorch. Finally I would also like to mention kartOO that have, in my opinion, gone one step further and even presents the links to the search results with images and icons.

In conclusion one can say that the ability to graphically visualize the search result so that it is possible to get a quick overview of a particular subject can prove to be a very important feature in future search solutions. It would not only help users find what they want to know, but also help them get a better and wider understanding of a particular subject, without forcing them to read through a large chunk of (hopefully) relevant text.

The search result and related concepts can be presented graphically instead. That will also take advantage of the fact that people can take in a lot more information through an image then by reading text. Further it can help the user to easily see if he or she is on the right track and make possible refinements to the query even before any returned document has been read through, thus saving valuable time, which today is more important then ever.