Recently, both clients of Findwise as well as the Enterprise Search community in general are increasingly showing interest in text analytics in order to get a higher business value out of their (often large) volumes of unstructured information.
Text Analytics merges techniques from linguistics, computer science, machine learning, statistics and many of the central algorithms in this field are publically available as open source tools and packages with easily accessible APIs. While many customers of commercial Enterprise Search solutions, such as Automomy, IBM Omnifind, Microsoft FAST ESP, etc., have long benefitted from some sort of Text Analytics (e.g. Entity Extraction, Keyword Extraction and document summarization), the open source components have now come a long way in providing alternative, free of charge solutions with similar performance and feature set.
As every modern enterprise search architecture today has some kind of document processing that is extensible by additional stages or APIs (for example the Open Pipeline with Solr or the pipeline that comes with Microsoft FAST) – the opportunity for plugging new text analytics stages to existing search implementations is open and ready for new innovation.
Among the most popular applications of text analytics that have emerged lately are customized entity extraction, sentiment analysis and document classification – each with a set of open source alternatives (such as Balie, OpenNLP and GATE) readily available for customization and implementation to your document processing.
Regardless of your industry domain, these techniques open up for a wide variety of new ways to interpret the content and discover new trends from your unstructured textual data – be it through sentiment analysis to support the decision making process, trend analysis or relevance model of search, or entity extraction in order to navigate your content by entities (such as company name or person), the enhancement of your texts by meta-data tagging or finding similar and related content.
How are you taking advantage of modern text analytics?