The Findability blog

the enterprise search and findability blog by Findwise

Main menu

Skip to primary content
Skip to secondary content
  • Home
  • About
  • Findwise.com

Author Archives: Daniel Ling

Open Source Tools for Text Analytics

Posted on March 21, 2011 by Daniel Ling
Reply

Recently, both clients of Findwise as well as the Enterprise Search community in general are increasingly showing interest in text analytics in order to get a higher business value out of their (often large) volumes of unstructured information.

Text Analytics merges techniques from linguistics, computer science, machine learning, statistics and many of the central algorithms in this field are publically available as open source tools and packages with easily accessible APIs. While many customers of commercial Enterprise Search solutions, such as Automomy, IBM Omnifind, Microsoft FAST ESP, etc., have long benefitted from some sort of Text Analytics (e.g. Entity Extraction, Keyword Extraction and document summarization), the open source components have now come a long way in providing alternative, free of charge solutions with similar performance and feature set.

As every modern enterprise search architecture today has some kind of document processing that is extensible by additional stages or APIs (for example the Open Pipeline with Solr or the pipeline that comes with Microsoft FAST) – the opportunity for plugging new text analytics stages to existing search implementations is open and ready for new innovation.

Among the most popular applications of text analytics that have emerged lately are customized entity extraction, sentiment analysis and document classification – each with a set of open source alternatives (such as Balie, OpenNLP and GATE) readily available for customization and implementation to your document processing.

Regardless of your industry domain, these techniques open up for a wide variety of new ways to interpret the content and discover new trends from your unstructured textual data – be it through sentiment analysis to support the decision making process, trend analysis or relevance model of search, or entity extraction in order to navigate your content by entities (such as company name or person), the enhancement of your texts by meta-data tagging or finding similar and related content.

How are you taking advantage of modern text analytics?

Posted in Data Processing, Open Pipeline, Open source, Search | Tagged Analytics, Apache Solr, Artificial intelligence, central algorithms, charge solutions, Computational linguistics, Data analysis, Data management, document processing, enterprise search architecture, Findwise, IBM, machine learning, Metadata, Microsoft, Named entity recognition, Natural Language Processing, Open Pipeline, Open source tools, Science, search implementations, Text analytics, Text mining | Leave a reply

Recent Posts

  • Semantic Annotation (how to make stuff findable, and more)
  • Building a chatbot – that actually works
  • Design Elements of Search – Zero Results Page
  • Design Elements of Search – Landing Page
  • Design Elements of Search – Results

Recent Comments

  • Fredric Landqvist on Tinkering with knowledge graphs
  • Harold Solbrig on Tinkering with knowledge graphs
  • Reflection, part 2 on Reflection… is like violence
  • Łukasz Wójcik on 3 easy ways to integrate external data sources with SharePoint Online
  • Gavin on 3 easy ways to integrate external data sources with SharePoint Online

Tags

Apache Software Foundation Apache Solr business intelligence content management systems Document Management System Enterprise Search Facebook findability Findwise Google Google Search Appliance Human-computer interaction IBM Index Information Information retrieval Information science internet search engines Intranet Knowledge representation Kristian Norling Lucene M&A Metadata Microsoft Microsoft SharePoint search analytics search engine search engines search experience Searching search platform search result search results search solution search solutions search technology SharePoint Social information processing Technical communication usability Web 2.0 web design Web search engine World Wide Web
Find us on Google+

Categories

Archives

Proudly powered by WordPress