Using log4j in Tomcat and Solr and How to Make a Customized File Appender

This article shows how to use log4j for both tomcat and solr, besides that, I will also show you the steps to make your own customized log4j appender and use it in tomcat and solr. If you want more information than is found in this blogpost, feel free to visit our website or contact us.

Default Tomcat log mechanism

Tomcat by default uses a customized version of java logging api. The configuration is located at ${tomcat_home}/conf/logging.properties. It follows the standard java logging configuration syntax plus some special tweaks(prefix property with a number) for identifying logs of different web apps.

An example is below:

handlers = 1catalina.org.apache.juli.FileHandler, 2localhost.org.apache.juli.FileHandler, 3manager.org.apache.juli.FileHandler, 4host-manager.org.apache.juli.FileHandler, java.util.logging.ConsoleHandler

.handlers = 1catalina.org.apache.juli.FileHandler, java.util.logging.ConsoleHandler

1catalina.org.apache.juli.FileHandler.level = FINE

1catalina.org.apache.juli.FileHandler.directory = ${catalina.base}/logs

1catalina.org.apache.juli.FileHandler.prefix = catalina.

2localhost.org.apache.juli.FileHandler.level = FINE

2localhost.org.apache.juli.FileHandler.directory = ${catalina.base}/logs

2localhost.org.apache.juli.FileHandler.prefix = localhost.

Default Solr log mechanism

Solr uses slf4j logging, which is kind of wrapper for other logging mechanisms. By default, solr uses log4j syntax and wraps java logging api (which means that it looks like you are using log4j in the code, but it is actually using java logging underneath). It uses tomcat logging.properties as configuration file. If you want to define your own, it can be done by placing a logging.properties under ${tomcat_home}/webapps/solr/WEB-INF/classes/logging.properties

Switching to Log4j

Log4j is a very popular logging framework, which I believe is mostly due to its simplicity in both configuration and usage. It has richer logging features than java logging and it is not difficult to make an extension.

Log4j for tomcat

  1. Rename/remove ${tomcat_home}/conf/logging.properties
  2. Add log4j.properties in ${tomcat_home}/lib
  3. Add log4j-xxx.jar in ${tomcat_home}/lib
  4. Download tomcat-juli-adapters.jar from extras and put it into ${tomcat_home}/lib
  5. Download tomcat-juli.jar from extras and replace the original version in ${tomcat_home}/bin

(extras are the extra jar files for special tomcat installation, it can be found in the bin folder of a tomcat download location, fx. http://archive.apache.org/dist/tomcat/tomcat-6/v6.0.33/bin/extras/)

Log4j for solr

  1. Add log4j.properties in ${tomcat_home}/webapps/solr/WEB-INF/classes/ (create classes folder if not present)
  2. Replace slf4j-jdkxx-xxx.jar with slf4j-log4jxx-xxx.jar in ${tomcat_home}/webapps/solr/WEB-INF/lib (which means switching underneath implementation from java logging to log4j logging)
  3. Add log4jxxx.jar to ${tomcat_home}/webapps/solr/WEB-INF/lib

Make our own log4j file appender

Log4j has 2 types of common fileappender:

  • DailyRollingFileAppender – rollover at certain time interval
  • RollingFileAppender – rollover at certain size limit

And I found a nice customized file appender:

  •  CustodianDailyRollingFileAppender online.

I happen to need a file appender which should  rollover at certain time interverl(each day) and backup earlier logs in backup folder and get zipped. Plus removing logs older than certain days. CustodianDailyRollingFileAppender already has the rollover feature, so I decide to start with making a copy of this class,

Parameters

Besides the default parameters in DailyRollingFileAppender, I need 2 more parameters,

Outdir – backup directory

maxDaysToKeep – the number of days to keep the log file

You only need to define these 2 parameters in the new class, and add get/set methods for them (no constructor involved). The rest will be handled by log4j framework.

Logging entry point

When there comes a log event, the subAppend(…) function will be called, inside which a super.subAppend(event); will just do the log writing work. So before that function call, we can add the mechanism for back up and clean up.

Clean up old log

Use a file filter to find all log files start with the filename, delete those older than maxDaysToKeep.

Backup log

Make a separate Thread for zipping the log file and delete original log file afterwards(I found CyclicBarrier very easy to use for this type of wait thread to complete task, and a thread is preferable for avoiding file lock/access ect. problems). Call the thread at the point where current log file needs to be rolled over to backup.

Deploy the customized file appender

Let’s say we make a new jar called log4jxxappender.jar, we can deploy the appender by copying the jar file to ${tomcat_home}/lib and in ${tomcat_home}/webapps/solr/WEB-INF/lib

Example configuration for solr,

log4j.rootLogger=INFO, solrlog

log4j.appender.solrlog=com.findwise.xx.log4j.fileappender.YyRollingFileAppender

log4j.appender.solrlog.File=${catalina.home}/logs/solr.log

log4j.appender.solrlog.Append=true

log4j.appender.solrlog.Encoding=UTF-8

log4j.appender.solrlog.DatePattern='.'yyyy-MM-dd

log4j.appender.solrlog.MaxDaysToKeep=10

log4j.appender.solrlog.Outdir=${catalina.base}/logs/backup

log4j.appender.solrlog.layout=org.apache.log4j.PatternLayout

log4j.appender.solrlog.layout.ConversionPattern = %d [%t] %-5p %c - %m%n

Solr.war

Last thing to remember about solr is to zip the deployment folder ${tomcat_home}/webapps/solr and rename the zip file solr.zip to solr.war. Now you should have a log4j enabled solr.war file with your customized fileappender.

Want more information, have further questions or need help? Stop by our website or contact us!

Phonetic Algorithm: Bryan, Brian, Briane, Bryne, or … what was his name again?

Let the spelling loose …

What do Callie and Kelly have in common (except for the double ‘l’ in the middle)? What about “no” and “know”, or “Ceasar’s” and “scissors” and what about “message” and “massage”? You definitely got it – Callie and Kelly, “no” and “know”, “Ceasar’s” and “scissors” sound alike, but are spelled quite differently. “message” and “massage” on the other hand differ by only one vowel (“a” vs “e”) but their pronunciation is not at all the same.

It’s a well known fact for many languages that ortography does not determine the pronunciation of words. English is a classic example. George Bernard Shaw was the attributed author of “ghoti” as an alternative spelling of “fish”. And while phonology often reflects the current state of the development of the language, orthography may often lag centuries behind. And while English is notorious for that phenomenon it is not the only one. Swedish, French, Portuguese, among others, all have their ortography/pronunciation discrepancies.

Phonetic Algorithms

So how do we represent things that sound similar but are spelled different? It’s not trivial but for most cases it is not impossible either. Soundex is probably the first algorithm to tackle this problem. It is an example of the so called phonetic algorithms which attempt to solve the problem of giving the same encoding to strings which are pronounced in a similar fashion. Soundex was designed for English only but has its limits. DoubleMetaphone (DM) is one of the possible replacements and relatively successful. Designed by Lawrence Philips in the beginning of 1990s it not only deals with native English names but also takes proper care of foreign names so omnipresent in the language. And what is more – it can output two possible encodings for a given name, hence the “Double” in the naming of the algorithm, – an anglicised and a native (be that Slavic, Germanic, Greek, Spanish, etc.) version.

By relying on DM one can encode all the four names in the title of this post as “PRN”. The name George will get two encodings – JRJ and KRK, the second version reflecting a possible German pronunciation of the name. And a name with Polish origin, like Adamowicz, would also get two encodings – ATMTS and ATMFX, depending on whether you pronounce the “cz” as the English “ch” in “church” or “ts” in “hats”.

The original implementation by Lawrence Philips allowed a string to be encoded only with 4 characters. However, in most subsequent
implementations of the algorithm this option is parameterized or just omitted.

Apache Commons Codec has an implementation of the DM among others (Soundex, Metaphone, RefinedSoundex, ColognePhonetic, Coverphone, to
name just a few.) and here is a tiny example with it:

import org.apache.commons.codec.language.DoubleMetaphone;

public class DM {

public static void main(String[] args) {

String s = "Adamowicz";

DoubleMetaphone dm = new DoubleMetaphone();

// Default encoding length is 4!

// Let's make it 10

dm.setMaxCodeLen(10);

System.out.println("Alternative 1: " + dm.doubleMetaphone(s) +

// Remember, DM can output 2 possible encodings:

"nAlternative 2: " + dm.doubleMetaphone(s, true));

}
}

The above code will print out:

Alternative 1: ATMTS

Alternative 2: ATMFX

It is also relatively straightforward to do phonetic search with Solr. You just need to ensure that you add the phonetic analysis to a field which contains names in your schema.xml:

Enhancements

While DM does perform quite well, at first sight, it has its limitations. We should know that it still originated from the English language and although it aims to tackle a variety of non-native borrowings most of the rules are English-centric. Suppose you work on any of the Scandinavian languages (Swedish, Danish, Norwegian, Icelandic) and one of the names you want to encode is “Örjan”. However, “Orjan” and “Örjan” get different encodings – ARJN vs RJN. Why is that? One look under the hood (the implementation in DoubleMetaphone.java) will give you the answer:

private static final String VOWELS = "AEIOUY";

So the Scandinavian vowels “ö”, “ä”, “å”, “ø” and “æ” are not present. If we just add these then compile and use the new version of the DM implementation we get the desired output – ARJN for both “Örjan” and “Orjan”.

Finally, if you don’t want to use DM or maybe it is really not suitable for your task, you still may use the same principles and create your own encoder by relying on regular expressions for example. Suppose you have a list of bogus product names which are just (mis)spelling variations of some well known names and you want to search for the original name but get back all ludicrous variants. Here is one albeit very naïve way to do it. Given the following names:

CupHoulder

CappHolder

KeepHolder

MacKleena

MackCliiner

MacqQleanAR

Ma’cKcle’an’ar

and with a bunch of regular expressions you can easily encode them as “cphldR” and “mclnR”.

String[] ar = new String[]{"CupHoulder", "CappHolder", "KeepHolder",
"MacKleena", "MackCliiner", "MacqQleanAR", "Ma'cKcle'an'ar"};

for (String a : ar) {
a = a.toLowerCase();
a = a.replaceAll("[ae]r?$", "R");
a = a.replaceAll("[aeoiuy']", "");
a = a.replaceAll("pp+", "p");
a = a.replaceAll("q|k", "c");
a = a.replaceAll("cc+", "c");
System.out.println(a);
}

You can now easily find all the ludicrous spellings of “CupHolder” och “MacCleaner”.

I hope this blogpost gave you some ideas of how you can use phonetic algorithms and their principles in order to better discover names and entities that sound alike but are spelled unlike. At Findwise we have done a number of enhancements to DM in order to make it work better with Swedish, Danish and Norwegian.

References

You can learn more about Double Metaphone from the following article by the creator of the algorithm:
http://drdobbs.com/cpp/184401251?pgno=2

A German phonetic algorithm is the Kölner Phonetik:
http://de.wikipedia.org/wiki/Kölner_Phonetik

And SfinxBis is a phonetic algorithm based on Soundex and is Swedish specific:
http://www.swami.se/projekt/sfinxbis.68.html

Search and Accessibility

Västra Götalands regionen has introduced a new search solution that Findwise created together with Netrelations. Where both search and accessibility is important. We have also blogged about it earlier (see How to create better search – VGR leads the way). One important part of the creation of this solution was to create an interface that is accessible to everyone.

Today the web offers access to information and interaction for people around the world. But many sites today have barriers that make it difficult, and sometimes even impossible for people with different disabilities to navigate and interact with the site. It is important to design for accessibility  – so that no one is excluded because of their disabilities.

Web accessibility means that people with disabilities can perceive, understand, navigate, interact and contribute to the Web. But web accessibility is not only for people that use screen readers, as is often portrayed. It is also for people with just poor eyesight who need to increase the text size or for people with cognitive disabilities (or sometimes even for those without disabilities). Web accessibility can benefit people without disabilities, such as when using a slow Internet connection, using a mobile phone to access the web or when someone is having a broken arm. Even such a thing as using a web browser without javascript because of company policy can be a disability on the web and should be considered when designing websites.

So how do you build accessible websites?

One of the easiest things is to make sure that the xhtml validates. This means that the code is correct, adheres to the latest standard from W3C (World Wide Web Consortium) and that the code is semantically correct i.e. that the different parts of the website use the correct html ”tags” and in the correct context. For example that the most important heading of a page is marked up with ”h1” and that the second most important is ”h2” (among other things important when making websites accessible for people using screen readers).

It is also important that a site can easily be navigated only by keyboard, so that people who cannot use a mouse still can access the site. Here it is important to test in which order the different elements of the web page is selected when using the keyboard to navigate through the page. One thing that is often overlooked is that a site often is inaccessible for people with cognitive disabilities because the site contains content that uses complex words, sentences or structure. By making content less complex and more structured it  will be readable for everyone.

Examples from VGR

In the search application at VGR elements in the interface that use javascript will only be shown if the user has a browser with java script enabled. This will remove any situations where elements do not do anything because java script is turned off. The interface will still be usable, but you will not get all functionality. The VGR search solution also works well with only the keyboard, and there is a handy link that takes the user directly to the results. This way the user can skip unwanted information and navigation.

How is accessibility related to findability?

http://www.flickr.com/photos/morville/4274260576/in/set-72157623208480316/

Search and Accessibility

Accessibility is important for findability because it is about making search solutions accessible and usable for everyone. The need to find information is not less important if you are blind,  if you have a broken arm or if you have dyslexia. If you cannot use a search interface you cannot find the information you need.

“what you find changes who you become” -Peter Morville

In his book Search Patterns Peter Morville visualizes this in the ”user experience honeycomb”. As can been seen in the picture accessibility is as much a part of the user experience as usability or findability is and a search solution will be less usable without any of them.

Find People with Spock

Today, Google is the main source for finding information on the web, regardless of the kind of information you’re looking for. Let it be company information, diseases, or to find people – Google is used for finding everything. While Google is doing a great job in finding relevant information, it can be good to explore alternatives that are concentrated upon a more specific target.

In the previous post, Karl blogged about alternatives to Google that provides a different user interface. Earlier, Caroline has enlightened us about search engines that leads to new ways on how to use search. Today I am going to continue on these tracks and tell you a bit about a new challenger, Spock, and my first impressions of using it.

Spock, relased last week in beta version, is a search engine for finding people. Interest in finding people, both celebreties and ordinary people has risen the past years; just look at the popularity of social networking sites such as LinkedIn and Facebook. By using a search engine dedicated to finding people, you get more relevancy in the hits and more information in each hit. Spock crawls the above mentioned sites, as well as a bunch of others to gather the information about people you want to find.

When you begin to use Spock, you instantly see the difference in search results compared to Google. Searching for “Java developer Seattle” in Spock returns a huge list of Java developers positioned in Seattle. With Google, you get a bunch of hiring applications. Searching for a famous person like Steve Jobs with Google, you find yourself with thousands of pages about the CEO of Apple. Using Spock, you will learn that there are a lot of other people around the world also named Steve Jobs. With each hit, you find more information such as pictures, related people, links to pages that the person is mentioned on, etc.

In true Web 2.0 fashion, Spock uses tags to place people into categories. By exploring these tags, you will find even more people that might be of interest. Users can even register on Spock to add and edit tags and information about people.

Over all, Spock seems like a great search engine to me. The fact that users can contribute to the content, a fact that has made Wikipedia to what it is today, combined with good relevancy and a clean interface it has a promising future. It also shows how it is possible to compete with Google and the other giants at the search market by focusing on a specific target and deliver an excellent search experience in that particular area.