Full-text index contents of email attachments and zip files

kyle.hatlestad | Jul 20, 2010 06:54 +0000

email_200.jpgI recently found out about new indexing functionality that allows UCM to full-text index content that is attached to an email message or the contents of a compressed zip file. This means that a Word document attached to an email message checked in or a zip file containing multiple PDF documents can now all be full-text indexed.

If you are using UCM 11g, this functionality is already built-in and configured. But if you are using UCM 10g, you need an extra patch file. I've made a copy available here. This will most likely make it into future update bundle patches.winzip_200.png

To implement the patch, create the directory <ucm dir>/classes/intradoc/taskmanager/tasks/ and place the patch file there. For handling the emails with attachments, you don't need to do anything beyond that. But for handling zip files, you'll need to change the configuration so that UCM knows to full-text index that file type as well. The easiest way is to add this configuration value in <ucm dir>/config/config.cfg (as a single line)


TextIndexerFilterFormats=pdf,msword,ms-word,doc*,ms-excel,xls*,ms-powerpoint,powerpoint,ppt*,rtf,xml,msg,zip

Make those changes, restart the server, and then rebuild the search collection if you want to catch any existing content items. Otherwise, new items will now get indexed this way.