Bayesian Feed Filtering

About

The Bayesian Feed Filtering (BayesFF) project will be trying to identify those articles that are of interest to specific researchers from a set of RSS feeds of Journal Tables of Content by applying the same approach that is used to filter out junk emails.

We will develop and investigate the performance of a tool that will aggregate and filter a range of RSS and ATOM feeds selected by a user. The algorithm used for the filtering is similar to that used to identify spam in many email filters only in this case it will be “trained” to identify items that are interesting and should be highlighted, not those that should be junked.

An important element of the project is investigating whether the filtering is effective enough to be helpful to users (specifically, in this case, researchers looking at journal tables of content for interesting newly-published papers) and disseminating information about the potential of this approach within the JISC community. We appreciate that the potential applicability of the technique is much wider, it applies to any area where a user might want to monitor alerts from a wide range of sources in the knowledge that many of the items in the feeds will be irrelevant. Anyone who has subscribed to dozens of seemingly relevant feeds only to find that they are presented with more items than they can scan is familiar with this problem.

Aims & Objectives

Aims

Objectives

Approach

The demonstrator service will be built, as far as is practicable, out of existing open source software modules, for example the Bayesian filtering routine used by sux0r, and the RSS aggregator and the user interfaces from sux0r and ticTOCs. All software will be developed as open source software, i.e. using open source applications such as Apache, mySQL, PHP, with code hosted on SourceForge or Google Code, and available through an open source licence. The API is intended to allow users to interact remotely with the filtering mechanism, i.e. by indicating which items are and are not relevant to their interests. A typical use for the API would be a widget to display those items that the system suggested as of interest on a site such as iGoogle or Netvibes, and through this widget to be able to indicate any items which actually weren't of interest.

We will guide a group of approximately 20 researchers through the use the system, training the Bayesian filter with information about their interests. RSS feeds for the tables of contents of journals which the researchers are interested in will be sourced from ticTOCs. Ideally, information about which items they find useful will come from those feeds, however the time scale for the project means that there may not be a sufficient number of interesting items in the journal issues for which table of contents feeds are available during the project. To allow for this, the system may be trained using text from the abstracts of papers that have been identified by the researchers as interesting, e.g. papers they have recently read, written or cited. The Bayesian filter will then be used to select items from subsequent journal TOC feeds and the researchers will provide feedback through interviews or questionnaires on the success of the filtering. Researcher will be recruited locally, from Heriot-Watt where possible, in order to facilitate easy interaction with them; the project budget includes a sum for a small incentive for researchers to take part in the trial.

Outputs

Development

We have created a local installation of the open source software sux0r in order to trial the sytem with researchers. Sux0r is a platform for blogging, bookmarking, sharing photos and reading RSS Feeds. Our intallation includes only the RSS Reader with Bayesian Filtering in order to simplify the experience for our users. http://icbl.macs.hw.ac.uk/sux0r210/

We have also developed an API for Sux0r to allow other applications to include Bayesian Feed Filtering functionality.

Related Blog Posts:

User Trialling

We recruited 20 research staff and students from Heriot-Watt University as volunteers to trial Bayesian Feed Filtering. The trials consisited of five main stages.

Related Blog posts:

Community Engagement

The project disseminated it's find through blog posts. The customised installation of sux0r was made available to allow external users to test. The installation will be sustained after the project is completed to allow these current users, the volunteers of the trial and potential future users use of the tool.

Related Blog Posts:

Project Mangement

We adopted the Feature Driven Development approach to project management. The development was broken down into a list of features which could be designed and built rapidly.

Related Blog Posts:

 

Plan and Progress

Project proposal is available on scribd.

Who did this and who paid them

The project was managed by Phil Barker, of ICBL, Heriot-Watt University. Santiago Chumbe, of ICBL, was responsible for development and Lisa Rogers, also of ICBL conducted the user trials. Funding for the project was from the JISC as part of the Rapid Innovation program.

Maintained by Lisa Rogers
Last modified: 10 December 2009.
© For more information about this website, including terms of use, go to http://www.icbl.hw.ac.uk/aboutweb.html