Information Retrieval Resources

Document Collections Description
Structured DocumentsStructured Documents at Queen Mary, University of London (Focus project)
TREC Web CollectionsWT2G, WT10G, DOGTOV, DOTGOV2 and Blog06 collections.
The University of Glasgow recently became the TREC-appointed distributor of the test collections of Web crawls for the TREC Web and Terabyte tracks. If you're experimenting with Information Retrieval systems in a Web or blogs context and/or if you are interested in large-scale IR evaluation, then these crawls are really a necessity. As queries and relevance assessments are available from TREC for these collections, you can use these to tune/evaluate your system or approach.

Journals Description
CL MITComputational Linguistics, The MIT Press
DKEData & Knowledge Engineering (DKE), Elsevier
IJCISInternational Journal of Cooperative Information Systems
IJDARInternational Journal on Document Analysis and Recognition
IJDLInternational Journal on Digital Libraries, Springer
IJISInternational Journal of Intelligent Systems
IJUFKSInternational Journal of Uncertainty, Fuzziness and Knowledge-Based Systems (IJUFKS), World Scientific
IPLInformation Processing Letters, Elsevier
IPMInformation Processing & Management
IRInformation Research
JASIST Journal of the American Society for Information Science and Technology (JASIST), Wiley
JDoc Journal of Documentation (JDoc). Emerald
JIIS Journal of Intelligent Information Systems (JIIS), Springer
JIRInformation Retrieval, Springer
KAISKnowledge and Information Systems (KAIS), Springer
NLENatural Language Engineering, Cambridge University Press
TKDETransactions on Knowledge and Data Engineering (TKDE), IEEE
TOIS Transactions On Information Systems (TOIS), ACM
TWEDTransactions on the Web (TWEB), ACM

Other Description
D-Lib Magazine
FAST Search white papersIf you would like further information on Search, FAST is offering a number of interesting, free white papers at their knowledge center covering topics such as ‘Demystifying Search; ‘Search for Tomorrow’ and ‘Search Best Practises’.
Finding Out AboutFinding Out About: Search Engine Technology from a cognitive Perspective. Belew, R.K., Cambridge University Press, 2000.
Foundations of Statistical Natural Language ProcessingFoundations of Statistical Natural Language Processing. Manning, C., and H. Schutze, H., MIT Press, 1999.
Information retrievalInformation retrieval. van Rijsbergen, C.J., Butterworths, 1979.
Information Retrieval InteractionInformation Retrieval Interaction. Ingwersen, P., Taylor Graham Publishing, 1992.
Information Retrieval: Data Structures & AlgorithmsInformation Retrieval: Data Structures & Algorithms . Frakes, W. and Baeza-Yates, R., Prentice Hall, 1992.
Managing GigabytesManaging Gigabytes: Compressing and Indexing Documents and Images. Witten, I.H., Moffat, A. and Bell,T.C., Morgan Kaufmann Publishing,1999
Modern Information RetrievalModern Information Retrieval. Baeza-Yates, R. and Ribeiro-Neto, B. (eds), Addison-Wesley-Longman Publishers, 1999.
The TurnThe Turn: Integration of Information Seeking and Retrieval in Context. Ingwersen, P., and Jarvelin, K., Springer, 2005.
TRECTREC : Experiment and Evaluation in Information Retrieval. Voorhees, E.M., and Harman, D.K., (eds) The MIT Press, 2005.

Systems and Tools Description
HySpiritHySpirit is a Software Development Toolkit for Information Retrieval. The key concept of HySpirit is the representation of knowledge and its intrinsic uncertainty based on database modelling and probability theory. Thus, we achieve a descriptional approach that allows us to construct effective, flexible and scalable retrieval systems in a wide range of applications such as structured document retrieval (XML, HTML), datawarehousing, knowledge-based retrieval, interactive TV.
LuceneApache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
TerrierTerrier, Terabyte Retriever, is a new modular information retrieval system for the rapid development of Web, intranet and desktop search engines. The indexing module is optimised for indexing large-scale collections of documents, while the retrieval module provides a wide range of weighting approaches and full-text search algorithms, aiming to offer a public testbed for performing information retrieval experiments. Terrier has been successfully used for adhoc retrieval, Web search and cross-language retrieval, in a centralised or distributed setting. A version of Terrier is available for download as open source software, distributed under the Mozilla Public License (MPL). Terrier is written in JAVA and therefore it runs on any platform with a JAVA virtual machine, allowing an easy integration with cross-platform applications.
The Lemur Toolkit for Language Modeling and IRThe Lemur Toolkit is designed to facilitate research in language modeling and information retrieval, where IR is broadly interpreted to include such technologies as ad hoc and distributed retrieval, cross-language IR, summarization, filtering, and classification. The toolkit supports indexing of large-scale text databases, the construction of simple language models for documents, queries, or subcollections, and the implementation of retrieval systems based on language models as well as a variety of other retrieval models. The system is written in the C and C++ languages, and is designed as a research system to run under Unix operating systems, although it can also run under Windows.
The Tools We Use for Music IRA list of tools that are available to those interested in Music Based Information Retrieval.
WIRE - Web Information Retrieval EnvironmentThe WIRE project is an effort started by the Center for Web Research for creating an application for information retrieval, designed to be used on the Web.
Currently, it includes a simple format for storing a collection of web documents, a web crawler, tools for extracting statistics from the collection, and tools for generating reports about the collection.
XapianXapian is an Open Source Search Engine Library, released under the GPL. Xapian is a highly adaptable toolkit which allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also supports a rich set of boolean query operators.