Implementing MapReduce over language and literature data over the UK National Grid Service

Sarwar, M.S., Alexander, M. , Anderson, J., Green, J. and Sinnott, R.O. (2011) Implementing MapReduce over language and literature data over the UK National Grid Service. In: 7th International Conference on Emerging Technologies (ICET), Islamabad, Pakistan, 5-6 Sep 2011, (doi: 10.1109/ICET.2011.6048475)

Full text not currently available from Enlighten.

Publisher's URL: http://dx.doi.org/10.1109/ICET.2011.6048475

Abstract

Humanities researchers are producing large volumes and heterogeneous varieties of language and literature data collections in digital format. These collections include dictionaries, thesauri, corpora, images, audio and video resources. The increased availability of these datasets brought about by advances and adaptations of the Internet and increased digitisation of humanities data resources, poses new challenges for humanities researchers. Many of these challenges are related to data access and usage and include security, integrity, interoperability, information retrieval, sharing, licensing and copyright. The JISC-funded project Enhancing Repositories for Language and Literature Research (ENROLLER; https://www.enroller.org.uk) is addressing these issues through development of a targeted e-Research environment. A key component of this effort is in supporting large-scale analysis of diverse language and literature data sets. To this end, this paper presents the application of the MapReduce algorithm, that supports information retrieval and linguistic analysis on those datasets. In particular, we describe how MapReduce is used to provide advanced bulk search capabilities exploiting a range of high performance computing resources including the UK National Grid Service (www.ngs.ac.uk) and ScotGrid (www.scotgrid.ac.uk) to offer a step change in the kinds of research that can be undertaken by this community. We also present performance analysis results based on the application of these systems.

Item Type:Conference Proceedings
Additional Information:Proceedings ISBN: 9781457707698
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Alexander, Professor Marc and Green, Dr Johanna and Anderson, Mrs Jean and Sinnott, Professor Richard and Sarwar, Mr Muhammad
Authors: Sarwar, M.S., Alexander, M., Anderson, J., Green, J., and Sinnott, R.O.
Subjects:P Language and Literature > P Philology. Linguistics
P Language and Literature > PE English
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Z Bibliography. Library Science. Information Resources > ZA Information resources
College/School:College of Arts & Humanities > School of Critical Studies > English Language and Linguistics
University Services > IT Services > E-Science

University Staff: Request a correction | Enlighten Editors: Update this record

Project CodeAward NoProject NamePrincipal InvestigatorFunder's NameFunder RefLead Dept
512051Enhancing repositories for language and literature researchers (ENROLLER)Jean AndersonJoint Information Systems Committee (JISC)IRDEVREP/REPOSCRIT - ENGLISH LANGUAGE