Publication: Clustering Keywords to Identify Concepts in Texts: An Analysis of Research Articles in Applied Linguistics
Submitted Date
Received Date
Accepted Date
Issued Date
2016
Copyright Date
Announcement No.
Application No.
Patent No.
Valid Date
Resource Type
Edition
Resource Version
Language
en
File Type
No. of Pages/File Size
ISBN
ISSN
1513-5934 (Print), 2651-1479 (Online)
eISSN
DOI
Scopus ID
WOS ID
Pubmed ID
arXiv ID
item.page.harrt.identifier.callno
Other identifier(s)
Journal Title
rEFLections Journal
Volume
22
Issue
Edition
Start Page
55
End Page
70
Access Rights
Access Status
Rights
Rights Holder(s)
Physical Location
Bibliographic Citation
Research Projects
Organizational Units
Authors
Journal Issue
Title
Clustering Keywords to Identify Concepts in Texts: An Analysis of Research Articles in Applied Linguistics
Alternative Title(s)
Author(s)
Author’s Affiliation
Author's E-mail
Editor(s)
Editor’s Affiliation
Corresponding person(s)
Creator(s)
Compiler
Advisor(s)
Illustrator(s)
Applicant(s)
Inventor(s)
Issuer
Assignee
Other Contributor(s)
Series
Has Part
Abstract
Keyword analysis is one of the most widely used methods in corpus linguistics. The method is used to generate keywords which provide an indication of concepts in texts or a corpus. Keyword analysis tools commonly produce resulting keywords presented as a list which rather poorly indicates what the corpus is about since it typically requires analysts’ knowledge on conceptual associations between keywords. Therefore, common follow-up methods of keyword analysis are to examine concordances, collocational patterns, and some other patterns of associations between keywords and contexts. This study focuses on the association within a group of keywords by constructing a representation of a keyword list as keyword clusters. The keywords for an analysis were generated from two corpora; the target corpus was collected from research articles in applied linguistics and the comparative corpus was a collection of research in pure and applied sciences. The relationship between the top 30 keywords was identifed using mutual information scores of all possible pairs of the keywords within a span of 20 and these scores were used as input for creating keyword clusters. The representations of the 30 keywords as a list and clusters are presented and discussed.