Publication: Thai Monitor Corpus: Challenges and Contribution to Thai NLP
Submitted Date
Received Date
Accepted Date
Issued Date
2018
Copyright Date
Announcement No.
Application No.
Patent No.
Valid Date
Resource Type
Edition
Resource Version
Language
en
File Type
No. of Pages/File Size
ISBN
ISSN
2287-0903
eISSN
DOI
Scopus ID
WOS ID
Pubmed ID
arXiv ID
item.page.harrt.identifier.callno
Other identifier(s)
Journal Title
Vacana
Volume
6
Issue
2
Edition
Start Page
End Page
Access Rights
Access Status
Rights
Copyright (c) 2018 Vacana
Rights Holder(s)
Physical Location
Bibliographic Citation
Research Projects
Organizational Units
Authors
Journal Issue
Title
Thai Monitor Corpus: Challenges and Contribution to Thai NLP
Alternative Title(s)
Author’s Affiliation
Author's E-mail
Editor(s)
Editor’s Affiliation
Corresponding person(s)
Creator(s)
Compiler
Advisor(s)
Illustrator(s)
Applicant(s)
Inventor(s)
Issuer
Assignee
Other Contributor(s)
Series
Has Part
Abstract
Building a corpus has been a necessary task for NLP and other research fields like linguistics; language teaching; and translation. Only a few Thai corpora have been created and released. Most of them are static and small in size. They are not designed to be a monitor corpus; which can grow over time. The concept of a monitor corpus bears similarity to the new research area named Big Data; which has gained more interests in the past few years because of the extensive growth of data available online. In this paper; the differences between monitor corpus and Big Data will be first discussed. Then; the design and the framework for developing a Thai monitor corpus will be outlined. To carry out this task; techniques and methods used in Big Data research that are suitable for storing texts will be selected and summarized. The progress of this work will be reported in section 3; and the plan for further development and the use of TMC will be sketched. The paper is concluded by pointing out the relationship between the two research fields; NLP and Big Data. Contributions to each other will be reviewed.