Automatic Representative News Generation using On-Line Clustering
Abstract
The increasing number of online news provider has produced large volume of news every day. The large volume can bring drawback in consuming information efficiently because some news contain similar contents but they have different titles that may appear. This paper presents a new system for automatically generating representative news using on-line clustering. The system allows the clustering to be dynamic with the features of centroid update and new cluster creation. Text mining is implemented to extract the news contents. The representative news is obtained from the closest distance to each centroid that calculated using Euclidean distance. For experimental study, we implement our system to 460 news in Bahasa Indonesia. The experiment performed 70.9% of precision ratio. The error is mainly caused by imprecise results from keyword extraction that generates only one or two keywords for an article. The distribution of centroid’s keywords also affects the clustering results.
Keywords: News Representation, On-line Clustering, Keyword Aggregation, Text Mining.
Downloads
References
Kominfo Pekalongan, Pengguna Internet Indonesia BisaTembus 82 Juta, http://kominfo.pekalongankota.go.id, Retrieved June 19, 2013.
I. Moggi, Daftar Situs Berita Online yang ada di Indonesia, http://www.speechmagazine.blogspot.com, Retrieved May 13, 2011.
Diptia Zandra Eka Puspitasari, Ali Ridho Barakbah, Idris Winarno, Automatic Representative News Generation using Automatic Clustering, Industrial Electronics Seminar (IES) 2011, Surabaya, 2012.
Oren Zamir, Oren Etzioni, Grouper: A Dynamic Clustering Interface to Web Search Result, Department of Computer Science snd Engineering, Seattle, 2010.
A. C. George, Efficient Extraction of News Articles based on RSS. Computer and Informatics Engineering Department, University of Patras.
Ali Ridho Barakbah, Pursuit Reinforcement Competitive Learning: An approach for on-line clustering, The 2nd Information and Communication Technology Seminar (ICTS), Surabaya, 2006.
The copyright to this article is transferred to Politeknik Elektronika Negeri Surabaya(PENS) if and when the article is accepted for publication. The undersigned hereby transfers any and all rights in and to the paper including without limitation all copyrights to PENS. The undersigned hereby represents and warrants that the paper is original and that he/she is the author of the paper, except for material that is clearly identified as to its original source, with permission notices from the copyright owners where required. The undersigned represents that he/she has the power and authority to make and execute this assignment. The copyright transfer form can be downloaded here .
The corresponding author signs for and accepts responsibility for releasing this material on behalf of any and all co-authors. This agreement is to be signed by at least one of the authors who have obtained the assent of the co-author(s) where applicable. After submission of this agreement signed by the corresponding author, changes of authorship or in the order of the authors listed will not be accepted.
Retained Rights/Terms and Conditions
- Authors retain all proprietary rights in any process, procedure, or article of manufacture described in the Work.
- Authors may reproduce or authorize others to reproduce the work or derivative works for the author’s personal use or company use, provided that the source and the copyright notice of Politeknik Elektronika Negeri Surabaya (PENS) publisher are indicated.
- Authors are allowed to use and reuse their articles under the same CC-BY-NC-SA license as third parties.
- Third-parties are allowed to share and adapt the publication work for all non-commercial purposes and if they remix, transform, or build upon the material, they must distribute under the same license as the original.
Plagiarism Check
To avoid plagiarism activities, the manuscript will be checked twice by the Editorial Board of the EMITTER International Journal of Engineering Technology (EMITTER Journal) using iThenticate Plagiarism Checker and the CrossCheck plagiarism screening service. The similarity score of a manuscript has should be less than 25%. The manuscript that plagiarizes another author’s work or author's own will be rejected by EMITTER Journal.
Authors are expected to comply with EMITTER Journal's plagiarism rules by downloading and signing the plagiarism declaration form here and resubmitting the form, along with the copyright transfer form via online submission.