Cluster-Based News Representative Generation with Automatic Incremental Clustering
Abstract
Nowadays, a large volume of news circulates around the Internet in one day, amounting to more than two thousand news. However, some of these news have the same topic and content, trapping readers among different sources of news that say similar things. This research proposes a new approach to provide a representative news automatically through the Automatic Incremental Clustering method. This method began with the Data Acquisition process, Keyword Extraction, and Metadata Aggregation to produce a news metadata matrix. The news metadata matrix consisted of types of word in the column and news section of each line. Furthermore, the news on the matrix were grouped by the Automatic Incremental Clustering method based on the number of word similarities that arised, calculated using the Euclidean Distance approach, and was done automatically and real-time. Each cluster (topic) determined one representing news as a Representative News based on the location of the news closest to the midpoint/centroid on the cluster. This study used 101 news as experimental data and produced 87 news clusters with 85.14% precision ratio.
Downloads
References
J. Efendi and S., Perbandingan Nilai Berita Halaman Depan Portal Berita riauterkini.com dengan Portal Berita goriau.com, Jurnal Online Mahasiswa, vol. 2, Februari 2015.
E. L. Lukman, Laporan: inilah yang dilakukan 74,6 juta pengguna internet Indonesia ketika online, 31 October 2003. [Online]. Available: https://id.techinasia.com/tingkah-laku-pengguna-internet-indonesia. [Accessed on 24 Desember 2015].
R. Nistanto, Pengguna Internet Indonesia Tembus 88 Juta, Kompas, 26 Maret 2015. [Online]. Available: http://tekno.kompas.com/read/2015/03/26/14053597/pengguna.internet.indonesia.tembus.88.juta. [Accessed on 24 Desember 2015].
D. Z. E. Puspitasari, A. R. Barakbah and I. Winarno, Automatic Representative News Generation using Automatic Clustering, Industrial Electronics Seminar (IES), Surabaya, 2012.
M. Sigita, A. R. Barakbah, E. M. Kusumaningtyas and I. Winarno, Automatic Representative News Generation Using On-Line Clustering, EMITTER International Journal of Engineering Technology, vol. 1, p. 107, 2013. DOI: https://doi.org/10.24003/emitter.v1i1.11
D. P. Langgeni, Z. A. Baizal and Y. F. A. Wibowo, Clustering Artikel Berita Berbahasa Indonesia Menggunakan Unsupervised Feature Selection, Seminar Nasional Informatika (SEMNASIF), Yogyakarta, 2010.
J. Azzopardi and C. Staff, Incremental Clustering of News Reports, MDPI Open Access Journals, vol. 5, p. 364, 2012. DOI: https://doi.org/10.3390/a5030364
X. Zhang and Z. Li, Automatic Topic Detection with an Incremental Clustering Algorithm, WISM 2010: Web Information Systems and Mining, Berlin, 2010. DOI: https://doi.org/10.1007/978-3-642-16515-3_43
A. R. Barakbah and K. Arai, Pursuit Reinforcement Competitive Learning: an approach for on-line clustering, The 2nd International Seminar on Information and Communication Technology Seminar (ICTS), Surabaya, 2006.
A. R. Barakbah and K. Arai, Determining constraints of moving variance to find global optimum and make automatic clustering, Industrial Electronics Seminar (IES), Surabaya, 2004.
K. Arai and A. R. Barakbah, Cluster construction method based on global optimum cluster determination with the newly defined moving variance, Japan, 2007.
A. R. Barakbah and K. Arai, Reversed pattern of moving variance for accelerating automatic clustering, EEPIS journal, vol. 2, p. 15, 2004.
A. R. Barakbah and K. Arai, “Identifying moving variance to make automatic clustering for normal data set,†IECI Japan Workshop, Tokyo, 2004.
J. Asian, Effective Techniques for Indonesian Text Retrieval, RMIT Research Repository, Australia, 2007.
A. Z. Arifin, I. P. A. K. Mahendra and H. T. Ciptaningtyas, Enhanced Confix Stripping Stemmer and Ants Algorithm for Classifying News Document in Indonesian Language, Proceeding of International Conference on Information & Communication Technology and Systems (ICTS), Surabaya, 2009.
A. D. Tahitoe and D. Purwitasari, Implementasi Modifikasi Enhanced Confix Stripping Stemmer Untuk Bahasa Indonesia dengan Metode Corpus Based Stemming, Surabaya, 2010.
A. Z. Arifin and A. N. Setiono, Klasifikasi Dokumen Berita Kejadian Berbahasa Indonesia dengan Algoritma Single Pass Clustering, Seminar on Intelligent Technology and Its Applications (SITIA), Surabaya, 2002.
Copyright (c) 2019 EMITTER International Journal of Engineering Technology
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The copyright to this article is transferred to Politeknik Elektronika Negeri Surabaya(PENS) if and when the article is accepted for publication. The undersigned hereby transfers any and all rights in and to the paper including without limitation all copyrights to PENS. The undersigned hereby represents and warrants that the paper is original and that he/she is the author of the paper, except for material that is clearly identified as to its original source, with permission notices from the copyright owners where required. The undersigned represents that he/she has the power and authority to make and execute this assignment. The copyright transfer form can be downloaded here .
The corresponding author signs for and accepts responsibility for releasing this material on behalf of any and all co-authors. This agreement is to be signed by at least one of the authors who have obtained the assent of the co-author(s) where applicable. After submission of this agreement signed by the corresponding author, changes of authorship or in the order of the authors listed will not be accepted.
Retained Rights/Terms and Conditions
- Authors retain all proprietary rights in any process, procedure, or article of manufacture described in the Work.
- Authors may reproduce or authorize others to reproduce the work or derivative works for the author’s personal use or company use, provided that the source and the copyright notice of Politeknik Elektronika Negeri Surabaya (PENS) publisher are indicated.
- Authors are allowed to use and reuse their articles under the same CC-BY-NC-SA license as third parties.
- Third-parties are allowed to share and adapt the publication work for all non-commercial purposes and if they remix, transform, or build upon the material, they must distribute under the same license as the original.
Plagiarism Check
To avoid plagiarism activities, the manuscript will be checked twice by the Editorial Board of the EMITTER International Journal of Engineering Technology (EMITTER Journal) using iThenticate Plagiarism Checker and the CrossCheck plagiarism screening service. The similarity score of a manuscript has should be less than 25%. The manuscript that plagiarizes another author’s work or author's own will be rejected by EMITTER Journal.
Authors are expected to comply with EMITTER Journal's plagiarism rules by downloading and signing the plagiarism declaration form here and resubmitting the form, along with the copyright transfer form via online submission.