Implementasi Algoritma TF-IDF Pada Pengukuran Kesamaan Dokumen

Ryansyah, Adi (2014) Implementasi Algoritma TF-IDF Pada Pengukuran Kesamaan Dokumen. Other thesis, Sekolah Tinggi Teknik Musi.

[img] Text
IF-2014-310007-cover.pdf

Download (54kB)
[img] Text
IF-2014-310007-abstract.pdf

Download (34kB)
[img] Text
IF-2014-310007-tableofcontent.pdf

Download (178kB)
[img] Text
IF-2014-310007-chapter1.pdf

Download (29kB)
[img] Text
IF-2014-310007-chapter2.pdf
Restricted to Repository staff only

Download (3MB)
[img] Text
IF-2014-310007-chapter3.pdf
Restricted to Repository staff only

Download (4MB)
[img] Text
IF-2014-310007-chapter4.pdf
Restricted to Repository staff only

Download (23MB)
[img] Text
IF-2014-310007-conclusion.pdf

Download (5kB)
[img] Text
IF-2014-310007-reference.pdf

Download (33kB)
[img] Text
IF-2014-310007-attachment.pdf
Restricted to Repository staff only

Download (408kB)
[img] Text
IF-2014-310007-complete.pdf
Restricted to Repository staff only

Download (3MB)
[img] Text
IF-2014-310007-summary_id.pdf
Restricted to Repository staff only

Download (794kB)

Abstract

Documents similarity measure is a time consuming problem. The large amount of documents and the large number of pages per document are causing the similarity measures to become a complicated and hard job to do manually. In this research, a system that can automatically measuring similarity between documents is built by implementing TF-IDF. Measurements are carried by first creating a vector representation of documents being compared. This vector representation containing the weight of each term in the documents. After that, the similarity values are calculated using cosine similarity. This research used waterfall model. System design is done using Unified Modeling Language (UML) and implemented in Java programming language with Netbeans as IDE. The documents used in this research are ten thesis reports from computer science major, eight from information system major, and ten reports from industrial engineering major of Sekolah Tinggi Teknik Musi. The finished system can carry out comparison of documents in pdf or word format. Document comparison can be done using all the chapters in the report, or just a few selected chapters that are considered significant. Based on experiment, it can be concluded that tf-idf needs at least three documents to be available in the document collection being processed. The test of correlation shows that for document in pdf format, there is a significant correlation between the amount of characters in the document with the processing time.

Item Type: Thesis (Other)
Additional Information: Skripsi Lengkap dapat dibaca di Ruang Referensi Perpustakaan UKMC Kampus Bangau.
Uncontrolled Keywords: documents similarity measure, tf-idf, vector, cosine similarity.
Subjects: T Technology > T Technology (General)
Divisions: Theses - S1 > Informatics Study Program
Depositing User: Perpustakaan Unika Musi Charitas
Date Deposited: 05 Dec 2017 01:28
Last Modified: 26 Mar 2018 05:13
URI: http://eprints.ukmc.ac.id/id/eprint/662

Actions (login required)

View Item View Item