Skip to content

Using Vectorizer, it converts a text document given into a vector and compares that number to another document within a folder. This will give us a % difference between the documents using cosine similarity.

Notifications You must be signed in to change notification settings

tkoppop/Plagiarism-Checker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Plagiarism-Checker

Codacy Badge

Using python I have created a plagiarism checker that simply tests the similariy between all .txt files in a folder. It uses Vectorizer to set a numerical value to all files based on the words used, and compares it with cosine similarity (based of cosine law). In other words, the file gets turned into a vector, and the cosine similarity reads the angle between the two vectors and lists that as the % similarity. This program will try all combinations of files and print them out at the end.

Jan 12, 2021: So far the app.py only can access files within the same folder. I am hoping to implement it with google api, where I can search the actual files on that api, and pull about 10 links that are similar and output those similarities. It is also possible to tell the user that there are links with high similarity, and flag certain key words.

About

Using Vectorizer, it converts a text document given into a vector and compares that number to another document within a folder. This will give us a % difference between the documents using cosine similarity.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages