The naive fact checker based on Wikipedia. Using NLTK
and Wekipedia
Python packages.
Faktenprüfung
Name | Matriculation Number |
---|---|
Maqbool Ur Rahim Khan | 6843364 |
Ali Azmi | 6844310 |
Muhammad Hamza | 6845542 |
- Python3
- Pip3
python3 setup.py
Our setup uses Python3 which is aliased as python3
. If Python 3 command is not available as python3
then setup.py
won't work.
In order to execute the training data set, please run the following command.
Python3 main.py train
In order to execute test
, please run the following command.
python3 main.py test
Please make sure that the spellings of train and test are correct.
First of all, we perform Name Entity (NE)
on each sentence using NLTK
, but NE
done by NLTK
is not perfect. Therefore, we wrote our own rules for NE
in our project.
Consider the following sample sentence taken from the train.tsv
file:
Valve Corporation's foundation place is Kirkland, Washington.
After performing NE
, we will get Valve Corporation's
and Kirkland, Washington
. As these sentences are quite straight forward, so we will assume the first token as Subject and the second as Object. Now, the remaining tokens are place, is
and here you can easily see that after removing the stop word is
, we are left with place
, and so we can assume it to be the Predicate.
Now we take the Subject and search its corresponding Wikipedia page. Once we have the page, we seach our Object in that respective page. If the search result returns true
, we consider our fact as TRUE and if the search result returns false
, we consider our fact as FALSE.
Since the approach is Naive, one can simply question; why don't we use predicate to check our facts?
And the reason is each fact or sentence can be written in multiple ways.
Example:
Quaid-e-Azam was born in Pakistan (true fact)
this sentence can also be written as:
The birth place of Quaid-e-Azam is Pakistan (true fact)
The Predicate born
and birth place
are quite different. It's difficult to know what predicate is available on Wikipedia, there is a possibility of having a synoynym of the same word.
Secondly, the sentences on Wikipedia are not that simple as compared to our data and the distance between the Subject, Object and Predicate can make a sentence evaluation quite unreliable and without using predicate, surprisingly our project gives 65% of precision.
In this case we are simply returning false!!!
Following are the sample run of the trained data set.
- Quaid-e-Azam was born in India
- Ansoo Lake is situated in Sindh province
- Quetta is the capital city of Pakistan
- Microsoft was not founded by Steve Jobs
- University of Paderborn is not located in Munich
- Microsoft does not own the Skype company.