Bias in Link Prediction Benchmarks

We analyze the following benchmark datasets in this work: FB15k, FB15k-237, WN18, WN18RR, YAGO3-10, Wikidata5M and DBpedia50k

This repository contains all the work needed to characterize the most benchmarks used for link prediction:

Analysis of Test Leakage and Sample Selection Bias
Topological analysis (Visualization images, network metrics)
Experiments for analysing bias in prediction results in FB15k, FB15k-237, WN18 and WN18RR
Mappings for generating N-Triple files and to upload benchmark data to a knowledge graph
SPARQL queries for analyzing bias patterns and other irregularities

Data files

All datasets - except Wikidata5M - can be found in the data folder. Wikidata5M needs to be manually imported due its large size. It can be directly downloaded from here. The CSV files can be generated by placing the text files under data/Wikidata5M/original and running the script data/write_csv.py.

Dataset statistics

Statistics regarding entities, relations and triples for each split can be found under analysis/data/split_statistics.

Topological results

Relation distributions: analysis/output/relation_distribution
Network visualizations: analysis/output/network_analysis/visualizations
Other network characteristics (degree, components, pagerank, communities): analysis/output/network_analysis/{partition}

Bias analysis

In this work, seven bias patterns are defined concerning test leakage and sample selection bias.

Test leakage patterns:

Near-duplicate relations
Near-inverse relations
Near-symmetric relations

For sample selection bias, we reused patterns defined in the work of Rossi et al.:

Overrepresented tail answers (referred to as Type 1 Bias by Rossi et al.)
Overrepresented head answers
Default tail answers (referred to as Type 2 Bias by Rossi et al.)
Default head answers

Using these patterns, we queried the number of bias-affected triples for each split in every dataset with SPARQL. The queries can be found in the folder sparql/affectedTriples. The following plots provide a good overview of bias in each benchmark:

Following observations can be made:

FB15k and WN18 suffer from test leakage due to near-inverse relations
Test leakage can be found in YAGO3-10 through near-duplicate relations (over 63.3% of test triples are near-duplicates in the training set)
WN18RR contains a large number of near-symmetric relations (almost 40%), higher than WN18
FB15k-237 features a lot of test triples that have a default tail answer

Experimental results

To understand how these bias types affect prediction results we then tried to explain each correct prediction (based on H@k metric) by one or more of our bias types on a learned TransE model. If no explanation could be found the correct prediction is assigned to the bucket unknown. The jupyter notebook for bucketing predictions can be found under experiments/prediction_analysis.ipynb.

Following observations can be made:

Correct predictions in FB15k and WN18 can almost completely be explained with our defined bias patterns
In WN18RR, the majority of correct predictions can be explained with the occurrence of near-symmetric relations
We further notice that this pattern occurs at a ~100% higher rate than in the input data. This behavior reinforces the idea that the model is biased towards learning these patterns

Reproducing the experiments

First make sure to have correctly installed the library AmpliGraph with version 2.0.0.

To reproduce all of our experiments you simply need to start the reproducibility script using bash reproducibility/reproduce_experiments.sh.

The script will automatically run the whole pipeline from calling SPARQL endpoints for the dataset analysis on an input level to learning embeddings and finally generating the shown plots.

The hyperparameters from here were reused for our experiments.

The Python version used is 3.8

How to generate N-Triple files for each dataset?

Install the RDFizer using python3 -m pip install rdfizer
cd mappings
bash generate_triple_files.sh

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
analysis		analysis
data		data
experiments		experiments
mappings		mappings
reproducibility		reproducibility
sparql_queries		sparql_queries
__init__.py		__init__.py
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bias in Link Prediction Benchmarks

Data files

Dataset statistics

Topological results

Bias analysis

Experimental results

Reproducing the experiments

How to generate N-Triple files for each dataset?

About

Releases

Packages

Contributors 2

Languages

SDM-TIB/LinkPredBias

Folders and files

Latest commit

History

Repository files navigation

Bias in Link Prediction Benchmarks

Data files

Dataset statistics

Topological results

Bias analysis

Experimental results

Reproducing the experiments

How to generate N-Triple files for each dataset?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages