Paper

Code to reproduce results in the paper "Finding Streams in Knowledge Graphs to Support Fact Checking" in proceedings of ICDM 2017. A full version of this paper can be found at: https://arxiv.org/abs/1708.07239

Fetch

git clone https://github.com/shiralkarprashant/knowledgestream
cd knowledgestream
mkdir -p output

Data

Update: April 2025 - The specific version of DBpedia knowledge graph (DBpedia 2016-10), test datasets and relational similarity matrix used in the paper, which were previously accessible at this location, aren't unfortunately available anymore. However, the following instructions might be helpful if you are interested in running the code.

Knowledge Graph

You may use this example script to re-create a knowledge graph from one of the latest versions of the raw DBpedia datasets. An old video demonstrating how to use the code is available here. Once assembled, place the data inside the knowledgestream directory. Processing the raw dataset should lead to creation of three files collectively representing the knowledge graph: nodes.txt, relations.txt, abbrev_cup_dbpedia_filtered.nt. You may also try using the following script to generate the required graph files (.npy): KG generation script

Test Datasets

The following Wikipedia lists contain the true triples (relationships) between entities used in the test datasets in the paper. The false triples (negative examples) were created by "perturbing" the set of objects in each list by making a "local-closed world assumption (LCWA)."

NBA-Team: List of NBA players who have spent their entire career with one franchise
Oscars: Academy Award for Best Picture
FLOTUS: List of First Ladies of the United States
World Capitals: List of national capitals in alphabetical order
Birthplace-Deathplace: This was created just based on DBpedia. Persons having different birth and death place were identified and 250 individuals were sampled from five buckets partitioning Birthplace-Deathplace distribution. Their death place was forged as a false example (or triple) of their birth place, while their birth place was taken as a true triple, thereby creating 250 true and 250 false triples.

Relational Similarity Matrix

The relational similarity matrix derived using TF-IDF representations of relations of a graph can be created by following Section II.A of the paper.

System requirements

OS: Linux Ubuntu / Mac OSX 10.12 (Sierra)
Python: Python 2.7 (we developed and tested using the Anaconda distribution)
Memory requirements: >4 GB

Install

python setup.py build_ext -if

python setup.py install

Note: for the second command, please do sudo in case you need installation rights on the machine.

Run

Note: For the ouput directory parameter ("-o") for any of the algorithms in this repository, please make sure the directory exists before launching the process. Otherwise, it throws an AssertionError due to a non-existent directory.

Knowledge Stream (KS)

kstream -m stream -d datasets/sample.csv -o output/

You should see output such as

[19:00:10] Launching stream..
[19:00:10] Dataset: sample.csv
[19:00:10] Output dir: output
[19:00:10] Read data: (5, 7) sample.csv
[19:00:10] Note: Found non-NA records: (5, 7)
Reconstructing graph from data/kg/_undir
=> Loaded: undir_data.npy
=> Loaded: undir_indptr.npy
=> Loaded: undir_indices.npy
=> Loaded: undir_indeg_vec.npy
=> Graph loaded: 11.63 secs.

[19:00:22] Computing KNOWLEDGE-STREAM for 5 triples..
1. Working on (2985653, 599, 3218267) .. [19:00:38] mincostflow: 0.08863, #paths: 5, time: 12.55s.
 2. Working on (2734002, 599, 5305646) .. [19:00:52] mincostflow: 0.15648, #paths: 5, time: 12.91s.
 3. Working on (5140024, 599, 4567127) .. [19:01:01] mincostflow: 0.16628, #paths: 5, time: 9.20s.
 4. Working on (1522148, 599, 1357357) .. [19:01:11] mincostflow: 0.09414, #paths: 5, time: 9.28s.
 5. Working on (4319468, 599, 2450828) .. [19:01:19] mincostflow: 0.16025, #paths: 5, time: 8.05s.
[19:01:19] * Saved results: output/out_kstream_sample_2017-08-23.csv
[19:01:19] Mincostflow computation complete. Time taken: 57.64 secs.

and a CSV file is created at the specified output directory, which contains score and softmaxscore (normalized) for each triple.

Relational Knowledge Linker (KL-REL)

kstream -m relklinker -d datasets/sample.csv -o output/

You should see output such as

[18:56:43] Launching relklinker..
[18:56:43] Dataset: sample.csv
[18:56:43] Output dir: output
[18:56:43] Read data: (5, 7) sample.csv
[18:56:43] Note: Found non-NA records: (5, 7)
Reconstructing graph from data/kg/_undir
=> Loaded: undir_data.npy
=> Loaded: undir_indptr.npy
=> Loaded: undir_indices.npy
=> Loaded: undir_indeg_vec.npy
=> Graph loaded: 1.24 secs.

[18:56:44] Computing REL-KLINKER for 5 triples..
[18:56:50] time: 3.35s
1. Working on (2985653, 599, 3218267)..[18:56:55] time: 2.90s
 2. Working on (2734002, 599, 5305646)..[18:56:57] time: 2.18s
 3. Working on (5140024, 599, 4567127)..[18:57:00] time: 2.16s
 4. Working on (1522148, 599, 1357357)..[18:57:02] time: 2.15s
 5. Working on (4319468, 599, 2450828)..[18:57:03] 
[18:57:03] * Saved results: output/out_relklinker_sample_2017-08-23.csv
[18:57:03] Relklinker computation complete. Time taken: 18.68 secs.

and a CSV file is created at the specified output directory, which contains score and softmaxscore (normalized) for each triple.

Knowledge Linker (KL)

kstream -m relklinker -d datasets/sample.csv -o output/

You should see output such as

[16:02:18] Launching klinker..
[16:02:18] Dataset: sample.csv
[16:02:18] Output dir: output
[16:02:18] Read data: (3, 7) sample.csv
[16:02:18] Note: Found non-NA records: (3, 7)
Reconstructing graph from data/kg/_undir
=> Loaded: undir_data.npy
=> Loaded: undir_indptr.npy
=> Loaded: undir_indices.npy
=> Loaded: undir_indeg_vec.npy
=> Graph loaded: 0.54 secs.

[16:02:19] Computing KL for 3 triples..
1. Working on (392035, 599, 2115741).. time: 3.09s
2. Working on (392035, 599, 4119746).. time: 3.82s
3. Working on (392035, 599, 917821).. time: 3.13s
...
[16:02:34] * Saved results: output/out_klinker_sample_2017-08-25.csv
[16:02:34] KL computation complete. Time taken: 15.29 secs.

and a CSV file is created at the specified output directory, which contains score and softmaxscore (normalized) for each triple.

Alternatively, code for this method can be found at Knowledge Linker.

Predicate Path Mining (PredPath)

kstream -m 'predpath' -d ./datasets/sample.csv -o ./output/

You should see output such as

[13:20:16] Launching predpath..
[13:20:16] Dataset: sample.csv
[13:20:16] Output dir: output
[13:20:16] Read data: (104, 7) sample.csv
[13:20:16] Note: Found non-NA records: (104, 7)
Reconstructing graph from data/kg/_undir
=> Loaded: undir_data.npy
=> Loaded: undir_indptr.npy
=> Loaded: undir_indices.npy
=> Loaded: undir_indeg_vec.npy
=> Graph loaded: 3.82 secs.

=> Removing predicate 599.0 from KG.

=> Path extraction..(this can take a while)

P: +:40, -:33, unique tot:73
Time taken: 160.79s

=> Path selection..
D: +:40, -:33, tot:73
Time taken: 0.07s

=> Model building..
#Features: 73, best-AUROC: 0.92943
Time taken: 0.65s

Time taken: 163.72s

Saved: output/out_predpath_sample_2017-08-25.pkl

Path Ranking Algorithm (PRA)

kstream -m 'pra' -d ./datasets/sample.csv -o ./output/

You should see output such as

[13:31:37] Launching pra..
[13:31:37] Dataset: sample.csv
[13:31:37] Output dir: output
[13:31:37] Read data: (104, 7) sample.csv
[13:31:37] Note: Found non-NA records: (104, 7)
Reconstructing graph from data/kg/_undir
=> Loaded: undir_data.npy
=> Loaded: undir_indptr.npy
=> Loaded: undir_indices.npy
=> Loaded: undir_indeg_vec.npy
=> Graph loaded: 1.87 secs.

=> Removing predicate 599 from KG.

=> Path extraction..
...
#Features: 73
Time taken: 75.98s

=> Path selection..
Selected 73 features

=> Constructing feature matrix..
Time taken at C level: 2.656s
Time taken: 2.87590s

=> Model building..
#Features: 73, best-AUROC: 0.7786
Time taken: 0.59s

Time taken: 81.39s

Saved: output/out_pra_sample_2017-08-25.pkl

TransE

Code for this method and couple other methods based on the idea of knowledge graph embedding can be found at KB2E.

Link prediction algorithms: Katz, PathEnt, SimRank, Adamic & Adar, Jaccard, Degree Product

All link prediction algorithms can be invoked in a similar manner. The method names to specify are respectively: katz, pathent, simrank, adamic_adar, jaccard, and degree_product.

Example: kstream -m 'katz' -d ./datasets/sample.csv -o ./output/

You should see output such as

[15:25:06] Launching katz..
[15:25:06] Dataset: sample.csv
[15:25:06] Output dir: output
[15:25:06] Read data: (9, 7) sample.csv
[15:25:06] Note: Found non-NA records: (9, 7)
Reconstructing graph from data/kg/_undir
=> Loaded: undir_data.npy
=> Loaded: undir_indptr.npy
=> Loaded: undir_indices.npy
=> Loaded: undir_indeg_vec.npy
=> Graph loaded: 8.79 secs.

[15:25:16] Computing KZ for 9 triples..
1. Working on (392035, 599, 2115741).. score: 0.12750, time: 3.74s
2. Working on (392035, 599, 4119746).. score: 0.02813, time: 3.99s
3. Working on (392035, 599, 917821).. score: 0.03825, time: 3.59s
...
* Saved results: output/out_katz_sample_2017-08-25.csv

and a CSV file is created at the specified output directory, which contains score and softmaxscore (normalized) for each triple.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
algorithms		algorithms
datasets		datasets
datastructures		datastructures
.gitignore		.gitignore
CONTRIBUTORS.md		CONTRIBUTORS.md
README.md		README.md
__init__.py		__init__.py
create_knowledge_graph_04292025.py		create_knowledge_graph_04292025.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Paper

Fetch

Data

Knowledge Graph

Test Datasets

Relational Similarity Matrix

System requirements

Install

Run

Knowledge Stream (KS)

Relational Knowledge Linker (KL-REL)

Knowledge Linker (KL)

Predicate Path Mining (PredPath)

Path Ranking Algorithm (PRA)

TransE

Link prediction algorithms: Katz, PathEnt, SimRank, Adamic & Adar, Jaccard, Degree Product

About

Uh oh!

Releases

Packages

Uh oh!

Languages

shiralkarprashant/knowledgestream

Folders and files

Latest commit

History

Repository files navigation

Paper

Fetch

Data

Knowledge Graph

Test Datasets

Relational Similarity Matrix

System requirements

Install

Run

Knowledge Stream (KS)

Relational Knowledge Linker (KL-REL)

Knowledge Linker (KL)

Predicate Path Mining (PredPath)

Path Ranking Algorithm (PRA)

TransE

Link prediction algorithms: Katz, PathEnt, SimRank, Adamic & Adar, Jaccard, Degree Product

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages