Werge: Web Reversed Graph Extractor

A set of tools to work on computational propaganda detection

Running script

python web-network-mapper.py -l LINK -n NUMBER_OF_ITERATIONS -p (include if it is propaganda)

Example of propaganda

python web-network-mapper.py -l https://euvsdisinfo.eu/report/despite-promises-to-the-contrary-nato-has-been-expanding-further-eastwards/ -p -n 30

Build a graph

python graph.py -l LINK

Query the DB for the links to a particular URL

DB/query_db.py -l LINK

How we get the info

We use the serapi api.
We download the results from one seed url
For each result
- We filter that is not in our blacklist
- We download its content with curl
- We store its datetime of search from the engine
- We store its datetime of our search
- We ask the links to this page and we loop.

Blacklist

We apply some blacklisting of webpages based on some criteria in order to know which pages should be considered good links.

Packages you need

See requirements.txt

Useful Info

https://towardsdatascience.com/current-google-search-packages-using-python-3-7-a-simple-tutorial-3606e459e0d4

Database structure

Table LINKS

Table links has the following fields:

link_id:
parent_id:
child_id:
date:
source: Twitter, Facebook, Webpage (this is any generic webpage), VK, Reddit, etc.
linktype:

Extraction of Features

The features that we are extracting from each graph are:

Time histograms Generate the features based on time
- For each level:
  - Calculate a histogram of how many links there are by:
    - In the next 48hs after the publication date of the main URL (2 days).
      - Compute the histogram of urls published by minute
    - After more than 48hs of the publication, for the next 120hs (5 days), that is from > 2nd day to <= 7th day.
      - Compute the histogram of urls published by hour
    - After more than 168hs of the publication, for the next 23 days, that is from > 7th day to <= 30th
      - Compute the histogram of urls published by day
      main URL 48hs hs and up to 120hs | By minute | By hour | By day \/ \/ \/
    Days: |----|-----|----|-----|-----|-----|-----|--*--|-----|-----|-----|-----|-----|-----|-----|-----|-----| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17...```
The number of urls published before the source
The total number of urls in each level

Name		Name	Last commit message	Last commit date
Latest commit History 420 Commits
DB		DB
Experimental Graph Concepts		Experimental Graph Concepts
features		features
graphs		graphs
models		models
tests		tests
.gitignore		.gitignore
CITATION.cff		CITATION.cff
Knowledge Graph Model.ipynb		Knowledge Graph Model.ipynb
README.md		README.md
create_features.py		create_features.py
dataset_downloader.py		dataset_downloader.py
distance.py		distance.py
graph.py		graph.py
model.pickle		model.pickle
normal_data.txt		normal_data.txt
propaganda_data.txt		propaganda_data.txt
query_db.py		query_db.py
requirements.txt		requirements.txt
serpapi_utils.py		serpapi_utils.py
twitter_api.py		twitter_api.py
url_classifier.py		url_classifier.py
urls.py		urls.py
utils.py		utils.py
vk_api.py		vk_api.py
web-network-mapper.py		web-network-mapper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Werge: Web Reversed Graph Extractor

Running script

How we get the info

Blacklist

Packages you need

Useful Info

Database structure

Table LINKS

Extraction of Features

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Werge: Web Reversed Graph Extractor

Running script

How we get the info

Blacklist

Packages you need

Useful Info

Database structure

Table LINKS

Extraction of Features

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages