TriggerScrape is a python script to for example map out the cluster of Swedish sites containing highly anti-immigrant content.

It does this by the following procedure:

  1. Start at some entry point, with many outgoing links
  2. Collecting all outgoing links
  3. Randomly choosing a subsample of them and visiting them
  4. Looking at how many trigger words are found on those links
  5. Visiting them again by probability set by previous step
  6. If the percentage trigger words by the number of visited links is high – use that site as next starting point and restart at (1)

It looks something like this:

png

In the end it produces a list such as:

domain ratio triggered n_links
http://avpixlat.info 7.774193548387097 210 31
http://petterssonsblogg.se 4.835680751173709 817 213
http://gruvmor.wordpress.com 3.8 28 10
http://thoralfalfsson.webblogg.se 3.6484375 339 128
http://tobbesmedieblogg.blogspot.se 2.583333333333333 19 12
http://galnegunnarsblogg.wordpress.com 2.388888888888889 250 180
http://samnytt.se 2.193548387096774 74 62
http://imittsverige.blogspot.se 1.98 49 50

and if you give it enough time, it will map out the most of the sites in that cluster.

The script is build on top of the exellent Python library Grab, and can be found on my github if you are interested.