TriggerScrape is a python script to for example map out the cluster of Swedish sites containing highly anti-immigrant content.

It does this by the following procedure:

  1. Start at some entry point, with many outgoing links
  2. Collecting all outgoing links
  3. Randomly choosing a subsample of them and visiting them
  4. Looking at how many trigger words are found on those links
  5. Visiting them again by probability set by previous step
  6. If the percentage trigger words by the number of visited links is high – use that site as next starting point and restart at (1)

It looks something like this:


In the end it produces a list such as:

domain ratio triggered n_links 7.774193548387097 210 31 4.835680751173709 817 213 3.8 28 10 3.6484375 339 128 2.583333333333333 19 12 2.388888888888889 250 180 2.193548387096774 74 62 1.98 49 50

and if you give it enough time, it will map out the most of the sites in that cluster.

The script is build on top of the exellent Python library Grab, and can be found on my github if you are interested.