The Global Disinformation Index (GDI) is a UK-based non-profit which aims to disrupt, defund and down-rank disinformation sites. We collectively work with governments, business and civil society. We operate on three core principles of neutrality, independence and transparency.
We automatically collect the content of articles from thousands of news websites. For a 2020 study of the UK media market, our researchers will assess 10 articles each from 30 news domains. Each of these articles will be sent to them as a text file with any mention of the publisher removed in order to avoid any bias toward or against the website from influencing their assessment.
We are able to collect articles and store them as text files, but these files often include mentions of the publisher. They also sometimes include text from other parts of the webpage like ads that need removed. Your job is to:
1. Remove any mention of the publisher from the article so the reader can’t tell where the article came from.
2. Visit the webpage where the article is published in order to ensure our software captured the full text of the article and the proper headline, author, and date.
3. Remove any text from the body of the file that doesn’t belong in the article or makes it harder to read (like ads, image captions, or "read more" sections).
4. Check that the contents of the file are an actual news article and not some other page on the website (i.e. the "About" page, a page hosting a video and very little text, the homepage etc.)
5. Check for cases where a paywall blocked our software from collecting the article.
We will provide a directory with 20 text files per domain, and you will be required to check, anonymize, and approve 10 per domain. If there aren't 10 usable articles for a domain, we will send you a second batch with another 20 article for those incomplete domains. You'll then save any changes you made to the files and send them back to us.