I started working on this project on the 21st of February, 2022. I watched the speech of Vladimir Putin, I talked to my mom in Odesa and cried a lot, understanding that war we were discussing for months is about to start. And then I opened my laptop and started scraping the data. I knew I need to do something. I knew that being a Ukrainian NLP scientist working in European University positions me almost uniqly to research propaganda in terms of Computitional Linguistics, with full understanding of the contexts and its tactics and by publishing i can also raise public awareness of the war. After getting some trained models, which were laying under the dust on my laptop’s desktop for some time, I understood that I need to get them out there, let everyone use it, even if they won’t work perfectly, maybe it will teach someone a thing or two about Russian propaganda and how it works.
When I found out about Nika's project, I realized that I couldn't pass it up. Being in Germany, I understood, that I have to help in any way I can, so this project was and is a great opportunity to participate in the information war. This project gave me the opportunity to learn a lot and improve my existing knowledge. Even if this project is not perfect, we put our hearts and souls into it, and I personally hope that it will help other people understand what propaganda is and how it works.
Many European citizens become targets of the Kremlin propaganda campaigns, aiming to minimise public support for Ukraine, foster a climate of mistrust and disunity, and shape elections (Meister, 2022). To address this challenge, we developed “Check News in 1 Click”, the first NLP-empowered pro-Kremlin propaganda detection application available in 7 languages, which provides the lay user with feedback on their news, and explains manipulative linguistic features and keywords. We conducted a user study, analysed user entries and models’ behaviour paired with questionnaire answers, and investigated the advantages and disadvantages of the proposed interpretative solution.
We implement a binary classification using the following models for input vectors consisting of 41 handcrafted linguistic features and 116 keywords (normalized by the length of the text in tokens): decision tree, linear regression, support vector machine (SVM) and neural networks, using stratified 5‑fold cross-validation. For comparison with learned features, we extract embeddings using a multilingual BERT model and train a linear model using them.We performed 3 sets of experiments contrasting the handcrafted and learned features: