Mapping the Russian Internet Troll Network on Twitter using a Predictive Model

Sachith Dassanayaka; , Ori Swed; , Dimitri Volchenkov

Journal of Vibration Testing and System Dynamics

C. Steve Suh (editor), Pawel Olejnik (editor),

Xianguo Tuo (editor)

Mapping the Russian Internet Troll Network on Twitter using a Predictive Model

Journal of Vibration Testing and System Dynamics 7(2) (2023) 113--128 | DOI:10.5890/JVTSD.2023.06.001

Sachith Dassanayaka$^{1}$, Ori Swed$^{2}$, Dimitri Volchenkov$^{1}$

$^{1}$ Department of Mathematics and Statistics, Texas Tech University, Lubbock, Texas, 79409-1042, USA

$^{2}$ Department of Sociology, Anthropology, and Social Work, Texas Tech University, Lubbock, Texas, 79409-1012, USA

Download Full Text PDF

Abstract

Russian Internet Trolls use fake personas to spread disinformation through multiple social media streams. Given the increased frequency of this threat across social media platforms, understanding those operations is paramount in combating their influence. Building on existing scholarship on the inner functions within influence networks on social media, we suggest a new approach to map those types of operations. Using Twitter content identified as part of the Russian influence network, we created a predictive model to map the network operations. We classify accounts type based on their authenticity function for a sub-sample of accounts by introducing logical categories and training a predictive model to identify similar behavior patterns across the network. Our model attains 88\% prediction accuracy for the test set. Validation is done by comparing the similarities with the 3 million Russian troll tweets dataset. The result indicates a 90.7\% similarity between the two datasets. Furthermore, we compare our model predictions' on a Russian tweets dataset, and the results state that there is 90.5\% correspondence between the predictions and the actual categories. The prediction and validation results suggest that our predictive model can assist with mapping the actors in such networks.

References

[1]	Llewellyn, C., Cram, L., Hill, R.L., and Favero, A. (2019), For whom the bell trolls: Shifting troll behaviour in the Twitter Brexit debate. JCMS: Journal of Common Market Studies, 57(5), 1148-1164.

[2] Mueller III, R.S. (2019), Report On The Investigation Into Russian Interference In The 2016 Presidential Election, Volumes I \& II, (Redacted version of 4/18/2019).

[3]	Golovchenko, Y., Buntain, C., Eady, G., Brown, M.A., and Tucker, J.A. (2020), Cross-platform state propaganda: Russian trolls on Twitter and YouTube during the 2016 US presidential election, The International Journal of Press/Politics, 25(3), 357-389.

[4] Bey, M. (2018), Great powers in cyberspace: the strategic drivers behind US, Chinese and Russian competition, The Cyber Defense Review, 3(3), 31-36.

[5]

Bail, C.A., Guay, B., Maloney, E., Combs, A., Hillygus, D.S., Merhout, F., Freelon, D., and Volfovsky, A. (2020), Assessing the Russian Internet Research Agency's impact on the political attitudes and behaviors of American Twitter users in late 2017, Proceedings of the national academy of sciences, 117(1), 243-250.

[6]	Xia, Y., Lukito, J., Zhang, Y., Wells, C., Kim, S.J., and Tong, C. (2019), Disinformation, performed: self-presentation of a Russian IRA account on Twitter, Information, Communication $\&$ Society, 22(11), 1646-1664.

[7] Boatwright, B.C., Linvill, D.L., and Warren, P.L. (2018), Troll factories: The internet research agency and state-sponsored agenda building, Resource Centre on Media Freedom in Europe, 29.

[8]	Kim, D., Graham, T., Wan, Z., and Rizoiu, M. A. (2019), Tracking the digital traces of russian trolls: Distinguishing the roles and strategy of trolls on twitter. arXiv preprint arXiv:1901.05228.

[9] Lewinski, D. and Hasan, M.R. (2021), Russian Troll Account Classification with Twitter and Facebook Data, arXiv preprint arXiv:2101.05983.

[10]	Linvill, D.L., Boatwright, B.C., Grant, W.J., and Warren, P.L. (2019), ``The Russians Are Hacking My Brain!'' investigating Russia's internet research agency twitter tactics during the 2016 United States presidential campaign, Computers in Human Behavior, 99, 292-300.

[11] Atanasov, A., Morales, G.F., and Nakov, P. (2019), Predicting the role of political trolls in social media, arXiv preprint arXiv:1910.02001.
[12] Chun, S.A., Holowczak, R.D., Dharan, K., Wang, R., Basu, S., and Geller, J. (2019), Detecting Political Bias Trolls in Twitter Data, In WEBIST, 334-342.

[13]	Kim, D., Graham, T., Wan, Z., and Rizoiu, M.A. (2019), Tracking the digital traces of russian trolls: Distinguishing the roles and strategy of trolls on twitter, arXiv preprint arXiv:1901.05228.

[14] Francois, C., Nimmo, B., and Eib, C.S. (2019), The IRA copypasta campaign, Graphika, okt.
[15] Bird, S., Klein, E. and Loper, E. (2009), Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit, O'Reilly Media, Inc.
[16] Rahutomo, F., Kitasuka, T., and Aritsugi, M. (2012), Semantic cosine similarity, In The 7th International Student Conference on Advanced Science and Technology ICAST, 4(1), 1.

[17]	Li, B. and Han, L. (2013), Distance weighted cosine similarity measure for text classification, International conference on intelligent data engineering and automated learning, Springer, Berlin, Heidelberg, 611-618.

[18] Japkowicz, N. and Stephen, S. (2002), The class imbalance problem: A systematic study, Intelligent Data Analysis, 6(5), 429-449.
[19] Dupret, G. and Koda, M. (2001), Bootstrap re-sampling for unbalanced data in supervised learning, European Journal of Operation research, 134(1), 141-156.
[20] Liu, X., Wu, J., and Zhou, Z. (2009), Exploratory Undersampling for Class-Imbalance Learning, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39(2), 539-550.

[21]	Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., and Vanderplas, J. (2011), Scikit-learn: Machine learning in Python, The Journal of machine Learning research, 12, 2825-2830.

[22] Pope, P.T. and Webster, J.T. (1972), The use of an F-statistic in stepwise regression procedures, Technometrics, 14(2), 327-340.
[23] Hauke, J. and Kossowski, T. (2011), Comparison of values of Pearson's and Spearman's correlation coefficient on the same sets of data.
[24] Japkowicz, N. (2000), The class imbalance problem: significance and strategies, In Proceeding of the International Conference on Artificial Intelligence, 56.
[25] Abd Elrahman, S.M. and Abraham, A. (2013), A review of class imbalance problem, Journal of Network and Innovative Computing, (1), 332-340.
[26] Powers, D. (2011), Evaluation: From Precision, Recall and F-measure to ROC, Informedness, Markedness \& Correlation, Journal of Machine Learning Technologies, 2(22293981), 1.
[27] Opitz, J. and Burst, S. (2019), Macro f1 and macro f1, arXiv preprint arXiv:1911.03347.
[28] Breiman, L. (2001), Random forests, Machine learning, 45(1), 5-32.
[29] Murphy, K.P. (2006), Naive bayes classifiers, University of British Columbia, 18(60), 1-8.
[30] Peterson, L.E. (2009), K-nearest neighbor, Scholarpedia, 4(2), 1883.

[31]	Ng, A.Y. and Jordan, M.I. (2002), On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes, In Advances in neural information processing systems, 841-848.

[32] Li, X., Wang, L., and Sung, E. (2008), AdaBoost with SVM-based component classifiers, Engineering Applications of Artificial Intelligence, 21(5), 785-795.

[33]	Devetyarov, D. and Nouretdinov, I. (2010), Prediction with confidence based on a random forest classifier, In IFIP International Conference on Artificial Intelligence Applications and Innovations, Springer, Berlin, Heidelberg, 37-44.

[34] Raileanu, L.E. and Stoffel, K. (2004), Theoretical comparison between the Gini Index and Information Gain criteria, Annals of Mathematics and Artificial Intelligence, 41(1), 77-93.
[35] Rickard, S. and Fallon, M. (2004), The Gini index of speech, In Proceedings of the 38th Conference on Information Science and Systems (CISS'04).