Emulating Twitter Users

12 min readJan 25, 2021

Abstract

The purpose of this project is to use natural language processing to create a Twitter bot that makes tweets modeling the linguistic models of a particular person as well as get them to speak on certain subject. The methods for this rely heavily on the python Natural Language Processing Tool kit as well as frequent usage of the twitter API. The tweets are created by analyzing someone’s speech patterns, building a loose syntactic skeleton made of chunks and filling in those chunks with the speakers on words or on a topic gleaned from popular twitter hash tags.

1. Introduction

Based in San Francisco, California and having been launched in July 2006 Twitter is considered to be one of the online social media giants. With over 319 million users sending out over 340million tweets a day an estimated 20 million are twitter bots. A twitter bot is program that automatically posts to twitter. They can tweet, re-tweet, follow other accounts and tag people. Some twitter bots are specifically used for mass advertising while others such as the infamous @DeepDrumpf which creates tweets in the style and image of Donald Trump as a form of entertainment. The goal of our project is to do something similar to the DeepDrumpf bot. It is to create a tweet bot that emulates a specific persons style of tweets and can produce believable tweets in that linguistic style on a specific subject as well. The tweets it will create will be based on the syntactic structure and semantic word choice that can be gleaned from the tweets and twitter account of that specific person. The vocabulary for the specific topics will come from some sentiment analysis. Successfully creating a working twitter bot that produces tweets in the style of someone else could have many uses in marketing and manipulating the media. Some of the things it could do is give a more realistic human experience to automated marketing and promoting as well as potentially allow celebrities with twitter to focus on other things while still maintaining an online presence that is based off of them.

2. Related Work

While no specific research papers have been written on the subject of creating person emulating tweet bots there have been plenty of papers written about the natural language processing of Twitter and what you can do with it. Below are two summaries of tangentially related works involving Twitter and natural language processing. Precise tweet classification and sentiment analysis: This report explores how to model sentiment of short form social media, more specifically twitter. They want to be able to synthesize information from little amounts of data and provide the maximum amount of context. Their specific example is a user searching diabetes on twitter, the user will find tweets with the keyword diabetes in them, but will not find tweets talking about the disease without that keyword. Their model hopes to aid in correcting this issue. They filter short texts based on the semantics within, and categorized accordingly. They first started by splitting their tweet data into several categories: username, tweetdate, status, tweetid and image. They also remove slang from their data as words like plz and goooood (please and good) can not be recognized by their tool Alchemy that they will use to figure out the users sentiment. This sentiment would be lost if the tweets were corrected. They trained and tested their model using 40 thousand tweets from different categories. Their initial pass through the Alchemy API yielded that 3874 were about diabetes. When they added to alchemy their model, the number jumped to 8638, because their model helps find additional information for categorization. Overall, they had a 0.1 55% increase in performance depending on the category. Searching the keyword diabetes had its results improved by 55%. Meaning they had 55% more tweets that were about a synonym to diabetes or supplemental information for people with diabetes than they did by a vanilla search for the keyword diabetes.

The DARPA Twitter BOT Challenge:

The DARPA Twitter bot challenge set six university teams up to locate certain influential twitter bots on the subject of being pro vaccination. To start they had to discern influence bots from other kinds of bots then divide those bots on which ones were about vaccination and then from the vaccination bots to which ones produced the specific sentiment they were looking for. Their data consisted of 7,038 users, 4,095,000 time stamped tweets and tuples of who followed or unfollowed who. Each team sent guesses to a webserver that would immediately tell them if they had guessed correctly or not. The top three teams found that machine learning processes weren’t good enough on their own due to not enough training data. To build a better training set they analyzed twitter syntax by looking at the average number of hash tags, whether the tweets were similar to the natural language generation program ELIZA, average number of re-tweets and whether or not the tweets were geo enables. They also looked at twitter semantics such as the number of posts related to vaccinations, positive or negative sentiment strength, most frequent topics tweeted about the user. Some of the methods they used to make their guesses were neurolinguistics programming tools like latent Dirichlet allocation for topic detection, AVA for sentiment analysis framework and OASYS an opinion analysis system to assign sentiment scores. One method they used to try and locate more bots is cosine similarity. They used cosine similarity to measure the distance and similarity between known bots and potential bots. USC got 39 out of 39 guesses correct and had 39 accuracy and in doing so was the leading team.

3. Data

To create training data to train our model on, we gathered about three thousand of the most recent tweets from singer Pitbull’s Twitter account. This involved accessing the Twitter API by using an already existing Twitter account and creating credentials to access the API to be used with NLTK. We also had to download Twython which is the library that the NLTK Twitter Package uses. To get our tweets for data we retrieved just the text of the tweet itself without user ids or dates. As part of our per-processing to get the tweets in a usable format, we stripped a lot of the excess information off of the tweets such as the hash tags not included in the actual tweet, links and other users mentions. We wanted to process the data as just the actual utterance without the extra information that isn’t syntactically relevant to the structure of the tweet. The motivation for doing this comes with the way we tag our data. We used Noahs Ark twitter tagger hosted by Carnegie Mellon as it specialized in twitter specific data and could accurately provide tags for words the standard Penn Tree bank tagger could not such as the hash tags that appear mid sentence. The hash tags and user mentions are stored separately from their tweets to be processed later. We use Noahs Ark twitter tagger on the text only tweets to produce one thousand tagged and tokenized tweets. The tagged tweets give us enough information to see what is happening on the word level only which would be good for simple bi-grams. sentences but isn’t enough information to be as accurate as we want. We considered using a full statistical parse and top down parse trees to generate sentences based on the syntax of each tweet but that would get us too much higher level information on just the syntactic structure and not enough information on the semantics that appear on the word level. So with the word level part of speech tags being not enough syntactic data and a full parse being too much we settled on using a chunker. We used the NLTK chunker which in turn uses the CoNLL 2000 corpus. The CoNLL 2000 corpus chunks tagged text into three categories, noun phrases(NP), verb phrases(VP) and prepositional phrase(PP) with all combinations of the three under a sentence(S). The CoNLL 2000 corpus also provided us with per-chunked training data to chunk our tagged tweets. The chunker chunks data into one of three phrases and also gives an IOB accuracy score, precision, recall and f-measure for the test data it accepts. Chunks provide more syntactic data then is available on the word level because they consists of small trees or phrases but do not include all the extra information that comes with a full syntactic parse. This makes them a better option to create a skeleton of sorts to model the linguistic patterns of a particular person without relying on the simpler and less accurate form of bi-grams. To get data for the topic we want our bot to tweet about we picked a trending hash tag with a large amount of tweets attached to it. To find the words that make up that topic we used several built in NLTK functions such as frequency distribution and the filtering of specific tagged words. These functions allow us to see which words occur most frequently as well as a better look at just the content words that represent the topic at hand. To create the proper frequency distribution we had to filter out all the stop words as well as filter for the appropriate tags we wanted. For the word replacement we only wanted to replace the content words which ended up being verbs, nouns, adjective and adverbs as well as their subcategories. We chose the WWE, World Wrestling Entertainment, as the focus of our topic. To get the data for this we accessed the Twitter API again to receive about a thousand tweets that contain the hash tag WWE to tokenize and sort.

4. System Description

4.1 Skeleton Generation

As previously stated, we use several public tools to operate our bot. The system overall uses untagged data, and attempts to tag the data itself to create its own corpus. After creating our tagged data set, we preform bigram statistical analysis on the chunked sentences. We find the probability of a particular pair of bigrams occurring. We denote this by the count of that bigram over the total number of bigrams. Since phrase combinations are a closed set, we do not preform smoothing, as we are able to enumerate the possibilities that will occur. We then split our probabilities into two sets: starting probabilities, the bigrams containing the start character, and everything else. This allows us to have a variable length in our resulting skeletons, since we elect to keep bigrams in the middle and end of the sentences together, probability will allow ends to ”bubble up” to the top of our lists.

Table 1: Phrase Bigrams with Counts

Bigram Count

(’S’, ’sVP’) 1504

(’NP’, ’PP’) 1002

(’PP’, ’NP’) 1838

(’S’, ’sNP’) 1336

(’HT’, ’HT)’) 1504

(’NP’, ’HT’) 836

(’VP’, ’PP’) 668

(’sNP’, ’PP’) 668

(’sVP’, ’NP’) 668

(’PP’, ’HT’) 668

We build our sentences using these to distributions, we pick the starting bigram with the largest probability to begin our sentence, remove it from the starting set and then create a filtered set from the middle bigrams that contain the end of our chosen start bigram. We take the max of that filtered set, and remove that max middle bigram if the filtered set it was pulled from had more than one entity in it. We keep going through this process until a bigram with the ending character is used. We perform this three times for each start bigram, since there are a finite number of possible starts from our results. This gives us 9 total skeletons. An interesting quirk of this process is that we do not get the exact same results every time. Because of how python accesses our lists, order isn’t guaranteed, therefore you can get different skeletons depending on what starting bigrams go first, since we’re removing possibilities from our data set when they’re consumed.

Table 2: Resulting Semi-populated skeletons

VB on NN to HT HT

VBG with NN for HT HT

VB in NNP by HT HT

NN from VB through USR USR

which NN of VB after USR USR

NNP after VBZ after USR USR

HT USR after USR

Once these skeletons are created, we have to re populate the phrases with words that Pitbull would use. To do this, we preform a similar process to above, on the phrase level. We create a list of possibilities for each phrase type, and create a frequency distribution for that phrase’s possibilities. We then look through our skeletons, replacing the phrases with dynamically built sentence pieces. However, to allow us to easily alter the topic we leave in certain part of speech, specifically nouns, verbs, adverbs, and adjectives. We qualitatively chose these parts of speech, as they are most often what is changed in speech to change speech to a certain topic. Once our skeletons have been partially populated on the word level, we move on to filling in the leftover ”blanks” using the search data.

4.2 Skeleton Population

To populate our semi-fleshed out skeletons, we first create a tagged data set like we have done previously. However, we pivot functionality before chunking. We aren’t concerned with the syntactic structure of the topic we’re talking about. We only care about keywords that skew speech towards that topic. We create frequency distributions for the possible nouns, verbs, and adjectives we see in our search data to figure out the most likely keywords to use. Once these distributions are generated, we iterate through our skeletons, filling in the POS blanks we left with these keywords, and consume the keyword from our data set when doing so. So just like when we generated skeletons, depending on who ”goes first” can give you different results even if your skeleton set is the same between runs.

5. Results

So after using the NLTK chunker to create our syntactic skeletons to try and create Pitbull’s linguistic style we ended up with three statistically accurate skeletons and flushed them out using bigrams to generate our own tweets. These new tweets, being tagged and chunked, became what we used as the training data for the NLTK chunker, replacing the much more skilled built in chunker.

Table 3: Results of our chunker being self-trained

IOB Accuracy 97.1%

Precision 87.8%

Recall 93.5%

F-Measure 90.6%

From NLTK’s built in evaluate function, we found that our resulting tweet skeletons were quite accurate in tagging our data. It was able to give a tag to 97% of our data, with only 12.2% of our data being falsely tagged, while 93.5% were correctly tagged with the correct POS tag. However, although our tweets are syntactically correct, coercing these skeletons into sensible english proved much more difficult. Because of our chunker’s high scores, we were confident in our ability to create Pitbull styled tweets about the WWE. While statistically accurate, the tweets produced unfortunately don’t always make a lot of sense semantically.

Table 4: Pitbull’s ”Tweets” about the WWE

be on wwe to wwe wrestlemania

returning with poster for raw sdlive

get in wrestlemania by mattel romanreigns

match from read through @ringsidec @wwehistory

which pre-order of announce after @jeffhardybrand @wwe

hardy after is after @wweshop @wwerollins

nxt @bellatwins after @catch lutte

wrestling @romanreigns568 after @sashabankswwe

wweshop @ajstylesorg after @fitpregnancy

However much of this can be chalked up to the data available, such as the fact that many celebrities like Pitbull use their Twitters to advertise where those advertisement tweets already sound robotic leading us to creating robotic tweets. This could potentially be fixed by supplying different kinds of data such as Facebook posts or transcribed conversations instead of relying just with what is found on someone’s Twitter account. Using just tweets pulled from twitter did yield some insight into Pitbull’s personal idiolect. This can be seen in the frequent usage of #dale within his tweets as well as the general verb phrases headings of his statements.

6. Conclusion

In short, emulating a user’s speech pattern is feasible on the syntactic level. Making semantically coherent messages from these syntactic skeletons is much harder. We had many challenges through this study. The twitter API was very limited in usability since we were requesting user level access to the API. Companies and actual programs are able to secure much more bandwidth than we are able to, because of this we were limited in how many tweets we could gather to train our bot. The tools we used were outdated. Although the hosting pages for the CMU Tagger and Twitter for Python package were frequently updated, the actual repositories for these tools had not been touched in several years. In fact, we had to completely overhaul the source of the CMU tagger to work with python3. The twitter python package had just incorrect documentation in some cases, leading us to dig through the source to figure out how to use the API. Another thing that could have helped make this project better is to consider the semantic dependencies between word choices nd the relationships they create. Ideally semantics and syntax work together to be coherent and logical. We focused more on creating an accurate syntactic representation of speech rather than a semantically similar one. to create a more lifelike and realistic bot both parts of language need to be equally considered.

References

Gimpel, Kevin, Nathan Schneidar, Brendan O’Connor, Dipanjan Das, Daniel Mills, Jacob Eisenstein, Michael Heilman, Dani Yogatama, Jeffrey Flannigan, and Noah Smith 2017. Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments., volume 1–6. Carnegie Mellon Univeristy, Pittsburgh, PA 15213 Bird, Steven, Ewan Klein, and Edward Loper 2009. Natural Language Processing with Python. 1st ed., volume 1. O’Reilly, 2009. Print. Subrahmanian, V.S. ; Azaria, Amos ; Durst, Skylar ; Kagan, Vadim; Galstyan, Aram ; Lerman, Kristina ; Zhu, Linhong ; Ferrara, Emilio ; Flammini, Alessandro ; Menczer, Filippo 2013. Precise tweet classification and sentiment analysis Computer, June 2016, Vol.49(6), pp.38–46 [Peer Reviewed Journal]. R. Batool, A. M. Khattak, J. Maqbool and S. Lee 2016. The DARPA Twitter Bot Challenge 2013 IEEE/ACIS 12th International Conference on Computer and Information Science (ICIS), Niigata, 2013, pp.461- 466. doi: 10.1109/ICIS.2013.6607883 URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=arnumber=6607883isnu

Written by Rachel Edwards and Erik Andersson