The rise of ‘social bots’ has reached a tipping point in 2016, with four of the largest technology behemoths, Microsoft, Amazon, Google and Facebook, putting their commercial and technological weight behind bot technology and announcing commercial products. This has been precipitated by the explosion of social media and has given these technologies an incredibly rich ground within which to learn, cause mischief and interact with their human users. The journey to this point has been long and tumultuous with these new technologies causing legal questions throughout their development. Parallel to this, the advent of hyper-scale, cloud-based systems and data analysis, combined with advanced artificial intelligence techniques means the potential for this convergent technology to shake the foundations of our privacy and even legal frameworks needs to be considered.

This paper will examine the history of these social bots, from their humble start as web crawlers and simple, action-orientated algorithms, to the more complicated self-learning bots that roam social networks such as Twitter and Facebook. Furthermore, we will look at the complications that arise from teaching bots from social data and explore whether the algorithmic filtering of content used to increase consumer engagement skews or biases a bot’s ability to interact meaningfully across a range of audiences. Finally, we will look at the legal ramifications of social bots as their technology matures by examining the effects of filter bubbles, data collection, social trust and whether bots can have intent.

Crawlers and Early Bots

Since the 1950s and Alan Turing’s infamous ‘Turing Test’, the concept of intelligent machines conversing with humans has driven academic artificial intelligence research. However, the journey from Turing’s original vision to modern social bots has been circuitous; from early web crawlers, automated financial algorithms, simple bots on social networks through to the latest, emerging social bots[1], these automated technologies have rubbed against legal grey areas spanning trespass, intellectual property, impersonation and data privacy.

As early as 1964, scientists were experimenting with conversational computer algorithms. Joseph Weizenbaum, a professor at the MIT Artificial Intelligence Laboratory, created ELIZA, a simple algorithm that used pattern matching techniques to mimic Rogerian psychotherapists. Despite ELIZA’s simplicity, it elicited an extremely strong response from some users much to Weizenbaum’s chagrin.

Algorithmic chatbots such as ELIZA evolved over time, with similar programs such as Parry, A.L.I.C.E. and ChatterBot providing users with entertainment. However, their limited domain knowledge and syntactic capabilities meant the technology was never widely deployed and it was many decades later before conversational bot technology would be commercialised.

Instead, the first successful bots were web crawlers; the technologies that create the vast indices of information required by search engines to allow billions of people to easily access the trillions of pages of content on the Internet. However, even these seemingly innocuous technologies created significant legal issues in their early years. Companies objected to third party crawlers aggregating and indexing their content. They alleged that these actions deprived them of advertising revenue, diluted their brands and allowed these third parties to pilfer their content (Plitch, 2002).

Early U.S. test cases used a legal doctrine dating to the Middle Ages known as ‘trespass to chattels’, originally conceived to protect physical property, to assert that bots that ignored ‘no trespass’ signs[2] were in violation of the law. In 1996, the Californian appellate court ruled in Thrifty-Tel, Inc. v. Bezenek that electrons were sufficiently physical and tangible to constitute a trespass to chattels and that the electronic signals of the hacking activity of two minors caused sufficient interference to real property (Thrifty-Tel’s servers). In 2000, several cases attempted to apply similar reasoning against web crawlers. In eBay, Inc. v. Bidder’s Edge, Inc. the Northern District of California held that potential harm of Bidder’s Edge (an auction aggregator site) bots were sufficiently substantial to meet the requirements for trespass to chattels. Conversely however, a Central District of California hearing of TicketMaster Corp. v., Inc. ruled that bots did not constitute trespass to chattels as “the taking of factual information from a public source was not a trespass” (Quilter, 2002). While many detractors were concerned about the newly-malleable nature of the trespass to chattels tort and the potential to stifle free speech (Quilter, 2002 and Fritsch, 2004), other technological advances such as web-based Application Programming Interfaces (API) allowed sites to interoperate and aggregate content in mutually beneficial manner and reduced subsequent lawsuits of this nature.

Other early automated bots were utilised in the financial markets. Automated, high-frequency trading (HFT) algorithms trade stock extremely quickly and were becoming more popular as real-time data became increasingly prevalent and the cost of zero-latency systems dropped. The faster the algorithm could trade on near-realtime information, the earlier it could make money on the data it had received. These bots are design to make millions of calculations to make small profits that in aggregate add up to phenomenal sums of money. By comparison, in the space of time it takes Usain Bolt to react to a starter pistol, an HFT system can execute 165,000 separate trades (Harford, 2012). While they were an exciting innovation within financial markets, their uncontrolled and automated nature quickly came under scrutiny.

On May 6th, 2010, the Dow Jones plunged over 9% within minutes, wiping over a trillion dollars from the market, its largest single day flash crash in history. After a five-year investigation, it became apparent that it was several HFT algorithms that were largely responsible. HFT algorithms were involved in buying and selling the same stock—known as a “hot potato”—causing a collapse in the price of certain stocks. Some HFT algorithms were designed to exit a market if they detected sudden volatility or market activity that indicated a new phenomenon not designed within the algorithm. As these HFTs exited the financial markets, the resultant cash crunch caused further pressure and market drops. The effect was stock prices such as Apple’s traded at $100,000 each (a 40,000% increase) whereas Accenture’s stock dropped from over $40 to less than $0.01 within the space of minutes, although the market normalised shortly thereafter.

HFT algorithms traditionally used newsfeeds from curated sources that provided around 20-50,000 news items a day to base their trades upon. However, as Twitter’s popularity soared, some companies experimented with utilising the Twitter ‘firehose’ (a real-time pipe of all tweets), increasing the data points to 400 million a day.

Tweets could be analysed and parsed for pertinent information, their content could be scored according to the number of retweets or the number of followers the account came from; this provided the HFT algorithms with a rich and real-time source of new data. However, as with every new technology, there came the potential for abuse, with many realising that they could influence the market by using networks of bots to discuss certain stocks.

Cynk Technologies epitomised this phenomenon. A social media company with zero assets, one employee and no revenue saw their stock skyrocket from $0.06 to more than $21 (a 21,000% increase), giving it a market valuation in excess of $5 billion. An orchestrated bot campaign on Twitter was designed to mention Cynk extensively, these tweets became self-reinforcing as other bots noticed the trend and started to retweet their content, further lending credence to the stock they were discussing. In concert, these trends were picked up by HFTs and other automated trading bots that sent the stock price soaring (Farooq, Khan and Khalid, 2014).

Similarly, on April 23, 2013, the Syrian Electronic Army hacked the Twitter account of the Associated Press (AP) and posted a tweet claiming that the White House had been subject to a terror attack and President Obama injured. It took minutes for the tweet to go viral, partly perpetuated by bots, and for $136.5 billion to be wiped off the value of the stock markets before recovering after the hoax was discovered.

In both these examples, it is patently obvious that crimes have been committed. The Cynk Technologies case study was a technologically elaborate ‘pump and dump’ scheme where microcap stocks are artificially inflated through misleading positive statements (“pump”), before the operator sells their overvalued shares (“dump”), sending the price tumbling once again but the scammer pocketing the profit. In May 2016, Gregg R. Mulholland pled guilty to money laundering conspiracy for fraudulently manipulating stocks, including Cynk (EDNY No.14-CR-476).

However, while manipulation of the stock market through HFT and Twitter bots can be destructive, bots that change people’s behaviour and opinions are subtler and potentially even more dangerous. Early social media bots generally disseminated misinformation through their own immature technology. It was the self-reinforcing nature of Twitter bots that was so destructive, as they automatically retweeted information without checking its veracity or credibility of the sources, which were often other Twitter bots. As such, deliberate misdirection, rumour and false accusations quickly gained viral traction due to this bot behaviour and the vicious cycle it produced. In addition to the Obama example, this phenomenon was seen during the Boston Marathon bombings, with one study finding that 29% of tweets in the 10 days after the attack were fake compared to only 20% containing true information (Gupta, Lamba and Kumaraguru, 2013).

More recent examples have shown how Twitter bots have been utilised to manipulate discourse by inflating support for political candidates, polarising discussion, artificially enlarging grassroots campaigns, distorting people’s follower counts or skewing social media analytics which are then adopted within TV ratings and market demographic studies (Ferrara, Varol, Davis, Meneczer and Flammini, 2016). As an example, Mexican news outlets are reticent to report on drug cartels in fear of violent reprisals and as such many citizens turn to Twitter. During the 2012 Mexican presidential elections, all sides used Twitter extensively and deployed bots to tweet messages and hashtags for both promotional and malicious purposes (Orcutt, 2012).

Other bots take a different approach by flooding streams of users with extraneous information. This tactic was deployed during the Arab Spring as the Syrian government tried to spam activists’ accounts to keep important coordination messages suppressed (Finger, 2015). Similarly, the US Central Command (Centcom, a branch of the US military that oversees US armed operations in the Middle East) awarded a contract in 2011 for the development of an “online persona management service” that allowed one controller to operate up to 10 false identities to create online conversations and crowd out unwelcome opinions. Centcom spokesman Commander Bill Speaks stated that the objective was to, “[support] classified blogging activities on foreign-language websites to enable Centcom to counter violent extremist and enemy propaganda outside the US” (Fielding and Cobain, 2011).

The legal ramifications of this automated ‘sock-puppetry’ are untested but Speaks stated that none of the interventions would be in English, as it would be unlawful to “address US audiences”, and that it would not target any US-based websites, in particular Facebook or Twitter. Similarly, within the UK, this sort of impersonation technology could fall foul of the Forgery and Counterfeiting Act 1981 (Fielding and Cobain, 2011).

These laws exist to enshrine the social trust that exists between individuals or between humans and corporations in order to facilitate the connections and transactions that occur between them. It is this social trust that is convoluted by bots as they become increasingly social and the informational asymmetries widen (Graeff, 2013).

This dynamism between trust, popularity and influence led four Italian researchers to deploy a bot in a small social network to understand how it would impact these factors. The bot was deployed within aNobii, a niche network for book lovers. Initially, the bot, named lajello, simply visited other members’ profiles every 15 days. This rudimentary action led lajello to become the second most popular member of the social network within 7 months. A second phase of the experiment introduced a machine-learning classifier into the bot’s repertoire, allowing it to recommend other users to follow based upon their book library and existing social links. The researchers discovered that lajello’s influence (or persuasive power) at least doubled, although this phase “outed” lajello as a bot and it was quickly shut down by the site’s system administrators (Aiello, Deplano, Schifanella and Ruffo, 2012).

Whether it is Weizenbaum’s secretary asking him to leave the room in order to converse privately with ELIZA in 1964 or lajello’s influential rise within aNobii fifty years later, it is evident that even the most basic conversation elicits a disproportionate feeling of trust and persuasive power within humans, making the examination of truly social bots all the more important.

Social Bots

Early bots were generally undesirable on social media networks; the purveyors of spam, scams, political malfeasance and caused significant commercial concern to the social networks who relied on advertising dollars as well as the wider economic damage they caused when their activities affected market trading. However, modern “social bots” are a considerably more mature technology, with sufficient intelligence to converse and even act upon user requests.

The largest technological advancements leading to true social bots have come from the mutually reciprocal nature of social media and machine-learning: artificial intelligence generally requires the large corpus of human interactions to learn and build its own models of response generation that social media networks provide in huge volumes. Twitter, in particular, provides a rich source of data as its self-imposed 140-character limit ensures tweets are syntactically simple and heavily conversational in nature.

For example, recent research (Ritter, Cherry and Dolan, 2011) has looked to utilise Statistical Machine Translation (SMT) technologies to build strong structural relationships between different parties in a conversation. For example:

Person 1: I’m slowly making this soup…and it smells gorgeous!

Person 2: I’ll bet it looks delicious too! Haha!

Here, “it” refers to “the soup” in both Person 1 and Person 2’s contexts. Furthermore, it provides additional background to the adjectives and verbs within both sentences. This earlier research required human intervention to train and score the statistical models on their effectiveness and did not issue responses that were sensitive to the context of the conversation. Later research (Sordoni, Galley, Auli, Brockett, Ji, Mitchell, Nie, Gao and Dolan, 2014) refined this method further by utilising Recurrent Neural Network Language Models (RLMs).

While the technicalities are well beyond the scope of this paper, it is worth nothing that Sordoni et al. (2014) highlighted that, “[their] architecture is completely data-driven and can easily be trained end-to-end using unstructured data without requiring human annotation, scripting, or automatic parsing” (emphasis authors’). The ramifications of such technology are simple: social bots can now be trained to converse in an increasingly realistic manner without any human intervention.

Unique Legal Concerns of Social bots

With these new technologies in mind, it is worth considering the unique challenges that social bots provide particularly in the legal arena of privacy.

As the distinctions between software and the rules and regulations that govern the physical world become increasingly porous, the jurisprudence spans the technical, the philosophical, the social and legal sociocultural aspects of social bots.

In 1999, Lawrence Lessig coined the phrase Code as Law, arguing that software architecture would “serve as an instrument of social control on those that use it” (Graeff, 2013). This approach has several shortcomings however. Graeff (2013) argues that this approach is “too reductionist and deterministic—failing to account for the social embeddedness of technologies. Social bots are not coded to invade privacy or not…”. Furthermore, this approach ignores technologies such as RLM, where a machine learning algorithm has created its own statistical models independently of human intervention. Finally, Lessig himself cited concerns of Code of Law favouring de facto laws over de jure rights, such as privacy (ibid).

For example, how would data be protected via data protection legislation within a social bot framework? Information is currently controlled as discrete data points, such as medical backgrounds or personally-identifiable information (PII). For example, within the EU, online data is regulated through ‘informed consent’ specifically, via the ePrivacy Directive Article 5(3) and within the Privacy and Electronic Communications Regulations in the UK. 

If social bots are extracting information for entry into a connected system, then this would be covered by existing legislation. Imagine the following conversation:

Person 1: Hello! I have a date tonight! Can you recommend a good French restaurant in Kingston-upon-Thames?

Social bot: Of course, what price range and time are you looking for? Are there any dietary requirements?

Person 1: Somewhere between £100-200 for the both of us is fine, around 8pm? She’s vegetarian but says fish is ok too.

Social bot: Thanks! I have found Restaurant X. It is rated 4.2/5 stars, has a romantic atmosphere, a great wine selection and caters well for vegetarians and fish lovers. Would you like me to make a booking for you?

Person 1: Yeah, sounds good!

Social bot: Can I have your name and phone number please.

Person 1: It’s “Person 1” and my phone number is xxx-xxxx-xxxx.

Social bot: A booking has been made in your name for 7:30pm.

An artificial intelligence algorithm extracting the salient information (underlined above) would be feeding this data into a restaurant reviews web service to provide recommendations. Subsequently, it would then use the user’s PII to make the restaurant booking using a separate web service. The extraction and potential storage of this information would be regulated by the Data Protection Act 1998 in the UK, or the corresponding implementation of the Data Protection Directive elsewhere in the EU.

Similarly, a social bot could ask a user explicitly for consent before using cookies. Beneficially, users could interact with an appropriately trained social bot to gain greater understanding of the privacy implications. Imagine the above conversation continued:

Social bot: If you want to change the reservation, you can come back and talk to me. Are you ok me putting a cookie on your device to make it simpler?

Person 1: Sure, but can you tell me your privacy policy first?

Social bot: Of course. Cookies are small bits of code that sit securely in your browser. We only keep the cookie on your computer for 14 days to help you with your reservation. No personal information is stored on it and it will be automatically deleted after that time.

Person 1: Ok, great.

In existing, complex socials network such as Facebook, social bots can pull on a huge amount of additional data to provide users with a more tailored experience:

Social bot: I have a great recommendation for a French restaurant there. In fact, Dan went there last week and rated it 5/5, although he struggled to find parking at that time of night.

By combining technologies such as big data mining and advanced language generation algorithms such as RLMs, social bots can provide users with a hitherto unforeseen level of personalised interaction, information and capability in a seamless fashion. Graeff (2013) warns that this level of personalisation could lead to a vicious cycle, “wherein your, and your friends’, data are used to produce more intense cases of simulated familiarity, empathy and intimacy, leading to greater data revelations.”

This simulated empathy could give rise to users unknowingly divulging increasingly private information:

Person 1: Hello! I’m celebrating tonight! Can you recommend a good French restaurant in Kingston-upon-Thames?

Social bot: Hey “Person 1”, good to speak to you again – how are you? Are you celebrating anything special? J

Person 1: I’m great, thanks! Celebrating 100 days cancer-free!

This convergence of simulated empathy and the resultant social trust is a potent and volatile ramification as social bots become anthropomorphic. More worryingly, to effectively simulate empathy, social bots must learn to speak and relate to its interlocutors by learning their language cadence, aspirations and even political views. Social networks seemingly provide a perfect environment to do this – however, in an ironic twist, native machine learning algorithms within these platforms may be hampering efforts to create balanced social bots that can interact with a large percentage of the population.

Breaking the Filter Bubble

In 2009, on a relatively obscure corporate blog, Google announced that it was changing how it surfaced search results. Previously, Google’s crawler bots would trawl the Internet for information before indexing it using its revolutionary PageRank system and if two different people searched for the same term, it would yield the same results. This was no longer to be the case.

Google announced it would use a series of ‘signals’ from each user, such as their location or language, to better determine the relevancy of their searches. Prior to this, Google had also expanded its repertoire of services into other areas such as e-mail, groups, news aggregation and blogging. All these services required a user to login and provide additional information. If a user was logged in when they executed a search, additional content such as previous search terms and demographic data were also included within these signals. While privacy advocates rightly balked at the mass collection of individuals’ data, many heralded it as a step-change in increasing relevancy as the incumbent technologies’ ability to cope with the trillions of pages the Internet had grew into had diminished.

This strategic move marked a more tectonic shift within online services toward hyper-personalisation­, services looked to cater to fickle audiences to engage them for longer and therefore serve more and progressively relevant advertising to them. Services quickly monetised these personalisation algorithms effectively with Amazon and Netflix reporting 35% of sales and 75% of content watched respectively was from recommendation systems (Nguyen et al., 2014). Users enjoyed this metaphorical return to a Ptolemaic universe, in which all content and information revolved around them in ever-tightening orbits as they handed over greater quantities of personal data points.

Not everyone was enamoured with this personalisation trend. Eli Pariser was a vocal opponent of what he termed the ‘filter bubble’, described as “a self-reinforcing pattern of narrowing exposure that reduces user creativity, learning and connection.” (Nguyen et al., 2014). Pariser argued that humans’ ability to synthesise and simplify new information is the root of our intelligence and that recommender systems trap users in environments of their own creation. While Amazon and Netflix recommendations are far from rupturing our social cohesion, the introduction of similar algorithmic filters into social network platforms such as Facebook and Twitter may have the potential to.

Facebook experimented with algorithmically filtered data in 2009 with the introduction of a very innocuous feature: the ‘Like’ button. Users ‘liked’ posts, photos and other content that they wanted to feedback positively on (Kincaid, 2009). This suddenly gave Facebook a huge amount of data on the relative popularity of content. As Facebook’s influence soared, it shifted its newsfeed from a chronological to an algorithmically filtered list with a content’s likes and comments judging its relevancy being the primary data signal. Facebook soon matured the technology to interpolate engagement with external links, users’ friends’ behaviours as well as the brands and external websites consumers engaged with to further hone the content.

This is where the ‘relevancy paradox’ arises: to increase relevancy, filters need large corpuses of data to learn from and as these datasets get larger, increasingly sophisticated filters are required. However, the term ‘relevancy’ is subjective: what is relevant to consumers may not be relevant to the platform within which they are consuming the content. Facebook aims to increase user engagement, prolonging the length of time they spend on the site and therefore commanding a larger proportion of digital advertising revenue. Relevancy for Facebook may purely be content that captures users’ attention and this isn’t necessarily a positive:

Our bodies are programmed to consume fat and sugars because they’re rare in nature…In the same way, we’re biologically programmed to be attentive to things that stimulate: content that is gross, violent or sexual and that gossip which is humiliating, embarrassing or offensive. If we’re not careful, we’re going to develop the psychological equivalent of obesity. We’ll find ourselves consuming content that is least beneficial for ourselves or society as a whole. (danah boyd cited in Pariser, 2012)

Not only can the filter bubble lead to an “intellectual obesity”, it may be having wider sociological effects. Borgesius et al. (2016) expressed concern that “in a democratic society people need to come across opinions that differ from their own opinions, to develop themselves fully. Otherwise, people might enter a spiral of attitudinal reinforcement and drift towards more extreme viewpoints.” Furthermore, as a non-traditional gatekeeper of information, social networks are under none of the regulatory requirements traditional media must follow to enable a balanced and diverse discourse on public policy. Finally, all this gatekeeping happens via vastly complex and entirely opaque algorithms while transparency of information remains another key tenet of the digital democracy.

Social bots are not immune to these filter bubbles either. In March 2016, Microsoft Research released an artificial intelligence bot called “Tay”, aimed at exploring the use of social bots for entertainment purposes on Twitter. Within hours, Tay had descended from a naïve teenage persona, into a holocaust-denying and racist PR nightmare. The filter bubble was less to do with algorithmic filtering of content but the demographic of the particular social network, so while Microsoft blamed the disaster on “a coordinated attacked by a subset of people [exploiting] a vulnerability in Tay” (Lee, 2016), it was evident that Microsoft had underestimated the often caustic nature of social networks. The same algorithms that allowed Tay to dynamically learn the nuances of human language were used against it to produce incredibly offensive behaviours.

More generally, a social bot trained within a filter bubble could be inadvertently biased to a particular social stratum, geopolitical view, educational standard or a whole plethora of cultural niches. If a social bot has a requirement to interact with a wide variety of users, it is important that it is taught from a breadth of sources. However, this problem will most likely be mitigated over time as social bots become more tightly embedded within the platforms themselves. Integrated social bots will be trained on full breadth of data available to them, rather than being trained externally by simulating and authenticating as a human user (for example, the bot’s developer) and put under the filter bubble constraint.

Furthermore, as this technology becomes more sophisticated, social media will be less about learning the syntactic elements of language but the semantic; how are social bots expected to learn without the extensive domain knowledge that their human counterparts bring to the conversation?

We have seen how early technologies such as web crawlers have quickly evolved to become interconnected and complicated social bots, capable of learning nuanced and conversational language and are less reminiscent of their malevolent spam and scam-based progenitors. We have also seen how filter bubbles on social media networks hamper users and social bots alike. In the second half of this paper, we will look at how future users, social bots, filter bubbles and privacy will interact. Are social bots a hindrance or a help within social networks’ filter bubbles? Are they in danger of creating their own idiosyncratic bubbles? Do we need a more holistic view on privacy as these technologies mature?

Can Social Bots Break the Bubble?

In May 2016, Facebook was accused of suppressing conservative news items from their ‘Trending Topics’ feature. This feature was a highly-prominent and human-curated section of every users’ feed that showcased trending news items occurring around the globe. It was alleged there was a strong liberal bias and the section was curated in an entirely subjective manner, rather than the automated fashion that many believed to be the case. While Facebook denied any deliberate prejudice, they introduced new political bias sections within their employee training (King, 2016). Combined with users’ own organic filter bubbles, this level of unconscious bias served as a potent reminder of the dangers powerful social networks can hold. Could social bots alleviate these concerns?

Hugh Hancock, a virtual reality pioneer, developed a Twitter bot aimed at adding “social ‘white noise’ that…reflects all viewpoints rather than just a local echo chamber.” His bot simply chooses a news headline at random, obfuscates the link and tweets it (Hancock, 2015). Entirely random content may not be the best way to break the bubble if the content that is surfaced is so irrelevant it is consistently overlooked by the consumer. Instead, it is feasible to imagine a more sophisticated social bot on Facebook that automatically curates and shares content from a variety of news sources, with Facebook’s EdgeRank algorithm ensuring those articles are not suppressed by engagement indicators (or lack thereof).

Social Bots and Data Bubbles

One final point that must be discussed as social bots gain increasing complexity is whether they are in danger of creating their own unique bubbles. The human brain is an immensely powerful pattern recogniser but hyper-scale, cloud-based platforms have enabled systems capable of detecting and correlating data points in a way humans simply cannot—so called, “big data”.

While data aggregation can highlight new and emerging trends, it can also reinforce historical attitudes. Google contains an immense data set as well as the very complex, personalised advertising engine and has been subject to numerous controversies: search results for ‘beautiful skin’ returns images of mostly white women (Sanghani, 2015), searching for names associated with the black community serves ads for arrest records (Bray, 2013) or women being six times less likely to be served an ad for ‘$200k+’ executive jobs (Gibbs, 2015 and Datta, Tschanz and Datta, 2015).

Of course, Google’s algorithms have no innate racist or sexist tendencies, but instead surface latent societal attitudes. The ‘beautiful skin’ results are an aggregation of millions of highly-ranked websites containing those keywords in conjunction with an image of a white woman however the lack of transparency on how this clustering happens also leads academic studies on the racial or gender bias to be similarly vague on the correlation between the hundreds of data points and a seemingly discriminatory result. Datta et al. stated that we should “remain concerned if the cause of the discrimination was an algorithm ran by Google and/or the advertiser automatically determined that males are more likely than females to click on the ads in question. The amoral status of an algorithm does not negate its effect on society” (Datta et al., 2015, emphasis added).

Translate these effects into the realm of social bot conversations and the results could be disastrous. Imagine a scenario whereby a social bot is running on LinkedIn, a business-orientated social network. Recruiters might interact with the bot in order to whittle down thousands of applications for popular positions to a more manageable selection. The social bot uses a machine-learning classifier to understand the more nuanced aspects of the company’s hiring policy by reviewing their hire and no-hire decisions as candidates are proposed. Any social bot designed in this manner would likely be taught to disregard photographs, age, sex, racial and religious data points when building the data models to avoid explicit discrimination. But would this avoid all discrimination?

Recruiter: Could you reduce the applicants for this position down to the best 5 please? We have had our budget cut slightly here too, so may not be able to afford the most experienced candidates.

Social bot: Of course. Out of the 73 applicants, I would recommend Abbie, Barbara, Charlotte, Deborah and Evelyn.

Unbeknownst to the recruiter, the pattern matching algorithm has discovered a trend that experienced applicants with particular job backgrounds who have been out of work for 9-18 months are liable to take a job at a below-market salary. It has discovered a latent trend regarding the post-maternity job market without ever knowing the sex of the candidate. While hypothetical, this scenario highlights the dangers caused by data bubbles.

The fundamental issue with these data bubbles is that social bots lack the domain knowledge and ‘self-awareness’ to distinguish between emergent trends or latent knowledge within these vast corpora and bubbles that may be more indicative of historical or undesirable trends that should be ignored or even counteracted (such as the maternity example above). It seems that human intervention will be required to negate the potentially damaging nature of data bubbles. Yet this in turn leads to issues regarding designer-led technology.

An important, emergent facet of self-learning systems are that they democratise technology. Previously, much of technology history has been littered with examples of inherent biases from their creators—a prime example being Kodak’s film coatings designed to favour lighter skin until the 1980s (Roth, 2009). Self-learning systems that learn from social media are effectively designed by everyone, or at least everyone who uses a particular social media network. However, as we have seen with Tay, the often acerbic nature of social networks means that social bots can easily descend into the offensive (and potentially illegal). So herein lies another paradox. Humans have created increasingly complex social bots designed to filter through the immense quantities of content they create; however, these same bots lack the human ability for amassing domain context (or ‘common-sense’) and require human intervention to regulate their language-learning and data bubble-induced inadequacies.


This paradox of self-learning but human-regulated social bots is giving rise to the first legal ramifications of this technology: liability. In 2015, Jeffry van der Goot, a Dutch programmer, created a Twitter bot that generated random tweets based upon his own tweet history. Shortly after it went live, the bot tweeted “I seriously want to kill people” and van der Goot was brought in for questioning by the police (Hern, 2015). In 2014, a bot known as the “Random Darknet Shopper” was created by a group of Swiss artists to spend $100 per week on the ‘darknet’ market. By the time authorities got involved, it had purchased numerous illicit items such as 10 ecstasy pills, a false Hungarian passport and a fake Louis Vuitton handbag (Grant, 2015). Similarly, in March 2016, Microsoft’s Tay tweeted numerous statements that could be interpreted as hate-crimes as well as denying the Holocaust, a crime in 14 European countries (Price, 2016).

Van der Goot was asked to shut down the bot and complied. Authorities seized the illegal goods purchased by the Darknet Shopper but did not file criminal charges against the artists as, despite taking responsibility for the bot’s actions, the authorities accepted it was within the realm of art (Grant, 2015). The police did not investigate any potential crimes that Tay may have committed. While van der Goot designed the bot with no criminal intent, the Swiss artists will have created theirs knowing the likelihood of the bot committing a criminal act. Tay is an example of a social bot not explicitly designed but instead learning in realtime from what others were saying to her.

In the US and UK, criminal laws are often dependent on a certain mental state (or mens rea) in the accused. So a bot explicitly designed to do harm or commit a crime would find their creators largely liable for its actions. However, how does this extend to bots that are not explicitly designed but learn from the actions of others? In Tay’s example, is it seen as the online social equivalent of ‘from the mouth of babes’? While generic threats of physical abuse, such as the one tweeted by van der Goot’s bot, are generally harmless, other social bots could produce constant and directed verbal abuse.[3]

The purgatory state of modern social bots has them sufficiently advanced to learn without human intervention but not sufficiently advanced to have the self-awareness or domain knowledge to recognise their own liabilities. In the same way that autonomous car manufacturers are accepting liability for their vehicles’ actions, bot creators may be held liable for the actions of their creations. While autonomous vehicles are grounded in the physical world, bots (currently) inhabit the digital—bringing with it a raft of complications around legal jurisdiction and data privacy.

Autonomous vehicles will be bound by the transportation laws of the nation and state they are operating within. Internet jurisdiction is a considerably more muddled field. This was illustrated in 2013, when French courts ruled Twitter had to hand over personal details of users tweeting hate speech. Twitter argued that its servers were situated in the US where, unlike France, hate speech is legal. While Twitter eventually relented, it illustrates how easy it is for jurisdictional issues to occur. The idiosyncrasies of jurisdiction in cyberspace are well beyond the scope of this paper and are certainly not limited to social bots, but social bots do add further jurisdictional complications dependent on the physical location of the bot’s creator, as well as their ability to interact globally and in many languages simultaneously. For example, the aforementioned US military sock puppetry programme deemed it was unlawful to operate the bots in English, thereby addressing US audiences.

As we have already noted, data privacy and social bots are more intrinsically linked. From information that is gathered during conversations and used to execute transactions (flight bookings, online shopping, recommendation engines etc.), to the content that is used to refine and train the language models, to the PII and data social bots may extract from conversations and social connections to provide a more personalised experience, there remains a raft of legislation aimed at protecting the gathering, storage and dissemination of this information.

What is not explicitly provisioned for includes social trust and the en masse proliferation of social bots that may require a radically different interpretation of privacy in a modern, digital world.

Social Trust and Data Privacy

As social bots becoming increasingly anthropomorphic, any interlocutors must be made aware they are talking to a social bot and that information may be used in the plethora of ways highlighted above. As a template, the UK Regulation of Investigatory Powers Act (RIPA) governs the recording of telephone calls and explicitly states that when recordings are made available to a third party, the company must inform the other party that the call will be recorded and how the recording will be used. The Data Protection Act 1998 further stipulates provisions for storing the data. Enforcing social bots to declare their automated nature, as well as their data protection policies could provide consumers with sufficient information for informed consent.

Graeff (2013) counters this by arguing that the methods by which most web services inform online users is via discreet, publicly-posted privacy policies, click-through Terms of Service agreements or passively agreeing by continued interaction with the service. He warns that “a platform’s porous privacy policy might extend implicit consent to cover data collection by third-party social bots acting on it—and in so doing indemnify the platform from negligence…Combining these existing forms of information asymmetry with the ‘invisible’ quality of anthropomorphic interfaces significantly compromises the ability of users to be sufficiently informed about how and when their data is being stored and used.”

He explores two potential solutions in the form of Privacy by Design and the Do Not Track movement but discounts them as incomplete, as they ignore the context within which potential privacy violations may occur. Instead, Graeff proposes a grander scheme, asserting that, “[c]ompliance with a legal right to privacy is not just about the technicalities of how data is collected and used but about social processes that value informed consent.” He argues that the current status quo of customer relationship management should be reversed and vendor relationship management (VRM) should become the norm; whereby users have control over their personal data, including the right to correct data on record and restrict the use of the data at any time:

The empowerment of the user over what data is stored and how it is used, with the ability to edit data about you, would be highly advantageous in addressing the problems arising from social bot conversations. Social conversations are fundamentally different from the types of communication that happen when inputting data into forms. Sarcasm, wordplay, and colloquial grammar are hard to discern to non-native speakers of a language, let alone artificial intelligence. With Do Not Track standards based on VRM, users could correct the mistakes on their own, otherwise permanent, online record. Next, the ability to later request that collected data not be used or be destroyed would allow social bot creators to collect data in cases where informed consent is unclear, but later give control over any stored personal data to the user.

A grandiose goal although any successful implementation would be hampered by fundamental differences in how privacy is treated between US and EU jurisdictions. In the EU, data protection and privacy legislation is enshrined within a fundamental right to privacy. No such right exists within the US; courts instead relying on the legal test of a reasonable expectation of privacy (Graeff, 2013).

What seems grandiose now may soon become a necessity as social bots evolve in complexity and robotics advances further. A sophisticated social bot embodied within a bipedal robot, complete with facial and voice recognition and the ability to interact with non-Internet users such as children, medical patients, the elderly and the disabled means that a new paradigm in privacy legislation will be needed. The current Internet of Things (IoT) trend is a precursor to this—how are consumers able to make an informed decision on the data and privacy implications of smart light bulbs, fitness monitors, smoke alarms, thermostats, fridges, washing machines and the plethora of Internet-connected smart devices?

A prime example is the US Health Insurance Portability and Accountability Act (HIPAA). HIPAA has been used to regulate the collection and storage of sensitive medical information of US citizens and includes provisions to ensure data is appropriately de-identified. Crucially however, HIPAA regulations only apply to specific entities such as health care providers and insurance companies (FTC, 2015). The rapid explosion of mobile devices and, more recently, IoT-enabled wearables that generate much of the same medical data fall outside its scope contrary to consumer expectations.

To mitigate against this in Europe, the European Commission launched the Alliance for the Internet of Things Innovation (AIOTI) in 2015 in an attempt to foster growth in the market while promoting a pan-European approach to the unique security and privacy pitfalls that it may bring (European Commission, 2016). For example, the European Commission Joint Research Centre has recently published two papers detailing a proposed agent-based design for informed consent in IoT, “where access to personal data is regulated through usage control policies, which can be tailored for the specific features of the user and the context.” (Neisse, Baldini, Steri, Miyake, Kiyomoto and Biswas, 2015 and Neisse, Baldini, Steri and Mahieu, 2016).

Any legislation that is borne from regulating informed consent around the collection of consumer data from IoT-enabled devices will almost definitely impact upon similar data collected via social bots, whether they are embodied within robotics or are purely digital.

While this approach goes some way to address concerns around true informed consent in a modern, complex digital world, it does little to mitigate the primary concern around data ownership. Both traditional customer and vendor relationship management views of privacy imply datasets of personal information that need to be regulated and audited. There is a growing trend around the disintermediation of industries via technologies such as blockchain. Blockchain is the underlying technology behind Bitcoin, the first widely-recognised cryptocurrency. Blockchain is a much broader technology, allowing any transaction or similar electronic contract to be encoded and sent between different parties in a trusted and publicly auditable fashion, without a plethora of personal information being stored in an opaque manner by any party. Blockchain-based technologies also distribute their data across their networks, allowing anyone to verify and audit the content. This same technology could be deployed to decentralise privacy and protect personal data, allowing users ex post facto control over their information. A blockchain-based privacy mechanism would also benefit the industry, as the research by Zyskind, Nathan and Pentland (2015) concluded (emphasis added):

Users are not required to trust any third-party and are always aware of the data that is being collected about them and how it is used. In addition, the blockchain recognizes the users as the owners of their personal data. Companies, in turn, can focus on utilizing data without being overly concerned about properly securing and compartmentalizing them.

Furthermore, with a decentralized platform, making legal and regulatory decisions about collecting, storing and sharing sensitive data should be simpler. Moreover, laws and regulations could be programmed into the blockchain itself, so that they are enforced automatically. In other situations, the ledger can act as legal evidence for accessing (or storing) data, since it is (computationally) tamper-proof.

Blockchain remains an incredibly nascent field and its applicability to areas beyond cryptocurrency is exciting. New research is even showing how blockchain technologies can be applied to data-intensive fields such as the recommender systems that early filtering algorithms and modern social bots are so reliant on (Frey, R., Wörner, D. and Ilic, A., 2016).

Regulating Social Bots

Despite these promising advances in technology, it is evident that the legal ramifications of social bots are a small part of a larger discussion around data privacy in the modern age. Millions of users will interact with social bots, billions of internet-connected devices will feed data into vast cloud-based systems and social media networks will gather trillions of data points on the vagaries of their users.


Filter bubbles and their effect on netizens’ ability to actively participate in the democratic process should be of eminent concern to regulators globally. The regulation of press, broadcasting and other forms of content dissemination such as user-generated content (Facebook, YouTube, blogs etc.) remain a contentious issue particularly amongst free speech advocates.

Online trends have seen a rapid rise in the popularity of opinion pieces, activist content, and services such as Vice, Buzzfeed and certain YouTube channels. A social media presence allows individual journalists to rise above their organisation and become the key driver of trust, engagement and loyalty. Despite this argument from commentators, objective reporting and impartial news remains the firm favourite with consumers (Sambrook, 2014). So while social media focus remains on increasing engagement and maximising the impact of content, consumers still hold impartial content in higher regard. As our technological maturity grows, consumers will expect social media to provide quality over impact, while social media networks will be under continued pressure to increase engagement and impact to maximise the monetisation of its user base—for example, in August 2016, Facebook announced it had developed technologies to automatically suppress ‘clickbait’ news articles (Peysakhovich and Hendrix, 2016).

Social bots and the machine learning algorithms that underpin these sites could provide an effective method of ensuring quality content, giving users access to impartial (or even contradictory) content without necessarily popping the bubble that filters the content within online users’ feeds. A social bot would provide a window within which content could be shown, increasing the serendipitous discovery that Pariser states is so important to the democratic process.

Implementation of such social bots must be cognisant of the ‘cobra effect’—the unintended consequences of a solution making the original problem worse.[4]

Any algorithm designed to create an ‘anti-bubble’ may be subject to the same restrictions; namely, the data bubble and only surfacing counter-pieces rather than true serendipitous content. For example, Ofcom research showed that highly educated consumers were nearly twice as likely to prefer impartial news to that of those with limited or no education (Sambrook, 2014). Would an anti-bubble social bot be more liable to display news items with impartiality or a strong contradictory view based upon such education trends? The risk remains that anti-bubble algorithms simply create recursive bubbles and further polarise content.

Given the volatile mix of varying press regulations, cultural and socio-economic differences and a need to enable technological innovation, creating any form of regulation requiring social media networks to provide impartiality and breaking their filter bubbles would be counterintuitive. Consumers, pressure groups and regulators should apply pressure on technology providers to transparently develop such solutions and subsequently self-regulate. This is analogous to the ‘transparency reporting’ many technology providers voluntarily release to educate consumers on the number of government and law enforcement requests for information they receive each year.

Mitigating the other legal issues that social bots will proliferate will require a different, more legislative approach. The issues can be broadly summarised as social trust, data collection and intentionality.

Social Trust

As social bots rise out of the uncanny valley of artificial intelligence, there is a stronger chance they will carry greater elements of social trust with whomever they interact with. It will become extremely important that any interlocutors are acutely aware they are talking to a social bot, in order to make an informed decision on their subsequent conversation, as well as understanding how their data may be used.

We have already seen how easy social trust is created in a conversational context, as well as existing regulation that provides a potential framework for mandating informed consumer consent when deploying a social bot.

Data Collection

Likewise, data collection itself is already well regulated but corollary legislation is needed to modernise and bring it in line with technological advances such as data analytics, the Internet of Things and social bots as well as addressing the consequences of consumers’ uninformed or passive consent to privacy policies and terms of service.

In late April 2016, the European Union adopted the General Data Protection Regulation (GDPR). The GDPR attempts to address many of the shortcomings of the 1995 data protection directive. Firstly, as a regulation (as opposed to its predecessor, a directive), it will apply to all EU member states by May 2018 without any local implementation. It also includes several provisions to harmonise and update data protection legislation; such as expanding personal data to include online identifiers like cookies or unique device identifiers and defines consent as follows (Regulation 2016/679 OJ L119/34, emphasis added):

(11) ‘consent’ of the data subject means any freely given, specific, informed and unambiguous indication of the data subject’s wishes by which he or she, by a statement or by a clear affirmative action, signifies agreement to the processing of personal data relating to him or her;

Finally (and controversially), the GDPR applies extraterritorially. While previous legislation focused on data controllers or entities that used equipment based within an EU member state, the GDPR applies to any controller within an EU member state, offering goods and services or monitoring the behaviour of EU residents.

While it is clear that ‘monitoring behaviour’ is aimed at the online advertising industry, it is plausible that sufficiently complex social bots would fall under the remit of the GDPR. Under the GDPR, social bots that engaged with EU residents would have requirements to ensure consumers provided their unambiguous consent and provided facilities to amend or delete any personal data associated with them, regardless of the location of the bot or social network.

While US legislators are not enacting similar regulation, the combination of GDPR and the newly ratified Privacy Shield mean many technology providers will be encouraged to produce systems that conform to the highest privacy requirements.


Intent will remain the most complicated form of jurisprudence required. As our technologies mature, the philosophical foundation of all legal systems will be stretched. Complicated algorithms such as the one that caused the 2010 Flash Crash are already subject to conjecture around the intent of their creation (Stafford and Chon, 2015). Social bots such as the Dark Web Shopper were created with the intent of buying products on a market flooded with illicit goods but there was no explicit intent by its creators to acquire any of them. Jurisprudence and transhumanism collide as technology advances further and we find ourselves needing to understand whether a social bot could have its own sense of intentionality. Whilst interpreting the intent of a social bot itself when applying the law could be seen as an overly-anthropomorphic, the inherent ambiguities with intentionality and the march of technology toward near-sentience and autonomy make the subsequent assignment of liability and other legal ramifications of mens rea even harder.

Graeff (2013) proposes that the precautionary principle is invoked in any social bot regulation. He recognises that, “[a]lthough social bots may not seem to invoke the precautionary principle in the same way an environmental health risk might, the unique risks they pose to personal privacy may dramatically increase the harms that follow as a result of existing under-regulation of online privacy protection.”

He proposes this approach in an attempt to counter the potential for reactionary legislation pushed through hastily:

If a moral panic is created…it may lead to over-regulation by Congress, or rushed and poorly considered legislation, which in the case of social bots might mandate too many or the wrong kind of standards on industry and jeopardize coordination and compliance, interoperability, or innovation.

While his reasoning is correct, the call for invoking the precautionary principle seems counter-intuitive. While public or environmental harm is a more absolute measurement, the very definition of privacy remains contentious and differs greatly between national borders.

These differences were acutely highlighted in Vidal-Hall v. Google Inc ([2014] EWHC 13 (QB)), where the plaintiff took issue with Google circumventing a browser’s default cookie settings and used the subsequent data collected for targeted advertising. Google argued as there was no pecuniary loss to the plaintiff, there was no case to answer for in misusing private information. The UK Court of Appeal dismissed Google’s argument that misuse of private information was not a tort and classified browser-generated information as personal data.

Google initially appealed to the Supreme Court to rule on whether it was liable for damages when a plaintiff had suffered no financial impact but the case was settled out of court in July 2016 (Smith, 2015). Google settled a similar action in the US by paying a $22.5 million fine without admittance of liability (FTC, 2012) so while the UK case was orientated toward the misuse of information, its American counterpart focused on Google’s misrepresentation of its privacy policies and advise to consumers.

Notably, it is the very subjectivity that surrounds the definition of ‘harm’ in relation to technology that would yield the exact outcome that Graeff wanted to avoid! If such definitions are malleable, they are subject to popular opinion and are liable to produce overly-cautious and poorly considered regulation. The precautionary principled approach would not mitigate the existing under-legislation within the US and would stifle innovation across the technology sector globally.

An alternative could lie within the proactionary principle:

People’s freedom to innovate technologically is highly valuable, even critical, to humanity. This implies a range of responsibilities for those considering whether and how to develop, deploy, or restrict new technologies. Assess risks and opportunities using an objective, open, and comprehensive, yet simple decision process based on science rather than collective emotional reactions. Account for the costs of restrictions and lost opportunities as fully as direct effects. Favor measures that are proportionate to the probability and magnitude of impacts, and that have the highest payoff relative to their costs. Give a high priority to people’s freedom to learn, innovate, and advance. (More, 2005)

The brainchild of Max More, the proactionary principle was explicitly created to provide an alternative to the precautionary principle with a focus on enabling technological innovation to proceed. Both the precautionary and proactionary principles frame relevant actors as caricatures; either Luddites as the protectors of the common person with innovators blindly pursuing profit, or as scientists driving humanity forward in the face of needless risk-aversion.

Rather than using principles to dictate technology policy, the legislation of intelligent social bots, their intentionality and subsequent jurisprudence may fall within Jasanoff’s “technologies of humility” (2007):

We need disciplined methods to accommodate the partiality of scientific knowledge and to act under irredeemable uncertainty. Let us call these the technologies of humility. These technologies compel us to reflect on the sources of ambiguity, indeterminacy and complexity. Humility instructs us to think harder about how to reframe problems so that their ethical dimensions are brought to light, which new facts to seek and when to resist asking science for clarification. Humility directs us to alleviate known causes of people’s vulnerability to harm, to pay attention to the distribution of risks and benefits, and to reflect on the social factors that promote or discourage learning.

As such, Jasanoff is encouraging policy makers to know when to call for further research with other proponents advocating the flipside, urging philosophers and science policy analysts to know when to call for principled decision making and legislation (Holbrook and Briggle, 2014).

It is difficult to conclude that there can be no conclusion; that there is no definitive roadmap in understanding how the technologies that are evolving from social bots will change our understanding of intentionality, liability and the wider legal implications. In an age of scientific and technological enlightenment, we are creating new intellectual ambiguities as our technologies that we built upon binary logic and digital decision-making transcend their deterministic roots. An early-sighted and multidisciplinary approach to technology policy will address Graeff’s concerns regarding ill-considered legislation and ensures that a holistic approach is taken in formulating policy and regulation.


The rise of social bots has had a chequered and meandering past. Starting from disconnected and ‘dumb’ bots such as ELIZA, merging with the technologies that arose from web crawlers and financial trading algorithms, before evolving into complex, conversational social bots capable of learning their own syntax and colloquial ways of interlocution. Parallel to this, the explosive growth of social media has led to concerns around filter bubbles and their effects on consumers.

Social bots provide a robust solution in breaking our filter bubbles, as they can consume and proliferate content that is either truly random (thus negating the filter bubble effect directly) or using the same algorithms that filter the content to generate an ‘anti-bubble’ in order to facilitate the breadth of content required to enable democracy and serendipitous discovery of varying opinions. More worryingly are the potential for ‘data bubbles’, trends and latent issues that are present within data that have the potential to cause great harm without the domain knowledge required to ignore or counter them—algorithmic amorality does not negate social impact.

The very same technology that has driven us to be well-informed on other peoples’ lives has obfuscated our ability to understand the data collected from our own! As technologies such as social media and bots proliferate, the social trust we form during conversation means our propensity to hand over precious data unwittingly will increase exponentially without measures put in place to ensure consumers are well informed.

While Graeff (2013) proposed a combination of shifting toward vendor relationship management and a regulatory approach based upon the precautionary principle, this paper has taken a somewhat broader and perhaps more pragmatic approach. Effective VRM is still a Herculean effort in data processing and management, despite the consumer having a greater role in the ex post facto control of that data. Looking more holistically at the wider technology trends, such the Internet of Things to the exciting developments in a truly decentralised privacy model utilising blockchain technologies, the governance and oversight that must be applied to how social bots gather, store, learn from and aggregate data will be part of larger regulatory frameworks, as well as the democratisation movements that blockchain and similar technologies will enable.

The EU’s recent ratification of the GDPR is a strong first step in modernising this data privacy legislation. Rights to access, amend and delete data, extraterritorial applicability and a stronger form of informed consent are already providing the foundations for the privacy implications in new technology as we have seen with the research into IoT-centric informed consent.

Yet neither regulation nor technology can prepare us for the questions that will start to arise as social bot technology continues to evolve into artificial intelligence or as the Internet of Things moves toward autonomous robotics. Basing technology policy on ‘principles’ runs the risk of caricaturing public pressures, leading to a societal myopia. Instead, utilising Jasanoff’s wholly ungratifying but braver tactic toward these “technologies of humility” requires a multi-disciplinary approach to the questions that will surface.

Nietzsche once said, “Knowledge kills action; action requires the veils of illusion”. This may never ring as true as when we start to deal with the implications of artificial intelligence.

Works Cited

Aiello, L. M., Deplano, M., Schifanella, R. and Ruffo, G. (2012). “People are strange when you’re a Stranger: Impact and Influence of Bots on Social Networks”. In Proc. 6th AAAI International Conference on Weblogs and Social Media. AAAI, 10–17.

Borgesius, F., Trilling, D., Möller, J., Bodó, B., de Vreese, C. and Helberger, N. (2016). “Should we worry about filter bubbles”. Internet Policy Review. Vol. 5(1).

Datta, A., Tschantz, M. and Datta, A. (2015). “Automated Experiments on Ad Privacy Settings”. Proceedings on Privacy Enhancing Technologies 2015; 2015(1):92-112.

European Commission. (2016). “The Alliance for Internet of Things Innovation (AIOTI)”. Published 5th April, 2016. Available at: Last accessed 13th August, 2016.

Farooq, O., Khan, S. and Khalid, S. (2014). “Financial Ethics: A Review of 2010 Flash Crash”. International Journal of Social, Behavioural, Educational, Economic, Business and Industrial Engineering. Vol. 8, No. 6.

Fielding, N. and Cobain, I. (2011). “Revealed: US spy operation that manipulates social media”. The Guardian. Published 17th March, 2011. Available at: Last accessed 12th August, 2016.

Fritch, D. (2005). “Click here for Lawsuit – Trepass to Chattels in Cyberspace”. Journal of Technology Law & Policy. Vol. 9. pp.32-63.

FTC. (2012). “Google Will Pay $22.5 Million to Settle FTC Charges it Misrepresented Privacy Assurances to Users of Apple’s Safari Internet Browser”. FTC Press Release. Published 9th August, 2012. Available at: Last accessed 7th August, 2016.

FTC. (2015). “Internet of Things: Privacy & Security in a Connected World” FTC Staff Report.

Google. (2016). “Robots meta tag and X-Robots-Tag HTTP header specifications”. Available at: Last accessed 12th August, 2016.

Graeff, E. (2013). “What We Should Do Before the Social Bots Take Over: Online Privacy Protection and the Political Economy of Our Near Future”. Media in Transition 8.

Grant, K. (2015). “Random Darknet Shopper: Exhibition featuring automated dark web purchases opens in London”. The Independent. Published 11th December, 2015. Available at: Last accessed: 9th August, 2016.

Ferrara, E., Varol, O., Davis, C., Menczer, F. and Flammini, A. (2016) “The Rise of Social Bots”. Communications of the ACM. 59(7). pp.96-104.

Finger, L. (2015). “Do Evil – The Business of Social Media Bots”. Forbes. Published 17th February, 2015. Available at: Last accessed 12th  August, 2016.

Frey, R., Wörner, D. and Ilic, A. (2016). “Collaborative Filtering on the Blockchain: A Secure Recommender System for e-Commerce”. 22nd Americas Conference on Information Systems (AMCIS).

Fritch, D. (2004). “Click here for lawsuit – Trepass to Chattels in Cyberspace”. Journal of Technology Law and Policy. Vol 9. pp.31-63.

Gupta, A., Lamba, H. and Kumaraguru, P. (2013). “$1.00 per rt #bostonmarathon #prayforboston: Analyzing fake content on Twitter”. In: Proc. Eighth IEEE APWG eCrime Research Summit (eCRS), p.12. IEEE.

Hern, A. (2015). “Randomly generated tweet by bot prompts investigation by Dutch police”. The Guardian. Published 12th February, 2015. Available at: Last accessed: 9th August, 2016.

Jasanoff, S. (2007). “Technologies of Humility.” Nature. 450 (7166): 33–33

Kincaid, J. (2009). “Facebook Activates ‘Like’ Button; FriendFeed Tires of Sincere Flattery.” TechCrunch. Published 9th February, 2009. Available at: Last accessed: 9th August, 2016.

King, H. (2016). “Facebook will require political bias training for employees”. CNN. Published 23rd June, 2016. Available at: Last accessed 9th August, 2016.

Lee, P. (2016). “Learning from Tay’s Introduction”. Official Microsoft Blog. Published 25th March, 2016. Available at: Last accessed 17th July, 2016.

More, M. (2005). “The Proactionary Principle: v1.2”. Available at: Last accessed 7th August, 2016.

Neisse, R., Baldini, G., Steri, G., Miyake, Y., Kiyomoto, S. and Biswas, A. (2015). “An agent-based framework for Informed Consent in the internet of things” Internet of Things (WF-IoT), 2015 IEEE 2nd World Forum on, Milan, 2015, pp. 789-794.

Neisse, R., Baldini, G., Steri, G. and Mahieu, V. (2016). “Informed Consent in Internet of Things: the Case Study of Cooperative Intelligent Transport Systems”. 2016 23rd International Conference on Telecommunications (ICT).

Nguyen, T., Hui, P-M., Harper, F., Terveen, L. and Konstan, J. A. (2014). “Exploring the Filter Bubble: The Effect of Using Recommender Systems on Content Diversity.” WWW’14, April 7–11, Seoul, Korea.

Orcutt, M. (2012). “Twitter Mischief Plagues Mexico’s Election”. Technology Review. Published 21st June, 2012. Available at: Last accessed: 19th June, 2016.

Pariser, E. (2012). “The Filter Bubble: What The Internet Is Hiding from You”. New York: Penguin Press.

Peysakhovich, A. and Hendrix, K. (2016). “News Feed FYI: Further Reducing Clickbait in Feed”. Facebook Newsroom. Published 4th August, 2016. Available at: Last accessed 7th August, 2016.

Plitch, P. (2002). “Are Bots Legal?” The Wall Street Journal. Published September 16, 2002. Available at: Last accessed 12th June, 2016.

Price, R. (2016). “Microsoft is deleting its AI chatbot’s incredibly racist tweets”. Business Insider UK. Published 24th March, 2016. Available at: Last accessed 9th August, 2016.

Quilter, L. (2002). “The Continuing Expansion of Cyberspace Trespass to Chattels”. Berkeley Technology Law Journal. Vol.17. pp.421-443.

Ritter, A., Cherry, C. and Dolan, W. (2011). “Data-Driven Response Generation in Social Media”. Empirical Methods in Natural Language Processing (EMNLP).

Roth, L. (2009). “Looking at Shirley, the Ultimate Norm: Colour Balance, Image Technologies, and Cognitive Equity”. Canadian Journal of Communcation. Vol 34. pp. 111-136.

Sambrook, R. (2014). “Objectivity and Impartiality for Digital News”. Digital News Report 2014. Reuters Institute for the Study of Journalism. Available at: Last accessed 13th August, 2016.

Sordoni, A., Galley, M., Auli, M., Brockett, C., Ji, Y., Mitchell, M., Nie, J., Gao, J. and Dolan, B. (2015). “A neural network approach to context-sensitive generation of conversational responses”. Human Language Technologies: The 2015 Annual Conference of the North American Chapter of ACL. pp.196-205.

Smith, P. (2015). “Vidal-Hall v Google Goes to the Supreme Court”. Carter-Ruck Blog. Published 12 August, 2015. Updated 1st July, 2016. Available at: Last accessed 7th August, 2016.

Stafford, P. and Chon, G. (2015). “UK trader arrest over 2010 flash crash”. Financial Times. Published 22nd April, 2016. Available at: Last accessed 13th August, 2016.

Zyskind, G., Nathan, O. and Pentland, A. (2015). “Decentralizing Privacy: Using Blockchain to Protect Personal Data”. 2015 IEEE CS Security and Privacy Workshops. (SPW), 180–184. New York, NY: IEEE.

[1] For purposes of clarity, this paper will refer to modern bots capable of conversation as ‘social bots’ and all other automated or ‘artificial intelligence’ agents that operate on social networks will be referenced as simply ‘bots’.

[2] Websites can use several methods to tell bots to not index the site, include a “robot.txt” file containing relevant instructions for web crawlers or (on more modern sites) a META robots tag (Google, 2016).

[3] An interesting aside, Van der Goot’s bot was actually engaged in a conversation with another bot when it made its ‘threat’. Van der Goot said that, “people [seemed] to be under the misconception that the death threat was aimed AT the other bot, which [was] not the case.” (Hern, 2015).

[4] The name derives from colonial India, where the British attempted to reduce the cobra population in Delhi by offering a bounty. Enterprising citizens starting breeding cobras for the bounty, leading to the programme being scrapped and the surplus snakes released, increasing the overall cobra population.

3 thoughts on “Chatbots, social media and the law

  1. I am currently carrying out a project as part of my coursework, on the impact and opportunities presented by the adoption of chatbot technologies for Irish businesses operating across selected various industries (mainly customer service). The impact will be assesed under opportunity for the greater part, whilst including to a lesser extent, a discussion surrounding the threats posed by the adoption of this technology. I found your article very interesting, and am just curious as to your opinion on what you believe are the main threats concerned here for Irish businesses? Specifically in terms of non-compliance of the GDPR?

    1. Hi Holly,

      Thanks — I think chatbots and GDPR are going to be a particular pain-point that will need clarity from Article 29 or a similar body. The concerns that I see is that the data that is held could be both explicit and implicit — i.e., a chatbot could be parsing information that is then stored in a database (my preferences/PII etc.) or it could be used for a more general purpose such as better language recognition and modelling. Either way, as chatbots will increasingly act more and more “human” (NOT like a transactional database) the line will get extremely blurred. Even trying to define data controllers and data processors…it will get very messy.

      The other big concern that I have is around “informed consent” as I outline above. I think the current approach that many companies take for the EU Cookie legislation (tacit approval) will be woefully insufficient for chatbots. Similarly, reading a privacy policy or T&Cs will just lead to uninformed consumers. Ensuring that companies are GDPR-compliant when chatbots advance will mean companies will need to follow the “spirit” of the legislation, not just the technicalities.

      Sorry, absolutely NO answers there for you — but happy to talk further. 🙂

  2. Very well written – bots will be the primary interaction channel of the future and holds a lot of promise against current and traditional formats. The challenge is to make the interaction design more intuitive, demystify the tech / engineering side to bring it into more acceptance of the masses by displaying value, and increase utilization for business cases. The 24×7 self service and intelligent nature of the format is already being used for customer service, conversational commerce, health tech and education. We have started on the journey at Engati, do visit us to give us feedback on – you can also read our collection of blogs on

Leave a Reply

Your email address will not be published.