It is now impossible to ignore the impact of artificial intelligence (AI) and large language models (LLMs) as they have transformed, and are continuing to transform, industries at an unprecedented pace. 

With the fast rate of innovation and adoption of technologies utilising AI and LLMs comes significant challenges, particularly in the realm of intellectual property. 

This article unlocks some of the key questions surrounding AI and intellectual property rights and, in particular, how copyright law in the United Kingdom can be a useful tool for content creators and owners, should they need to consider protecting and enforcing against potential infringements by AI systems. 

How do AI and LLMs work?
AI systems operate using LLMs that function by processing large amounts of data with the objective to identify rules, patterns and structures in specific datasets. The objective for most human interaction AI systems is to generate outputs that mimic human intelligence. This is a fundamental stage in the training of most (if not all) AI systems. LLMs, like the ones utilised by OpenAI's ChatGPT, GPT4 and those that will follow, are trained on datasets. Without datasets, LLMs would have no data to train from and, therefore, they would not be able to identify rules, patterns and structures fundamental to generate accurate and meaningful outputs (not just limited to text). It is therefore necessary for all LLMs to have access to data so that they have a foundation for the model to start and build upon. 

Where does the data in the datasets come from? This goes to the core of the legal issue. Datasets are comprised of text available from the internet, articles, forum content, electronic books and magazines and any other digital content. Datasets are often compiled by ‘scraping’ vast amounts of this digital content from online sources, which is then made available to the LLMs for the extensive training and learning. Once trained, the AI system will generate output text that can respond to prompts (in the form of questions from users) but in a manner that mimics human intelligence. 


What is the UK law on copyright?
The law in the United Kingdom (predominantly under the Copyright, Designs and Patents Act 1988) grants creators and/or authors of certain types of works absolute protection over the works, such that only owners have the legal right to reproduce, publish, distribute and make adaptations (amongst others). Any act by an unauthorised or unlicensed party would (subject to certain defences) amount to an act of infringement which would give rise to a claim for infringement.

Where is the risk of copyright infringement?
Operating an AI system or being involved in compiling LLMs attracts a certain amount of litigation risk. With the emergence of more litigation and judgments from UK courts adjudicating on the interplay between AI and intellectual property rights, this is an area that should be monitored closely. 

  • In particular, the operation of an AI system’s LLM raises the following legal issues: 
    Scraping: If the data collection stage involved scraping any online content without explicit permission from the copyright holder, this could amount to an infringement of copyright, namely the unauthorised act of reproduction, distribution and/or adaptation. These, as explained above, are exclusive rights afforded only to the copyright owner, or a licensed party. 
  • Unauthorised reproduction: Users interact with an AI system by providing it with prompts. These prompts refer to the LLMs, which generate output text that is a reconstruction of data that it has previously ‘learnt’ which is then recompiled in a manner that attempts to respond to the user’s prompt in a style that would have some degree of connection to the original source in the dataset. In the early stages of AI training, the output responses to prompts may have been identical to large parts of the original data, however as the datasets have increased, the output may have changed or developed, but could arguably amount to an adaptation or a derivative work – which could potentially constitute copyright infringement.   
  • Distribution of infringing content: Once the output work has been generated (whether text or graphics) often the AI generated work is distributed, which further intensifies the infringement, especially if the generative work is used commercially without obtaining a license and/or giving credit to the true owner/author. This is compounded by the fact that the AI system may re-scrape the AI generated work as part of the continuous learning for the LLM, which would result in further adaptations and derivatives such that, potentially, the association with the original content is impossible, which means that copyright owners get little to no value and/or incentive for creating the original work.

How can businesses protect content and data?
With this rapid change in technology and the emergence of generative AI, the value of digital content is likely to change. Understandably, copyright and content owners need to be aware that if content is available publicly, it is more than probable that it has been, or may be, used by AI systems to train LLMs. There are certain steps copyright and content owners can take to protect their intellectual property from potential AI-related infringements.

  • Police and detect: Use digital tools to ringfence your content and to monitor unauthorised use of copyright protected works. If detected, it would be prudent to issue a Letter Before Action or a Cease and Desist letter to give notice to the infringer of the infringement.
  • Terms of use: Ensure that Terms of Use expressly prohibit the use of any digital content to be used as part of any AI and machine learning process.
  • Commercialise the intellectual property right: If appropriate, put in place a license granting third parties access to use and/or reproduce copyright protected material for a license fee for a limited period of time, or whilst the LLM is in use.
  • Police and protect: If required, be prepared to enforce your intellectual property rights by issuing a Letter Before Action or a Cease and Desist letter, and also court proceedings to obtain an injunction, damages or an order for a proportion of the infringers’ profits. 

Protecting intellectual property has never been more crucial. As machines get faster with more processing capability the potential risks and legal implications relating to the creation and use of AI is going to be integral to (i) the operation of any business; and (ii) the creators and owners of content.