Deep learning model trained on vast text
A Large Language Model (LLM) is a type of artificial intelligence system based on neural networks that uses deep learning techniques and is trained on vast amounts of text data to understand, generate, and manipulate human language. No universally agreed definition exists for what constitutes "large" in this context, though LLMs typically have at least one billion parameters, with leading models containing hundreds of billions or even trillions of parameters.
LLMs represent a subset of foundation models specifically designed for natural language processing tasks. They are built using transformer neural network architecture, which enables them to process entire sequences of text in parallel rather than sequentially, allowing for more efficient training and superior performance on language-related tasks.
The term gained prominence following the release of models such as OpenAI's GPT series, with ChatGPT bringing LLMs to widespread public attention in 2022. Modern LLMs demonstrate capabilities including text generation, language translation, question answering, summarisation, and increasingly, multimodal tasks involving images, audio, and other data types.
LLMs are constructed using transformer neural networks, which consist of encoder and decoder components with self-attention mechanisms. This architecture allows the model to understand relationships between words and phrases across entire documents, regardless of their distance from each other in the text.
The key distinguishing feature of LLMs is their scale, both in terms of parameters (the weights and connections within the neural network) and training data (typically involving petabytes of text from diverse sources including web pages, books, and other textual materials). Parameters function as a knowledge repository, with larger parameter counts generally correlating with enhanced capabilities and performance.
LLMs utilise various training methodologies including unsupervised learning on unlabelled text data, supervised fine-tuning on specific tasks, and reinforcement learning from human feedback (RLHF) to align model outputs with human preferences and values. This multi-stage training process enables LLMs to perform few-shot and zero-shot learning, adapting to new tasks with minimal or no additional training.
The EU AI Act addresses LLMs primarily through its provisions on General-Purpose AI Models (GPAI), which encompasses most commercial LLMs. The Act defines GPAI models as systems that "display significant generality and are capable to competently perform a wide range of distinct tasks regardless of the way the model is placed on the market and that can be integrated into a variety of downstream systems or applications."
LLMs that exceed certain computational thresholds (currently 10^25 floating-point operations during training) are classified as GPAI models "with systemic risk" and face enhanced regulatory obligations including risk assessment, adversarial testing, model evaluations, and incident reporting requirements.
The regulatory treatment of LLMs varies significantly across jurisdictions. The Biden Administration's Executive Order on AI specifically addresses the risks posed by large-scale AI models, whilst the UK has adopted a principles-based approach focusing on existing regulators adapting their frameworks to address AI-specific risks.
LLMs have found applications across numerous sectors with significant legal implications. In healthcare, they assist with clinical note analysis and medical research, raising questions about liability for diagnostic errors and patient privacy. Legal applications include document review, contract analysis, and legal research, creating concerns about professional responsibility and the unauthorised practice of law.
The financial sector employs LLMs for fraud detection and customer service, whilst their use in hiring and employment decisions raises discrimination and fairness concerns. Educational applications have sparked debates about academic integrity and the appropriate use of AI assistance in learning and assessment.
LLMs present several categories of legal challenges that courts and regulators are beginning to address. Copyright infringement litigation has emerged as a primary concern, with numerous cases challenging the use of copyrighted materials in training data without permission. Publishers, authors, and content creators have filed lawsuits against major LLM developers, arguing that training on copyrighted works constitutes unauthorised reproduction and distribution.
Academic research has identified significant issues with "legal hallucinations" where LLMs confidently generate false legal information, including non-existent case citations, incorrect legal principles, and fabricated court decisions. This phenomenon poses particular risks in legal applications and raises questions about professional negligence when lawyers rely on LLM-generated content without adequate verification.
Privacy concerns arise from LLMs' potential to memorise and reproduce personal information encountered during training, leading to inadvertent disclosure of sensitive data. The models' ability to infer information about individuals from limited context creates additional privacy risks that existing legal frameworks may not adequately address.
Despite their impressive capabilities, LLMs face several well-documented limitations relevant to legal assessment. Hallucination remains a persistent problem, where models generate plausible-sounding but factually incorrect information. This issue is particularly concerning in professional contexts requiring accuracy and reliability.
LLMs lack true understanding of causation, often struggling with reasoning tasks that require genuine comprehension rather than pattern matching. They cannot access real-time information beyond their training data cutoff and may exhibit biases present in their training materials, potentially perpetuating discriminatory outcomes.
The "black box" nature of LLMs creates explainability challenges, making it difficult to understand how specific outputs are generated. This opacity complicates efforts to identify bias, ensure fairness, and assign accountability for harmful outputs.
As LLMs continue to evolve, legal frameworks must address several emerging challenges. The development of increasingly capable models raises questions about liability allocation between developers, deployers, and users. Professional responsibility rules may require updating to address the appropriate use of LLM assistance in regulated professions.
International coordination on LLM governance remains limited, creating potential conflicts between different regulatory approaches. The rapid pace of technological development challenges traditional legal frameworks designed for slower-moving technologies, suggesting a need for more adaptive regulatory approaches.
AWS, "What is LLM? - Large Language Models Explained" (2025). Center for Security and Emerging Technology, "What Are Generative AI, Large Language Models, and Foundation Models?" (2024). European Union, Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act), Official Journal of the European Union, L 1689, 12 July 2024, Article 3. Henderson, P., et al. "Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models." Journal of Legal Analysis 16, no. 1 (2024): 64-93. Stanford Institute for Human-Centered Artificial Intelligence, "Foundation Models under the EU AI Act" (2024). Various technical and industry sources as cited above.