Towards the Development of a Copyright Risk Checker Tool for Generative Artificial Intelligence Systems
Abstract
1 Introduction
2 Research Background and Motivation
2.1 GAI
Models | Definition | References |
---|---|---|
Machine Learning | Enables computers to learn from data and improve their performance over time. | [12] |
Deep Learning | Subset of machine learning that uses neural networks with three or more layers to learn from large data sets. A Large Language Model is a kind of a deep learning model. | [3] |
Diffusion Processing | Enables computers to analyse and process data in a decentralised manner, simulating how information spreads and influences networks. | [21] |
Neural Network | Computational learning system that uses a network of functions to understand and translate data inputs into desired outputs. | [12, 18] |
GAI System | GAI Model | Generated Work | Category of Work by Australian Copyright Definition [48, 49] |
---|---|---|---|
ChatGPT, Bard and BingAI | Large Language Model | Language based tasks (question responses, essays, translations, etc.) | Literary works |
GitHub Copilot | Large Language Model | Programming code | Literary works |
DALL-E, Midjourney, and Stable Diffusion | Diffusion Model | Images | Artistic work |
DeepMind and Amazon Polly | Deep Learning | Voice/Speech | Sound recordings |
MusicLM and Soundraw | Deep Learning | Music | Musical works |
Stable Diffusion, Make-A-Video and Veed.io | Diffusion Model | Video | Cinematograph films |
2.2 AI Regulations and Standards
Country | Regulation/Standards | Addressed Copyright Concerns | References |
---|---|---|---|
Australia | Australia showcases a commitment to ethical AI through initiatives like the Responsible and Inclusive AI whitepaper and the AI Assurance framework, emphasising transparency and accountability. Active participation in global AI standards via ISO/IEC JTC 1/SC 42 further demonstrates Australia's dedication. The impending release of ISO/FDIS 42001 by this committee signifies progress in AI management systems. However, despite these advancements, the absence of specific domestic regulations raises concerns about consistent implementation. While these initiatives provide essential principles, enforceable legislation is vital for translating these ideals into tangible actions. Australia needs comprehensive, binding laws to ensure ethical AI principles are not just guidelines but enforceable standards, fostering a culture of responsible AI practices nationwide. | Transparency, Accountability | [28, 27, 31, 29] |
United Kingdom (UK) | The UK's sector-based regulatory approach, integrating AI principles into existing frameworks, demonstrates adaptability. Notable legislative efforts like the Data Protection and Digital Information Bill showcase the UK's responsiveness to evolving concerns, particularly regarding data privacy. Financial investments in initiatives such as the Foundation Model Taskforce reflect a commitment to ethical AI deployment. Nevertheless, the sector-specific nature of regulations might create complexities and inconsistencies, making it essential to monitor uniformity in application. | Data Privacy, Adaptability | [26, 33, 32] |
United States (US) | The United States operates in a decentralised AI regulatory landscape, lacking a nationwide framework. While non-binding guidelines, such as the 'Blueprint for an AI Bill of Rights,' emphasise ethical practices, the absence of binding laws raises questions about the enforceability and standardisation of AI ethics. Local initiatives, while promising, contribute to a fragmented approach, potentially resulting in varying standards and oversight mechanisms. | Ethical Practices, Fragmentation | [34, 25] |
European Union (EU) | The EU's proposed AI Act adopts a risk-based approach, tailoring regulations based on potential AI dangers. It categorises AI applications into unacceptable risk (outright prohibited), high risk (stringent requirements), low-to-minimal risk (less strict). Expected in 2024, the act aims to ensure AI safety and benefits. A notable section, Recital 107, requires developers of generative AI to transparently document the data used in model training, including copyrighted works. | Safety, Transparency of Data Use | [28, 30, 24] |
2.3 Legal Cases: Generative AI and Copyright Concerns
Case Number | Case Name | Copyright Concerns | Description | Relevant Concern Mapping |
---|---|---|---|---|
1 | Kris Kashtanova's ‘Zarya of the Dawn’ Comic Book (2023) | Delineating AI-generated components | US Copyright Office acknowledged specific human-authored elements but refused protection for individual AI-generated images within the graphic novel [49]. | Transparency (clarifying AI vs. human contributions), Trust (in the copyright system's ability to adapt) |
2 | Telstra v Phone Directories Company (2010) | Human authorship and originality | Landmark Australian case asserting that a work must originate from a human author and possess a degree of creativity for copyright protection [51]. | Infringement (defining what can be copyrighted), Transparency (in the criteria for copyright) |
3 | Metallica v Napster, Inc. (2000) | Liability for secondary infringement of works | Napster provided the means for the peer-to-peer sharing of infringing musical works and sound recordings, raising concerns about secondary infringement liability [50]. | Infringement (liability for facilitating copyright violation) |
4 | IceTV Pty Ltd v Nine Network Australia Pty Ltd (2009) | Protects the specific expression of facts | Australian case where copyright protects the specific expression of facts, not the facts themselves, emphasising the importance of originality [53]. | Transparency (in what constitutes original expression) |
5 | Author's Guild v OpenAI (2023) | Authorisation or licensing requirement | Case dealt with the unauthorised, unlicensed use of works to train GAI models, highlighting the need for proper authorisation or licensing [54]. | Infringement (use without license), Trust (in respecting copyright laws) |
6 | Plaintiffs v Google (2023) | Compliance with copyright licensing in GAI system development | Challenges the legality of Google's AI systems training on copyrighted content without proper licensing, highlighting the need for compliance with intellectual property rights in the development of GAI technologies [57]. | Infringement (lack of licensing), Trust (in legal compliance) |
7 | Doe v GitHub et. Al (2022) | Adherence to licensing terms for open-source material | Use of open-source software by AI without adhering to licensing terms such as attribution, infringing copyright owners' rights [55]. | Infringement (ignoring licensing terms), Transparency (in open-source licensing) |
8 | Feist Publications v Rural Telephone Service Company (1991) | Originality and copyright protection of directories and fact-based works | US case that limits the scope of protection in fact-based works, questioning if the work AI is training on is based on facts or original expressions [52]. | Transparency (in the scope of copyright protection) |
9 | Getty Images v Stability AI (2023) | Unauthorised use of copyrighted material in GAI systems | Highlights the legal ramifications of unauthorised use of copyrighted material in GAI systems, emphasising proper licensing and adherence to intellectual property rights [56]. | Infringement (unauthorised use), Trust (in upholding copyright) |
10 | Jason Allen's ‘Théâtre D'opéra Spatial’ Artwork (2023) | Human authorship in AI-generated creative works | US Copyright Office ruled that the work lacks human authorship" and falls outside the purview of copyright law, which excludes works produced by non-humans [58]. | Transparency (in authorship criteria), Trust (in the system's recognition of human creators) |
3 Research Method
4 The CRC Tool
Number | Component | Description |
---|---|---|
1 | User | The User component signifies the individuals engaging with the system, offering input and interacting with its features. |
2 | Admin | The Admin component signfies the administrators who will manage and oversee the system's functionality and user interactions. |
3 | Interface | The Web Browser component symbolises the interface through which users access the system, enabling seamless interaction and input submission. |
4 | Tally.so | The online form tool that is easy to use and allows for customisation of relevant questions. |
5 | Research and Analysis | Informs the CRC Tool's adaptation by processing legal cases, regulations, legislation, and academic literature, ensuring accuracy and relevancy in the generated insights. |
Case Number | Case Name | Identified Copyright Concerns | Mapping to Related CRC Tool Questions | Relevant Concern Mapping |
---|---|---|---|---|
1 | Kris Kashtanova's ‘Zarya of the Dawn’ Comic Book (2023) | US Copyright Office acknowledged specific human-authored elements but refused protection for individual AI-generated images within the graphic novel [49]. | To what extent is the AI output generated from sufficiently original and creative content? | Transparency (clarifying AI vs. human contributions), Trust (in the copyright system's ability to adapt) |
2 | Telstra v Phone Directories Company (2010) | Landmark Australian case asserting that a work must originate from a human author and possess a degree of creativity for copyright protection [51]. | How well have you determined the degree of human input and contribution to the AI generation process? | Infringement (defining what can be copyrighted), Transparency (in the criteria for copyright) |
3 | Metallica v Napster, Inc. (2000) | Napster provided the means for the peer-to-peer sharing of infringing musical works and sound recordings, raising concerns about secondary infringement liability [50]. | How effectively have you identified any existing copyrighted works that may have influenced the AI output? | Infringement (liability for facilitating copyright violation) |
4 | IceTV Pty Ltd v Nine Network Australia Pty Ltd (2009) | Australian case where copyright protects the specific expression of facts, not the facts themselves, emphasising the importance of originality [53]. | How well have you determined the degree of human input and contribution to the AI generation process? | Transparency (in what constitutes original expression) |
5 | Author's Guild v OpenAI (2023) | Case dealt with the unauthorised, unlicensed use of works to train GAI models, highlighting the need for proper authorisation or licensing [54]. | To what extent is the AI output generated from sufficiently original and creative content? | Infringement (use without license), Trust (in respecting copyright laws) |
6 | Plaintiffs v Google (2023) | Challenges the legality of Google's AI systems training on copyrighted content without proper licensing, highlighting the need for compliance with intellectual property rights in the development of GAI technologies [57]. | To what extent have you considered the potential moral and ethical implications of your AI-generated content? | Infringement (lack of licensing), Trust (in legal compliance) |
7 | Doe v GitHub et. Al (2022) | Use of open-source software by AI without adhering to licensing terms such as attribution, infringing copyright owners' rights [55]. | How effectively have you identified any existing copyrighted works that may have influenced the AI output? | Infringement (ignoring licensing terms), Transparency (in open-source licensing) |
8 | Feist Publications v Rural Telephone Service Company, Inc. (1991) | US case that limits the scope of protection in fact-based works, questioning if the work AI is training on is based on facts or original expressions [52]. | How well have you determined the degree of human input and contribution to the AI generation process? | Transparency (in the scope of copyright protection) |
9 | Getty Images v Stability AI (2023) | Highlights the legal ramifications of unauthorised use of copyrighted material in GAI systems, emphasising proper licensing and adherence to intellectual property rights [56]. | How effectively have you identified any existing copyrighted works that may have influenced the AI output? | Infringement (unauthorised use), Trust (in upholding copyright) |
10 | Jason Allen's ‘Théâtre D'opéra Spatial’ Artwork (2023) | US Copyright Office ruled that the work “lacks human authorship” and falls outside the purview of copyright law, which excludes works produced by non-humans [58]. | How well have you determined the degree of human input and contribution to the AI generation process? | Transparency (in authorship criteria), Trust (in the system's recognition of human creators) |
Risk Level | Definition |
---|---|
High Risk | Systems demand immediate and decisive action to mitigate potential legal consequences [15]. |
Moderate Risk | Systems require strategic enhancements to their processes and legal understanding, balancing urgency with strategic planning [15]. |
Low Risk | Systems benefit from ongoing vigilance and proactive measures to avoid future complications [15]. |
CRC Question | Low Risk | Moderate Risk | High Risk |
---|---|---|---|
To what extent is the AI output generated from sufficiently original and creative content? | Demonstrates high originality and creativity, minimising the risk of copyright concerns. | Shows moderate originality, with potential for minor similarities to existing content. | Displays limited originality, raising potential copyright risks. |
How well have you determined the degree of human input and contribution to the AI generation process? | Clear attribution and understanding of human-AI roles, reducing ambiguity. | Some attribution clarity, but roles may need further clarification. | Ambiguous attribution, leading to uncertainty about human-AI contributions. |
How effectively have you identified any existing copyrighted works that may have influenced the AI output? | Thorough identification of potential influences, minimising copyright risks. | Some influences acknowledged, with potential for overlooked content. | Limited identification of potential influences, raising significant risks. |
How comprehensively have you assessed the risks associated with using copyrighted data for training the AI model? | Comprehensive risk assessment and mitigation strategies in place. | Partial risk assessment with room for further analysis. | Limited risk assessment, posing potential legal and ethical concerns. |
How well are you aware of the potential legal challenges related to AI-generated content? | In-depth awareness of legal challenges, ensuring proactive compliance. | Moderate awareness of legal challenges, with room for improvement. | Limited awareness of legal challenges, raising potential issues. |
To what extent have you considered the potential moral and ethical implications of your AI-generated content? | Thorough consideration of ethical implications, guiding responsible AI development. | Some consideration of ethical implications, with potential for further analysis. | Limited consideration of ethical implications, posing ethical concerns. |
How thoroughly have you reviewed relevant copyright laws and regulations in your jurisdiction? | In-depth review of copyright laws, ensuring compliance and minimising risks. | Moderate review of copyright laws, with potential for further understanding. | Limited review of copyright laws, raising potential legal issues. |
How much are you using AI-generated content in a way that respects the rights of the original creators? | Strong adherence to rights of original creators, respecting intellectual property. | Some adherences, with potential for better alignment with creator rights. | Limited respect for creator rights, posing potential copyright violations. |
How deeply have you considered the potential impact of AI-generated content on existing markets or industries? | Thorough consideration of market impact, with strategies to mitigate disruptions. | Some consideration, with room for further analysis of potential impacts. | Limited consideration of market impact, raising potential challenges. |
Recommendations | Relevant Legal Cases | Relevant AI Regulations |
---|---|---|
Thoroughly Review GAI Outputs | All Cases | Australia: Emphasises transparency and fairness (AI assurance framework) [31]. EU: AI Act's transparency obligations for developers of generative AI models, as outlined in Recital 107 [24]. UK: Data Protection and Digital Information Bill, focusing on data privacy [26]. US: Importance of staying updated due to the fragmented regulatory landscape [34]. |
Seek Professional Legal Advice | Cases where legal complexities and liabilities are highlighted (e.g., Case 3: Metallica v Napster, Inc., Case 5: Author's Guild v OpenAI, Case 7: Doe v GitHub et. Al, Case 9: Getty Images v Stability AI) | EU: AI Act's transparency obligations emphasise the importance of legal consultation [24]. UK: Importance of legal consultation for AI-generated content [26]. US: Fragmented regulations require legal expertise for compliance [34]. |
Maintain Clear Documentation | Cases involving the need for clear authorship and licensing (e.g., Case 1: Kris Kashtanova's ‘Zarya of the Dawn’, Case 5: Author's Guild v OpenAI, Case 7: Doe v GitHub et. Al) | UK: Clarity in documentation aligns with UK's emphasis on data privacy [26]. Australia: Clear records of AI development processes align with transparency principles [27]. |
Stay Updated on Copyright Laws | All Cases | Australia: Staying updated is essential to ensure compliance with evolving laws [27]. EU: AI Act's transparency obligations require ongoing awareness of legal requirements [24]. US: Fragmented regulations necessitate continuous awareness of copyright laws [34]. |
Criterion Type | Description |
---|---|
Fit for Purpose | The CRC Tool accurately identifies copyright concerns in a variety of GAI systems, ensuring applicability and relevance within the ever-evolving GAI and copyright landscape. |
Novelty | The CRC Tool serves as an innovative solution, bridging gaps in existing literature and resources, addressing ambiguities and challenges related to copyright concerns in GAI systems. |
Ease of Use | The CRC Tool is designed to be easily accessible via web and user-friendly, making it a practical solution for addressing the complexities of GAI-related copyright issues. |
5 CRC Tool Evaluation
5.1 The Creation of the Experimental Scenario
Number | Component | Description |
---|---|---|
1 | Source | The source serves as the trigger for each scenario, representing the situation or event that prompts the need for copyright assessment and the use of the CRC Tool. The scenario was inspired by real legal cases to ensure their authenticity and applicability. By drawing from these cases, the source is grounded in actual copyright challenges faced by organisations and individuals in the GAI landscape. |
2 | Stimulus | The stimulus represents the specific issue or incident that arises from the source. In the context of these scenarios, the stimulus is the discovery of unlicensed copyrighted materials within AI-generated content. This stimulus is aligned with the copyright concerns highlighted in the legal cases, ensuring that the scenarios capture the essence of copyright infringements and compliance challenges. |
3 | Artifact | The artifact in the scenario is the CRC Tool. This Tool is employed to assess and evaluate copyright-related risks and to suggest next steps or recommendations for addressing them. By incorporating the CRC Tool as the artifact, the scenario provides a practical and structured means to indicate copyright concerns, mirroring real-world practices of using assessment tools to navigate copyright issues. The CRC Tool does not enforce compliance to specific standards or regulations, nor does it check against regulation databases; it serves to guide the developer in identifying areas that may require further legal scrutiny or action. |
4 | Response | The response component outlines the actions or measures taken in response to the stimulus. In these scenarios, the response includes the completion of the CRC Tool, and any subsequent steps taken based on its assessment/advice. This element reflects the proactive approach of organisations and individuals in addressing copyright challenges, as observed in the legal cases. |
5 | Response Measure | The response measure evaluates the effectiveness of the CRC Tool in identifying relevant or potential risks. It does not assess how well a company or individual addresses copyright concerns post-assessment. The focus is on the tool's capability for risk identification and assessment, in line with the original research question. The CRC Tool's objective is to assist developers in recognising potential copyright issues; it does not verify the originality of content, which remains the developer's responsibility. |
5.2 The Experimental Scenario: CopiCode AI Training Controversy
5.3 Application of the CRC Tool for the CopiCode AI Training Controversy Scenario
CRC Tool Question | CopiCode Response | Explanation |
---|---|---|
To what extent is the AI output generated from sufficiently original and creative content? | Moderate (B) | CopiCode's AI-generated code exhibits moderate originality but may have minor similarities to existing content. This suggests there is room for improvement in terms of originality. |
How well have you determined the degree of human input and contribution to the GAI generation process? | Moderate (B) | While there is some attribution clarity, roles may need further clarification, indicating a moderate level of understanding regarding human-AI contributions. |
How effectively have you identified any existing copyrighted works that may have influenced the GAI output? | Low (C) | CopiCode has limited identification of potential influences, raising significant copyright risks. This suggests the need for better identification and management of copyrighted content. |
How comprehensively have you assessed the risks associated with using copyrighted data for training the GAI model? | High (A) | CopiCode has conducted a comprehensive risk assessment and has mitigation strategies in place, demonstrating a high level of preparedness regarding copyright risks. |
How well are you aware of the potential legal challenges related to AI-generated content? | High (A) | CopiCode shows in-depth awareness of legal challenges, ensuring proactive compliance with legal requirements. |
To what extent have you considered the potential moral and ethical implications of your AI-generated content? | Moderate (B) | There is some consideration of ethical implications, with potential for further analysis. While ethical concerns are acknowledged, they may benefit from more thorough examination. |
How thoroughly have you reviewed relevant copyright laws and regulations in your jurisdiction? | High (A) | CopiCode has conducted an in-depth review of copyright laws, ensuring compliance and minimising risks. |
How much are you using AI-generated content in a way that respects the rights of the original creators? | Moderate (B) | CopiCode demonstrates some adherence to the rights of original creators, with potential for better alignment with creator rights. |
How deeply have you considered the potential impact of AI-generated content on existing related markets or industries? | Moderate (B) | There is some consideration of market impact, with room for further analysis of potential impacts. CopiCode has acknowledged the importance of market impact but may need a more detailed assessment. |
How extensively have you sought legal advice or consulted professionals for copyright-related concerns related to AI-generated content? | Moderate (B) | CopiCode has engaged in moderate consultation with legal professionals, indicating a willingness to seek legal insights. |
6 Discussion and Conclusion
Footnotes
References
Index Terms
- Towards the Development of a Copyright Risk Checker Tool for Generative Artificial Intelligence Systems
Recommendations
Protection of Copyrights in the Era of Generative Artificial Intelligence
Progress in Artificial IntelligenceAbstractThe historical evolution of terms such as artificial intelligence and related concepts has perpetuated a misconception conflating machine intelligence with human cognition. Recent strides in generative AI technology have exacerbated this ...
Artificial intelligence and moral rights
AbstractWhether copyrights should exist in content generated by an artificial intelligence is a frequently discussed issue in the legal literature. Most of the discussion focuses on economic rights, whereas the relationship of artificial intelligence and ...
Ethics in the Use of Artificial Intelligence in the Media
HCI International 2024 – Late Breaking PapersAbstractThe use of artificial intelligence and robotics in the media is no longer a novelty and is rapidly entering all areas of life, including the media. Now, a robot can replace the news anchor, the moderator of TV shows, and more. Artificial ...
Comments
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Information
Published In

Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Check for updates
Author Tags
Qualifiers
- Research-article
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 547Total Downloads
- Downloads (Last 12 months)547
- Downloads (Last 6 weeks)179
Other Metrics
Citations
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in