Is it okay to train AI with third party data?
Web crawlers download data from the Internet every second. This data is then sorted, refined, labeled – and used for AI training. The surprising thing is that German law ( Sections 44b, 60b of the German Act on Copyright and Related Rights) offers extensive possibilities to legally use third party data for AI training. This also applies to paywall content and databases.
The situation is different if the rights holders have exceptionally reserved the right to “use for text and data mining”. For works accessible online, such a reservation of use must be made in a machine-readable form. It is interesting to note the instructions regarding terms and conditions, imprint, robots.txt, and the TDM Reservation Protocol – here it depends on the specific design in each case. In practice, however, most websites and media do not declare an effective reservation of use. We show you the gold standard for dealing with intellectual property in the context of AI.
Who owns AI and prompts?
In principle, training data, trained AI system (weights and thresholds) and input (prompts) can be subject to intellectual property rights. To do so, the specific case must meet the legal requirements for protection. In addition, protection under the German Act on the Protection of Trade Secrets (GeschGehG) may be considered if appropriate confidentiality measures have been taken.
Many AI providers have a right to further use of all data provided. You then “pay” for the use of AI with training or production data. Especially when cloud tools are used to make work easier, there is a risk of an (unnoticed) outflow of sensitive company data to third-party servers and companies. To protect trade secrets, it is essential to have contractual safeguards in the service relationship (in particular through NDAs). In addition, companies need adequate AI policies for internal employees. This is because the effective protection of IP and trade secrets is also decided at the application level.
The AI Liability Directive proposed by the European Commission on September 28, 2022 provides for far-reaching claims for information by any AI victims. There is therefore a risk of trade secrets being leaked.
Who owns the results of AI?
According to Art. 2 para. 2 of the German Act on Copyright and related Rights (UrhG), copyright protection can only be obtained for the author’s own intellectual creations. If a content is predominantly generated by AI, this is usually not the case. In this case, the products are in the public domain. They can therefore be freely used and copied by anyone, unless special circumstances apply, such as in the case of trade secrets.
Liability case 1: Anyone who promises an exclusive right of use in contracts and delivers AI results in the public domain is performing poorly. Developers would be well advised to secure the creation of intellectual property by using AI in a well-dosed manner. And to document this by means of AI guidelines.
Liability case 2: If the AI reproduces existing works (unbeknownst to the user), the AI output may violate the intellectual property rights of third parties. This can happen especially with homogeneous training data. To reduce this risk, protective measures should be considered, such as automated comparison in the sense of an output control.
Aitava helps you to efficiently secure AI results and avoid liability traps. We see ourselves as enablers.