Intellectual Property in the Machine Learning Era

In a philosophical sense, the creative lineage of a work generated by artificial intelligence is difficult to define. This notion is equally complex in the legal framework. Certain specificities of machine learning algorithms raise new questions about the protection of the intellectual property they represent and the productions these models generate.

Present even in the Universal Declaration of Human Rights (Article 27), the right to intellectual property serves as a binding force for our innovative tendencies. Traditionally the domain of seasoned lawyers with complex jargon, it nonetheless safeguards a significant portion of the inventions that surround us. Behind the legal and technical subtleties lies a protective framework for the inventor, and, sometimes, the consumer. Often confined to the jealous protection of industrial secrets, intellectual property law also serves to regulate more collaborative practices like open-source software or the sharing of creations into the public domain.

Everyone has the right freely to participate in the cultural life of the community, to enjoy the arts and to share in scientific advancement and its benefits.

Universal Declaration of Human Rights (Article 27)

To assert their rights, an inventor or creator has three main mechanisms at their disposal. The trademark ™ protects the image of a recognizable product, a brand. For the consumer, it is a guarantee of quality. The copyright © allows the inventor to specify the conditions under which their invention may be used. It can limit usage to the inventor alone or be more permissive, as with some open-source licenses.

Finally, a patent grants a temporary monopoly on the exploitation of an invention. Once the patent is granted, it is common to make it public since the law will protect against the undue use of this intellectual property. To obtain this coveted patent, an application is submitted. An innovation office will assess the quality and novelty of the invention to decide whether to grant the patent.

The Specificity of a Model Lies Not in Its Constituent Blocks

Technical Drawing of a Robot Arm (US Patent US4806066A)

The patent application requires a particularly detailed description of how the invention works. Technical drawings and systemic analysis diagrams are common. In the case of a machine with a physical extension, the description is often straightforward. However, for an algorithm, things can get complicated. What are the specifications, what is the manufacturing process? Under current laws, an algorithm or mathematical formula alone cannot be patented. To make the whole thing patentable, it must be justified in a specific context, a use particular to the industrial enterprise.

This definition becomes meaningful in the context of machine learning models. These models are often described as a quantity that one seeks to optimize under certain constraints, such as a classification error. Levers or parameters are provided to an optimization algorithm, which is then tasked with finding the settings that best satisfy this constraint to obtain the desired model. Like an automatic chef, the algorithm will seek the best possible combination of available ingredients to achieve the desired dish. It is the obtained proportions, the optimal recipe, that can be patented, but not the formula that allows obtaining them.

It should be noted, however, that the parameters obtained are intrinsically dependent on the "ingredients" provided for training. To use the analogy of the Portrait of Edmond Bellamy, the formula was optimized to find a combination of old paintings, resulting in the final artwork. However, the same formula could be given images of cats, resulting in a model with optimized parameters to produce feline images. These two models are very different but stem from the same general formula. It is the data that made the difference in obtaining the final model.

Photos of Cats Generated by a GAN (Alexia Jolicoeur-Martineau)

This is the key to intellectual property in the field of machine learning. Companies like Google or Facebook have no problem sharing the equations governing these models because, without the massive datasets they jealously guard, their intellectual property remains intact. It is absolutely critical for these companies to obtain massive and detailed datasets, as they are the keystone of effective models. It should be noted that these datasets can only be copyrighted if they are specifically structured, manipulated, or processed.

Given the economic stakes of machine learning, many companies take advantage of protectionist laws on industrial secrets. Unlike patents, trade secrets do not require making the functioning of the invention public. This practice is often applied to the datasets collected by these companies but raises questions of transparency, ethics, and alignment with the scientific method, which relies on the open communication of knowledge.

Indeed, it is extremely difficult to know the extent of data collection practices, which are known to be intensive, or their use by companies. While revelations about dubious practices by groups like Facebook have drawn political attention to the issue, only the European Union has adopted regulations limiting the right to trade secrets in the context of personal data manipulation.

Even if the collection and use of data were more transparent, many tools are still lacking to assess the macroscopic impact of these models and their automated decisions. On a microscopic level too, it remains to be understood exactly how to make these models more robust and correctly interpret how they learn and produce predictions. These two scales will necessarily have to cooperate, at both the technical and political levels, to achieve a positive economic and social impact.

Once the Algorithm Is Trained, Who Owns Its Production?

In a previous article, we discussed the difficulty of defining the artistic lineage of a work generated by a machine learning model. This question obviously arises beyond the artistic sphere. While the previous debate largely revolved around the philosophical aspect of creation, for many companies this question is quickly dominated by financial and legal concerns.

In 2011, English photographer David Slater traveled to Indonesia to photograph an endangered species of macaque. To avoid startling these shy monkeys, the photographer set up his camera, allowing the monkeys to play with the shutter. In this way, he obtained "selfies" taken by the animals. Upon his return, he attempted to sell these photos, sparking a controversy over their attribution: can the monkey have rights over the photo, or are they the work of the photographer? After a long debate, the U.S. Copyright Office ruled that a non-human creator is not a legal person and therefore cannot obtain rights to the creation.

Macaque Selfie (Indonesia, 2011)

This decision clearly challenges copyright for computer-generated works, including those by artificial intelligence. Software law, which came into question in the 80s and 90s, offers a partial answer. Creators must produce licenses that explicitly transfer the usage rights of the developed models to the user. This condition is standardized for software. Thus, models, like computer programs, are considered tools for the user, who is then free to claim rights to the creations that result from them. To do so, they must meet the same conditions of uniqueness and innovation as for any other artistic or technical work.

However, these conditions remain vague: how do we judge the artist's intent? Can a creator claim all the creations that are possible with this tool? How do we quantify the importance of artistic intervention in the final result? What about meta-algorithms that automatically create other algorithms based on a constraint? These are all questions that remain to be discussed to finally obtain a clear legal framework around creative practices that will become increasingly common.