Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Can I post code examples with GPL software on Stack Overflow? - https://opensource.stackexchange.com/a/6870 I believe is more applicable.

Or from even well before LLMs - Do I have to worry about copyright issues for code posted on Stack Overflow? https://meta.stackexchange.com/q/12527 (also linked from https://news.ycombinator.com/item?id=25621815 )

It is legal and not a copyright violation to post GPL licensed code on Stack Overflow under the CC-BY-SA 4.0 license for purposes of education (without license attribution).

It is a GPL license / copyright violation to use the GPL licensed code that was posted to Stack (without license attribution) and use that in your code under the presumption that it was licensed under CC-BY-SA 4.0

There is no real difference in terms of copyright violation between copying GPL licensed code from Stack Overflow (which can be there) or code from Copilot - in either case, you, the human doing the copy and paste are the one doing the infringing and are responsible for ensuring that the province of the code that you are copying is free from any licensing encumbrances.

It is not a copyright violation for a radio (a machine) to play a song that it received from the airwaves. It is a copyright violation for you, the human, to take that radio and have it be a performance in the park where people can dance to the music on the radio played loudly.

Machines with no agency cannot infringe copyright. If I took a photo of a page of a book with my iPhone, and the iPhone did image to text on it, that's not the iPhone's fault. And I would possibly be within my rights to take that photo. I would be infringing if I then published that image or the text that the iPhone generated.

I believe / have the understanding that copyright can only be violated by an entity with agency - and some entity with agency is the one that ends up publishing or redistributing the work.

To that end, it doesn't matter if code was written by Copilot, copying from StackOverflow, or a random person on Fiver (who may or may not have used Copilot). If I publish it, I am the person with agency that infringed copyright.

If we say that "ahh, but you used Copilot - that was an infringement" ... ok, so I copied some unattributed code from Stack Overflow that I believed was CC-BY-SA. Is StackOverflow responsible for my accidental infringement? If the answer is "no, you - as the person pasting the code into your work - should always be checking the copyright province of unknown code you're pasting in" then I believe that same answer should applicable to all the other situations too.

https://www.synopsys.com/blogs/software-security/stack-overf...

https://opensource.com/law/13/7/fantec-german-foss-complianc...

> The court required Fantec to pay a contractual penalty in the amount of € 5,100 based on the prior settlement agreement. In addition, the court awarded the plaintiff’s expenses in enforcing the GPLv2. (This award is standard under German law and is based on Section 97a (1), 31, 69c no. 3 and 4 of the German Copyright Act which awards costs for a justified warning by a party which is so cautioned.) The court affirmed the culpability of Fantec’s violation by classifying the violation as negligent: the seller of firmware may not rely on suppliers'´statements about compliance. The distributor of GPLv2 software must carry out the assessment or commission experts to make the assessment even if they incurred additional costs.

https://fsfe.org/news/2013/news-20130626-01.en.html

> The court decided that FANTEC acted negligently: they would have had to ensure to distribute the software under the conditions of the GPLv2. The court made explicit that it is insufficient for FANTEC to rely on the assurance of license compliance of their suppliers. FANTEC itself is required to ascertain that no rights of third parties are violated.

It is the responsibility of the distributor to comply with the license.

In this light, it doesn't matter what Copilot "claims" about the license of code - the programmer copying the code is responsible for verifying its copyright status and is at fault if they publish that code.



> Machines with no agency cannot infringe copyright. If I took a photo of a page of a book with my iPhone, and the iPhone did image to text on it, that's not the iPhone's fault. And I would possibly be within my rights to take that photo. I would be infringing if I then published that image or the text that the iPhone generated.

> I believe / have the understanding that copyright can only be violated by an entity with agency - and some entity with agency is the one that ends up publishing or redistributing the work.

The big problem with this argument is that the machine is not publishing things, OpenAI the company is. They have created the entire circumstances around which this copying can happen.

Let's consider the Napster case. If the argument is "software can't violate copyright" then what was the RIAA's problem with a mass-scale copying and sharing of their music? Why was Napster able to be sued into nonexistence? They only created the software, after all.

There's precedent here that creators of software can be held liable for the copyright abuses that software leads to or permits.

> It is the responsibility of the distributor to comply with the license.

By all measures, OpenAI is the distributor of the code here. After all, their software is outputting licensed code.


I draw more parallels between OpenAI and Xerox and the copyright crisis about people making copies of material.

https://www.copyright.gov/title37/201/37cfr201-14.html

> The copyright law of the United States (title 17, United States Code) governs the making of photocopies or other reproductions of copyrighted material.

> Under certain conditions specified in the law, libraries and archives are authorized to furnish a photocopy or other reproduction. One of these specific conditions is that the photocopy or reproduction is not to be “used for any purpose other than private study, scholarship, or research.” If a user makes a request for, or later uses, a photocopy or reproduction for purposes in excess of “fair use,” that user may be liable for copyright infringement.

> This institution reserves the right to refuse to accept a copying order if, in its judgment, fulfillment of the order would involve violation of copyright law.

The machine is not at fault for reproducing an exact copy of copyrighted materials. It is perfectly within fair use of copyright if it is used for private study, scholarship, or research.

If that person goes beyond that, and uses the reproduction for purposes beyond that then it is that person is liable for infringement - not the machine.


To be comparable, Xerox would need to have been the sole holder of all photocopiers everywhere, and charge fees for use. Not to mention you're also then in the physical world.

This is why Napster is a far better comparable. It's all software, via the Internet, and was at scales no photocopiers could compete with. Only it goes a step worse than Napster. In Napster's case, they simply built software and services primarily aimed at facilitating P2P file sharing. In OpenAI's case, they themselves are responsible for creating the copies of infringing materials. They performed the scraping, and they perform the distribution.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: