> Also, the source code they were trained on does not exist in the model, it's impossible for the llm to return a code snippet from some other code base.
So? Just because a piece of output data is encrypted or compressed and does not resemble the input, does not mean that the process did not take the input.
We have decades of law that regards zipped files as infringment, lossy compression (MP3's) as infringment, etc.
> guess another way to put it is show your code in the output of an llm that isn't being attributed correctly.
Well, a better way of putting it is answering the question "Will that model have existed had none of the code used as input existed".
IOW, can that model be generated or created without first having all that copyrighted code used as input?
So? Just because a piece of output data is encrypted or compressed and does not resemble the input, does not mean that the process did not take the input.
We have decades of law that regards zipped files as infringment, lossy compression (MP3's) as infringment, etc.
> guess another way to put it is show your code in the output of an llm that isn't being attributed correctly.
Well, a better way of putting it is answering the question "Will that model have existed had none of the code used as input existed".
IOW, can that model be generated or created without first having all that copyrighted code used as input?