Discussion about this post

User's avatar
SLPs Talk Tech's avatar

Great article

Richard Pinch's avatar

I suppose It All Depends What You Mean By ... memorise. It's clear that LLMs store a representation of much of their training data and that a copy of the training data may be elicited from that representation -- clear, because there is good evidence of this actually happening. If that what you mean by "memorise", then yes they do. But it's important not to give too much weight to the specific word.

Perhaps a useful analogy is with, say, digital photographs or digital recordings. The creative input (picture, music, text) is processed into numeric data in a way that captures much if not all of the signal, and that numeric data is stored and copied. From any such copy, a user with the appropriate hardware (player, computer, phone) and software (browser, GPT) can obtain a more-or-less faithful copy of a specified creative input.

It has been previously generally understood that to do this without the permission of the rights owner is an infringement, and that the file of numeric data is an infringing copy, even if it is physically very different from the creative original.

2 more comments...

No posts

Ready for more?