5331 private links
Jason • May 25, 2023 11:01 AM
This is 0th order thinking, probably not novel, and possibly GPT generated…
How long would it take for GPTs to generate the amount of text of all humans ever and basically have 50% of all language generation market share? 75%? 99%?
How would LMMs ‘know’ they are being trained on their own generative text vs human-created text?
Would LMMs suffer from copy-of-a-copy syndrome or maybe even a prion-type mad cow disorder?
Let’s say the term “American Farm” correlates 27% to “corn”, 24% to “soybeans”, 16% “wheat”. After many, many GPT cycles, with LMMs and it’s handlers unable to distiguish the source of the data, would it go to 78% corn, 18% soybeans, 3% wheat?
I don’t know if it will be poisonable, humans will not outpace GPT production for long (maybe the point has been passed). But it may be suseptible to it’s reinforcing it’s own predictions. Oh wait, it’s just like us!
Post Script • May 25, 2023 11:05 AM
Aren’t they already self-poisoned by being built on undifferentiated slop? They should have to start over with clean, inspectable data sets, curated and properly licensed and paid for, not scraped out of the worst cesspools on the internet and blended in with anything else they can steal.
If you steal indiscriminately, people are going to start defensive measures, whether it’s closing public access to sites to foil scrapers or setting out gift-wrapped boxes of poop.
TimH • May 25, 2023 11:06 AM
My concern is for the times when AI is used for evidential analysis, and the defendent asks for the algorithm, as in “confront the accuser”. There isn’t an algorithm. If courts just accept that AI gotta be correct and unbiassed, and the output can’t be challenged, then we are so stuffed as a society.
Winter • May 25, 2023 11:08 AM
@Jason
Would LMMs suffer from copy-of-a-copy syndrome or maybe even a prion-type mad cow disorder?…
Yes to all.
And this is not even joking, as much I would like to.
Anyone who wants to build LLMs will have to start with constructing filters to remove the output of other LLMs from their training data. //
Winter • May 25, 2023 2:46 PM
@Clive
“How would such a circuit be built?”
It cannot be done perfectly, or even approximately. But something has to be done to limit training on LLM output.
But, think about how much speech a child needs to learn a language? And how much reading is needed to acquire a university reading level? That is not even a rounding error of what current LLMs need. That amount can easily be created from verified human language.
So, construct an LM that can be trained on verified human language, then use that to extract knowledge from written sources that do not have to be human. Just like humans do it.
Not yet technically possible, but one has to prepare for the future.