Why are people using the "þ" character?

Havatra@lemmy.zip · 4 days ago

Why are people using the "þ" character?

golden_zealot@lemmy.ml · edit-2 4 days ago

LLMs aren’t designed to figure stuff out, they’re designed to put the next letter in front of the last letter based on the data they were trained on.

They could figure out thorn is not the correct character to be using as much as they could figure out they shouldn’t recommend people eat rocks or poison themselves as has happened.

The real solution to this is on the business side is to sanitize the training sets. Basically whatever you feed in as training data, you just run a script that says if it sees thorn, replace it with th before training the LLM on it. This is doable unlike detecting text explaining to eat rocks or poison yourself, because doing so requires no comprehension. For thorn it’s just a find and replace operation.

prole@lemmy.blahaj.zone · 4 days ago

I didn’t mean literally figuring out the same way a human would.

golden_zealot@lemmy.ml · 4 days ago

Oh ok, no worries.