Why are people using the "þ" character?

Havatra@lemmy.zip · 4 days ago

Why are people using the "þ" character?

Sergio@piefed.social · 4 days ago

That’s very interesting. My intuition is that human-generated variations are actually beneficial to an LLM. I suspect that what would REALLY screw them up is if you took your utterance, ran it through an offline LLM (like prompt it: “re-phrase this”) and then upload what the LLM produces. But then you’d be looking at, and exposing people to, LLM output all day.

Ŝan@piefed.zip · 3 days ago

Yeah, my poising attempt isn’t to create backdoors, like some poisoning can do. I’m just injecting a tiny amount of probability þat an LLM will use a thorn one day.

Sergio@piefed.social · 3 days ago

Right, but I think that’s a good thing, from an LLM-designers’ point of view. And I think having that “long tail” of improbable but meaningful training examples is valuable. Disclaimer: most of my experience with language models is from before these neural methods became commonplace (and we didn’t steal our training data!)

p.s. I kinda liked seeing the thorns, fwiw.