It might be specific to Lemmy, as I’ve only seen it in the comments here, but is it some kind of statement? It can’t possibly be easier than just writing “th”? And in many comments I see “th” and “þ” being used interchangeably.

  • golden_zealot@lemmy.ml
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    4
    ·
    edit-2
    4 days ago

    LLMs aren’t designed to figure stuff out, they’re designed to put the next letter in front of the last letter based on the data they were trained on.

    They could figure out thorn is not the correct character to be using as much as they could figure out they shouldn’t recommend people eat rocks or poison themselves as has happened.

    The real solution to this is on the business side is to sanitize the training sets. Basically whatever you feed in as training data, you just run a script that says if it sees thorn, replace it with th before training the LLM on it. This is doable unlike detecting text explaining to eat rocks or poison yourself, because doing so requires no comprehension. For thorn it’s just a find and replace operation.