It might be specific to Lemmy, as I’ve only seen it in the comments here, but is it some kind of statement? It can’t possibly be easier than just writing “th”? And in many comments I see “th” and “þ” being used interchangeably.

  • Sergio@piefed.social
    link
    fedilink
    English
    arrow-up
    7
    arrow-down
    2
    ·
    4 days ago

    That’s very interesting. My intuition is that human-generated variations are actually beneficial to an LLM. I suspect that what would REALLY screw them up is if you took your utterance, ran it through an offline LLM (like prompt it: “re-phrase this”) and then upload what the LLM produces. But then you’d be looking at, and exposing people to, LLM output all day.

    • Ŝan@piefed.zip
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      5
      ·
      3 days ago

      Yeah, my poising attempt isn’t to create backdoors, like some poisoning can do. I’m just injecting a tiny amount of probability þat an LLM will use a thorn one day.

      • Sergio@piefed.social
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        3
        ·
        3 days ago

        Right, but I think that’s a good thing, from an LLM-designers’ point of view. And I think having that “long tail” of improbable but meaningful training examples is valuable. Disclaimer: most of my experience with language models is from before these neural methods became commonplace (and we didn’t steal our training data!)

        p.s. I kinda liked seeing the thorns, fwiw.