• 0 Posts
  • 54 Comments
Joined 3 years ago
cake
Cake day: August 15th, 2023

help-circle










  • The training is sophisticated, but inference is unfortunately really a text prediction machine. Technically token prediction, but you get the idea.

    For every single token/word. You input your system prompt, context, user input, then the output starts.

    The

    Feed the entire context back in and add the reply “The” at the end.

    The capital

    Feed everything in again with “The capital”

    The capital of

    Feed everything in again…

    The capital of Austria

    It literally works like that, which sounds crazy :)

    The only control you as a user can have is the sampling, like temperature, top-k and so on. But that’s just to soften and randomize how deterministic the model is.

    Edit: I should add that tool and subagent use makes this approach a bit more powerful nowadays. But it all boils down to text prediction again. Even the tools are described per text for what they are for.


  • Decent sized for what?

    Creative writing and roleplay? Plenty, but I try to fit it into my 16 GB VRAM as otherwise it’s too slow for my liking.

    Coding/complex tasks? No, that would need 128GB and upwards and it would still be awfully slow. Except you use a Mac with unified memory.

    For image and video generation you’d want to fit it into GPU VRAM again, system RAM would be way too slow.



  • You might genuinely be using it wrong.

    At work we have a big push to use Claude, but as a tool and not a developer replacement. And it’s working pretty damn well when properly setup.

    Mostly using Claude Sonnet 4.6 with Claude Code. It’s important to run /init and check the output, that will produce a CLAUDE.md file that describes your project (which always gets added to your context).

    Important: Review everything the AI writes, this is not a hands-off process. For bigger changes use the planning mode and split tasks up, the smaller the task the better the output.

    Claude Code automatically uses subagents to fetch information, e.g. API documentation. Nowadays it’s extremely rare that it hallucinates something that doesn’t exist. It might use outdated info and need a nudge, like after the recent upgrade to .NET 10 (But just adding that info to the project context file is enough).




  • Maybe it has changed again, but in the past I gave it a try. When 16 GB was a lot. Then when 32 GB was a lot. I always thought “Not filling up the RAM anyway, might as well disable it!”

    Yeah, no, Windows is not a fan. Like you get random “running out of memory” errors, even though with 16 GB I still had 3-4 GB free RAM available.

    Some apps require the page file, same as crash dumps. So I just set it to a fixed value (like 32 GB min + max) on my 64 GB machine.