• 4 Posts
  • 64 Comments
Joined 1 month ago
cake
Cake day: August 10th, 2025

help-circle
  • Initial thought: Well… but this is a transparently absurd way to set up an ML system to manage a vending machine. I mean it is a useful data point I guess, but to me it leads to the conclusion “Even though LLMs sound to humans like they know what they’re doing, they does not, don’t just stick the whole situation into the LLM input and expect good decisions and strategies to come out of the output, you have to embed it into a more capable and structured system for any good to come of it.”

    Updated thought, after reading a little bit of the paper: Holy Christ on a pancake. Is this architecture what people have been meaning by “AI agents” this whole time I’ve been hearing about them? Yeah this isn’t going to work. What the fuck, of course it goes insane over time. I stand corrected, I guess, this is valid research pointing out the stupidity of basically putting the LLM in the driver’s seat of something even more complicated than the stuff it’s already been shown to fuck up, and hoping that goes okay.

    Edit: Final thought, after reading more of the paper: Okay, now I’m back closer to the original reaction. I’ve done stuff like this before, this is not how you do it. Have it output JSON, have some tolerance and retries in the framework code for parsing the JSON, be more careful with the prompts to make sure that it’s set up for success, definitely don’t include all the damn history in the context up to the full wildly-inflated context window to send it off the rails, basically, be a lot more careful with how to set it up than this, and put a lot more limits on how much you are asking of the LLM so that it can actually succeed within the little box you’ve put it in. I am not at all surprised that this setup went off the rails in hilarious fashion (and it really is hilarious, you should read). Anyway that’s what LLMs do. I don’t know if this is because the researchers didn’t know any better, or because they were deliberately setting up the framework around the LLM to produce bad results, or because this stupid approach really is the state of the art right now, but this is not how you do it. I actually am a little bit skeptical about whether you even could set up a framework for a current-generation LLM that would enable to succeed at an objective and pretty frickin’ complicated task like they set it up for here, but regardless, this wasn’t a fair test. If it was meant as a test of “are LLMs capable of AGI all on their own regardless of the setup like humans generally are,” then congratulations, you learned the answer is no. But you could have framed it a little more directly to talk about that being the answer instead of setting up a poorly-designed agent framework to be involved in it.




  • “Don’t worry,” I said. “It’s just because it’s new. The novelty will wear off. And if it doesn’t, we’ll get rid of it.”

    I feel like this belongs in a horror movie

    Edit: Jumping Christ

    “I’m afraid I’m locking you in a cupboard,” I inform it after it asks if I’m ready for some fun. “Oh no,” it says. “That sounds dark and lonely. But I’ll be here when you open it, ready for snuggles and hugs.”

    Also, spoiler for the article: The kid quickly got bored and moved on from the toy because the toy kind of sucks. She is ahead of some tech CEOs I could name.


  • I think the crisis of Trump is likely to be worse than any crisis in the Western world for the last 50 years. I think the closest analogue is probably the collapse of the USSR. So yes, some of the rich people upped their wealth by orders of magnitude, and honestly you might be right that Zuck might manage to be one of that category, but also some of them lost everything or got thrown out windows, or had to survive in reduced capacity within their new walled fortresses in the horrifying new meta. I feel like more likely is that the MAGA world will remember Facebook censoring their posts about ivermectin, and not feel like Zuck needs to have a seat at the table, no matter how many ass-kissing sessions he shows up at the White House to do.

    For example I feel like breaking up Meta and mandating Truth Social and TikTok as the only new sanctioned social media going forward might be one possible outcome. It’s kind of hard to say and I won’t swear that you’re definitely wrong that he might come out way ahead in the end. I’m just saying that this type of crisis is a very different type of crisis.












  • In retrospect, Seinfeld was a very dark show. Somewhere on YouTube there is an insightful little video essay about how the first few seasons of the show are basically the story of how Elaine, a perfectly decent person, gets drawn into their little circle and over time adopts their awful selfishness and sociopathic behavior to try to fit in. How most of the problems of the show are caused by their selfishness and dishonesty, and often involve significant harm coming to someone else, and they don’t care.

    I can’t even remember which comedian it is, but someone had a bit about how the darkest joke he ever heard was a Seinfeld bit about being at the movie theater and just throwing his drink on the ground at the end for someone else to clean up. Like it’s a small thing, but the guy talking about it was genuinely alarmed by the depth of how far he genuinely just doesn’t give a fuck and doesn’t mind if you know it.