• 1 Post
  • 333 Comments
Joined 3 years ago
cake
Cake day: July 4th, 2023

help-circle
  • Absolutely 100% all of this, though with a lot of other tricks like caveman mode and careful skill files and helper scripts to help the agent quickly surgical extract out just the useful output, you can substantially reduce token burn and improve its memory.

    As well as carefully having it rollback changes everytime a fix doesn’t work, and having ut keep a markdown file log of each fix it tried and the results, so it can review each thing it tried previously.


  • Reproducing the bug with an automated test is harder, its code you can run that tests your other code.

    But allows you to just 1 click run it and get a yes/no “is this still broken” output without having to manually reproduce it by hand each time.

    Whats important is this is in the domain of what LLMs can actually work with, the output of the test is something they can parse and iterate on until it works.

    They execute the command to run the test, check the output, and keep working til the test passes.

    They can add additional tests to help isolate the problem, or strip down the existing test until its doing the absolute bare min steps to reproduce, in order to narrow the scope of whats causing it.

    But when your test involves stuff running in the kernel of an OS, your automated tests meed to effectively be code you write that bootstraps a virtual machine up and manipulates and observes that second machines kernel…

    You can do it, but its one of the most complicated forms of automated tests to design and run!


  • Yeah, LLMs are gonna spin their wheels hard when it comes to testing anything at the kernel/os level, if you dont have automated testing with a virtual machine setup to actually be able to replicate a bug, you 100% just cannot test anything they produce or say

    As soon as you have the ability to go “Okay we have a failing test, make it pass”, the LLMs get a lot less stupid, because instead of just randomly fumbling around and guessing, they have actual feedback to iterate on and can actually chew on it til they fix the issue or give up.




  • Innovating the boring stuff

    There is no innovating it, you simply have to just do it.

    This us a purely logical requirement, the cide is already abstracted to the maximum feasible point.

    You simply have to write the code that connects the output of pipe A to the input of pipe B

    This is called the Domain Rules or Business Rules, its the stuff specific to your apps needs that simply cant be abstracted further.

    If we define for example “This endpoint lets you add a person to a room, but a room cannot have more than 8 people” you cannot get around needing to somehow define this business rules in your logic.

    Even at its absolute most abstract form, its at least a couple lines of code minimum.

    Now, most api endpoints have several rules. And often apis can have hundreds of endpoints. And often businesses maintain multiple apis.

    So, 3 x ~7 x ~100 x ~3 puts you at like 6300 lines of code baseline for defining business rules.

    And then for every. single. rule. You have to write a test that positive and negative tests these rules.

    Which puts us at about 2100 rules, multiple by about easily 10 to 12 lines per test easy.

    So 11 x 2100 = 23,100 lines of code for tests, though its prolly closer to double that.

    ALL of this is extremely simple and easy to do, its just a lot of fuckin typing lol.

    AI can pump this out in about 1/10th the time I can, prolly closer to 1/20th tbh.





  • by the types of people you maybe don’t want using your code anyways

    …companies? Sure I guess, if you want to angle your career trajectory towards “unemployable” by all means lol.

    Personally anyone doing this I’m going to be more likely to use their code

    I am a tech lead, if any dev under me intentionally added/used a tool to our systems because it had malicious undocumented behaviors of any kind, they would be fired immediately and any company that contacted us for reference would be informed of their behavior.

    To be clear, this is the scenario of

    Me: hey I saw you installed [tool], that thing is flagged by our systems for the maintainers having done malicious undocumented stuff in the past

    Dev: haha yeah thats why I used it

    Me: you are joking right?

    Thatd be an instant high level escalation to “strip this person of privs and get them off our system asap, and HR now has to be involved”

    You dont fuckin do shit like that in a real company if you wanna stay employed lol.


  • Most open source maintainers never “license [any] stuff you maintain for big bucks” that is often hard to do and/or goes against the philosophy of open source entirely.

    Uhhh… no this is actually very common. Usually with scaling licenses, “free for use if your company is below [threshold]”, its super common…

    And I don’t even think this is malicious behaviour as it just nukes the code of this package and nothing else if you are not being careful yourself…

    Are you even reading what you just wrote lol.

    Being “sorta” malicious is still malicious. And companies usually have zero tolerance for that shit.

    If you don’t do version control you are not a good programmer, imo

    You really underestimate how much damage this could do then, lol…


  • They only documented it after all the outcry, which is way too late.

    Documenting it post release still counts as having released undocumented behavior.

    And if its malicious (which this 100% is), then it doesn’t fuckin matter anyways lol. You now are treated akin to a trojan maintainer by companies. You’ll get flagged as “don’t ever use anything by this person”

    Super great way to get yourself flagged and lose any opportunity in the future for possibly licensing stuff you maintain for big bucks. What company would risk paying money to someone who does childish stuff like that lol


  • How to get yourself blacklisted by large sweeps of the FOSS community:

    Step 1: Include any kind of undocumented subversive behaviour in your thing.

    That’s it, doesn’t matter what the intent is, simply by demonstrating you are willing to include anything that is remotely subversive without being open about it is usually enough to get blacklisted by a lot of people, because if you did it once… who’s to say you won’t do it again, but possibly worse next time?

    People are extremely coldly receptive to anytime a FOSS dev throws a sudden undisclosed anything in their tool, let alone one that is actively malicious.

    If I’m gonna depend on work life on anything FOSS, I ain’t touching anything like that, regardless of intent, with a 200 foot pole lol.

    All it takes is one button click to get notified:





  • What tasks make economic sense to devolve to AI?

    So, the main one I use it for, as its my job, is software development.

    I offload about 90% to 95% of my workload to AI, almost all of which is “boilerplate” code that sits in the realm of “very easy to do, but very repetitive and time consuming” which is… most of it. Thats just the reality of software dev, especially web app dev. A lot of our stuff is just plumbing “this api endpoint calls this backend logic which just maps to this basic database operation”

    IE, the POST endpoint to /users/{id} invokes the UpdateUserHandler which takes in an UpdateUserRequest which maps to an UPDATE [dbo].[Users] ... sql statement… not exactly super complicated stuff, but you do have to actually write the stuff that defines this.

    This type of work is trivial for AI, but any given project has its own set of business rules, code rules, syntax rules, formatting rules, etc etc etc.

    A naive approach is just yolo throw an AI agent at it, cross your fingers and pray that it randomly chooses to read the right stuff and happen to succeed at following your code quality and methodologies (it wont)

    The naive agent also will demolish its way through tokens as it reads way more files than it has to, because every single time you do work its basically starting all over again from ground zero with no context of wtf its doing. This wastes… so many tokens, because its gonna sit and read like 20 files just to figure out “what am I supposed to be doing here?” and this in turn pollutes its context window up so damn full it will start forgetting shit anyways.

    This is where actual tools come into play that make this stop being an issue…

    1. RAG Memory, instead of blindly searching your codebase, you can tokenize your codebase into a much easier to semantic search system, the agent can WAY faster do a simple search and get returned pointers to “Look over here”, its like creating an index of your codebase itself so the agent has a useable optimized “map” of the project.

    2. MCP Tools, which are basically tools that the agent can invoke to do… anything. Normally by default the agent is just given willy nilly access to the terminal and it’ll just fumble through trying to use that to do anything it needs to. This is a great way to fuck stuff up, especially if it also has access to shit it shouldnt (que “our agent accidently deleted our database” type shit). MCP tools allow you to build a curated set of stuff it can invoke, so instead of just doing random shit it has prefab commands to run. If you flesh out your MCP Tools well enough you can outright disable its access to the terminal entirely, because it doesnt even need it anymore. No more accidental database deletions, and it uses waaaay less tokens fumbling around in the terminal.

    3. Skills, which are special files that allow it to “lazy load” guides on “how to do x/y/z”, which you can break up into bite sized pieces. So instead of a giant AGENTS.md file that uses up half its context, even though the agent doesnt need 90% of whats in it for a given job, it can instead have a big list of “how to do this, how to do that” and it’ll load a skill for a given task its working on ad hoc, only loading in the instruction relevant to the task at hand. These are huge and critical to further reduce token usage a lot

    4. Token reduction skills (caveman being the most popular), theres certain skills people have made that outright change the way the agent behaves, namely the caveman skill. Make agent talk like this. Why more word when less word good. Less word, less tokens. Less tokens, less money. Also faster. Ungabunga. (Caveman mode can give you like 50% to 70% token reductions alone)

    5. General prompting skills, know how to prompt agents genuinely makes a big fuckin difference. Baseline thing you learn asap is NEVER correct an agent, this does not work well. Instead you should be going back in the timeline and editing your prior prompt to preemptively correct the mistake before it even happens

    Example, responding with “X is wrong” is bad. Instead, going backwards and editing your prior prompt with “And dont do X btw” is far better

    This is just a handful of stuff, this is literally the basics, but hopefully gives you an idea of how deep the rabbit hole can go, I didnt even touch on stuff like agentic workflows and agent orchestration, which is where shit really starts to pop off…