

You’ll certainly gain some valuable insight, even if it has nothing to do with your question. Which is more than I can say for LLMs.
🅸 🅰🅼 🆃🅷🅴 🅻🅰🆆.
𝕽𝖚𝖆𝖎𝖉𝖍𝖗𝖎𝖌𝖍 𝖋𝖊𝖆𝖙𝖍𝖊𝖗𝖘𝖙𝖔𝖓𝖊𝖍𝖆𝖚𝖌𝖍
You’ll certainly gain some valuable insight, even if it has nothing to do with your question. Which is more than I can say for LLMs.
Seriously, do not use LLMs as a source of authority. They are stochistic machines predicting the next character they type; if what they say is true, it’s pure chance.
Use them to draft outlines. Use them to summarize meeting notes (and review the summaries). But do not trust them to give you reliable information. You may as well go to a party, find the person who’s taken the most acid, and ask them for an answer.
Thanks! I’ve learned in jobs over the years that there are two good ways to choose names:
“Sci fi starships” is a great one! Lots of source material there; the categories basically fall out by themselves. That’s a great choice.
In the past, I used Gimli’s family tree for server names.
Oh, that’s good.
Middle Earth is a great source for this stuff, b/c Tolkien filled out the world like a historian.
Huh. I thought for sure someone else would be using my scheme.
LAN computers are all Tolkien swords: sting, orcrist, gurthang, glamdring, etc. If I run out of swords, I’ll start adding other weapons: aeglost, the spear; dailir, the arrow. We don’t get a lot of named battle axes, which I always thought weird; I’d think dwarves of all people would forge legendary axes, and certainly name them.
My WiFi and VPN networks are forests in Middle Earth: fangorn, bindbole, dimholt, lothlorien, etc. The only exception is my LAN itself which is… “lan”. Because short.
My cloud VPSes are named after Greek Titans: hyperion, phaethusa, tethys, etc.
Mobile devices have whatever names they come with, because they’re so ephemeral.
Do you need a web app, or would a mobile app do? There are a number of medication trackers and one specifically for tracking stuff like this called Track & Graph. The DB export is a SQLite DB, which can be SyncThing’ed to a computer and worked with with whatever rolls you like.
If I suggested helix + dvalv on your desktop, Markor + Valv on your phone, and SyncThing for syncing… would that be sufficient? It’s basically what I use, but without the dvalv/Valv component because my notes are never hosted outside of my phone or desktop, so I don’t need the encryption.
The missing piece would be a “calendar view”; I’d have to think about that one. Myself, my entries are named by ISO 8601, so sorting by timestamp or name both work. But if you want a traditional grid calendar with, like, colored days for the days with entries, that’d require another program.
It sounds to be, though, that you’ve set up your requirements such that you really just want a web app, with client-side encryption. I’m old enough to have learned the value of application-independent data storage; SQLite is about a complex as I’m comfortably with, and flat files in a directory are even better.
Oh, color laser is the way to go, for sure. Refills are expensive, but rare; the biggest problem is if you have to move them, they’re a nightmare. And far heavier than inkjet. But, all things being equal, I’d take a color, duplex laser any day.
You’re not the first person I’ve heard who’s had trouble with Ecotanks. I’ve been very fortunate and have not had any issues. I did learn that you need to print at least once a week or the heads tend to clog; the downside of never replacing the heads with the cartridges, I guess. But now I just have a cron job that prints a test page once a week and it’s fine.
Both Ecotanks and laser eliminate that “print anxiety”, where you’re afraid to use the device because each page costs $2 because of the cartridges costs.
To paraphrase Quint: “I’ll never replace a cartridge again.”
What do you mean, decouple DNS and domain? The registrar is the authoritative source for DNS - how do you bypass that?
Epson Ecotanks. Liquid ink in, prints out. There’s nothing to lock out.
I’ve domains from both DomainMonger and NameCheap. If it were trivial, I’d probably move my domains to NameCheap. The web UX is a little better; aside from that, I’ve never had issues with either, not heard anything particularly bad about either.
But, yeah: +1 on the NameCheap suggestion.
That’s how it works. Wake-on-LAN wakes the computer if the computer receives a network request. Which is the same thing you’re asking for, right?
I’ve been using Contabo. German company, several geographic locations for your nodes, reasonably priced.
Restic to BackBlaze. B2 support is built in to restic, so all you need is an account and credentials.
Most of my home data - servers, PCs - I back up to HD and B2. I have a few VPS I only back up to B2.
I’ve never had a Cyberpower that hasn’t worked just find with but. Nuts a PITA to configure, but other than that it likes Cyberpower. I have that model - without the 3-R - and it’s great.
I would be extremely surprised if the 3-R version didn’t work. With Cyberpower, I don’t even bother to look up compatability. I bought 3 EC850LCDs blind, for the router and a couple other servers around the house. They all came up just fine.
In the Verge article, are you talking about the table the the “presumably” qualifier in the table column headers? If so, not only is it clear they don’t know what, exactly, is a attributable to the costs, but also that they mention “gross pay”, which is AKA “compensation.” When a company refers to compensation, they include all benefits: 401k contributions, the value of health insurance, vacation time, social security, bonuses, and any other benefits. When I was running development organizations, a developer who cost me $180k was probably only taking $90k of that home. The rest of it went to benefits. The rule of thumb was for every dollar of salary negotiated, I had to budget 1.5-2x that amount. The numbers in “Presumably: Gross pay” column are very likely cost-to-company, not take-home pay.
I have some serious questions about the data from “h1bdata.info”. It claims one software engineer has a salary of $25,304,885? They’ve got some pretty outlandish salaries in there; a program manager in NY making $2,400,000? I’m sceptical about the source of the data on that website. The vast number of the salaries for engineers, even in that table, are in the range of $100k - 180k, largely dependent on location, and a far cry from a take-home salary of 500,000€.
Nobody is paying software developers 500.000€. It might cost the company that much, but no developers are making that much. The highest software engineer salaries are still in the US, and the average is $120k. High-end salaries are $160k; you might creep up a little more than that, but that’s also location specific. Silicon Valley salaries might be higher, but then, it costs far more to live in that area.
In any case, the question is ROI. If you have to spend $500,000 to address some sites that are being clever about wasting your scrapers’ time, is that data worth it? Are you going to make your $500k back? And you have to keep spending it, because people keep changing tactics and putting in new mechanisms to ruin your business model. Really, the only time this sort of investment makes sense is when you’re breaking into a bank and are going to get a big pay-out in ransomware or outright theft. Getting the contents of my blog is never going to be worth the investment.
Your assumption is that slowly served content is considered not worth scraping. If that’s the case, then it’s easy enough for people to prevent their content from being scraped: put in sufficient delays. This is an actual a method for addressing spam: add a delay in each interaction. Even relatively small delays add up and cost spammers money, especially if you run a large email service and do it at scale.
Make the web a little slower. Add a few seconds to each request, on every web site. Humans might notice, but probably not enough to be a big bother, but the impact on data harvesters will be huge.
If you think this isn’t the defense, consider how almost every Cloudflare interaction - and an increasingly large number of other sites - are including time-wasting front pages. They usually say something like “making sure you’re human” with a spinning disk, but really all they need to be doing is adding 10 seconds to each request. If a scraper of trying to indeed only a million pages a day, and each page adds a 10s delay, that’s wasting 2,700 hours of scraper computer time. And they’re trying to scrape far more than a million pages a day; it’s estimated (they don’t reveal the actual number) that Google indexes billions of pages every day.
This is good, though; I’m going to go change the rate limit on my web server; maybe those genius software developers will set a timeout such that they move on before they get any content from my site.
Start it up before you go to bed. If it isn’t indexed when you wake up, it’s just not going to work for you.
Jellyfin is pretty good about preserving the index; you only really pay a cost during that first start up, or if you shuffle content around on the storage. Otherwise, it only indexes new stuff, which should be mostly not noticeable.
Ah, that’s where tuning comes in. Look at the logs, take the average time-out, and tune the tarpit to return a minimum payload consisting of a minimal HTML containing a single, slightly different URL back to the tar pit. Or, better yet, JavaScript that loads a single page of tarpit URLs very slowly. Bots have to be able to run JS, or else they’re missing half the content on the web. I’m sure someone has created a JS forkbomb.
Variety is the spice of life. AI botnet blacklists are probably the better solution for web content; you can run ssh on a different port and run a tarpit on the standard port, and it will barely affect you. But for the web, if you’re running a web server you probably want visitors, and tarpits would be harder to set up to catch only bots.
That is a perfect description.