Off-and-on trying out an account over at @[email protected] due to scraping bots bogging down lemmy.today to the point of near-unusability.

  • 56 Posts
  • 2.25K Comments
Joined 2 years ago
cake
Cake day: October 4th, 2023

help-circle

  • What makes this worse is that git servers are the most pathologically vulnerable to the onslaught of doom from modern internet scrapers because remember, they click on every link on every page.

    The especially disappointing thing is that, for the specific case that Xe was running into, a better-written scraper could just recognize that this is a public git repository and just git clone the thing and get all the useful code without the overhead. Like, it’s not even “this scraper is scraping data that I don’t want it to have”, but “this scraper is too dumb to just scrape the thing efficiently and is blowing both the scraper’s resources and the server’s resources downloading innumerable redundant copies of the data”.

    It’s probably just as well, since the protection is relevant for other websites, and he probably wouldn’t have done it if he hadn’t been getting his git repo hammered, but…

    EDIT: Plus, I bet that the scraper was requesting a ton of files at once from the server, since he said that it was unusable. Like, you have a zillion servers to parallelize requests over. You could write a scraper that requested one file at once per server, which is common courtesy, and you’re still going to be bandwidth constrained if you’re schlorping up the whole Internet. Xe probably wouldn’t have even noticed.


  • https://en.wikipedia.org/wiki/National_Helium_Reserve

    The National Helium Reserve, also known as the Federal Helium Reserve, was a strategic reserve of the United States, which once held over 1 billion cubic meters (about 170,000,000 kg)[a] of helium gas.

    The Bureau of Land Management (BLM) transferred the reserve to the General Services Administration (GSA) as surplus property, but a 2022 auction[10] failed to finalize a sale.[11] On June 22, 2023, the GSA announced a new auction of the facilities and remaining helium.[12] The auction of the last helium assets was due to take place in November, 2023.[13] Though the last of the Cliffside reserve was to be sold by November 2023, more natural gas was discovered at the site than was previously known, and the Bureau of Land Management extended the auction to January 25, 2024 to allow for increased bids.[14] In 2024 the remaining reserve was sold to the highest bidder, Messer Group.[15]

    Arguably not the best timing on that.


  • Sure. What that guy is using is actually not the most-interesting diagram style, IMHO, for automatic layout of network maps, if you want large-scale stuff, which is where the automatic layout gets more interesting. I have some scripts floating around somewhere that will generate very large network maps — run a bunch of traceroutes, geolocate IPs, dump the results into an sqlite database, and then generate an automatically laid-out Internet network map. I don’t want to go to the trouble of anonymizing the addresses and locations right now, but if you have a graphviz graph and want to try playing with it, I used:

    goes looking

    Ugh, it’s Python 2, a decade-and-a-half old, and never got ported to Python 3. Lemme gin up an example for the non-hierarchical graphviz stuff:

    graph.dot:

    graph foo {
        a--b
        a--d
        b--c
        d--e
        c--e
        e--f
        b--d
    }
    

    Processed with:

    $ sfdp -Goverlap=prism -Gsep=+5 -Gesep=+4 -Gremincross -Gpack -Gsplines=true -Tpdf -o graph.pdf graph.dot
    

    Generates something like this:

    That’ll take a ton of graphviz edges and nicely lay them out while trying to avoid crossing edges and stuff, in a non-hierarchical map. Get more complicated maps that it can’t use direct lines on, it’ll use splines to curve lines around nodes. You can create massive network maps like this. Note that I was last looking at graphviz’s automated layout stuff about 15 years ago, so it’s possible that they have better layout algorithms now, but this can deal with enormous numbers of nodes and will do reasonable things with them.

    I just grabbed his example because it was the first graphviz network map example that came up on a Web search.


  • tal@lemmy.todaytoTechnology@lemmy.worldDigg Shut Down Again
    link
    fedilink
    English
    arrow-up
    9
    arrow-down
    1
    ·
    22 hours ago

    We faced an unprecedented bot problem

    When the Digg beta launched, we immediately noticed posts from SEO spammers noting that Digg still carried meaningful Google link authority. Within hours, we got a taste of what we’d only heard rumors about. The internet is now populated, in meaningful part, by sophisticated AI agents and automated accounts. We knew bots were part of the landscape, but we didn’t appreciate the scale, sophistication, or speed at which they’d find us. We banned tens of thousands of accounts. We deployed internal tooling and industry-standard external vendors. None of it was enough. When you can’t trust that the votes, the comments, and the engagement you’re seeing are real, you’ve lost the foundation a community platform is built on.

    This isn’t just a Digg problem. It’s an internet problem. But it hit us harder because trust is the product.

    It’s a social media problem. It’s going to be hard to provide pseudonymity, low-cost accounts relatively freely, and counter bots spamming the system to manipulate it. The model worked well in an era before there were very human-like bots that were easy to produce.

    It might be possible to build webs of trust with pseudonyms. You can make a new pseudonym, but the influence and visibility gets tied to, for example, what users or curators that you trust trust, so the pseudonym has less weight until it acquires reputation. I do not think that a single global trust “score” will work, because you can always have bot webs of trust.

    Unfortunately, the tools to unmask pseudonyms are also getting better, and throwing away pseudonyms occasionally or using more of them is one of the reasonable counters to unmasking, and that doesn’t play well with relying more on reputation.






  • You have all your devices attached to a console server with a serial port console set up on the serial port, and if they support accessing the BIOS via a serial console, that enabled so that you can access that remotely, right? Either a dedicated hardware console server, or some server on your network with a multiport serial card or a USB to multiport serial adapter or something like that, right? So that if networking fails on one of those other devices, you can fire up minicom or similar on the serial console server and get into the device and fix whatever’s broken?

    Oh, you don’t. Well, that’s probably okay. I mean, you probably won’t lose networking on those devices.


  • You have remote power management set up for the systems in your homelab, right? A server set up that you can reach to power-cycle other servers, so that if they wedge in some unusable state and you can’t be physically there, you can still reboot them? A managed/smart PDU or something like that? Something like one of these guys?

    Oh. You don’t. Well, that’s probably okay. I mean, nothing will probably go wrong and render a device in need of being forcibly rebooted when you’re physically away from home.


  • You have squid or some other forward http proxy set up to share a cache among all the devices on your network set up to access the Web, to minimize duplicate traffic?

    And you have a shared caching DNS server set up locally, something like BIND?

    Oh. You don’t. Well, that’s probably okay. I mean, it probably doesn’t matter that your devices are pulling duplicate copies of data down. Not everyone can have a network that minimizes latency and avoids inefficiency across devices.






  • Should have been more clear about the remote part. The systems operate remotely from me, I can access them via the internet. The users need to use the screen to operate it. This is just a windows 11 computer after all.

    Ah, okay. So then they aren’t getting any video display from the BIOS when the problem comes up. Okay, yeah, then that’s pretty convincing that it’s early in the boot process. So, yeah, OS probably isn’t a factor.

    But 0 packets are getting sent from the computer. I believe those are ARP, but the router page doesnt define it. Just has a table with packets in and packets out. Packets going in will usually have a couple from the router. Always 0 with packets coming out. Theres actually a ping function built into the router, and that doesnt respond at all.

    Okay. So, strictly-speaking, “frames” are what one calls things at the Ethernet level, and “packets” at the IP level, so if one assumes that the router page is actually being technically-correct, it’s possible that the computer is still doing things at the Ethernet level. But, yeah, gotcha.

    considers

    Well, let’s see. This is kinda more at the brainstorming level. You said that you stuck the thing in a fridge, so I’m assuming that the user is at a colder-than-normal situation, not warmer. I guess you’ve got:

    Temperature. You said that you tried that. If it’s colder than normal specifically around the time that the problem shows up, you might also consider humidity – if it’s high humidity, then condensation inside electrical devices can be a problem. Like, especially if this thing is outside and you’re getting (electrically-conductive) dew forming on surfaces in the morning or something like that, that can wreak havoc on electrical devices. I don’t know what the best way to diagnose that would be. A hygrometer will tell you the relative humidity. If it’s in the open, maybe leave a small space heater aimed at the system, which should produce lower-humidity air where the air is warmer than the surrounding air, and see if the issue goes away. If it’s in an enclosed space, maybe run a dehumidifier.

    You’ve got the possibility of problems coming in from your external electrical lines — you have at least serial, power, and networking going into that machine. You might try, for troubleshooting purposes, if it arises again, having the user pull all of the lines connected to external stuff other than power, including the Ethernet cable and that serial thing, and seeing if it becomes impossible to reproduce then.

    You said that some motor was involved. If you’re talking some kind of industrial setup with other machinery around, I guess something could theoretically be emitting some kind of strong electrical field that creates problems. I’ve never heard of a situation where a PC won’t work because of that, but I’d imagine that it’s possible.

    I once recall hearing about a situation at a company I was working at where our support people had problems with vibration in a customer’s environment affecting the device — they couldn’t reproduce the problem back at the company. That took them some time to work out.

    Some of that’s pretty exotic, but if you’re just looking for potential leads to consider, that’s all that immediately comes to mind.



  • This particular port drives a DC motor. Power is injected to drive the motor.

    On one hand, that does sound suspicious. It’s an external factor and it’s unusual.

    But on the other hand, you said that having it not plugged in doesn’t consistently see the thing working, which doesn’t really mesh with what I’d expect if it’s the cause.

    You can get optical isolators for serial ports. Those basically run the signal into an LED and then to another LED, which keeps two separate isolated electrical system; those will eliminate ground loops. I haven’t looked into them for serial ports, but I was looking at hooking up a breadboard to USB at one point, and there it seems to be kinda a best practice for USB device development work, to keep any mistakes from damaging a connected computer system.

    https://www.amazon.com/rs232-isolator/s?k=rs232+isolator

    In your case, it sounds like you are also using the serial port as a power supply, so you might want something that can provide external power:

    https://www.amazon.com/External-powered-Repeater-Mini-size-PhotoElectric-Full-line/dp/B00GI9GRMC

    It might also be possible to use an isolation transformer, which would be simpler and possibly cheaper. That would, I believe, provide ground isolation without providing protection against more exotic things, like a short in your external device frying a serial controller. I’ve used isolation transformers for coax TV to avoid ground loops, so I imagine that it must be possible to have them handle serial port speeds. But when I search for “RS-232 isolation”, everything I see seems to be opto-electric.

    If you can change the serial port being used, maybe also use a different serial port interface, like a USB serial interface.


  • Hmm. First, I’m a little fuzzy on the symptoms. To clear that up:

    You say “that operate remotely”, which sounds like it’s a headless server, but you also say “there is no issues with the monitor”. Do you mean that they attached a monitor to a video output and that they could see video display?

    You have:

    We’re unable to access the bios when the computer stops working.

    Like, is this via a video output, serial port, or some kind of dedicated hardware management system?

    As to troubleshooting, when you say “no packets”, and since it sounds like you have access to and familiarity with the router, you mean not even stuff like ARP? Like, it’s not “I’m not pinging anything”, but “no Ethernet frames have reached the router on that port?”

    I’m…a little confused about you saying that the NIC is negotiating speed, because I’d have thought that on typical systems, a NIC wouldn’t be doing that until the OS is up and tells the NIC to become active. But it sounds from your systems like whatever is happening is happening pretty early in the boot process, before the OS or even bootloader is doing anything. Do you have the BIOS set up to make use of the NIC in some way, like using DHCP or BootP or something to do network booting?

    EDIT: Or wake-on-LAN?

    EDIT2: Because if you do, and don’t actually need wake-on-LAN or network booting functionality, I’d think that I’d try turning it off to see whether the issue might vanish.