#freedesktop on 2025-08-15 — irc logs at oftc.catirclogs.org

2024-07-16 04:52 ChanServ changed the topic of #freedesktop to: https://www.freedesktop.org infrastructure and online services || for questions about freedesktop.org projects, please see each project's contact || for discussions about specifications, please use https://gitlab.freedesktop.org/xdg or xdg@lists.freedesktop.org

00:00 alpernebbi has quit [Ping timeout: 480 seconds]

00:04 Traneptora_ has quit [Ping timeout: 480 seconds]

00:11 alpernebbi has joined #freedesktop

00:52 scrumplex has joined #freedesktop

00:56 scrumplex_ has quit [Ping timeout: 480 seconds]

02:24 JanC is now known as Guest24195

02:24 JanC has joined #freedesktop

02:28 Guest24195 has quit [Ping timeout: 480 seconds]

02:53 JanC is now known as Guest24196

02:53 JanC has joined #freedesktop

02:56 Guest24196 has quit [Ping timeout: 480 seconds]

03:00 snetry has joined #freedesktop

03:06 sentry has quit [Ping timeout: 480 seconds]

03:19 swatish2 has joined #freedesktop

03:28 Zathras_11 has joined #freedesktop

03:29 swatish2 has quit [Ping timeout: 480 seconds]

03:30 Traneptora has joined #freedesktop

03:32 Zathras has quit [Ping timeout: 480 seconds]

03:47 swatish2 has joined #freedesktop

03:56 swatish2 has quit [Ping timeout: 480 seconds]

04:31 AbleBacon has quit [Remote host closed the connection]

05:23 Kayden has joined #freedesktop

05:46 JanC has quit [Ping timeout: 480 seconds]

05:48 tzimmermann has joined #freedesktop

06:07 sima has joined #freedesktop

06:17 haaninjo has joined #freedesktop

06:21 swatish2 has joined #freedesktop

06:22 alarumbe has quit []

06:57 todi1 has joined #freedesktop

07:02 todi has quit [Ping timeout: 480 seconds]

07:16 karolherbst has quit [Ping timeout: 480 seconds]

07:26 karolherbst has joined #freedesktop

07:43 ximion has quit [Remote host closed the connection]

07:53 swatish2 has quit [Ping timeout: 480 seconds]

08:47 JanC has joined #freedesktop

08:49 kasper93_ has joined #freedesktop

08:49 kasper93 is now known as Guest24211

08:49 kasper93_ is now known as kasper93

08:57 Guest24211 has quit [Ping timeout: 480 seconds]

09:02 kasper93 has quit [Ping timeout: 480 seconds]

09:06 kasper93 has joined #freedesktop

09:20 kasper93_ has joined #freedesktop

09:20 kasper93 is now known as Guest24213

09:20 kasper93_ is now known as kasper93

09:22 kasper93_ has joined #freedesktop

09:22 kasper93 is now known as Guest24214

09:22 kasper93_ is now known as kasper93

09:24 kasper93_ has joined #freedesktop

09:24 kasper93 is now known as Guest24215

09:24 kasper93_ is now known as kasper93

09:26 Guest24213 has quit [Ping timeout: 480 seconds]

09:29 Guest24214 has quit [Ping timeout: 480 seconds]

09:31 Guest24215 has quit [Ping timeout: 480 seconds]

09:47 <karolherbst> daniels: I was thinking about the smtp credentials thing again and... couldn't we just do it via ssh port forwarding? Like we just create a tunnel to the server and then send the email through that, so it looks like it comes from our servers anyway

09:47 <karolherbst> uhm..

09:47 <karolherbst> or isn't it how it works 🙃

10:02 <Mithrandir> you could do that, or make /usr/sbin/sendmail on the sending host just ssh to wherever and call sendmail on the other side.

10:10 <karolherbst> yeah but sendmail is a terrible interface

10:11 <karolherbst> and the point was to not use it in the first place

10:13 <pinchartl> karolherbst: just for my information, what's the issue with leaking the sender's IP ?

10:13 <karolherbst> pinchartl: protecting the coc member you specifically sent out the email?

10:13 <karolherbst> like if it's an IP in $country, the person might figure who it was and just target them specifically

10:14 <karolherbst> or if it's an IP to a company network

10:14 <karolherbst> or...

10:14 <pinchartl> so it's to be able to send mails that originate from a group, without identifying the person who pressed the button. got it

10:14 <karolherbst> exactly

10:15 <karolherbst> but sending emails via ssh on the cli on the server sucks for many reasons

10:17 <Mithrandir> karolherbst: your email client is likely able to call sendmail on the host you're running it, was my point.

10:18 <karolherbst> uhhhhh I kinda want a reliable solution tbh

10:19 <karolherbst> like I suspect "email client using sendmail" is de facto dead code and untested

10:20 jsa1 has joined #freedesktop

10:27 <Mithrandir> I obviously can't speak for every client under the sun, but I'd assume _lots_ of people use that for most of the clients.

10:37 <karolherbst> huh? why would they if they have imap and smtp?

10:37 <karolherbst> like yeah, git sendmail is used, but besides that?

10:38 <karolherbst> and even that uses smtp

10:39 <karolherbst> though I think that uses sendmail under the hood

10:40 <karolherbst> heh.. actually it also does smtp natively

10:40 <karolherbst> yeah.. so if you specify smtpserver with git send-email it does not use sendmail

10:42 <karolherbst> but anyway, I don't understand why it's controversial wanting to use smtp, given that's what most people do and is used by clients, where sendmail is actually not

10:42 <karolherbst> like thunderbird doesn't seem to support sendmail either

10:45 <karolherbst> Evolution seems to have terrible integration where you need to build the argument list yourself for anything not being basic "send to one recipient" situations...

10:51 <karolherbst> anyway.. what needs to be done to make smtp work?

10:59 vkareh has joined #freedesktop

11:10 <daniels> karolherbst: wanting to use SMTP isn't controversial, but it doesn't mean it's really easy to do and people have the time on their hands to go figure out how to make it work

11:10 <karolherbst> sure, I understand that, and I'd be up for helping out with that

11:30 <karolherbst> daniels: anyway, what's like the main thing that needs to be done to wire it all up for smtp? Creating the account? Anything else?

11:34 <daniels> karolherbst: right, you can use saslpasswd2 on gabe to create an account to be able to use SMTP from - what needs to be done is figuring out how it is that you don't immediately leak the Received header

11:34 <karolherbst> daniels: yeah.. that's something I can play around with. The I had was to use ssh port forwarding and see if that helps

11:34 <karolherbst> so from gabe it looks like it's coming from a local account

11:35 <daniels> sure, go for it

11:43 <karolherbst> daniels: I don't think I have enough permissions for the account creation part

11:45 <Mithrandir> just do ssh -L 1234:localhost:25 gabe and then use localhost:1234 as the smtp server, no extra accounts needed, and nothing should leak.

11:47 <karolherbst> Mithrandir: but I need to do it with the account I want to send from, right?

11:48 <karolherbst> but yeah.. let me try that

11:50 <Mithrandir> no, any account should be fine

11:51 <karolherbst> ahh, okay

11:51 <karolherbst> that makes it easier

11:54 guludo has joined #freedesktop

12:00 <karolherbst> Mithrandir: I'm getting "channel 3: open failed: administratively prohibited: open failed"

12:00 <Mithrandir> maybe we need to allow your user to do port forwarding

12:02 <karolherbst> Mithrandir: maybe it would be easier to do it on the conduct account and then we can manage ssh keys there, so it's not tied to individuals? Though for testing would be good enough if I can do it with my own account

12:03 <Mithrandir> or socat tcp-listen:1234,reuseaddr,fork EXEC:'ssh gabe socat - TCP4-CONNECT:localhost:25' maybe

12:06 <karolherbst> "E EXEC: wrong number of parameters (3 instead of 1): usage: EXEC:<command-line>"

12:07 <karolherbst> uhm...

12:07 <karolherbst> let me check first if it's not a local mess up :D

12:08 <karolherbst> nah, seems fine

12:09 kasper93 has quit [Ping timeout: 480 seconds]

12:11 kasper93 has joined #freedesktop

12:15 <Mithrandir> socat TCP-LISTEN:1234,reuseaddr,fork "EXEC:ssh gabe.freedesktop.org nc localhost 25" works for me

12:18 <karolherbst> Mithrandir: that worked, thanks!

12:20 <karolherbst> "client-ip=131.252.210.177" sounds about right

12:29 jsa1 has quit [Ping timeout: 480 seconds]

12:48 jsa1 has joined #freedesktop

13:48 AbleBacon has joined #freedesktop

14:05 jsa2 has joined #freedesktop

14:08 jsa1 has quit [Ping timeout: 480 seconds]

14:58 alarumbe has joined #freedesktop

15:44 ximion has joined #freedesktop

15:58 noodlez1232 has quit [Remote host closed the connection]

16:00 noodlez1232 has joined #freedesktop

16:09 jsa1 has joined #freedesktop

16:10 jsa2 has quit [Ping timeout: 480 seconds]

17:02 tzimmermann has quit [Quit: Leaving]

17:40 jsa1 has quit [Ping timeout: 480 seconds]

18:05 xe has joined #freedesktop

18:14 sentry has joined #freedesktop

18:16 snetry has quit [Ping timeout: 480 seconds]

18:22 ___nick___ has joined #freedesktop

18:30 guludo has quit [Ping timeout: 480 seconds]

18:39 guludo has joined #freedesktop

19:43 snetry has joined #freedesktop

19:48 sentry has quit [Ping timeout: 480 seconds]

20:03 scrumplex_ has joined #freedesktop

20:03 ___nick___ has quit [Remote host closed the connection]

20:04 i-garrison has quit []

20:07 scrumplex has quit [Ping timeout: 480 seconds]

20:09 ybogdano has quit [Remote host closed the connection]

20:09 ybogdano has joined #freedesktop

20:26 ___nick___ has joined #freedesktop

20:42 vkareh has quit [Quit: WeeChat 4.7.0]

20:43 Caterpillar has quit [Remote host closed the connection]

20:43 Caterpillar has joined #freedesktop

20:44 ___nick___ has quit []

20:46 <DragoonAethis> Some really bad news: https://social.anoxinon.de/@Codeberg/115033790447125787

20:46 ___nick___ has joined #freedesktop

20:47 ___nick___ has quit []

20:49 ___nick___ has joined #freedesktop

20:49 i-garrison has joined #freedesktop

21:08 haaninjo has quit [Quit: Ex-Chat]

21:25 andy-turner has joined #freedesktop

21:25 <Consolatis> did the crawlers finally figure out how to modify their user agent?

21:25 <karolherbst> nah, they just start solving the math problem apparently

21:25 <dwfreed> ^ exactly that

21:27 ___nick___ has quit [Remote host closed the connection]

21:28 andy-turner has quit []

21:36 <Consolatis> well, that was to be expected at some point. The fun starts with completely bypassing Anubis specifically at which point it will only bother legit users while not interfering with any crawlers

21:56 <pinchartl> time to make nvidia accountable for destroying the web ?

22:04 <Consolatis> well, or in case of gitlab.fdo to require a login to access CPU / database intensive pages which can (or should not) be cached

22:15 <pinchartl> those are not mutually exclusive options

22:21 <karolherbst> if there is one thing I'm sure of, then it's logins are doing nothing against bots

22:21 <karolherbst> most of the accounts on our gitlab are bots anyway

22:22 <Consolatis> it defeats non-targeted attacks

22:22 <karolherbst> not really

22:22 <karolherbst> you think those sign up by hand?

22:23 <karolherbst> since anubis got deployed we are seeing a lot less sign-ups as well

22:23 <Consolatis> if a scaper registers an account it is (by my definition) targeted

22:23 <karolherbst> well you can automate it

22:23 <karolherbst> and detecting gitlab isn't even hard

22:24 <Consolatis> if it really turns out to be an issue even with account requirement then one could also start tracking requests per hour per account + IPs used within $timespan per account and react accordingly

22:24 <karolherbst> well then they'll create 10000 accounts across 10000 IPs

22:25 alanc has quit [Remote host closed the connection]

22:25 <pinchartl> 10000 ? if only it was that little

22:25 <karolherbst> they don't even use their own machine for the scrapping

22:25 <pinchartl> some scrapers use "vpn" networks that claim 100M residential IPs

22:26 <karolherbst> all the mitigations you'd come up with, they'll already found a way to get around

22:26 alanc has joined #freedesktop

22:26 <Consolatis> doesn't gitlab.fdo require a capture for account creation?

22:26 <Consolatis> captcha*

22:26 <karolherbst> bots solve captches better than humans

22:26 <pinchartl> captchas have long been defeated

22:26 <karolherbst> "all the mitigations you'd come up with, they'll already found a way to get around"

22:26 <karolherbst> I meant it

22:27 <Consolatis> if that would be the case.. then why are we still using those captchas and similar annoyances? to make human experience worse?

22:27 <karolherbst> yep

22:27 guludo has quit [Ping timeout: 480 seconds]

22:27 <karolherbst> or to train AI

22:28 <karolherbst> it's even worse.. some also sell you solving captchas services

22:28 <karolherbst> super cheap

22:28 <karolherbst> sometimes it's machines

22:28 <karolherbst> sometimes it's people crammed in sweatshops

22:30 <karolherbst> soo one might even say it's unethical to deploy captchas even

22:30 jsa1 has joined #freedesktop

22:31 <pinchartl> captchas are still useful to address the low-hanging fruits

22:31 <pinchartl> but certainly not the professional bots

22:31 <karolherbst> yeah but compared to the AI crawlers they don't really matter anymore

22:31 <Consolatis> well, lets see how long it takes the crawlers to bypass Anubis rather than just solving it (be it via custom implementation or a 2nd more expensive tier of crawlers using actual headless browsers)

22:32 <karolherbst> well the point of aanubis was, that it forces you to deploy something capable of doing real JS

22:32 <karolherbst> which most crawlers just didn't do

22:32 <Consolatis> there are other ways to do that, no computation required

22:32 <karolherbst> the entire point is just to make use of the fact that those crawlers are developed really badly

22:33 ybogdano has quit [Quit: The Lounge - https://thelounge.chat]

22:33 <karolherbst> Consolatis: I mean you are free to come up with another alternative that works, I'm sure everybody would want to use it

22:34 <Consolatis> whenever a larger group uses the same "alternative" it becomes a big player and thus a target to attack. site specific behavior seems way more scalable

22:35 <karolherbst> not gonna patch gitlab

22:35 <karolherbst> but sure, every website on earth could come up with their own mitigation and then nobody else would have time to do anything else anymore

22:36 <karolherbst> only to get defeated anyway

22:36 ybogdano has joined #freedesktop

22:36 <karolherbst> anyway, I assure you that all the basic stuff doesn't work these days

22:37 sima has quit [Ping timeout: 480 seconds]

22:43 <Consolatis> it certainly helps if software is written in away which takes crawlers into account and minimizes CPU and database queries; delivering static content is cheap. but I assume > "not gonna patch gitlab" is the main issue here

22:46 <Consolatis> as an example, an MR only updates on certain changes like git pushes, comments, label changes and so on. Instead of a giant query (or a bunch of simpler ones via XHR), each change can update static content (redis / actual file / ..) which then gets delivered to any client until the next change

22:46 <Consolatis> so it scales with the amount of MR changes rather than with the amount of users

22:49 <Consolatis> then there are things like git blame which personally I would simply disable completely, there is git itself for that

22:52 <karolherbst> I think you kinda miss the point here

22:52 <karolherbst> it doesn't matter if something is cheap or not

22:52 <karolherbst> if every endpoint is cheap, they'll just crawl even more

22:53 jturney has quit [Ping timeout: 480 seconds]

22:53 <Ermine> cheap * millions of them = not so cheap

22:54 <Consolatis> why would a single AI crawler crawl the same page twice? it would be against its own goal to crawl as much of the web as possible

22:54 <karolherbst> it's like with roads, or processing power. The more capacity you have, the more usage you'll see

22:55 <Consolatis> I don't buy this argument

22:55 <karolherbst> Consolatis: :D

22:55 <karolherbst> they crawl the same page millions of times

22:55 <karolherbst> they don't care

22:55 <karolherbst> they crawl as often and as much as they want

22:56 <karolherbst> don't try to apply your logic to them

22:56 <karolherbst> you'll only lose the argument

22:57 <karolherbst> if you tihnk it's too stupid to do it, they do it anyway

22:58 <Consolatis> in that case its still better to have more requests of the same crawler which in total is still not really impacting the important resources (outside of bandwidth) rather than have less requests which bring everything to a stop

22:59 <karolherbst> they'll just crawl even more, seriously

22:59 <karolherbst> they don't care if they put systems under full load, they'll just throw more crawlers at it anyway

23:00 <karolherbst> it's not just gitlab that gets crawled to death

23:00 <karolherbst> but also pages like lwn

23:00 <Consolatis> well, let them in that case. I think this is extremely hypothetical though. AI crawler operators have a goal, requesting the same page of and over again doesn't make sense for them and if that happens its a bug. They are not simply "evil" and want to annoy people but rather want to archive something

23:01 <karolherbst> "hyptothetical"? Because you don't believe it's already happening, which it is

23:01 <karolherbst> with your logic any website with only cheap resources shouldn't be crawled to death

23:01 <karolherbst> but they are

23:02 <karolherbst> you should just throw all your assumptions away

23:02 <karolherbst> "crawler operators have a goal", yes, crawl every website as often as possible, as much as possible. That's the goal

23:03 <karolherbst> they might not be evil, but they don't care if they dos your webpage

23:03 <Consolatis> i think their goal is more like "crawl as much of the internet as possible"

23:03 <karolherbst> no

23:03 <karolherbst> "as much and as often"

23:03 <Consolatis> which contradicts "crawl a single website as often as possible"

23:03 <karolherbst> crawling the same page 1000000 times? yes, they do that

23:04 <karolherbst> you can pretend they don't, but...

23:04 <Consolatis> do you have any links for that claim?

23:04 <karolherbst> again.. don't apply your logic, you'll be wrong

23:05 <karolherbst> I've talked with admins about it

23:05 <karolherbst> like, I'm not making any assumptions here

23:05 <karolherbst> I'm just telling what people see

23:07 <Consolatis> so admins told you a single crawler (same set of IPs?) requested the same URLs over and over again?

23:07 <Consolatis> e.g. a million times

23:09 sentry has joined #freedesktop

23:10 <karolherbst> they hide behind a residential IP botnet...

23:10 <Consolatis> right. so how do you know its the same crawler then?

23:10 <karolherbst> fingerprinting

23:11 <Consolatis> that just detects the software stack / family of crawler (if at all)

23:12 <Consolatis> but its people paying for their use that have a goal, that is the metric

23:12 <karolherbst> not in the mood of simply contining this argument if all you want to be is right. Maybe just accept that your assumptions simply don't hold true and that "resource optimizing endpoints" won't help at all, because low overhead pages die under the weight just as much as gitlabs

23:12 <karolherbst> I don't even understand why it matters to you that much if your assumptions are true or not, because the end result is, that everybody gets crawled to death regardless of what they run

23:13 <karolherbst> but if you don't want to listen, then I'm not going to waste my time any further explaining it

23:13 snetry has quit [Ping timeout: 480 seconds]

23:14 <DragoonAethis> Consolatis: I don't have links, but I do have Patchwork access logs, and yes, the same pages were hit over and over again

23:15 <DragoonAethis> It's not redownloading the same page in a loop, it's more like multiple companies independently scraping the same (expensive to generate server-side) content, and then scheduling a crawl of all visible links, then refreshing their scrapes once every few days

23:17 <Consolatis> karolherbst: I am just a bit annoyed by pointing fingers to a known "bad guy" and then ignoring the actual technical reasons for massive resource usage and resulting slow downs and not even considering fixing those because it won't help anyway because it would just make the bad guys do more bad guy stuff.

23:17 <Consolatis> DragoonAethis: that makes sense, thanks for the insights

23:18 <Consolatis> so in that case if it would be static content rather than expensive to generate, the resource exhaustion issue would mostly be solved by trading it for some disk (or memory) space

23:19 <Consolatis> e.g. it would not increase the amount of requests for the same URL whatsoever just because its now slightly faster from the crawler side

23:38 jsa1 has quit [Ping timeout: 480 seconds]