ChanServ changed the topic of #freedesktop to: https://www.freedesktop.org infrastructure and online services || for questions about freedesktop.org projects, please see each project's contact || for discussions about specifications, please use https://gitlab.freedesktop.org/xdg or xdg@lists.freedesktop.org
AbleBacon has joined #freedesktop
alanc has quit [Remote host closed the connection]
alanc has joined #freedesktop
sentry has joined #freedesktop
snetry has quit [Ping timeout: 480 seconds]
swfrd_ has quit []
AbleBacon has quit [Read error: Connection reset by peer]
Zathras has joined #freedesktop
Zathras_11 has quit [Ping timeout: 480 seconds]
ximion has quit [Remote host closed the connection]
alarumbe has quit []
ity has quit [Ping timeout: 480 seconds]
mergen has joined #freedesktop
a-l-e has joined #freedesktop
kode54 has quit [Quit: The Lounge - https://thelounge.chat]
kode54 has joined #freedesktop
mergen has quit [Quit: Leaving]
sima has joined #freedesktop
haaninjo has joined #freedesktop
jsa1 has joined #freedesktop
<__tim> paused one of the windows runners fwiw, since it struggles to download a new image from the registry for some reason (unclear why, just hangs)
<__tim> (gst-htz-5)
alarumbe has joined #freedesktop
jsa1 has quit [Ping timeout: 480 seconds]
ximion has joined #freedesktop
snetry has joined #freedesktop
sentry has quit [Ping timeout: 480 seconds]
Mary has quit [Quit: .]
Mary has joined #freedesktop
Mary has quit []
Mary has joined #freedesktop
<__tim> unpausing gst-htz-5-windows again, reboot seems to have sorted it out
<__tim> hrm, getting 502 Bad Gateway from gitlab nginx all of a sudden
<__tim> changed to '502 Waiting for Gitlab to boot' and now it's back
<__tim> and now it's gone again
<__tim> ah well
Mortal has joined #freedesktop
<Mortal> hi all, I'm having problems accessing gitlab.freedesktop.org (GitLab HTTP 502) and lists.freedesktop.org (connection timeout) - can anyone point me to a "server status" page or similar where I can read more about the situation?
<glehmann> I don't think such a page exists
<glehmann> but given that it's still down maybe we should ping bentiss and daniels, but it's sunday so don't expect a immediate solution
<daniels> sounds bad, yeah
<daniels> I’m at an airport but will look when I can
<bentiss> I just happen to be in front of my computer
<bentiss> trying a rollout restart on the webservice pods
<bentiss> I see some weird requests made to the servers...
<bentiss> I'm going to try the "let's reboot everything"
<__tim> It kept going away and then going to gitlab is rebooting, then it would be back for a second and then it would disappear again, fwiw
<bentiss> the gitlab is rebooting is because I restarted redis
<__tim> Before you showed up I meant
<bentiss> all workers have been rebooted, not much better unfortunately
soreau has joined #freedesktop
<soreau> is https://gitlab.freedesktop.org/ currently down?
<bentiss> yep, not sure what hapened still debugging
<soreau> aha
<bentiss> ok, I'm going to take gitlab down to see if it's a load problem or not
andy-turner has joined #freedesktop
<bentiss> looks like we have a load issue. I've disabled fastly to serve gitlab, and now the webservice pods are all happier
andy-turner has quit [Quit: Leaving]
twrk has joined #freedesktop
<twrk> I'm not sure where else to go to ask, and I was just going to be curious if anyone else has seen that gitlab.freedesktop.org serves a "welcome to nginx" page, lol
<twrk> oh wait
<twrk> not anymore
<twrk> guess it's fine now
<twrk> that was all. bye, haha :)
twrk has left #freedesktop [#freedesktop]
<bentiss> sigh... the webservice pods are falling one after the other
<bentiss> it held 15 minutes :(
<Mortal> I'm sorry to have initiated this incident response action on a weekend - I appreciate the effort, but I'd also say feel free to postpone it if today is inconvenient
<__tim> gitlab.gnome.org seems to be having issues too
<bentiss> Mortal: it's definitely not you, but it's rather worrying TBH
<__tim> can't be a coincidence
<bentiss> I wonder if some actors are issuing bad requests
<soreau> must be AI hackers :P
<bentiss> or just haters
<bentiss> or AI crawlers trying evey single f*** graphql query
<__tim> wouldn't that require them to bypass anubis though? or are the requests made direct now?
<bentiss> there was a new gitlab minor update
<bentiss> __tim: when I blocked everything from fastly, the server went back up, but as soon as I reenable the flood gate to the gitlab pods we are getting hammered
<__tim> ah I see
<__tim> gnome admin says 'It's OOMing as it reaches 4G of RAM per webservice pod'
<bentiss> and IIRC I saw that some website using fastly were still crawed by smarter AI crawlers
<bentiss> the servers here are not too bad in terms of memory
<bentiss> anyway, deploying the 18.3.2 version as we speak
<__tim> might hit some configured limit?
<bentiss> few CVEs fixed there, so I wonder if someone just realized they could weaponized those
<Mortal> yeah, looks like 18.3.2 fixes a DoS issue that allows unauthenticated users to render the GitLab instance unusable
<soreau> while true; do git clone https://gitlab.freedesktop.org/foo/bar.git $RANDOM & done
<bentiss> __tim: I might be wrong, but I don't think I've set limits on the various pods
<bentiss> (from a quick look)
<bentiss> that didn't help either :(
<bentiss> KDE doesn't seem impacted
<bentiss> I wonder if I can only allow authenticated users while this is happening
<bentiss> (plus the endpoint to authenticate of course)
<Mortal> does gitlab.fd.o currently have "user and ip rate limits" configured? I found it while browsing the gitlab docs: https://docs.gitlab.com/administration/settings/user_and_ip_rate_limits/
<bentiss> right now we don't even have access to the admin console (in web)
<bentiss> currently deploying a counter measure: only those already authenticated should be able to access once the pods have recovered (hopefully)
<Mortal> sounds good - from the docs it looks like it might be possible to configure those rate limits through the Rails console, but it's probably better to try to do it through the Web admin console
<bentiss> that doesn't seem to have any impact at all :(
<bentiss> actually bug on my side, trying again
<bentiss> that definitely helped
AbleBacon has joined #freedesktop
<bentiss> ok, now also allowing /users/sign_in for users who are logged out
<Mortal> I can confirm - I got the nginx page, but /users/sign_in works, and I could log in and now I can browse gitlab
<bentiss> \o/
<bentiss> though for me the CSS and JS are not loaded properly :)
<Mortal> luckily the sign-in page works even without JS... :)
<bentiss> yeah, but it must be some fastly cache happening, so I tried to allow slightly more requests
lazka has joined #freedesktop
lemoniter has joined #freedesktop
<bentiss> and now redirecting all requests to users/sign_in when they are not authentified
<bentiss> last step will be add a new banner explaining the situation and then have dinner
<Mortal> bravo!
<bentiss> damn... all webservice pods went down again
<bentiss> I've removed the redirect for now
<bentiss> I don't have any other ideas...
<bentiss> I think I know what happened: the full website redirect means that whatever was crashing the webservice is now hammering /users/sign_in instead of the original url ;(
<bentiss> thus re-opening the flood gates
<bentiss> so I'm blocking /users/sign_in for now
lemoniter has quit [Quit: lemoniter]
<bentiss> and once again, that seemed to have made the trick
<bentiss> but now I need to find out the valid redirects from the bad ones :(
Rayyan has joined #freedesktop
hikiko_ has quit [Ping timeout: 480 seconds]
<Venemo> hey guys, is the gitlab dead? seems to lead to an empty nginx page
hikiko has joined #freedesktop
brodie_ has joined #freedesktop
<Mortal> Venemo: go to https://gitlab.freedesktop.org/users/sign_in and log in - unauthenticated users are blocked atm
<Mortal> oops sorry, outdated info - please ignore
<bentiss> yeah, proper daed ATM, even my last attempt is failing
<bentiss> I guess we need to wait a bit
* bentiss goes to have dinner, bbl
<Venemo> okay, thanks
<Mortal> I'm heading to bed - once again thanks for the big effort on this incident, fingers crossed that it'll be done soon ...
Mortal has quit [Quit: leaving]
<bentiss> anybody has news from gitlab.gnome? they seem to behave better
<bentiss> interesting... before the incident: we were serving ~1GB per hour in the body requests. When it started, we suddenly are serving 30 to 45 GB per hour
<bentiss> actually no: *from* client, not *to*
<bentiss> so we have someone sending a huge amount of data to us and this fills up all the slots
haaninjo has quit [Quit: Ex-Chat]
<karolherbst> pain..
<karolherbst> do we know who?
<karolherbst> or rather where it's coming from
<karolherbst> well I got into the news again for _those_ reasons, and maybe it's just some trolls thinking they are funny?
<bentiss> karolherbst: don't know. gnme was impacted roughly at the same time, but they seemed to have recover
<karolherbst> ... what a weird coincidence
<bentiss> yeah
nirbheek_ has joined #freedesktop
<nirbheek_> Is ssh access also not working? I can't seem to fetch git repos
<karolherbst> I wished abuse reports on the internet would actually work šŸ™ƒ
<karolherbst> nirbheek_: servers are being hammered with massive uploads apparently
<nirbheek_> Yes, I read the backlog :)
<bentiss> nirbheek_: ssh access should be up now, if I cut all incoming connections from fastly we are good
<bentiss> (not guaranteeing it won't fall in the next few minutes as I'm making tests)
<nirbheek_> Works now, thank you! Sorry you have to deal with this on a weekend :(
<__tim> I asked gnome admins if they did anything special to mitigate, but not sure if they are still around.
<bentiss> __tim: thanks!
<karolherbst> at which endpoints are those uploads directed?
<bentiss> that's the question
<karolherbst> mhhh
<bentiss> so far, I can think of git and POST/PATCH, all three of them can easily trigger a bomb
<bentiss> I'm going to try to open the flood gate on git first
<xe> hi, i am reading
<karolherbst> xe: I wonder if Anubis could actually detect certain patterns of DDOS.. like dunno how it exactly looks on a HTTP level, but I could imagine that if suddenly a lot of connections with a certain patterns emerge (high data uploads?) that anubis could start nuking all those connections?
<karolherbst> dunno how big corpo bot protection works...
<xe> well, usually they have two layers
<xe> a per-request layer and an aggregator layer
<xe> i haven't had the time to finish the aggregator layer
<xe> (or spec it out)
<bentiss> plus my setup here with fastly is not really making the setup easier
<bentiss> xe: and hi!
<xe> bentiss: i mean, gnome also getting hit makes me think the problem is either gitlab or anubis
<xe> and i only understand one of those better
<bentiss> xe: very unlikely to be anubis because I'm bypassing it as soon as the cookie is valid
<xe> ah
<xe> hmm
<bentiss> fastly is checking the cookie, and if it's valid -> direct to gitlab
<bentiss> but the logs from fastly showed a spike in uploads to the servers, which could be related to some git-bomb (i.e. uploading a ton to random data)
<karolherbst> I wonder if you can upload pre authenticated? Though I would trust gitlab enough to not allow this kind of attacks?
<bentiss> problem is git through https
<xe> that hits rails doesn't it
<karolherbst> mhh guess we don't have anubis on the git+https path...
<bentiss> so, I've re-enabled 1. GET which looks like git and anything but GET, and I'm seeing the pod crashing
<bentiss> karolherbst: no, I bypass that entirely
<xe> but yeah that kind of anomaly detection is my white whale atm
<bentiss> I think the main issue is that our nginx and config allows for very large uploads because we have some CI which uploads large artifacts
<karolherbst> right...
<karolherbst> and also git lfs exist
<bentiss> so instead of having nginx just cutting the connection, it just eats it all, and we get -ECONN errors
<bentiss> trying to disable git through https from fastly
<xe> you should contact gitlab and/or the arch linux devops team
<bentiss> well... gitlab is not very fast at answering our requests
<xe> I'll see if i have backroom contacts
<bentiss> well, the thing also is that it's 11PM here, so not sure I'll be awake for a long time
<karolherbst> yeah.. hopefully those people get bored and not annoy us over the week, but who knows
<xe> bentiss: does nginx have the concept of connection classes and semaphores?
<bentiss> xe: honestly no ideas :(
<xe> fair
<xe> but yeah, this is kind of a shit scenario
<xe> the only thing I can think of helping would be something conceptually similar to ircd I:lines
<bentiss> honestly, you're talking gibberish to me, so I all I can do is vehemently agree :)
<soreau> bentiss: can't you rate limitper client uploading or is that already happening?
<bentiss> ok, the nice thing is that if I blocked git, we still seem to have the servers down, so it might not be that elaborate and just big POST requests
<xe> bentiss: set the max post body size to 4Mi
<xe> i think nginx can detect and reject that
<bentiss> xe: yeah, but I'd rather not touch the nginx conf
<xe> but if the client is using streaming uploads, it may not work, idk
<xe> bentiss: we are approaching a god is dead scenario in general tbh
<bentiss> I hope fastly can do it (it has the entire packet at its disposal)
<xe> I'd hope so too
<pinchartl> xe: is this when we decide on what the next god will be?
<xe> I've been wanting to implement anomaly detection, but it's not been in the cards yet lol
<xe> (it's my white whale)
<bentiss> and AFAICT, the pods are in a better stage now that POST and PATCH (anything but GET) has been disabled
<xe> that may be enough to go to sleep over
sima has quit [Ping timeout: 480 seconds]
<bentiss> well, I'd need to reenable the Web UI :)
<xe> i'm pretty sure comment posting and issue posting is broken, but that may be an acceptable compromise
<xe> ah
<xe> :)
<bentiss> cause right now only the internal CI would be working plus git over ssh
<bentiss> https://docs.rs/fastly/latest/fastly/struct.Request.html#method.get_content_length looks like I might have this, but that's assuming it's provided by the client :(
<xe> fun fact about http
<xe> :)
<bentiss> well, I can also pump the body in fastly compute and then resend it instead of streaming it, but it will fail most of the time (I probably don't have to put any rule in place, fastly would probably drop the connection after a few MB)
sentry has joined #freedesktop
snetry has quit [Ping timeout: 480 seconds]
<bentiss> heh... we are getting hammered with `POST /api/graphql HTTP/1.1" 405 157 "https://gitlab.freedesktop.org/drm/amd/-/issues/662"` like multiple times per seconds
<xe> oh fun
<xe> yeah, i think you may just need to leave gitlab off overnight? :S
<bentiss> well, I'd like to see how much I can re-enable
<bentiss> re-opening the GET requests like normal
<Venemo> somebody must really hate that bug
<bentiss> yeah :)
<Venemo> I'm crossing my fingers for you
<Venemo> good night
<bentiss> thx
<__tim> doesn't look like GNOME has figured out a fix either though, at least it's down for me
<bentiss> yeah, same here
<bentiss> so far, here, bolcking POST requests makes the situation better, though unusable :)
<xe> i'll poke barthalion
<bentiss> so it would seem (but trying that hypothesis right now) that it's a graphql injection which creates the DoS, so allowing everything but graphql and see how that behaves
<bentiss> yep, definitely something related to graphql, and that issue I linked above keeps hammering anubis (because I redirect the unknown requests there :-P )
<karolherbst> heh
<karolherbst> what gitlab version are we on?
<bentiss> next logical step is to check the body size in that request, but I'd need to read the nginx doc
<bentiss> karolherbst: gitlab 18.3.2 (did the upgrade earlier in the evening in the hope it would fix the issue)
<karolherbst> mhh I see
<bentiss> fun thing: the hammering for the issue above comes from an IP address registered at... AMD :)
<karolherbst> :)
<karolherbst> maybe we should poke people at AMD lol
<bentiss> so it could be just a bug and firefox getting completely rogue
<bentiss> "who has a tab opened on https://gitlab.freedesktop.org/drm/amd/-/issues/662????"
<bentiss> I'm sure that will help a lot :)
<karolherbst> thing is.. the issues don't load at all for me, is that on purpose?
<bentiss> but actually, I can filter the IP
<karolherbst> yeah
<bentiss> yeah, all graphql queries are banned
<karolherbst> somebody will scream
<bentiss> or the whole company
<karolherbst> either way, somebody will reach out :D
<bentiss> hopefully
<bentiss> alright, let's filter out that IP
<karolherbst> classic "we've noticed strange network requests coming form your network, we've banned it until you reach out to us to resolve this issue :)"
<xe> bentiss: can you get me a copy of the DoS request body?
<bentiss> xe: currently no... but I can try to extract it
<xe> bentiss: <3
<bentiss> \o/ that seems to be working
<karolherbst> nice
<bentiss> but it's also a little bit scary that *one* IP can mess up entirely a gitlab server
<xe> lol i think you banned someone at AMD
<bentiss> yeah, that's what I said above
<bentiss> which also makes me think it might not be intentional
<bentiss> or if it is, we might have easier way of putting pressure on that person
<karolherbst> I don't know which possibility I want to wish is the actual one
<__tim> amazing
<karolherbst> I wonder to how many offices/machines a single IP maps
<karolherbst> at AMD
<xe> i mean, worst case
<xe> banpage all of AS33619
<xe> ask them to explain their folly
<karolherbst> heh
<karolherbst> is gitlab down again šŸ™ƒ
<bentiss> indeed
<bentiss> let me re disable graphql
Haven0320-2 has quit [Ping timeout: 480 seconds]
<xe> how many qps are you getting to the graphql endpoint?
<bentiss> don't know yet
<karolherbst> I wonder if it's now a different IP?
<karolherbst> or something unrelated?
orowith2os[m] has joined #freedesktop
<bentiss> xe: something in the 10 qps from what I can see on the nginx logs behind anubis
<bentiss> so peanuts
<xe> yeah, i'm thinking this is a heckin layer 7 DoS
<bentiss> damn, I need to put the banner back on as well
<airlied> agd5f: ^
tuttza0x1 has joined #freedesktop
<bentiss> so it seems there is another hammering on https://gitlab.freedesktop.org/drm/amd/-/issues/3911
<bentiss> but ipv6 this time
* soreau hands bentiss some coffee
* bentiss doesn't drink coffee :) (plus at this hour, I would never sleep after)
<karolherbst> maybe the same thing, just falls back to ipv4? šŸ™ƒ
<karolherbst> ehh
<karolherbst> ipv6
<bentiss> not AMD this time around
<karolherbst> mhhhh
<karolherbst> soooo
<karolherbst> IPv4 be VPN and v6 is the actual residential IP but fucked up their script? šŸ™ƒ
<karolherbst> or just applications doing apaplication stuff, but that's _kinda_ suspicious now
<soreau> is there a way to rate limit each client so each can only make so many queries per time period?
<karolherbst> I mean, if it's a directed (malicious or not) from a single machine, than banning the IPs would work for now
<bentiss> well, anything is possible... but maybe not at 1 AM :)
<karolherbst> the 4 -> 6 switch isn't that weird given that most company VPNs run on v4 only
<orowith2os[m]> are y'all in contact with GNOME folks? Their gitlab instance is having troubles too.
<karolherbst> bentiss: yeah.. I guess the best you can do at this point is to ban the IP and see if that helps
<xe> I'm trying to orowith2os[m] lol
<karolherbst> I wonder if they used the same IPs šŸ™ƒ
<orowith2os[m]> xe: through Matrix channels, email, or?
<xe> orowith2os[m]: discord dm, mastodon, etc
<orowith2os[m]> kk
<orowith2os[m]> do also try \#infrastructure\:gnome.org if you haven't.
<orowith2os[m]> I don't know if the channel name got sent across the bridge correctly.
<xe> it would be easier if they just used IRC :)
<orowith2os[m]> Maybe, but it would also bring up other issues. IIRC they disabled their IRC bridges because of spam.
<orowith2os[m]> I can't complain about not having to set up a proxy to store messages while I'm offline.
<orowith2os[m]> either way, do try Matrix too.
<orowith2os[m]> or if you'd like me to, I can bug them myself, if you have a communication method I want to send to them.
<bentiss> on my side , re-enabling graphql but banning that ipv6 attempting to pull endlessly https://gitlab.freedesktop.org/drm/amd/-/issues/3911
<xe> ban the /48
<bentiss> it seemed pretty consistent :) (and not sure I have the brain to change my rust function to match on the mask instead of the ip)
<karolherbst> well as long as it works for now
<xe> one person should not be able to have so much power
<bentiss> maybe I can try axxing on the referrer as well
<karolherbst> ohh btw, you can search for the IP address in gitlab
<karolherbst> users I mean
<karolherbst> might point to an account
<orowith2os[m]> curious: are any other OSS git instances being hit too?
<bentiss> karolherbst: dm-ed you the 2 culprits
<orowith2os[m]> other than fd.o and gnome.
<bentiss> kde seemed fine
<orowith2os[m]> pmos is too... let me rack my brain and try to see what else is out there.
<orowith2os[m]> (pmos is fine, to clarify.)
<bentiss> and gitlab is failing again
<bentiss> re-re-re-disabling grqphql
<xe> leave it off overnight
<bentiss> xe: sorry, not sure I'll be able to get the full request, I don't see any sane way to extract it from nginx
<xe> it's all good :)
<xe> I figured that would be the case
<xe> worst case I'll have something that can extract it ready by when you wake up
<karolherbst> is gitlab down again šŸ™ƒ
<bentiss> well, should be easiy enough to spawn a custom web server that just spills out every single POST request, but again, there are times when it's more optimal to do such things
<bentiss> karolherbst: yeah, see my comment above
<karolherbst> ahh
<bentiss> it takes a few minutes to propagate and re-enable
<karolherbst> okay
<xe> bentiss: i've been working on something kinda horrible for developing better filtering logic in anubis
<bentiss> "good" now (without graphql, meaning no POST to the server from the UI)
<bentiss> anyway, going afk, sorry it's not fixed, but maybe we'll all have a better idea tomorrow
<bentiss> (sorry .au folks)
<xe> I'll keep you updated if I get in contact with anyone from GNOME
<bentiss> thanks!
<xe> for now
<xe> go sleep
<xe> I have no power here, but I am using all of that no power to tell you to sleep