ChanServ changed the topic of #freedesktop to: https://www.freedesktop.org infrastructure and online services || for questions about freedesktop.org projects, please see each project's contact || for discussions about specifications, please use https://gitlab.freedesktop.org/xdg or xdg@lists.freedesktop.org
balrog has quit [Ping timeout: 480 seconds]
konstantin has quit [Remote host closed the connection]
konstantin has joined #freedesktop
scrumplex has joined #freedesktop
scrumplex_ has quit [Ping timeout: 480 seconds]
alarumbe has quit []
guludo has quit [Ping timeout: 480 seconds]
snetry has joined #freedesktop
sentry has quit [Ping timeout: 480 seconds]
sentry has joined #freedesktop
snetry has quit [Ping timeout: 480 seconds]
olivial has quit [Read error: Connection reset by peer]
olivial has joined #freedesktop
c137 has quit [Ping timeout: 480 seconds]
MrCooper_ has joined #freedesktop
infernixx has joined #freedesktop
MrCooper has quit [Ping timeout: 480 seconds]
infernix has quit [Ping timeout: 480 seconds]
balrog has joined #freedesktop
alpernebbi has quit [Ping timeout: 480 seconds]
alpernebbi has joined #freedesktop
MrCooper__ has joined #freedesktop
MrCooper_ has quit [Ping timeout: 480 seconds]
D-HUND has joined #freedesktop
debdog has quit [Ping timeout: 480 seconds]
GNUmoon has quit [Ping timeout: 480 seconds]
GNUmoon has joined #freedesktop
D-HUND is now known as debdog
MrCooper_ has joined #freedesktop
MrCooper__ has quit [Ping timeout: 480 seconds]
rudi_s is now known as Guest16345
rudi_s has joined #freedesktop
Guest16345 has quit [Remote host closed the connection]
ximion has quit [Remote host closed the connection]
olivial has quit [Read error: Connection reset by peer]
olivial has joined #freedesktop
jsa1 has joined #freedesktop
pjakobsson has quit []
swatish2 has joined #freedesktop
tzimmermann has joined #freedesktop
sima has joined #freedesktop
<eric_engestrom> bentiss, daniels, mupuf: the x86 shared runners went *poof*?
<eric_engestrom> https://gitlab.freedesktop.org/mesa/mesa/-/jobs/76698592 for instance, says "There are no active runners online"
<eric_engestrom> (with or without the priority tag, btw, it's not just nightly that's stuck)
Eighth_Doctor has quit []
tomeu has quit [Quit: Bridge terminating on SIGTERM]
dcbaker has quit []
jasuarez has quit [Quit: Bridge terminating on SIGTERM]
swick[m] has quit [Quit: Bridge terminating on SIGTERM]
kusma has quit [Quit: Bridge terminating on SIGTERM]
nirbheek_ has quit [Quit: Bridge terminating on SIGTERM]
sergi has quit [Quit: Bridge terminating on SIGTERM]
geobang[m]1 has quit [Quit: Bridge terminating on SIGTERM]
Trevinho has quit [Quit: Bridge terminating on SIGTERM]
alatiera[m] has quit [Quit: Bridge terminating on SIGTERM]
ErikReider[m] has quit [Quit: Bridge terminating on SIGTERM]
thaytan[m] has quit [Quit: Bridge terminating on SIGTERM]
DanLee[m] has quit []
dabrain34[m] has quit []
cadubentzen[m] has quit [Quit: Bridge terminating on SIGTERM]
MathieuBridon[m] has quit [Remote host closed the connection]
mimimyh[m] has quit [Read error: Connection reset by peer]
ndufresne[m] has quit [Write error: connection closed]
talion_809[m] has quit [Read error: Connection reset by peer]
sberz[m] has quit [Read error: Connection reset by peer]
little932[m] has quit [Write error: connection closed]
wontfix[m] has quit [Write error: connection closed]
jenatali has quit [Write error: connection closed]
ojuschugh1[m] has quit [Write error: connection closed]
mj-talbot[m] has quit [Write error: connection closed]
Salamancalasa[m] has quit [Remote host closed the connection]
colinmarc has quit [Write error: connection closed]
heftig has quit [Remote host closed the connection]
Ian[m] has quit [Read error: Connection reset by peer]
Hantz[m] has quit [Remote host closed the connection]
hellfire7734club[m] has quit [Remote host closed the connection]
gallo[m] has quit [Remote host closed the connection]
ich2022[m] has quit [Remote host closed the connection]
Berenguer1931[m] has quit [Write error: connection closed]
Auyer[m] has quit [Read error: Connection reset by peer]
valentine has quit [Remote host closed the connection]
bilboed has quit [Write error: connection closed]
<bentiss> eric_engestrom: looking into it
<bentiss> eric_engestrom: I see a storm of libcamera jobs
<bentiss> WTF???? https://gitlab.freedesktop.org/camera/libcamera/-/pipelines?page=7&scope=all -> seems like the libcamera folks enabled a patchwork bridge and requested CI for all of them
<bentiss> disabled project runners on this project
<bentiss> pinchartl: ^^ FYI, I'm raising an issue
<bentiss> damn, there is no issues enabled on this project
<bentiss> pinchartl: FWIW, I can see that you've got 32 pages of pipelines that need to be run -> https://gitlab.freedesktop.org/camera/libcamera/-/pipelines?page=33&scope=all this is not fair use of our resources
<bentiss> eric_engestrom: so the issue is because of the libcamera storm, the runner wasn't able to pick/request a single priority:low job and got marked as offline. Once the current jobs will terminate, they'll be able to recontact gitlab and should be marked as ready
<bentiss> but first, we need to clear the queue :(
<bentiss> pinchartl: one way I would accept that is if your patchwork jobs were tagged with priority:low, so it's not impacting the rest of the instance. Please forward the info to Kieran
<bentiss> yay, found where I could put the issue :)
<eric_engestrom> bentiss: ack, thanks for investigating!
<eric_engestrom> I guess with that many pipelines in the libcamera queue, letting them run is not an option, so they need to all be cancelled?
<bentiss> yeah, but I'm not doing that for them
<svuorela> bentiss: if you can give me a simple list of things to do and the rights to do it, I can do the cancelling.
<bentiss> svuorela: I don't see how it would be faster for me to look into the docs than you TBH
<svuorela> bentiss: oh. I thought it was because it needed 33 pages of clicky clicky you didn't feel like doing it.
<bentiss> svuorela: that, but I think there should be either a graphql request that can be done or an API call which would definitely be faster that clicky clicky on all 495 pipelines still pending
<bentiss> (but if you feel like spending your day on that, feel free do click on all of them one by one :-P )
<svuorela> project = Project.find_by_full_path('<project_path>')
<svuorela> Ci::Pipeline.where(project_id: project.id).where(status: 'pending').each {|p| p.cancel if p.stuck?}
<bentiss> svuorela: do you have gitlab rails console access?
<svuorela> but just remove the "if p.stuck?"
<svuorela> I don't
<svuorela> I'm just a normal user
<bentiss> svuorela: are you even part of libcamera?
<svuorela> I'm not
<svuorela> I was just volunteering to get shared CI resources unstuck
<bentiss> svuorela: then I really appreciate the effort, but I think someone needs to clean his/her own mess
<svuorela> I'm just stuck waiting behind it
<bentiss> svuorela: they are, I disabled shared runners on this project, meaning that the normal queue restarted. But we are 8 hours behind, so it can take a while before we get back to normal
<svuorela> aha.
<svuorela> great.
<svuorela> I will jsut sit back and wait then.
<bentiss> for now, there are a lot of libinput jobs taking off, and that's excepted because whot worked over the night (his day in .au)
<bentiss> svuorela: sounds like a plan
<svuorela> (and queue more jobs up ...)
<bentiss> heh
andy-turner has joined #freedesktop
karolherbst has quit [Quit: Ping timeout (120 seconds)]
sentry has quit [Quit: left OFTC]
karolherbst has joined #freedesktop
sentry has joined #freedesktop
tonitch has quit []
tonitch has joined #freedesktop
mvlad has joined #freedesktop
mripard_ has joined #freedesktop
mripard has quit [Ping timeout: 480 seconds]
<pinchartl> bentiss: I don't know what happened, I'm not aware of any change on our side. I'll figure it out
<pinchartl> sorry for what the impact :-(
dcbaker has joined #freedesktop
<bentiss> pinchartl: that's fine. Just don't re-enabled the shared runners without cleaning up the current pipelines
<pinchartl> absolutely
<bentiss> FWIW, backlog seems to be 1h and 30 minutes of waiting now
<pinchartl> I'm talking with Kieran at the moment
<bentiss> good :)
<pinchartl> we'll make sure not only to clean up, but to make sure it will never happen again
<bentiss> looks like at least there was some timeouts, and the jobs are now all marked as failed
<bentiss> someone got a shitload of failed pipeline gitlab emails :)
<whot> is that an imperial or metric shitload?
<bentiss> :)
<bentiss> whot: that reminds me that I should really get libinput to make marge pipelines pick up priority:high tags instead of none
<kbingham> ayeeee
<kbingham> bentiss, sorry - that script has been running for a long time ... not sure what broke ... it's only supposed to send 'new' patches ... not the entire patchwork history :-(
<pinchartl> kbingham: let's figure out what happened first, and then put measures in place to make sure it won't happen again
<bentiss> kbingham: looks like the cache got reset -> patchwork/5182 was triggered 8 hours ago when it already ran on page 50 https://gitlab.freedesktop.org/camera/libcamera/-/pipelines?page=50&scope=all
bochecha[m] has joined #freedesktop
<bentiss> pinchartl, kbingham: one way of preventing the DDoS is to ensure your patchwork jobs are making use of the tag `priority:low`, this way, they'll get picked up when there is room
<bentiss> but given that we now somehow vet every users, it would be nice if the patchwork bridge has some guards to not run pipelines from random users as well
<bentiss> FWIW, I think the queue is now cleared \o/
<pinchartl> bentiss: I'd prefer having to trigger the patchwork integration manually really
<bentiss> pinchartl: if you have a process where a developper manually triggers the pipeline, that's even better :)
bochecha[m] is now known as MathieuBridon[m]
<bentiss> could be automated if one of the maintainers gives a rev-by or something like that that patchwork recognizes
<kbingham> can confirm I have a metric shitload of failure emails .. :S
<bentiss> pinchartl, kbingham: I was a little bit annoyed a couple of hours ago, but don't take this personnally, we all make mistakes, and the situation is now back under control. So take your time, and if you need shared runners, you can probably re-enable them, or I can spin up dedicated VMs for you in the menatime
<kbingham> bentiss, What I really need to do here (as well,instead?) is setup a gitlab runner so libcamera build resources are done here without consuming compute. I have a build pc here - jsut don't know how to link it in yet.
<kbingham> Manual triggering is what I started with and it was a real pain.
<pinchartl> kbingham: I'm sure bentiss would love if we brought our shared runners :-)
<pinchartl> s/shared/own/
MrCooper_ is now known as MrCooper
<kbingham> 2 or 3 times a day I just ran the script ... and after a while it was pointless me pretending to be 'cron' - so I set up a systemd timer on that about 2 months ago I think.
<kbingham> So it's been fine until *some recent event* ...
<bentiss> kbingham: unless I messed up my time zones, but this seemed to happened at midnight UTC, so some cron job, reboot or something happened
<bentiss> for shared runners, you can easily add some to your group (usually you just install gitlab-runner or run it under a podman/docker container). If you want you can also add one to the entire instance, but then you'll need an admin to give you a token
<pinchartl> bentiss: is there an easy way to route pipelines to specific runners based on their priority (or another other option we can set when pushing) ?
<bentiss> kbingham: and for manually running the script, I completely understand. But you probably need your script to check if the branch already exists before pushing, that should prevent the DDoS
<bentiss> pinchartl: add the tag `priority:low` or `priority:low-aarch64` (there are variants for kvm as weel)
<kbingham> bentiss, The problem is the branches /didn't/ exist - so it was successfully supplying everything that had never been run.
<kbingham> https://paste.debian.net/1375652/ (for the curious about my script)
<kbingham> git -C libcamera/ branch -a | \
<kbingham> grep remotes/gl.fdo/patchwork | \
<kbingham> sed 's#.*gl.fdo/patchwork/##g' | sort -hr \
<kbingham> | uniq | head -n1
<pinchartl> bentiss: I meant configuring the pipeline to route jobs to our runners for low-priority jobs and to the shared runners for normal jobs for instance (so we can start testing our own runners without blocking everything)
<kbingham> That was supposed to identify the 'newest' series already tested and *only* test series newer than that ... but somethign broke - and it seemed to go back to the beginning :(
<bentiss> pinchartl: if you register your runners with a specific tag like `patchwork` (and probably allow them to run untags jobs as well), every patchwork job will only be run on your runners, when normal ones will either be picked by the shared ones or your own if they have capacity
<bentiss> kbingham: something was really off: https://gitlab.freedesktop.org/camera/libcamera/-/commit/0069b9ceb1e03d5887ac614e35d79602b003ff27/pipelines shows 432 pipelines created for that single commit
<kbingham> 2+ hours of log entries of the script attempting to apply patches from patchwork - create a branch and push it to gitlab :-(
<pinchartl> kbingham: you had nothing planned for today, right ? :-)
<kbingham> bentiss, Those will be all the patches that were already merged - so I can 'improve' the script so it detects if there was nothing applied ...
<kbingham> but yes - this was awful :_( - testing on prod is never great ...
<bentiss> kbingham: yeah I finally figured it as well, but I guess that's not on your side to fix this, more on the gitlab side (upstream gitlab)
<kbingham> The script has already run 2 more times 'successfully' since 'the event' without regenerating - so the issue was a glitch ... but I'll still disable the job for now.
<bentiss> kbingham: I have some doubts on the last_tested() function, the fact that it strips the output with `head -n 1` means that if there is a weird branch appearing, then you probably lose the current index
<kbingham> bentiss, indeed.
<bentiss> also, is there any reasons to `git-libcamera branch -D $BRANCH`? if the remote keeps the various patchwork/* branches, keeping them locally wouldn't add more space, and so you can then detect that the patch doesn't apply locally
<kbingham> Found when it happened in the logs but not a lot of insight ... ;-( https://paste.debian.net/1375656/ ..
<kbingham> https://paste.debian.net/1375657/ (without clipping)
<kbingham> I have to go to the dentist ... so I'll resume this investigation after ...
<kbingham> meanwhile: `systemctl --user stop libcamera-ci.timer`
<bentiss> heh, thanks.
* kbingham shudders : https://i.imgur.com/BNZsZfD.png ... I've got ... cleanup to do ...
swatish2 has quit [Ping timeout: 480 seconds]
<bentiss> kbingham: [when you get back from dentist] the problem also is that you do a `git-libcamera push gl.fdo -f $BRANCH` in your script, meaning that you do not trust the integrity of gitlab. I would remove the `-f`, and add a `|| true` (or put an error), which means that you'll ensure your script never mess with upstream branches
<bentiss> for the rare cases you need to force push, you can manually remove the branches on gitlab and on git.libcamera.org IMO
tomeu has joined #freedesktop
alatiera[m] has joined #freedesktop
bilboed has joined #freedesktop
swatish2 has joined #freedesktop
<kbingham> So ... the storm (aside from the ultimate blame being my script i.e. 'me' not validating parameters sufficiently) was that my dns broke - the script launched ... started running "May 19 21:28:40 Monstersaurus regular.sh[1095810]: Testing between 5184 and" (note the unchecked / failed target number) and then proceeded to run seq 0...5184 instead of seq 5184...5184 ;_( ... now while the dns was broken the script was happily churning
<kbingham> through and /failing/ to do any work - but at some point an hour later I fixed the DNS ... which then opened the flood gates and the background job started actually pushing jobs ...
olivial has quit [Read error: Connection reset by peer]
olivial has joined #freedesktop
<emersion> i'm trying to rename a project, but i'm running into this weird error:
<emersion> > Cannot rename project, the container registry path rename validation failed: Not Found
alarumbe has joined #freedesktop
swatish2 has quit [Ping timeout: 480 seconds]
swatish2 has joined #freedesktop
guludo has joined #freedesktop
swatish2 has quit [Remote host closed the connection]
snetry has joined #freedesktop
sentry has quit [Ping timeout: 480 seconds]
AbleBacon has quit [Remote host closed the connection]
swatish2 has joined #freedesktop
c137 has joined #freedesktop
andy-turner has quit []
swatish2 has quit [Ping timeout: 480 seconds]
ximion has joined #freedesktop
enyc has joined #freedesktop
sima has quit [Remote host closed the connection]
<bentiss> FWIW, gitaly-2 is running out of space. That means pushes to drm repos are not working properly :(
haaninjo has joined #freedesktop
swatish2 has joined #freedesktop
balrog has quit []
balrog has joined #freedesktop
dianders has joined #freedesktop
<bentiss> I've downloaded more disk for gitaly-2 (sorry, I had to do this joke), and we are back in business. Though we should probably have someone check on the usage of gitaly-2 and split the data into the other gitaly pods, or create a new pod, or consider that we just need to pay for more storage for this one
swatish2 has quit [Ping timeout: 480 seconds]
swatish2 has joined #freedesktop
daniels has quit [Read error: Network is unreachable]
jnoorman has quit [Remote host closed the connection]
dianders has quit [Remote host closed the connection]
jnoorman has joined #freedesktop
zmike has quit [Remote host closed the connection]
daniels has joined #freedesktop
dianders has joined #freedesktop
zmike has joined #freedesktop
jsa1 has quit [Ping timeout: 480 seconds]
c137 has quit [Ping timeout: 480 seconds]
cascardo has quit [Remote host closed the connection]
sentry has joined #freedesktop
ids1024 has joined #freedesktop
cascardo has joined #freedesktop
snetry has quit [Ping timeout: 480 seconds]
swatish2 has quit [Ping timeout: 480 seconds]
c137 has joined #freedesktop
sima has joined #freedesktop
sima is now known as Guest16401
sima has joined #freedesktop
jsa1 has joined #freedesktop
tzimmermann has quit [Quit: Leaving]
airlied_ has joined #freedesktop
airlied has quit [Ping timeout: 480 seconds]
Traneptora has joined #freedesktop
c137 has quit [Remote host closed the connection]
AbleBacon has joined #freedesktop
soreau has quit [Ping timeout: 480 seconds]
jsa1 has quit [Ping timeout: 480 seconds]
soreau has joined #freedesktop
FAQ_ has joined #freedesktop
alanc has quit [Remote host closed the connection]
alanc has joined #freedesktop
<zmike> can someone help me understand these errors? https://gitlab.freedesktop.org/mesa/mesa/-/jobs/76751030
<zmike> they don't seem to correspond to anything in the cited file
<zmike> oh nvm wrong branch
f_ is now known as Guest16412
olivial has quit [Read error: Connection reset by peer]
olivial has joined #freedesktop
misyl_ has quit [Remote host closed the connection]
mupuf has quit [Read error: Connection reset by peer]
mupuf has joined #freedesktop
agomez has joined #freedesktop
dj-death_ has joined #freedesktop
dj-death has quit [Read error: Connection reset by peer]
tanty has quit [Remote host closed the connection]
airlied_ is now known as airlied
misyl has joined #freedesktop
FAQ_ has quit []
snetry has joined #freedesktop
mvlad has quit [Remote host closed the connection]
sentry has quit [Ping timeout: 480 seconds]
agomez is now known as tanty
jsa1 has joined #freedesktop
dj-death_ has left #freedesktop [#freedesktop]
haaninjo has quit [Quit: Ex-Chat]
sima has quit [Ping timeout: 480 seconds]
Guest16401 has quit [Ping timeout: 480 seconds]
sentry has joined #freedesktop
snetry has quit [Ping timeout: 480 seconds]
guludo has quit [Quit: WeeChat 4.6.3]
jsa1 has quit [Ping timeout: 480 seconds]
Consolatis_ has joined #freedesktop
Consolatis is now known as Guest16438
Consolatis_ is now known as Consolatis
<whot> we all know that saving time for the bug reporters is the most important thing for any open source project...
Guest16438 has quit [Ping timeout: 480 seconds]