#freedesktop on 2025-09-12 — irc logs at oftc.catirclogs.org

2024-07-16 04:52 ChanServ changed the topic of #freedesktop to: https://www.freedesktop.org infrastructure and online services || for questions about freedesktop.org projects, please see each project's contact || for discussions about specifications, please use https://gitlab.freedesktop.org/xdg or xdg@lists.freedesktop.org

01:10 alanc has quit [Remote host closed the connection]

01:11 alanc has joined #freedesktop

02:24 snetry has joined #freedesktop

02:30 sentry has quit [Ping timeout: 480 seconds]

02:50 alarumbe has quit []

03:55 cptaffe has quit [Ping timeout: 480 seconds]

03:56 pzanoni has quit [Ping timeout: 480 seconds]

03:56 Zathras has joined #freedesktop

04:00 Zathras_11 has quit [Ping timeout: 480 seconds]

04:29 eluks has quit [Remote host closed the connection]

04:30 eluks has joined #freedesktop

04:49 AbleBacon has quit [Quit: I am like MacArthur; I shall return.]

05:02 sima has quit [Ping timeout: 480 seconds]

05:32 swatish2 has joined #freedesktop

05:41 prb12[m] has joined #freedesktop

05:42 prb12[m] has left #freedesktop [#freedesktop]

06:15 jsa1 has joined #freedesktop

06:18 Kayden has joined #freedesktop

06:39 <daniels> eric_engestrom: is there a reason every ci-tron job pulls from the gitlab repo with a fixed SHA instead of just having a local clone of the repo at that SHA, or having it built into the trigger container?

06:40 tzimmermann has joined #freedesktop

06:42 <eric_engestrom> daniels: (assuming you mean `ci-tron-job-dut-v1.yml.j2`) I'm not sure how your "local clone" suggestion would work, but it's not built-in to allow modifying it without having to make a new release, which allows users to tweak it and allows easy testing for us devs

06:43 <eric_engestrom> the "fixed" sha is fixed by the user (mesa ci) btw

06:43 <eric_engestrom> (CI_TRON_JOB_TEMPLATE_COMMIT in the root yaml)

06:45 <eric_engestrom> if you're talking about it because gitlab keeps crashing when we try to download it, I posted an MR yesterday to add a retry to that download: https://gitlab.freedesktop.org/gfx-ci/ci-tron/-/merge_requests/975

06:47 <daniels> sure, but also in the context of load - e.g. if ci-tron scales to doing 1000 pre-merge jobs, then every pipeline would mean 1000 hits to pull a single file from a repo, which is ... not no load

06:47 <daniels> the local-tweaking thing makes sense, but how about embedding the version that's usually used in the container image, and allowing the user to override it?

06:48 noodlez1232 has quit [Remote host closed the connection]

06:49 noodlez1232 has joined #freedesktop

06:50 <eric_engestrom> the problem is that "the version" is not really a thing, as updates of the farm and of the ci config are decoupled

06:50 <eric_engestrom> any commit can be "the version", even if we exclude user changes

06:51 <mupuf> daniels: we probably could keep local caches if this is a concern, but wouldn't we hit the CDN anyway rather than the gitlab server?

06:51 <eric_engestrom> mupuf what do you think of doing a local clone of the ci-tron repo on the farm, and pulling the file from there if the commit is upstream? that could be possible

06:52 <eric_engestrom> ignore that; cache is much better 🙈

06:53 <mupuf> In any case, I am quite concerned about the reliability of testing in the presence of gitlab issues... but I guess we still need to pull artifacts from it anyway for testing

06:54 <mupuf> Rather than client-side, I guess we could load the template server side using the http library I've made so that we get caching. The concern is of course security.

06:55 <mupuf> another option would be to vendor the template in mesa, but that would mean we need to start cloning the mesa tree again for all the CI-tron jobs, unlike now where we do not depend on a single file from the tree

06:57 <daniels> mupuf: I'm honestly not sure if fastly caches those or not

06:57 <eric_engestrom> re- cloning mesa, there's the ill-named "python artifacts" tarball

06:57 <mupuf> eric_engestrom: oh, right, that was what I had in mind

06:58 <daniels> when I made that tarball, it literally did contain only python scripts which were artifacted

06:58 airlied_ has joined #freedesktop

06:58 <mupuf> daniels: that was a good idea, and still is

06:58 <daniels> but yeah, that was the straight-shot thing which let us pull the scripts as quickly as possible whilst letting people change whatever local stuff they want ... if it needs to be renamed, that's fine, just slam in a commit to change it to whatever name you think si best

07:00 <mupuf> but I tried to reduce our load on the fdo infra by just going with a single file. This way we still kept Mesa in complete control of the environment (as bentiss strongly suggested)

07:00 airlied has quit [Ping timeout: 480 seconds]

07:00 <eric_engestrom> daniels: sorry, I didn't mean it as a jab, just that it should be renamed now because we forget what it actually does nowadays

07:00 <daniels> sure, I'm totally happy with any name at all

07:01 <mupuf> I think the most logical thing to do would be to pull the template server-side, this way we would get all the deduplication for free

07:01 <mupuf> and... we can have whatever policy we want on 50X errors (like reusing from cache without revalidation)

07:01 <eric_engestrom> mupuf: agreed, and IIRC that was my initial intention, not sure why it ended up being pulled client-side

07:01 <daniels> mupuf: oh yeah, makes a lot of sense, I was just thinking of OOB storage - if it's the gl.fd.o service that's unreliable, then bypassing it and going straight to something more optimised for straight-line file storage (rather than a complex mechanism of stuff to serve git repos) might help?

07:02 <eric_engestrom> mupuf: I'll look into passing the url and pulling server-side

07:02 <mupuf> daniels: agreed, artifacts have not nearly been as problematic

07:03 <mupuf> moving to server side would also reduce the concern of CI-tron farms on non-fdo-gitlab. So far, we recommend people vendor the template in their forge for this very reason.

07:06 <mupuf> as for the suggestion to have embed the template in the trigger container, that's not a bad idea. It would just force us to make new releases every time we change the template, unlike now where we only make new releases when changing the code of the client

07:13 <valentine> python-artifacts doesn't exist anymore in Mesa, I included it in the LAVA container which was the only user

07:41 mvlad has joined #freedesktop

07:41 ximion has quit [Remote host closed the connection]

07:56 fantom has quit [Ping timeout: 480 seconds]

08:13 elibrokeit has quit [Read error: Connection reset by peer]

08:13 elibrokeit has joined #freedesktop

08:36 fantom has joined #freedesktop

08:58 <dj-death> half of the request to gitlab end up in 503 here

08:58 <dj-death> is that expected?

08:59 <dj-death> it started yesterday

09:19 swatish21 has joined #freedesktop

09:19 <daniels> yeah, that's been a thing :(

09:22 <__tim> do we know if it's fastly/anubis related or if gitlab is throwing it?

09:24 <__tim> I think it must be fastly/anubis since I'm not seeing these issues on runners bypass it

09:25 swatish2 has quit [Ping timeout: 480 seconds]

09:34 olspookishmagus has quit [Ping timeout: 480 seconds]

09:39 alarumbe has joined #freedesktop

09:40 swatish2 has joined #freedesktop

09:45 swatish21 has quit [Ping timeout: 480 seconds]

09:46 <daniels> you mean ones that address htz directly?

09:46 swatish2 has quit [Remote host closed the connection]

09:48 <__tim> yes

09:49 <__tim> (which was needed to make our largeish uploads work, would hit a timeout otherwise)

09:50 swatish2 has joined #freedesktop

09:53 olspookishmagus has joined #freedesktop

09:54 <daniels> yeah, it seems to be on the fastly/anubis side, but tbh I'm not sure how to get more data out of it - I just see internal_error and queue_timed_out (limit: instances) paired whenever we have a failing request

09:54 <daniels> which I assume means that we don't have enough server capacity to service all the incoming anubis requests? I'm not sure tbh

09:57 <karolherbst> so lists are down again?

10:04 swatish2 has quit [Ping timeout: 480 seconds]

10:04 <daniels> something is very wrong with apache; looking into it

10:07 <daniels> ok, banned a couple more spammers who were hammering mailman

10:29 scrumplex_ has joined #freedesktop

10:35 scrumplex has quit [Ping timeout: 480 seconds]

10:43 sima has joined #freedesktop

10:53 swatish2 has joined #freedesktop

11:00 vkareh has joined #freedesktop

12:05 guludo has joined #freedesktop

12:28 <zmike> okay, it seems to me something is either deeply wrong with the zink-anv jobs in ci or I don't know how ci jobs work

12:28 <zmike> because I'm now on the second class of vvl errors which show up in the tgl/adl jobs which I cannot repro on any systems locally

12:28 <zmike> and one of them occurs just while running a single test

12:29 <zmike> so I should clearly be able to trigger this

12:29 <zmike> and yet I cannot

12:31 <zmike> I'm not sure what the solution here is, but if it isn't something that I or anyone can repro locally then it isn't adding value

12:40 <__tim> is it possible to whitelist runner IPs in anubis or would that not help?

12:41 <__tim> (or are some of the runners automatically created on demand and IPs keep changing?)

12:43 <__tim> or in fastly; not sure how it's layered with fastly

12:50 <YelsinSepulveda[m]> <YelsinSepulveda[m]> "Hello o/..." <- Friendly reminder - looking for roadmap and ways to support this.

12:54 swatish2 has quit [Ping timeout: 480 seconds]

13:04 vkareh has quit [Quit: WeeChat 4.7.1]

13:06 vkareh has joined #freedesktop

14:02 <bentiss> __tim: if the runner is bypassing fastly it's also bypassing anubis

14:02 <__tim> yes

14:03 <__tim> The problem (503 errors) is with the runners that don't :)

14:03 <__tim> fdo runners, windows/macos runners

14:04 <bentiss> __tim: that's the script for making the anubis decision: https://gitlab.freedesktop.org/freedesktop/terraform-gitlab-fastly/-/blob/main/anubis/src/main.rs?ref_type=heads

14:04 <bentiss> discourse.gstreamer.org already has an exception

14:05 <__tim> ah, nice

14:06 <__tim> not sure if it will help with the 503s though if those are caused by too many concurrent requests

14:07 ximion has joined #freedesktop

14:08 <bentiss> we can try

14:08 <__tim> also not sure if it will help with the large artefact uploads, but doesn't hurt to try I suppose (there's a timeout somewhere that effectively limits the amount of data and I'm not sure where that timeout comes from exactly)

14:08 <__tim> Thanks

14:09 <bentiss> large artifacts would probably not be solved, because there are some restrictions on the fastly side AFAIU

14:10 <bentiss> I think I managed to stream the request directly to the servers in those case but not 100% sure

14:11 <__tim> for the fdo runners, is there any reason not to just add a hostname override to the gitlab-runner config and make it talk to gitlab directly? (are there caching benefits when going through fastly?)

14:12 <bentiss> __tim: that's already what is done for the fdo-htz ones I think

14:14 <__tim> not sure, e.g. https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/jobs/84334714

14:14 <bentiss> Maybe I forgot it for the last one (trying to ssh on it)

14:15 <__tim> right, maybe it's just the placeholder ones?

14:16 <bentiss> placeholders are different beast indeed :(

14:16 <__tim> but I can change the tags for those of course :)

14:20 swatish2 has joined #freedesktop

14:20 <bentiss> FWIW, runner-3 wasn't having the /etc/host override, but the runners pods were having it, so I guess it was working the same (just that the runner was pinging/getting jobs through fastly)

15:02 tlwoerner_ has joined #freedesktop

15:03 tlwoerner has quit [Ping timeout: 480 seconds]

15:07 swatish21 has joined #freedesktop

15:12 swatish2 has quit [Ping timeout: 480 seconds]

15:15 sentry has joined #freedesktop

15:19 usc is now known as psychon

15:19 snetry has quit [Ping timeout: 480 seconds]

15:27 luc64627490 has quit [Remote host closed the connection]

15:27 swatish21 has quit [Ping timeout: 480 seconds]

15:33 jsa1 has quit [Ping timeout: 480 seconds]

15:58 swatish2 has joined #freedesktop

16:17 swatish2 has quit [Ping timeout: 480 seconds]

16:26 pzanoni has joined #freedesktop

16:47 tzimmermann has quit [Quit: Leaving]

17:10 scrumplex_ has quit [Quit: Quassel - Signing Off]

17:12 scrumplex has joined #freedesktop

17:13 <__tim> bentiss, https://gitlab.freedesktop.org/freedesktop/terraform-gitlab-fastly/-/merge_requests/5

17:15 jsa1 has joined #freedesktop

17:30 <bentiss> __tim: merged, rebuilt and deployed. Should be up in a few minutes while it gets propagated

17:30 <__tim> Thanks a lot!

17:33 AbleBacon has joined #freedesktop

18:34 jsa1 has quit [Ping timeout: 480 seconds]

20:16 vkareh has quit [Quit: WeeChat 4.7.1]

20:51 mvlad has quit [Remote host closed the connection]

21:18 <Lyude> Is there anything up with the patchwork instance right now? patchwork.freedesktop.org doesn't seem to want to load properly, but the gitlab seems like it's loading fine

21:18 <Lyude> oh there it goes

21:19 <Lyude> see, the thing about freedesktop services is I just need to join the channel and say "it's not working", then they start working

21:19 <Lyude> :)

21:20 <dwfreed> magic words

21:37 Haven0320 has quit [Ping timeout: 480 seconds]

21:45 jsa1 has joined #freedesktop

22:02 Haven0320 has joined #freedesktop

22:25 guludo has quit [Ping timeout: 480 seconds]

23:08 sima has quit [Ping timeout: 480 seconds]

23:28 Haven0320-2 has joined #freedesktop

23:32 Haven0320 has quit [Ping timeout: 480 seconds]

23:40 airlied_ is now known as airlied

23:42 jsa1 has quit [Ping timeout: 480 seconds]