ChanServ changed the topic of #freedesktop to: https://www.freedesktop.org infrastructure and online services || for questions about freedesktop.org projects, please see each project's contact || for discussions about specifications, please use https://gitlab.freedesktop.org/xdg or xdg@lists.freedesktop.org
alanc has quit [Remote host closed the connection]
alanc has joined #freedesktop
snetry has joined #freedesktop
sentry has quit [Ping timeout: 480 seconds]
alarumbe has quit []
cptaffe has quit [Ping timeout: 480 seconds]
pzanoni has quit [Ping timeout: 480 seconds]
Zathras has joined #freedesktop
Zathras_11 has quit [Ping timeout: 480 seconds]
eluks has quit [Remote host closed the connection]
eluks has joined #freedesktop
AbleBacon has quit [Quit: I am like MacArthur; I shall return.]
sima has quit [Ping timeout: 480 seconds]
swatish2 has joined #freedesktop
prb12[m] has joined #freedesktop
prb12[m] has left #freedesktop [#freedesktop]
jsa1 has joined #freedesktop
Kayden has joined #freedesktop
<daniels>
eric_engestrom: is there a reason every ci-tron job pulls from the gitlab repo with a fixed SHA instead of just having a local clone of the repo at that SHA, or having it built into the trigger container?
tzimmermann has joined #freedesktop
<eric_engestrom>
daniels: (assuming you mean `ci-tron-job-dut-v1.yml.j2`) I'm not sure how your "local clone" suggestion would work, but it's not built-in to allow modifying it without having to make a new release, which allows users to tweak it and allows easy testing for us devs
<eric_engestrom>
the "fixed" sha is fixed by the user (mesa ci) btw
<eric_engestrom>
(CI_TRON_JOB_TEMPLATE_COMMIT in the root yaml)
<daniels>
sure, but also in the context of load - e.g. if ci-tron scales to doing 1000 pre-merge jobs, then every pipeline would mean 1000 hits to pull a single file from a repo, which is ... not no load
<daniels>
the local-tweaking thing makes sense, but how about embedding the version that's usually used in the container image, and allowing the user to override it?
noodlez1232 has quit [Remote host closed the connection]
noodlez1232 has joined #freedesktop
<eric_engestrom>
the problem is that "the version" is not really a thing, as updates of the farm and of the ci config are decoupled
<eric_engestrom>
any commit can be "the version", even if we exclude user changes
<mupuf>
daniels: we probably could keep local caches if this is a concern, but wouldn't we hit the CDN anyway rather than the gitlab server?
<eric_engestrom>
mupuf what do you think of doing a local clone of the ci-tron repo on the farm, and pulling the file from there if the commit is upstream? that could be possible
<eric_engestrom>
ignore that; cache is much better 🙈
<mupuf>
In any case, I am quite concerned about the reliability of testing in the presence of gitlab issues... but I guess we still need to pull artifacts from it anyway for testing
<mupuf>
Rather than client-side, I guess we could load the template server side using the http library I've made so that we get caching. The concern is of course security.
<mupuf>
another option would be to vendor the template in mesa, but that would mean we need to start cloning the mesa tree again for all the CI-tron jobs, unlike now where we do not depend on a single file from the tree
<daniels>
mupuf: I'm honestly not sure if fastly caches those or not
<eric_engestrom>
re- cloning mesa, there's the ill-named "python artifacts" tarball
<mupuf>
eric_engestrom: oh, right, that was what I had in mind
<daniels>
when I made that tarball, it literally did contain only python scripts which were artifacted
airlied_ has joined #freedesktop
<mupuf>
daniels: that was a good idea, and still is
<daniels>
but yeah, that was the straight-shot thing which let us pull the scripts as quickly as possible whilst letting people change whatever local stuff they want ... if it needs to be renamed, that's fine, just slam in a commit to change it to whatever name you think si best
<mupuf>
but I tried to reduce our load on the fdo infra by just going with a single file. This way we still kept Mesa in complete control of the environment (as bentiss strongly suggested)
airlied has quit [Ping timeout: 480 seconds]
<eric_engestrom>
daniels: sorry, I didn't mean it as a jab, just that it should be renamed now because we forget what it actually does nowadays
<daniels>
sure, I'm totally happy with any name at all
<mupuf>
I think the most logical thing to do would be to pull the template server-side, this way we would get all the deduplication for free
<mupuf>
and... we can have whatever policy we want on 50X errors (like reusing from cache without revalidation)
<eric_engestrom>
mupuf: agreed, and IIRC that was my initial intention, not sure why it ended up being pulled client-side
<daniels>
mupuf: oh yeah, makes a lot of sense, I was just thinking of OOB storage - if it's the gl.fd.o service that's unreliable, then bypassing it and going straight to something more optimised for straight-line file storage (rather than a complex mechanism of stuff to serve git repos) might help?
<eric_engestrom>
mupuf: I'll look into passing the url and pulling server-side
<mupuf>
daniels: agreed, artifacts have not nearly been as problematic
<mupuf>
moving to server side would also reduce the concern of CI-tron farms on non-fdo-gitlab. So far, we recommend people vendor the template in their forge for this very reason.
<mupuf>
as for the suggestion to have embed the template in the trigger container, that's not a bad idea. It would just force us to make new releases every time we change the template, unlike now where we only make new releases when changing the code of the client
<valentine>
python-artifacts doesn't exist anymore in Mesa, I included it in the LAVA container which was the only user
mvlad has joined #freedesktop
ximion has quit [Remote host closed the connection]
fantom has quit [Ping timeout: 480 seconds]
elibrokeit has quit [Read error: Connection reset by peer]
elibrokeit has joined #freedesktop
fantom has joined #freedesktop
<dj-death>
half of the request to gitlab end up in 503 here
<dj-death>
is that expected?
<dj-death>
it started yesterday
swatish21 has joined #freedesktop
<daniels>
yeah, that's been a thing :(
<__tim>
do we know if it's fastly/anubis related or if gitlab is throwing it?
<__tim>
I think it must be fastly/anubis since I'm not seeing these issues on runners bypass it
swatish2 has quit [Ping timeout: 480 seconds]
olspookishmagus has quit [Ping timeout: 480 seconds]
alarumbe has joined #freedesktop
swatish2 has joined #freedesktop
swatish21 has quit [Ping timeout: 480 seconds]
<daniels>
you mean ones that address htz directly?
swatish2 has quit [Remote host closed the connection]
<__tim>
yes
<__tim>
(which was needed to make our largeish uploads work, would hit a timeout otherwise)
swatish2 has joined #freedesktop
olspookishmagus has joined #freedesktop
<daniels>
yeah, it seems to be on the fastly/anubis side, but tbh I'm not sure how to get more data out of it - I just see internal_error and queue_timed_out (limit: instances) paired whenever we have a failing request
<daniels>
which I assume means that we don't have enough server capacity to service all the incoming anubis requests? I'm not sure tbh
<karolherbst>
so lists are down again?
swatish2 has quit [Ping timeout: 480 seconds]
<daniels>
something is very wrong with apache; looking into it
<daniels>
ok, banned a couple more spammers who were hammering mailman
scrumplex_ has joined #freedesktop
scrumplex has quit [Ping timeout: 480 seconds]
sima has joined #freedesktop
swatish2 has joined #freedesktop
vkareh has joined #freedesktop
guludo has joined #freedesktop
<zmike>
okay, it seems to me something is either deeply wrong with the zink-anv jobs in ci or I don't know how ci jobs work
<zmike>
because I'm now on the second class of vvl errors which show up in the tgl/adl jobs which I cannot repro on any systems locally
<zmike>
and one of them occurs just while running a single test
<zmike>
so I should clearly be able to trigger this
<zmike>
and yet I cannot
<zmike>
I'm not sure what the solution here is, but if it isn't something that I or anyone can repro locally then it isn't adding value
<__tim>
is it possible to whitelist runner IPs in anubis or would that not help?
<__tim>
(or are some of the runners automatically created on demand and IPs keep changing?)
<__tim>
or in fastly; not sure how it's layered with fastly
<YelsinSepulveda[m]>
<YelsinSepulveda[m]> "Hello o/..." <- Friendly reminder - looking for roadmap and ways to support this.
swatish2 has quit [Ping timeout: 480 seconds]
vkareh has quit [Quit: WeeChat 4.7.1]
vkareh has joined #freedesktop
<bentiss>
__tim: if the runner is bypassing fastly it's also bypassing anubis
<__tim>
yes
<__tim>
The problem (503 errors) is with the runners that don't :)
<bentiss>
discourse.gstreamer.org already has an exception
<__tim>
ah, nice
<__tim>
not sure if it will help with the 503s though if those are caused by too many concurrent requests
ximion has joined #freedesktop
<bentiss>
we can try
<__tim>
also not sure if it will help with the large artefact uploads, but doesn't hurt to try I suppose (there's a timeout somewhere that effectively limits the amount of data and I'm not sure where that timeout comes from exactly)
<__tim>
Thanks
<bentiss>
large artifacts would probably not be solved, because there are some restrictions on the fastly side AFAIU
<bentiss>
I think I managed to stream the request directly to the servers in those case but not 100% sure
<__tim>
for the fdo runners, is there any reason not to just add a hostname override to the gitlab-runner config and make it talk to gitlab directly? (are there caching benefits when going through fastly?)
<bentiss>
__tim: that's already what is done for the fdo-htz ones I think
<bentiss>
Maybe I forgot it for the last one (trying to ssh on it)
<__tim>
right, maybe it's just the placeholder ones?
<bentiss>
placeholders are different beast indeed :(
<__tim>
but I can change the tags for those of course :)
swatish2 has joined #freedesktop
<bentiss>
FWIW, runner-3 wasn't having the /etc/host override, but the runners pods were having it, so I guess it was working the same (just that the runner was pinging/getting jobs through fastly)
tlwoerner_ has joined #freedesktop
tlwoerner has quit [Ping timeout: 480 seconds]
swatish21 has joined #freedesktop
swatish2 has quit [Ping timeout: 480 seconds]
sentry has joined #freedesktop
usc is now known as psychon
snetry has quit [Ping timeout: 480 seconds]
luc64627490 has quit [Remote host closed the connection]
<bentiss>
__tim: merged, rebuilt and deployed. Should be up in a few minutes while it gets propagated
<__tim>
Thanks a lot!
AbleBacon has joined #freedesktop
jsa1 has quit [Ping timeout: 480 seconds]
vkareh has quit [Quit: WeeChat 4.7.1]
mvlad has quit [Remote host closed the connection]
<Lyude>
Is there anything up with the patchwork instance right now? patchwork.freedesktop.org doesn't seem to want to load properly, but the gitlab seems like it's loading fine
<Lyude>
oh there it goes
<Lyude>
see, the thing about freedesktop services is I just need to join the channel and say "it's not working", then they start working