vliaskov has quit [Read error: Connection reset by peer]
vliaskov has joined #dri-devel
vliaskov has quit [Remote host closed the connection]
bolson has quit [Ping timeout: 480 seconds]
kts has quit [Ping timeout: 480 seconds]
kts has joined #dri-devel
itoral has quit [Remote host closed the connection]
itoral has joined #dri-devel
jsa1 has joined #dri-devel
tobiasjakobi has joined #dri-devel
tobiasjakobi has quit [Remote host closed the connection]
jsa2 has quit [Ping timeout: 480 seconds]
kts has quit [Ping timeout: 480 seconds]
Nasina has quit [Read error: Connection reset by peer]
Nasina has joined #dri-devel
kode546 has joined #dri-devel
kode54 has quit [Ping timeout: 480 seconds]
itoral has quit [Quit: Leaving]
SquareWinter68 has quit []
SquareWinter68 has joined #dri-devel
SquareWinter68 has quit []
SquareWinter68 has joined #dri-devel
MandiTwo has joined #dri-devel
caitcatdev has joined #dri-devel
Sid127 has joined #dri-devel
Nasina has quit [Read error: Connection reset by peer]
Nasina has joined #dri-devel
<MandiTwo>
Hi! Im having an issue with mesa since mesa was split off in mesa and mesa amber. I have a intel hd graphics 4600 in my notebook and since the split, every compositor i tried, no matter if x or wayland, refuses to start any graphical application using it. Is there another driver i could use? I think crocus is the successor or?
<karolherbst>
yeah, will need to use crocus. Wasn't it enabled/installed?
feaneron has joined #dri-devel
Nasina has quit [Remote host closed the connection]
<MandiTwo>
karolherbst: it is enabled and installed but the compositors fail to start using it. I can make sway work on mesa 25 by using pixman but i never got beyond that
kode5460 has joined #dri-devel
Nasina has joined #dri-devel
guludo has joined #dri-devel
Nasina has quit [Read error: Connection reset by peer]
kode546 has quit [Ping timeout: 480 seconds]
mripard_ has joined #dri-devel
kode5460 is now known as kode54
mripard has quit [Ping timeout: 480 seconds]
vliaskov has joined #dri-devel
Company has joined #dri-devel
feaneron has quit [Quit: feaneron]
Company has quit [Remote host closed the connection]
<dj-death>
jnoorman: about !35252, is adding BASE to load/store_ssbo something people agree on?
<dj-death>
jnoorman: can I pull that in and drop my intel intrinsic variants? :)
JRepin has quit []
JRepin has joined #dri-devel
kode54 has quit [Ping timeout: 480 seconds]
Company has joined #dri-devel
bbhtt has quit [Ping timeout: 480 seconds]
haaninjo has joined #dri-devel
Nasina has joined #dri-devel
Jeremy_Rand_Talos has quit [Remote host closed the connection]
Jeremy_Rand_Talos has joined #dri-devel
nerdopolis has joined #dri-devel
mehdi-djait3397165695212282475 has joined #dri-devel
Nasina has quit [Read error: Connection reset by peer]
mehdi-djait3397165695212282475 has quit []
Nasina has joined #dri-devel
jsa1 has quit [Ping timeout: 480 seconds]
Nasina has quit [Read error: Connection reset by peer]
jsa1 has joined #dri-devel
dolphin has quit [Quit: Leaving]
nerdopolis has quit [Ping timeout: 480 seconds]
jsa1 has quit [Ping timeout: 480 seconds]
<jnoorman>
dj-death: note sure yet! I pinged in !34344 so let's see if people agree.
MandiTwo has quit [Remote host closed the connection]
<dj-death>
jnoorman: thanks!
<dj-death>
jnoorman: on Intel we could add the BASE for load_ubo as well
odrling has quit [Remote host closed the connection]
odrling has joined #dri-devel
jsa1 has joined #dri-devel
Nasina has joined #dri-devel
karolherbst3 has joined #dri-devel
karolherbst has quit [Read error: Connection reset by peer]
karolherbst3 has quit []
karolherbst has joined #dri-devel
feaneron has joined #dri-devel
Nasina has quit [Ping timeout: 480 seconds]
hikiko_ has joined #dri-devel
hikiko has quit [Ping timeout: 480 seconds]
rasterman has quit [Quit: Gettin' stinky!]
hikiko has joined #dri-devel
feaneron has quit [Quit: feaneron]
hikiko_ has quit [Ping timeout: 480 seconds]
kusma has joined #dri-devel
fab has quit [Quit: fab]
Grimler_ has joined #dri-devel
kzd has joined #dri-devel
asrivats__ has joined #dri-devel
guru_ has joined #dri-devel
<stsquad>
I've noticed that my newer vkmark test images fail with a host libvirglrenderer 1.1.0-2 - should mesa be introducing non-backward compatible changes to venus?
oneforall2 has quit [Ping timeout: 480 seconds]
<stsquad>
it looks like the commands failing are VIRTIO_GPU_CMD_RESOURCE_UNREF, VIRTIO_GPU_CMD_RESOURCE_CREATE_BLOB, VIRTIO_GPU_CMD_RESOURCE_MAP_BLOB and VIRTIO_GPU_CMD_RESOURCE_UNMAP_BLOB
<digetx>
venus protocol maintains compatibility and should work with libvirglrenderer 1.1.0, will test with older virglrenderer
<digetx>
assume you have tested with latest libvirglrenderer and it works?
<stsquad>
digetx the test images used by ./pyvenv/bin/meson test --setup thorough func-aarch64-aarch64_virt_gpu still work with the system virglrenderer - the newer image does indeed work with a hand built libvirglrenderer (07982b48d1967a - Uprev Mesa to 65e18a84944b559419aceaf2083936cf68ac3e79)
nashpa has joined #dri-devel
<stsquad>
digetx: the mesa in the new test images is 25.0.6
<digetx>
haven't see that problem, but I'm using latest virglrenderer all the time
<stsquad>
digetx the old images where mesa 25.0 with vkmark 2025.01 - the other change is I had to build a HEAD for vkmark
<stsquad>
the image is built with gcc-14 so hopefully this isn't the trigger
feaneron has joined #dri-devel
<robclark>
karolherbst: enqueueReadBuffer seems to be really bad.. and all memcpy within rusticl. Something you are aware of?
<karolherbst>
yeah
<karolherbst>
I need to special case situations where there is just one device
<karolherbst>
because this code is supposed to transparently migrate content of buffers in multi device contexts, and there isn't really a sane way to map resources if you have more than one device
<karolherbst>
the other issue is, that directly mapping pipe_resources is a bit of a pain, because in CL it's inherently an asynchronous operation
<karolherbst>
meaning you get the result pointer very early, and then at some point the result will be available
<karolherbst>
like you will have a mapped resource while it's used later
<karolherbst>
and some drivers don't really like this
<robclark>
hmm, at least iGPU would be happy enough w/ persistent mappings
<karolherbst>
yeah...
<karolherbst>
I was considering being able to use different impls for certain things internally
jsa2 has joined #dri-devel
<karolherbst>
so I can properly abstract between those use cases
<karolherbst>
persistent mappings are fine, but then it kinda broke with dGPU drivers and zink
<karolherbst>
robclark: the thing is.. I don't need a persistent mapping, I just need to be able to tell a driver to map at a specific virtual address :) the mapping can go aware in the meantime as long as the address stays reserved
<karolherbst>
then I wouldn't even need the copy, because then I can just map the resource where the current content is at even in multi device contexts
<karolherbst>
anyway.. many ideas, not enough time
<karolherbst>
the current code at least works 🙃
davispuh has joined #dri-devel
jsa1 has quit [Ping timeout: 480 seconds]
<robclark>
karolherbst: extending transfer_map for this wouldn't be all that hard
<robclark>
as long as rusticl takes care of reserving the VA range
<karolherbst>
no it probably wouldn't, it's just a lot of work
<karolherbst>
could make it opt-in by drivers
dsimic is now known as Guest17173
dsimic has joined #dri-devel
<karolherbst>
but then I need a solution for drivers not having it implemented yet
<karolherbst>
though I think all driver do the mapping with mmap internally, and they could get a target address
<robclark>
is it something zink can do on vk? If so, why bother with backwards compat path
<robclark>
right
<stsquad>
digetx: is it possible to run venus with a pure llvmpipe backend? I'm just working out if there is a better way to do the buildroot testcase for vkmark so it doesn't rely on host GPU hardware
<karolherbst>
robclark: no idea :)
Guest17173 has quit [Ping timeout: 480 seconds]
<karolherbst>
robclark: but there is also clEnqueueReadBuffer, which doesn't have to bother with this nonsense, and I do hope that nothing perf critical relies on mapBuffer to be fast :D
<robclark>
clEnqueueReadBuffer is specifically what I was looking at right now.. just looking at things that clpeak is significantly worse than closed cl driver
<robclark>
(idk, maybe there are better benchmarks than clpeak.. but gotta start somewhere)
<karolherbst>
yeah fair
<karolherbst>
I'd start with the alu perf tho
<karolherbst>
unless that's already all faster :D
<karolherbst>
mhh...
<karolherbst>
robclark: soo.. clEnqueueReadBuffer being slow is a bit of a driver problem. Rusticl only maps the resource and copies it to the destination
<karolherbst>
_however_
<robclark>
alu is surprisingly better than I expected, even with all the extra load/store_global pointer math
<karolherbst>
I was considering if e.g. the target address could be imported via resource_from_user_memory, and then doing the copy on the GPU :)
<karolherbst>
but that will require drivers to map arbitrary addresses
<digetx>
stsquad: venus tested with lavapipe in fdo ci, though ci is headless, assume vkmark should just work with lavapipe
<karolherbst>
so if freedreno fails resource_from_user_memory with anything not page aligned, that might be a good place to start
<robclark>
karolherbst: yeah, userpointer isn't supported on kernel side yet, and would be a heavier lift
<karolherbst>
ahhh
<karolherbst>
yeah...
<karolherbst>
it's going to be required for good perf tho
<karolherbst>
long-term
jsa2 has quit [Ping timeout: 480 seconds]
<karolherbst>
there are also places where rusticl shadow buffers, because mapping from user memory is required in CL
<digetx>
stsquad: think I was running vkmark with lavapipe couple months ago, let me know if it won't work
<karolherbst>
in required in the most insane way: you can't fail the API request :)
<karolherbst>
and the API sets up 0 requirements for the application
<karolherbst>
but yeah.. besides that... it also depends on how resource_map is implemented in drivers
<karolherbst>
like if the pipe_transfer already shadow buffers the mapping + copy, and rusticl also does a copy to the final test, then we copy a bit too often :)
tzimmermann has quit [Quit: Leaving]
<karolherbst>
but I really just would like the GPU to do the copy, which would then also get rid of stalls in the queue
<karolherbst>
I think some drivers already shadow buffer it if the resource is busy
<robclark>
in this case, I don't _think_ we should be hitting the shadow copy path, but only just starting to look at it... perf says it is all memcpy in rusticl::core::memory::Buffer::read
<robclark>
idk if it helps, but we could do quazi-svm (ie. gpu allocated but set gpu va and cpu va to the same)
phasta has quit [Ping timeout: 480 seconds]
<karolherbst>
well..
<karolherbst>
the API is supposed to copy from the GPU to application memory
<karolherbst>
so at least can't get rid of the main copy there unless you manage to move it to the GPU, which then requires userptrs :)
<karolherbst>
robclark: yeah so the thing is, that CL supports multi-device contexts, and memory objects allocated in those are transparently migrated between the devices
<karolherbst>
so we'd need to do it on all drivers
<karolherbst>
which I'm doing with proper SVM already anyway
<karolherbst>
and the interfaces exist for rusticl to manage the virtual GPU addresses instead
<robclark>
this feels bad enough that it's got to be like multiple extra memcpy's ;-)
<karolherbst>
with this interface I got SVM working across drivers/vendors, which is neat
<karolherbst>
robclark: how big is the perf difference reported by clpeak?
<robclark>
12x
<karolherbst>
oof
<karolherbst>
is the prop stack supporting userptr?
<robclark>
idk, it is windows
<robclark>
so no clue
<karolherbst>
mhhh
<karolherbst>
I have no idea how to make clEnqueueReadBuffer faster without userptr then
bolson has joined #dri-devel
<robclark>
but cpu should be able to saturate memory bandwidth, and I can use cached-coherent gpu buffers
<karolherbst>
it's just a plain copy from mapped to application memory
<robclark>
ok, well I'll poke around more
<karolherbst>
is the hw capable of doing such copies?
<karolherbst>
or just GPU -> GPU memory ones?
<karolherbst>
could be that "ptr::copy" is doing something that is slow or so...
<robclark>
I mean it is iGPU so everything is visible to cpu and gpu
<karolherbst>
or resource_map doing something weird, but if it's all the `ptr::copy` one.. then no idea hoenstly
feaneron has quit [Quit: feaneron]
<karolherbst>
could also be something else, like stalling and that's why the number is so low, but also no good idea how to figure that out
<robclark>
this test doesn't even seem to launch any grids, it's just a loop of clEnqueueReadBuffer()
<karolherbst>
yeah
<karolherbst>
maybe it's just memcpy not being optimized for aarch64 or something silly
<karolherbst>
which I doubt, but...
<karolherbst>
it's properly vectorized and stuff, no?
<karolherbst>
on x86 it shows up as memcpy_avx2 or whatever in profiles
<karolherbst>
robclark: uhm.. are you doing a optimized release build with all the perf things enabled? A debug build might give you bad perf there
<robclark>
hmm, is a good question why I get __memcpy_generic().. glibc should have optimized memcpy/memset/etc for this core.. but I don't expect that is a 12x thing
<karolherbst>
mhhh
<karolherbst>
who knows, could be x12 could be x2
<karolherbst>
could also be -O0 making things slow
<robclark>
results are basically same btwn debug and release mesa build
<karolherbst>
but yeah.. it's probably a CPU only test and if the CPU side is slow, then yeah...
<karolherbst>
I see
<karolherbst>
but yeah.. if your core has SVE it shouldn't use generic
<robclark>
ahh, it gets quite a bit faster if I actually set FD_BO_CACHED_COHERENT
<robclark>
:-P
<karolherbst>
:D
<karolherbst>
nice
<robclark>
this is a hack tho, rusticl should be setting some bits to tell me that it wants coherent ;-)
JRepin has quit []
<karolherbst>
yeah...
JRepin has joined #dri-devel
<karolherbst>
I just focused on getting things working for now, and if there are nice flags important for perf, I can figure out how to set them
<karolherbst>
coherent resources are just a bit of a pain in mesa, because some dGPU drivers place them in system RAM
<karolherbst>
or map_coherent needs the coherent flag at resource creation time
<karolherbst>
coherent also requires persistent
<karolherbst>
but maybe it's fine if I only do it for devices with caps.uma set to true
<karolherbst>
but then again, I have no idea what the implications are of enabling coherent + persistent for every resource on uma systems
<robclark>
bit higher cost on gpu I guess, much lower cost for cpu reads
<karolherbst>
could also change the gallium semantics. if PIPE_MAP_COHERENT is used without PIPE_MAP_PERSISTENT, PIPE_RESOURCE_FLAG_MAP_PERSISTENT, PIPE_RESOURCE_FLAG_MAP_COHERENT, it's best effort and drivers might ignore the flag
JRepin has quit []
JRepin has joined #dri-devel
<karolherbst>
what matters more is, if creating the resource with PIPE_RESOURCE_FLAG_MAP_PERSISTENT and PIPE_RESOURCE_FLAG_MAP_COHERENT adds additional costs
<karolherbst>
(besides dGPU drivers placing them in staging/system memory)
<karolherbst>
the one important aspect is, that I don't really need a coherent mapping, the content is synced at sync points, so a barrier should do the trick to guarantee content is visible. What matters more is that the copy is fast
oneforall2 has joined #dri-devel
<stsquad>
digetx can you point me at the CI job so I can crib the command line ;-)
guru_ has quit [Ping timeout: 480 seconds]
CME has quit [Ping timeout: 480 seconds]
testaccount has joined #dri-devel
JRepin has quit []
JRepin has joined #dri-devel
testaccount has quit []
davispuh has quit [Ping timeout: 480 seconds]
<digetx>
stsquad: set `VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/lvp_icd.x86_64.json` in env var and run qemu, checked that vkmark works with that
<zmike>
mareko: I'm looking at st_nir_unlower_io_to_vars; is the idea just that you unset nir_io_has_intrinsics to use it?
feaneron has joined #dri-devel
feaneron has quit []
imre is now known as Guest17176
imre has joined #dri-devel
JRepin has quit []
JRepin has joined #dri-devel
asrivats__ has quit [Ping timeout: 480 seconds]
mripard_ has quit []
Guest17176 has quit []
rasterman has joined #dri-devel
MandiTwo has joined #dri-devel
anholt has quit [Ping timeout: 480 seconds]
karolherbst2 has joined #dri-devel
karolherbst has quit [Remote host closed the connection]