ChanServ changed the topic of #dri-devel to: <ajax> nothing involved with X should ever be unable to find a bar
<zmike>
jenatali: any ballpark on a timeline (haha) to get all that fixed? I'd like to get this merged ASAP
<jenatali>
zmike: I made progress today, need to sync (haha) with Sil tomorrow to finish it
ced117_ has quit [Remote host closed the connection]
ced117 has joined #dri-devel
<zmike>
jenatali: alright cool, just signal (haha) me whenever
glennk has quit [Ping timeout: 480 seconds]
ity has joined #dri-devel
ADS_Sr has joined #dri-devel
guludo has quit [Ping timeout: 480 seconds]
zzyiwei has joined #dri-devel
<zzyiwei>
Hi, I have a question regarding ANV_IMAGE_MEMORY_BINDING_PRIVATE. Since that's a private bo to workaround non-ccs modifier, is it better to relocate that to the bound device memory instead? Then all the special rejections for image aliasing can be dropped given the VkDeviceMemory now contains both the main binding and the private binding.
sally_ is now known as sally
lemonzest1 has joined #dri-devel
lemonzest has quit [Ping timeout: 480 seconds]
nerdopolis has quit [Ping timeout: 480 seconds]
lemonzest1 has quit []
lemonzest has joined #dri-devel
Duke`` has joined #dri-devel
Mangix has quit [Read error: Connection reset by peer]
davispuh has quit [Ping timeout: 480 seconds]
Mangix has joined #dri-devel
glennk has joined #dri-devel
Company has joined #dri-devel
kzd has quit [Ping timeout: 480 seconds]
<Lynne>
...so glsl defines f16vec4, nice
<Lynne>
and also a -hf suffix so you can have native 16-bit floats (e.g. 0.0hf)
<Lynne>
...but no scalar 16-bit float format
<Lynne>
making everything mentioned incredibly useful indeed
<Lynne>
right, one more for the list of crimes to charge everyone involved with glsl's evolution (I refuse to call it design, no intelligent thought was involved)
<Lynne>
oh, its float16_t, and you have to enable the GLSL-era AMD half float glsl extension
<Lynne>
which was released after f16vec4 was defined, and is not present in the official GLSL extension list of khronos
<eric_engestrom>
dcbaker: ack; I think we were all happy with the solution we went with in the end though, so while it would be better if meson provided that functionality itself, I think it's not a problem if it takes a long time to get there :)
<sima>
pepp, did you test this on older kernels without F_DUPFD_QUERY already?
<sima>
I think the fallback errno should be EINVAL, everything else is unexpected and I guess you should assert on those
<sima>
and if errno == 0 you can rely on the result
<sima>
or I'm misreading the kernel code
Guest21212 has quit []
tzimmermann has quit [Quit: Leaving]
<pepp>
sima: EBADF can also be returned if the fd is invalid IIRC
nchery has quit [Read error: Connection reset by peer]
nchery has joined #dri-devel
<sima>
pepp, yeah but that's pretty bad programming mistake
<sima>
like from a quick read all the variants are fairly undefined for when you do that, so feels a bit silly to special case that one
<sima>
plus I do think you need a fallback for errno==EINVAL
<sima>
anyway going to drop a comment
<sima>
oh I misread your code
sravn has quit []
fab has quit [Quit: fab]
sally has quit []
asrivats_ has joined #dri-devel
sally has joined #dri-devel
<karolherbst>
zmike, mareko: one of you might know this: what's the "proper" way of compiling multiple shaders in parallel from a frontend pov? Do we have proper APIs for that or should I just create mulitple contexts and compile through those? Or do we already have a wrapper for that doing it with a queue + threads?
<zmike>
in depends how parallel you want it
sravn has joined #dri-devel
<zmike>
if shareable shaders is supported you can use multiple pipe_context objects
<zmike>
but if you want single context then you have a couple options
<zmike>
check out download_texture_compute in st_pbo_compute.c for state of the art
<karolherbst>
in CL I don't have a queue available when compiling things, so atm I use a global helper context for creating compute state objects
<karolherbst>
but some applications need like multiple minutes until compilation is all done, so I kinda want to parallelize this
lplc has quit [Remote host closed the connection]
<zmike>
yeah so copy what I did in compute pbo if you really want to max it out
lplc has joined #dri-devel
<karolherbst>
ahh we have "driver_thread_add_job"
<zmike>
that's the super hammer
<zmike>
which effectively lets you add work directly to the driver's shader compiler thread
<karolherbst>
mhhh
<pepp>
sima: thx for the feedback, I'll tweak the code a bit
<karolherbst>
guess I need to play around with those things a little then
kts has joined #dri-devel
<karolherbst>
luckily CL allows async compilation, so a queue + a fence thing kinda matches that pretty well
<karolherbst>
though I don't need the fence really
<karolherbst>
I think.. maybe I do to be safe
<zmike>
the GL scenario that prompted it was extremely latency sensitive, so I'd imagine it should be able to do whatever you need
<karolherbst>
mhhh, though only zink and radeonsi support it, so I need a fallback myself... maybe I just wrap `util_queue`
<zmike>
well there's the base parallel shader compile stuff
<zmike>
which is also used in the compute pbo logic
<karolherbst>
yeah.. not caring about latency at all, just want to turn synchronization compilation into async ones as this is permitted by the CL API
<karolherbst>
mhhh
<karolherbst>
though I think I have to rethink compilation in a more broader sense, because I also have parts which are driver agnostic... or I just use all driver threads and just load balance between them or something...
<zmike>
what if the shaders compiled themselves?
<karolherbst>
heh
<karolherbst>
the expensive part is the OpenCL C to SPIR-V compilation... and that's driver indepentent anyway
<karolherbst>
but I already have all the screens, and the threads are already created, so might as well just use them for random stuff 🙃
<karolherbst>
though so far I only want to use them for compiling things
<karolherbst>
but even "set_max_shader_compiler_threads" isn't supported by all drivers...
<zmike>
nothing is supported by all drivers
<karolherbst>
but looks like if driver_thread_add_job isn't provided it's all synchronous
<alyssa>
zmike: shame people keep writing gl drivers instead of using zink smh
* alyssa
runs
<zmike>
alyssa: GET BACK HERE!!!!
<zmike>
damn kids
<karolherbst>
anyway.. I don't mind creating my own threads..
<alyssa>
what can i say, gallium is a nicer api than vk
<alyssa>
(and vk is nicer than gl, obviously)
iive has joined #dri-devel
<zmike>
karolherbst: pls don't NIH the wheel
<karolherbst>
yeah.. I should just use util_queue
<zmike>
I meant just use existing api
epoch101 has quit []
<karolherbst>
but then it's slow on other drivers :D
<zmike>
so tell those drivers to fix their shit
<karolherbst>
mhh
<zmike>
be a real frontend
<karolherbst>
guess adding a driver compilation thread isn't hard
<karolherbst>
even if the drivers don't use it themselves
<zmike>
if drivers want to be slow that's their problem, not yours imo
<karolherbst>
well.. iris already has a thread, but doens't implement driver_thread_add_job
<jenatali>
zmike: This fence stuff is worse than I thought :( it's gonna take me a bit longer to untangle. Things like pipe_fence_handles not being refcounted correctly too
<jenatali>
Bleh
<zmike>
:/
<karolherbst>
so maybe I fix iris and move on
* zmike
whispers delete iris
<karolherbst>
it's 5 loc at most
jkrzyszt_ has quit []
lynxeye has quit [Quit: Leaving.]
Peuc_ has joined #dri-devel
jsa1 has quit [Ping timeout: 480 seconds]
Peuc has quit [Ping timeout: 480 seconds]
unerlige1 has left #dri-devel [#dri-devel]
unerlige has joined #dri-devel
<alyssa>
karolherbst: iris? it's like 25,000 lines!
<alyssa>
Kayden: be like "alyssa you're not helping"
<alyssa>
:P
epoch101 has joined #dri-devel
<zmike>
I don't think he can hear you over the sound of infinite meetings
sally has quit []
kts has quit [Quit: Konversation terminated!]
sally has joined #dri-devel
<Kayden>
in fact I can :P
<idr>
Lol
<mareko>
karolherbst: pipe_context::create_compute_state (and create_xs_state) just pushes that compilation onto an async thread, and the next bind or draw waits for it
<mareko>
in radeonsi
<mareko>
so it's already parallel in that driver
<karolherbst>
right... I'm not too concerned about creation of the CSOs itself, but there is a lot more I do: OpenCL C to LLVM, LLVM to SPIR-V, SPIR-V to nir + a bunch of passes
<karolherbst>
and driver finalization
kasper93 has quit []
<karolherbst>
and I'd like to parallelize the entire thing
<mareko>
any driver that enables TC must also be able to accept create_(shader)_state from any thread
<karolherbst>
right
<karolherbst>
so what I'm wondering is, if I just roll out my own compilation queue handling or just use driver_thread_add_job to push jobs to driver threads
<mareko>
that should work
<karolherbst>
but I also wanted to look into reusing clang instances because atm we recreate it over and over again... there is a bit for me to figure out, but if I can just use driver_thread_add_job to schedule such jobs that would make it a lot easier for me
<mareko>
I don't recommend messing with set_max_shader_compiler_threads
<karolherbst>
it's only for that one GL extension, right?
kasper93 has joined #dri-devel
<mareko>
yes
<mareko>
the radeonsi default is 3/4 of CPU cores are dedicated to driver_thread_add_job, and there is a reason for it
<mareko>
as the core count gets lower, the ratio decreases
<karolherbst>
mareko: right.. so the issue with that is, CL is multi-device natively, so if you have let's say 4 AMD devices you also have 4 screens each allocating 3/4 of CPU cores threads
cascardo_ has quit [Ping timeout: 480 seconds]
<karolherbst>
which might be fine regardless
<mareko>
let the kernel deal with it
<karolherbst>
but it complicates things a little if you don't want to spend all CPU cores on compiling things
<karolherbst>
right
<mareko>
radeonsi also has another queue using idle priority for low priority shader compilation (using 1/3 CPU cores)
<mareko>
there is an amazing synergy between TC and the shader compile queues
<mareko>
as draws get enqueued in TC, shaders created between draws get scheduled on compiler threads; TC is deep enough to hold ~1000 draws, so if we get 1000 new shaders between all draws, we compile them all in parallel because they are scheduled to compiler threads immediately while draws are waiting in TC
<karolherbst>
mhhh I see
<mareko>
if we get lots of shaders in 1 frame, we basically compile them in parallel even if they are compiled sequentially by GL and between different draws
<karolherbst>
luckily I don't have any of those issues really. though I was considering using TC at some point, but not sure that with compute only workloads there is much of a benefit
<zmike>
mareko: removing the first point size is valid though since that's the default since maintenance5
<zmike>
or
<zmike>
hm
<zmike>
no, I think that should be valid
<zmike>
radv would need to use the default value for that case
<jenatali>
Right, you'd need to treat emit as a barrier, wouldn't you?
<zmike>
if you're passing vkcts with this then I'd think it means it's missing coverage
jsa1 has joined #dri-devel
guludo has quit [Ping timeout: 480 seconds]
guludo has joined #dri-devel
hazard_hitman has quit [Remote host closed the connection]
hazard_hitman has joined #dri-devel
<Sachiel>
are you sure that's correct? Looking at ANV, we check if the shader writes pointSize and if so tell the HW the point size will come from the shader, otherwise we it comes from HW state, but I don't think there's anything handling the "some paths will write pointSize and others won't" case
<Sachiel>
If the maintenance5 feature is enabled and a PointSize decorated variable is written to, all execution paths must write to a PointSize decorated variable
<Sachiel>
If the pipeline is being created with a Geometry Execution Model, uses the OutputPoints Execution Mode, and the shaderTessellationAndGeometryPointSize feature is enabled, a PointSize decorated variable must be written to for every vertex emitted if the maintenance5 feature is not enabled
<Sachiel>
makes it sound like for geometry shaders you can omit it sometimes? Can't tell if that's contradicting the previous VU or I'm just misunderstanding things
epoch101_ has joined #dri-devel
<mareko>
it's confusing
<mareko>
if you don't write it in all execution paths, it should be 1, right?
<zmike>
I think those are different cases
<zmike>
the first one is saying you can't do like if (x) pointsize = y; else {}
<zmike>
the second one is enforcing the "no default point size exists" idea
<zmike>
but it could be clearer
<zmike>
and I'd guess there's no cts coverage eitehr
<mareko>
either way, pointsize must be set for every emit in radv or not at all
<zmike>
sounds like a radv bug according to the current spec
epoch101 has quit [Ping timeout: 480 seconds]
digetx has quit [Quit: No Ping reply in 180 seconds.]
<jenatali>
zmike: I'm going to stage a branch with fixes/cleanups to video stuff, and get that landed ASAP. I think the thing that makes sense is to rebase your branch on that, where the fixups to split the fence value stuff out will be more obvious
<jenatali>
I think the thing that makes sense is to wait to rebase your branch, but let me know if you want me to do it sooner
<zmike>
alrighty
<zmike>
thanks for prioritizing
<jenatali>
This found some... real bad stuff
<zmike>
haha I bet
<jenatali>
Mainly just refcounting gone missing
<zmike>
imo just ship zink+dozen
<jenatali>
Yeah but dzn's missing a ton of stuff and I don't have the prioritization in my schedule to make it work :(
<zmike>
oof
<jenatali>
Especially around video, getting vk video to map nicely would be a lot
<zmike>
vk video is a lot
<jenatali>
Video is a lot
<zmike>
amen
edolnx has joined #dri-devel
cascardo_ has joined #dri-devel
cascardo has quit [Ping timeout: 480 seconds]
Duke`` has quit [Ping timeout: 480 seconds]
illwieckz__ has quit []
woskova has quit [Ping timeout: 480 seconds]
woskova has joined #dri-devel
Nasina has quit [Read error: Connection reset by peer]
Company has quit [Quit: Leaving]
Nasina has joined #dri-devel
oneforall2 has quit [Remote host closed the connection]
oneforall2 has joined #dri-devel
guludo has quit [Ping timeout: 480 seconds]
oneforall2 has quit [Remote host closed the connection]
oneforall2 has joined #dri-devel
Nasina has quit [Ping timeout: 480 seconds]
apinheiro has quit [Quit: Leaving]
oneforall2 has quit [Remote host closed the connection]
oneforall2 has joined #dri-devel
oneforall2 has quit [Remote host closed the connection]
oneforall2 has joined #dri-devel
illwieckz has joined #dri-devel
Nasina has joined #dri-devel
pcercuei has quit [Quit: dodo]
haaninjo has quit [Quit: Ex-Chat]
Nasina has quit [Read error: Connection reset by peer]