<karolherbst>
FULL_PROFILE needs 128 sampler views... but 128 for image views is out of spec...
<karolherbst>
but I was considering using bindless to support full profile on more devices
<zmike>
I think only radeonsi and zink support bindless
<karolherbst>
and nouveau
<zmike>
whoveau?
<karolherbst>
:P
<alyssa>
asahi deliberately doesn't support bindless since there are perf tradeoffs
<robclark>
I don't really support bindless.. I use bindless internally within driver, it would be kinda awkward to support both that and gallium bindless
<alyssa>
i figure "bindless for vk, no bindless for gl" is good enough for everyone who isn't named mike
<robclark>
(but I could support 128.. it would just be a bit annoying)
<alyssa>
and anyone named mike can use zink
<karolherbst>
robclark: I'm more confused why it's using 128 image_views ...
* robclark
too
<karolherbst>
I could just reject such a kernel...
<karolherbst>
aand maybe should
<robclark>
well, it is because it adds that many kernel args
<karolherbst>
yeah sure, but that's illegal
<robclark>
yeah, you aren't checking the limits and then doing unsafe { set_shader_images() } which goes boom
<karolherbst>
yeah...
<karolherbst>
I should check if the kernel created actually fits within CL_DEVICE_MAX_WRITE_IMAGE_ARGS
<robclark>
right
<karolherbst>
looks like that's CL_OUT_OF_RESOURCES for clEnqueueNDRangeKernel
epoch101 has joined #dri-devel
<karolherbst>
kinda weird place for it...
<karolherbst>
should probably just fail to compile
<karolherbst>
but not sure if that's actually legal
<karolherbst>
alternatively I could do indirect image operations...
<karolherbst>
but not sure what's the perf trade off with those
<robclark>
it defn makes it as far as set_shader_images() before anything gets rejected
<karolherbst>
yeah, but my point is, it's an application bug anyway
<karolherbst>
though I could try to support kernels where applications use the same image_view
<karolherbst>
in multiple kernel args
<karolherbst>
but it's still a problem that the application uses too many image args and it seems the spec only allows an error for that when the kernel gets launched, so that's a bit annoying
Nasina has quit [Read error: Connection reset by peer]
<karolherbst>
sure, but not much we can do about that
<karolherbst>
though I guess it just requires 128 and won't run and probably will never fix it
<karolherbst>
mut also...
<karolherbst>
this code...
<karolherbst>
🙃
JLP_ has quit [Ping timeout: 480 seconds]
epoch101 has quit []
<robclark>
yeah, sorry, I probably owe you a drink to help forget that shader :-P
<robclark>
it's basically a giant demux
<karolherbst>
so what can do really do about that one...
<karolherbst>
could support 128 write images :)
<karolherbst>
there are a couple of options here.. as I said: bindless would be one, but could also turn it into indirect accesses and load the index as a kernel parameter
<robclark>
yeah, although it makes state changes more expensive.. maybe I'll do some different path for compute contexts.. idk yet
<karolherbst>
if it's like 128 times the same image, then I'd only need to bind once and reuse the same index
<karolherbst>
or well.. it loads the same index 128 times
<robclark>
indirect would be better, I think
<karolherbst>
but.. it's an additional context pull, but not sure how much it matters given that like.. kernels touch VRAM anyway
<karolherbst>
could only do indirects either if drivers don't care or if it goes out of lmits
<karolherbst>
indirects shouldn't be a major problem. I never got rid of the space I allocate for them in the kernel input buffer, so we already have a place to store the index..
<karolherbst>
more concerned about the deduplication
<karolherbst>
I'll think about it and maybe I come up with a good solution
<karolherbst>
robclark: are there any costs with binding the same image 32 times?
<karolherbst>
like.. is it more or less expensive over having 32 indirects
<robclark>
I've not measured it but I wouldn't expect so
<karolherbst>
okay
<karolherbst>
well then I only need to focus on the case where the kernel has more args than "max_shader_images"
<robclark>
anyways, I'm still wrapping my head around what tensorflow is doing.. I mean it seems like this case it should de-duplicate inputs to get a simpler kernel
<karolherbst>
and then I turn to indirects and fail the invocation if it actually binds more images than max_shader_images
<karolherbst>
yeah.. but maybe sometimes it's different iamges?
<karolherbst>
who knows
<karolherbst>
maybe it's just bad code
<karolherbst>
maybe making it work with this one just means it will fail later.. who knows
<robclark>
yeah, that is what I'm trying to figure out ;-)
Nasina has quit [Read error: Connection reset by peer]
Nasina has joined #dri-devel
Mangix has quit [Ping timeout: 480 seconds]
<alyssa>
robclark: "I paid for 128 images, and I'm gonna *use* 128 images!"
<robclark>
you get what you paid for :-P
Nasina has quit [Ping timeout: 480 seconds]
epoch101 has joined #dri-devel
jsa1 has quit [Ping timeout: 480 seconds]
Duke`` has quit [Ping timeout: 480 seconds]
Nasina has joined #dri-devel
asrivats_ has quit [Read error: Connection reset by peer]
asrivats has quit [Read error: Connection reset by peer]
asrivats_ has joined #dri-devel
apinheiro has quit [Quit: Leaving]
Mangix has joined #dri-devel
guludo has quit [Ping timeout: 480 seconds]
OftenTimeConsuming has quit [Remote host closed the connection]
OftenTimeConsuming has joined #dri-devel
pcercuei has quit [Quit: dodo]
vliaskov_ has quit [Ping timeout: 480 seconds]
haaninjo has quit [Quit: Ex-Chat]
Kayden has quit [Quit: -> home]
epoch101 has quit []
cef has quit [Ping timeout: 480 seconds]
lsntvt__ has quit [Ping timeout: 480 seconds]
tyalie has quit [Ping timeout: 480 seconds]
lanodan is now known as Guest18046
lanodan has joined #dri-devel
Guest18046 has quit [Remote host closed the connection]