swung0x4_ has quit [Remote host closed the connection]
Swung0x48 has joined #dri-devel
fab has quit [Quit: fab]
fab has joined #dri-devel
vliaskovitis has quit [Remote host closed the connection]
Swung0x48 has quit [Remote host closed the connection]
Swung0x48 has joined #dri-devel
vliaskovitis has joined #dri-devel
<linkmauve>
I have exactly that in the works, I can already decode the first frame of some H.264 streams using either ffmpeg or gstreamer’s Vulkan video decoders, tested on the Rockchip rk3588 and on the AllWinner A64 atm.
<linkmauve>
> 22:39:29 K900> I wonder if it makes sense to specify video-only Vulkan devices and then have something bridge that to V4L2
urja has quit [Read error: Connection reset by peer]
<K900>
Very cool
<linkmauve>
> 22:46:10 K900> Is there literally any relevant hardware with stateful V4L2 codecs though
<linkmauve>
Besides Qualcomm, Apple also relies on a coprocessor with its own stateful firmware AFAIK.
urja has joined #dri-devel
<K900>
3588 is actually the hardware I'm personally most interested in :)
<K900>
(given that's what's running on my NAS)
<linkmauve>
K900, it isn’t public yet, once I have a working prototype I intend to find funding to continue the maintainance, but I have almost no experience finding companies interested in that kind of work.
olivial has quit [Ping timeout: 480 seconds]
<linkmauve>
I guess Rockchip could be interested in that, maybe some of the board vendors as well.
<K900>
Yeah unfortunately not really something I can help with, but maybe Collabora folk would be interested?
<K900>
Or Radxa
<linkmauve>
Yeah them too, they surely have customers who could be interested in Vulkan video.
kts has joined #dri-devel
<K900>
Collabora is working with Radxa on 3588 bringup
<K900>
Well, is contracted by Radxa is probably more accurate
<linkmauve>
But first thing first, decoding more than just a keyframe and dealing properly with DPB slots so that I can decode a full video. :)
<K900>
We have a Radxa guy in NixOS spaces, I can get you in touch if you're ever interested
<linkmauve>
I indeed am!
<HdkR>
I'm a customer interesting in Qualcomm Vulkan video :P
<HdkR>
interested*
<linkmauve>
HdkR, Vulkan video is stateless, you would have to remap it to a stateful hardware.
<K900>
Uh oh
<K900>
I just realized something
<K900>
They're on matrix.org which is very dead right now
<HdkR>
linkmauve: Definitely, but it doesn't stop me from being interested in it
<linkmauve>
Thankfully I’m on XMPP. :)
<linkmauve>
HdkR, haha. :D
<linkmauve>
HdkR, I’ve heard Wine is starting to use Vulkan video for decoding, this will be useful in FEX-emu hopefully. ^^
<HdkR>
It would be nice yea. V4L2 and x86 applications don't play
rasterman has joined #dri-devel
kts has quit [Quit: Konversation terminated!]
jkrzyszt has joined #dri-devel
Swung0x48 has quit [Remote host closed the connection]
adavy has quit [Ping timeout: 480 seconds]
Swung0x48 has joined #dri-devel
Swung0x48 has quit [Remote host closed the connection]
swung0x4_ has joined #dri-devel
adavy has joined #dri-devel
<tzimmermann>
vsyrjala, hi. about the deadlock in vblank timer. i've been exploring various ideas to fix the deadlock. the easiest way solution schedules a worker that runs drm_crtc_vblank_handle() outside the hrtimer code. is it allowed to run drm_vblank_handle() outside of the vblank IRQ?
alarumbe has joined #dri-devel
sima has joined #dri-devel
enunes has quit [Ping timeout: 480 seconds]
<karolherbst>
Vulkan question.. if I dispatch a compute shader with VkPipelineShaderStageRequiredSubgroupSizeCreateInfo::requiredSubgroupSize == 8, does the SubgroupSize SPIR-V builtin also have to return 8? Because I see anv returning 32 and I'm wondering if it's my fault or not
apinheiro has joined #dri-devel
swung0x4_ has quit [Remote host closed the connection]
Swung0x48 has joined #dri-devel
Swung0x48 has quit [Remote host closed the connection]
swung0x4_ has joined #dri-devel
kts has joined #dri-devel
coldfeet has quit [Quit: Lost terminal]
<glehmann>
driver bug
<glehmann>
> If the pipeline was created with a chained VkPipelineShaderStageRequiredSubgroupSizeCreateInfo structure, or the shader object was created with a chained VkShaderRequiredSubgroupSizeCreateInfoEXT structure, the SubgroupSize decorated variable will match requiredSubgroupSize.
<glehmann>
as long as 8 is a supported subgroup size, and the stage is in requiredSubgroupSizeStages ofc
lynxeye has joined #dri-devel
<dj-death>
karolherbst: that should be tested by CTS and anv would fail that test...
<dj-death>
we should get that when Anv calls vk_pipeline_shader_stage_to_nir()
swung0x4_ has quit [Remote host closed the connection]
Swung0x48 has joined #dri-devel
olivial has joined #dri-devel
davispuh has joined #dri-devel
Caterpillar has quit [Quit: Konversation terminated!]
hansg has joined #dri-devel
<karolherbst>
okay.. yeah, I'll debug later, just wanted to know if my assumptions are correct or not
<karolherbst>
thanks for the info!
kts has quit [Quit: Konversation terminated!]
<dj-death>
np
Swung0x48 has quit []
Swung0x48 has joined #dri-devel
Swung0x48 has quit []
<karolherbst>
uhh.. it was zink calling "nir_lower_subgroups" with a subgroup_size set :'(
kts has joined #dri-devel
enunes has joined #dri-devel
<karolherbst>
dj-death: what are the perf characteristics in regards to subgroup sizes on anv btw. Is it always beneficial to use 8 over 16 over 32 if it fits, or are there situations where simd16 might be faster than sim8?
<dj-death>
it's complicated :)
<karolherbst>
I know that iris defaults to 16, just wondering
<dj-death>
as you go up in size, the register space gets halved
<dj-death>
so there is that...
<dj-death>
more spilling because you run out of space
<karolherbst>
right... sadly with vulkan I can't let the driver decide, and for CL subgroup support I have to know the subgroup size the shader runs at, so I always have to pick a size
<dj-death>
it's more difficult to allocate because messages to load/store data require bigger chunks of contiguous registers
<dj-death>
SIMD16 is usually faster than SIMD8
<dj-death>
but for SIMD32 it's not always the case
<karolherbst>
mhhh
<dj-death>
assuming no spilling
<karolherbst>
I wished vulkan could give me a preferred subgroup size or something
<dj-death>
even we can't until we compile the shader
<karolherbst>
so the "best" generic plan would be to create them all and compare metrics the vulkan runtime gives me on those pipelines?
<karolherbst>
does vulkan even give enough information there
<dj-death>
yeah
<dj-death>
it's what we use for shader-db
<dj-death>
check the spilling first
<karolherbst>
is it driver agnostic?
<dj-death>
then cycle count
<dj-death>
karolherbst: of course not :)
<karolherbst>
:')
<karolherbst>
not sure I like the plan to make it per driver in zink
<dj-death>
maybe you can make an extension to request all shader variants
<karolherbst>
I have a benchmark here where SIMD8 seems to be faster than SIMD16
<dj-death>
and have a dispatch parameter to tell which one to use
<karolherbst>
mhhh
<dj-death>
yeah it's possible
<dj-death>
but compiling all variants will sucks compile time wise
<karolherbst>
maybe I go with lowest first for compute
<karolherbst>
they have a cl_intel_required_subgroup_size extension tho
<dj-death>
that does anything different from the vulkan extension?
kts has quit [Quit: Konversation terminated!]
<karolherbst>
not really
<karolherbst>
let's you declare a subgroup size in the kernel
<karolherbst>
core CL is "give me the subgroups ize for this workgroup size"
<karolherbst>
so you can query how the runtime would behave
<karolherbst>
but it doesn't let you pick
<karolherbst>
sadly the CTS is using those queries and tests that what the runtime returns matches what the shader returns
<karolherbst>
and the only way to model that in vulkan is to set the subgroup size and let zink pick
<karolherbst>
but picking the optimal size is hard :)
shalem has joined #dri-devel
hansg has quit [Read error: Connection reset by peer]
<karolherbst>
ohh looks like on xe intel's CL runtime stopped using always 8
<dj-death>
yeah that sounds surprising to me always 8
shalem has quit [Remote host closed the connection]
<dj-death>
SIMD16 performs better in most cases
shalem has joined #dri-devel
<karolherbst>
even for compute?
<dj-death>
everywhere
<dj-death>
pre Xe2, you can only do SIMD16 on FS/CS
<dj-death>
everything else is SIMD8
<karolherbst>
kernels in CL tend to be a lot more branching, so I can see why a smaller subgroup size might have less perf issues in regards to divergency
hansg_ has joined #dri-devel
<karolherbst>
I see
shalem has quit [Read error: Connection reset by peer]
<karolherbst>
yeah okay in another benchmark SIMD16 is a bit faster
<karolherbst>
_anyway_ it only matters on hardware with multiple subgroup sizes, so maybe I can add vendor specific selection algos...
<karolherbst>
so only intel and AMD so far
<karolherbst>
correctness first anyway
coldfeet has joined #dri-devel
hansg_ has quit []
hansg has joined #dri-devel
Company has joined #dri-devel
sguddati1 has quit [Ping timeout: 480 seconds]
azerov has quit []
sguddati has joined #dri-devel
azerov has joined #dri-devel
guludo has joined #dri-devel
marcf has quit [Ping timeout: 480 seconds]
marcf has joined #dri-devel
nerdopolis has joined #dri-devel
feaneron has joined #dri-devel
fab has quit [Ping timeout: 480 seconds]
SquareWinter68_ has quit [Ping timeout: 480 seconds]
<glehmann>
and I now figured out why, I mistakingly enable it as soon as any lower_doubles_options is set
<glehmann>
but it still shouldn't crash with the unwanted lowering...
<glehmann>
I guess I need to figure out how to run this locally
epoch101 has quit []
<glehmann>
"let's clean up some nir subgroup stuff", I should have known it wouldn't be so easy
sally has quit []
sally has joined #dri-devel
epoch101 has joined #dri-devel
sally has quit []
sally has joined #dri-devel
mndrx has joined #dri-devel
mndrx has quit []
mndrx has joined #dri-devel
JRepin has joined #dri-devel
epoch101 has quit []
epoch101 has joined #dri-devel
sally has quit []
sally has joined #dri-devel
ccallawa has quit [Quit: WeeChat 4.1.1]
sally has quit []
sally has joined #dri-devel
<mndrx>
Hi, I'm new here and had a question about the meaning of the output when running with LIBGL_DEBUG=verbose. Should I ask here or somewhere else?
<mndrx>
when running `LIBGL_DEBUG=verbose glxgears` it first says: "using driver i915 for 4" (running on Intel HD Graphics 4600 btw) and then: "pci id for fd 4: 8086:0416, driver crocus"
djbw has joined #dri-devel
kts has quit [Quit: Konversation terminated!]
<mndrx>
does the 1st refer to the kernel driver and the second the Mesa one or both, cause that kinda confuses me.
sally has quit []
sally has joined #dri-devel
<mndrx>
I think it is using the crocus driver, because it is a Haswell iGPU, but then it also says i915, which makes me think it is using the i915g driver, so I just wanted to know how to interpret the output.
tobiasjakobi has joined #dri-devel
tobiasjakobi has quit []
epoch101 has quit []
<pendingchaos>
"i915" is likely printed by loader_get_kernel_driver_name(), which gets that name for libdrm
<pendingchaos>
which would make it the kernel driver
epoch101 has joined #dri-devel
<K900>
i915 is the kernel driver
<mndrx>
Aha, so my initial though was right, anyway, many thanks for answering.
<bluetail>
Is testing W7500 amdgpu ttm power management (kernel) issues via QEMU passthrough realistic, or does it miss too much hardware state? Each time rebooting the full machine to try yet another kernel param for my w7500 gpu is tedious.
kts has joined #dri-devel
mndrx has quit [Ping timeout: 480 seconds]
helmhotz has quit [Ping timeout: 480 seconds]
kts has quit [Quit: Konversation terminated!]
airlied_ is now known as airlied
kts has joined #dri-devel
kts has quit [Quit: Konversation terminated!]
sgerhold has quit [Quit: Ping timeout (120 seconds)]