ChanServ changed the topic of #etnaviv to: #etnaviv - the home of the reverse-engineered Vivante GPU driver - Logs https://oftc.irclog.whitequark.org/etnaviv
alarumbe has quit []
dos11 has joined #etnaviv
dos1 has quit [Read error: Connection reset by peer]
SpieringsAE has joined #etnaviv
luc64627490 has joined #etnaviv
chewitt has joined #etnaviv
lynxeye has joined #etnaviv
mvlad has joined #etnaviv
alarumbe has joined #etnaviv
pcercuei has joined #etnaviv
chewitt has quit [Ping timeout: 480 seconds]
chewitt has joined #etnaviv
pH5 has joined #etnaviv
_whitelogger has joined #etnaviv
chewitt has quit [Quit: Zzz..]
SpieringsAE has quit [Quit: SpieringsAE]
chewitt has joined #etnaviv
chewitt has quit [Quit: Zzz..]
chewitt has joined #etnaviv
chewitt has quit [Quit: Zzz..]
chewitt has joined #etnaviv
chewitt has quit [Quit: Zzz..]
<marex> austriancoder: is the kernel driver at least somehow, partly, able to decode the comand stream and print it in non-hex form ?
<marex> hmmm, I tried to create minimum command stream, but it is still like 50 opcodes
lynxeye has quit [Quit: Leaving.]
erle has joined #etnaviv
mvlad has quit [Remote host closed the connection]
<marex> austriancoder: is it possible the flop reset patches from Gert are not implemented correctly ?
<austriancoder> marex: there kernel driver has no cmd stream decoding - yeah the kernel adds quite some commands.
<austriancoder> marex: did you see a stuck dma address in debugfs?
<marex> austriancoder: sec
<marex> I even see it in register readout
<austriancoder> marex: I am in contact with someone who runs kmscube under FreeBSD/arm64 on a stm32mp2 with mesa. this person needed the PPU flop reset stuff - so it should be okay.
<marex> austriancoder: look at https://paste.debian.net/plainh/b60bdfa9
<marex> this part close to the end
<marex> [ 364.996556] cmd 000001e0: 48000000 00001001 0801502e 00000000
<marex> [ 365.002336] cmd 000001f0: 08010e01 00000040 380000c8 00000000
<marex> [ 365.008010] cmd 00000200: 40000002 000021f8 00000000 00000000
<marex> [ 365.013785] cmd 00000210: 00000000 00000000 00000000 00000000
<marex> ...
<marex> [ 369.128569] gpu_read[186] READ reg=0664 data=000021f8
<marex> [ 369.133692] gpu_read[186] READ reg=0668 data=40000002
<marex> [ 369.138778] gpu_read[186] READ reg=066c data=000021f8
<marex> those are VIVS_FE_DMA_ADDRESS / VIVS_FE_DMA_LOW / VIVS_FE_DMA_HIGH respectively
<marex> so I would say, the GPU did suck in the whole command stream ... but why did it time out, that is beyond me
<marex> I also think the register settings are now comparable between blob and etnaviv , so the problem must be in the command stream somewhere
<austriancoder> no interrupt -> job timeout
<marex> that is what I am struggling with for the last two or three weeks, yes, I figured that much
<marex> blob driver does generate interrupt
<marex> I made blob driver print register IO and buffers it exchanges with kernel
<austriancoder> marex: do you have a link to the used kernel fork?
<marex> austriancoder: is there any tool which I can use to decode the command stream into readable form ? I have the command streams in hex so far
<marex> austriancoder: which kernel fork , the one I use is local stuff with many patches , because the stm32mp2 upstream support is catastrophically bad
<marex> austriancoder: freebsd person is also running etnaviv, on freebsd ?
<austriancoder> there is a tool: ./dump_separate_cmdbuf.py -b
<austriancoder> marex: yes.. runs etnaviv in the userspace
<marex> austriancoder: and the flop reset is in mesa ?
<marex> austriancoder: or in kernel ?
<marex> austriancoder: because Gert had some weird flop reset branch in mesa too
<austriancoder> marex: in the freebsd kernel
<austriancoder> I am asking for a link to his work
<marex> austriancoder: thank you
<erle> austriancoder do you have a working stencil shadow demo for reform 2 vivante gpu maybe?
<erle> austriancoder can i just try mesa demos or is there some catch?
<marex> austriancoder: all the tools are python2 dependent, aren't they ?
<austriancoder> erle: good question .. I am "just" running conformance test suits and piglit.
<austriancoder> marex: they are .. but podman helps here ^
<erle> austriancoder well the big issue i have is somehow irrlicht stencil shadows don't seem to work bud idk why
<austriancoder> erle: from a quick look there might be some problems - like dEQP-GLES3.functional.texture.shadow.2d.linear.not_equal_depth24_stencil8 is failing
<austriancoder> erle: can I easily reproduce this issue?
<erle> austriancoder yes, compile https://codeberg.org/libregaming/minetest and run it with stencil shadow enabled
<marex> austriancoder: is this supposed to happen ?
<marex> oe@test:~/etna_viv/tools$ head -n 2 ~/cmd.txt
<marex> [ 364.341277] cmd 00000000: 08010e13 00000002 08010e02 00000701
<marex> [ 364.347050] cmd 00000010: 48000000 00000701 0804d800 00001000
<marex> oe@test:~/etna_viv/tools$ python2.7 ./dump_separate_cmdbuf.py ~/cmd.txt
<marex> {
<marex> }
<austriancoder> marex: na.. hmm.. you might need to add support for your format or convert the text representation to binary
<marex> oe@test:~/etna_viv/tools$ python2.7 ./dump_separate_cmdbuf.py -g ~/cmd.txt
<marex> [0x000000000] 0x08010e13, /* LOAD_STATE (1) Base: 0x0384C Size: 1 Fixp: 0 */
<marex> {
<marex> [0x000000004] 0x00000002, /* GL.API_MODE := OPENCL */
<marex> [0x000000008] 0x08010e02, /* LOAD_STATE (1) Base: 0x03808 Size: 1 Fixp: 0 */
<marex> -g switch seems to help
<austriancoder> great
<marex> austriancoder: but now what ... now I comapre the command buffers with blob driver and look for differences ?
<marex> because I think that is the only thing left
<marex> register settings seem to be aligned
<marex> austriancoder: the blob driver produces this kind of output ...
<marex> 683 [ 136.851864] #[function: set mmu]
<marex> 687 [ 136.870090] 0x08010E1E 0x010A0002 0x08010E1E 0x010A0003
<marex> 684 [ 136.855029] @[memory 0xfa813000 0x00000048
<marex> 686 [ 136.864601] 0x08010E1E 0x010A0000 0x08010E1E 0x010A0001
<marex> 685 [ 136.859198] 0x0801006B 0xFFFE0000 0x08010E12 0x01490000
<marex> 688 [ 136.875485] 0x08010E02 0x00000701 0x48000000 0x00000701
<marex> 689 [ 136.880869] 0x10000000 0x00000001
<marex> 690 [ 136.880883]
<marex> 691 [ 136.885861] ] -- memory
<marex> have you seen that before ?
<marex> the blob KMD anyway
<austriancoder> marex: I often look into the galcore kernel drivers from the nxp kernel fork - from time to time there are nice findings. but i never compile such a kernel and run it on any device. for RE devices, I never touch it.
<marex> austriancoder: this comes from galcore
<marex> the kernel driver that is
<marex> its some debug print
<austriancoder> nice
<marex> austriancoder: hmmm ... but wait
<marex> I am decoding the command stream, and I see
<marex> [0x0000001f4] 0x00000040, /* GL.EVENT := EVENT_ID=0x0,FROM_FE=0,FROM_PE=1,FROM_BLT=0,SOURCE=0x0 */
<marex> that should raise interrupt, right ?
<austriancoder> jup
<marex> and I know it was executed, because the DMA registers say the GPU is past that point, right ?
<austriancoder> no.. you know that FE has read it
<marex> austriancoder: can I have FE raise an interrupt ?
<marex> austriancoder: also .. FE has read it, so what is the next step ? PE did not execute it ?
<austriancoder> marex: cat /sys/kernel/debug/dri/128/gpu and you should see what inside the GPU is busy
<austriancoder> FROM_FE=1,FROM_PE=0
<austriancoder> and you should get something
erle has quit [Quit: K-lined]
<marex> austriancoder: so yes, if I use event FROM_FE, I get an IRQ
<marex> austriancoder: does that mean FE works, but some other block in the GPU is stuck/disabled/... ?
<austriancoder> great - yes
<austriancoder> marex: is the PPU flop reset cmd stream working?
<marex> austriancoder: I do not know :)
<marex> austriancoder: I was wondering about the same thing
<austriancoder> marex: should be a compute shader where the cmd stream is generated and executed during gpu poweron/reset
<marex> austriancoder: Gert had some flop reset stuff for mesa in their git branch
<marex> then they moved it to the kernel
<austriancoder> marex: correct - the kernel is the better place for it
<marex> lemme check if I can extract the same flop reset buffer from the blob driver
<marex> it should be in the debug output
<marex> then I can compare I suppose
<marex> austriancoder: lemme understand this first ... the vivante GPU executes a command stream ring buffer, right ?
<marex> austriancoder: there is a chunk of memory, and every time someone sends the GPU a new command stream, it gets appended to this ring buffer somehow ?
<marex> austriancoder: it isnt a linked list in memory, it is a continuous run of commands, right ?
<austriancoder> its a ring buffer and when the end is reach there is a wait-link command pair that executes the wait forever
<austriancoder> when you submit a cmd the last step is to patch the link-part of the wait-link to "jump" to the new command stream, which has an wait-link at the end
<marex> austriancoder: so it does behave more like a link list ?
<austriancoder> cache maintenance etc (kernel) | CMD (user) | event (kernel) | wait (kernel) | link back to the wait (kernel)
<marex> austriancoder: looking at the flop reset, hmmm
<marex> 43 CMD_LOAD_STATES_START(cmdbuf, VIVS_SH_HALTI5_UNIFORMS(0), 4);
<marex> 44
<marex> 45 OUT(cmdbuf, buffer_base + input_offset);
<marex> this part points to a buffer at 0xfa81f000 in blob driver (reserved memory area for the GPU)
<marex> this part points to a buffer at 0x1000 in etnaviv (odd?)
<marex> maybe the buffer base is bogus ?
<austriancoder> that should be fine
<austriancoder> you can add your event-from-fe state at the beginning of the flop reset and see if you get something
<marex> austriancoder: that event-from-fe is at the end of the command stream now, and I do get something
<marex> doesnt that mean the whole command stream went through, including the flop reset ?
<austriancoder> looks so
<austriancoder> now its time to look into the galcore kernel sources and see if there is some special handling regarding the event state
<marex> austriancoder: like some polling after flop reset ?
<marex> 63 * The flops can be reset with PPU, NN and TP programs.
<marex> 65 * Requirements:
<marex> 64 * PPU:
<marex> 66 * 1. DP inst with all bins enabled.
<marex> 67 * 2. Load inst which has at least two shader group,
<marex> 68 * and every thread should load from different 64-byte address.
<marex> 69 * 3. Stroe inst which has at least 6 threads, whose addresses are
<marex> 70 * from different 64-byte address and flush.
<marex> 71 * Case:
<marex> 72 * * InImage: 64x6 = {1}, unsigned int8
<marex> 73 * * OutImage: 64x6, unsigned int8
<marex> 74 * * OutImage = InImage + InImage
<marex> maybe ?
<marex> ... and flush ?
<austriancoder> what would be interesting is the used gceHW_FE_TYPE
<marex> what is that ?
<marex> I am not seeing that in either the driver or command stream dump
<austriancoder> gckEVENT_Submit(..)
<austriancoder> I need to leave now .. bed is calling.
<marex> you lost me
<marex> OK, thanks for the help
<marex> is there anything I should specifically look for next ?
<austriancoder> marex: in the kernel driver there should be a function that adds the event command .. and it depends on the FE type used. In the imx fork the function is called gckEvent(..).
<marex> Command->feType == gcvHW_FE_MULTI_CHANNEL ... this ?
<austriancoder> Wait.. it's called gckEvent_Submit(..)
<austriancoder> sorry...
<marex> austriancoder: well ...
<marex> austriancoder: I got the kmscube cube rendered I think
<marex> austriancoder: but it misbehaves, badly ...
<marex> austriancoder: I am using the FE to generate interrupt, instead of PE
<marex> so maybe the whole rendering happens, just ... the interrupt is missing ?
<austriancoder> You are telling everybody that the render job is done when read by FE ;)
<marex> austriancoder: but when I run kmscube, I see the actual cube every once in a while
<marex> austriancoder: which ... should come from the GPU , right ?
<marex> austriancoder: I mean, what else would assemble i
<marex> t
<austriancoder> You trigger an event at the beginning of the GPU pipeline, which still is running. What you want is an event form the end of the pipeline aka PE
<marex> austriancoder: so even if the event is at the end of the command stream, it is effectively triggered at the beginning of the pipeline, and something later in the pipeline gets ... stuck ?
<austriancoder> look how the galcore kernel driver emits the event to the command stream. Maybe it uses a not-yet supported FE type
<austriancoder> if you see something like a cube.. from time to time.. I think the pipleline runs till the end.
<marex> it does have some FE type async and multi-channel
<austriancoder> none of them is supported
<marex> I see cube, and it is even slowly rotating
<austriancoder> we can chat in some hours again.. me is leaving the pc
<marex> thanks, good night