So that means the bulk of the wait is in the thread not releasing quickly. I'm still seeing 6-10ms between reads, which is about 2ms with the fast capture. Attached is yet another patch. I don't have access to a PLC right now to test this patch. It does have a risk of high CPU usage, so check that to make sure I'm not generating other issues.
Are any of your values non-BOOL, but the same type such as DINT? If so, is it possible to move those into an array, then subscribe to the array? That can really reduce the update rate because the driver will group them into a single read. So let's say 40 of your subscriptions were DINTS. If they were moved to an array in the PLC, then use the array to subscribe to in the HMI, the driver would put those into 2 read packets. Instead of taking 200+ms to read the DINT values, it would only take about 15ms.