This is a follow-up of my previous post, where I was trying to fix the bug #1042864 in Chromium: key strokes happening on native dialogs, like open and save dialogs, were not reported to the screen reader.
After learning how accessibility tools (ATs) register listeners for key events, I found out the problem was not actually there; I had to investigate how events arrive from the X11 server to the browser, and how they are forwarded to the ATs.
Events arrive from the X server
If you are running Chromium on Linux with the X11 backend (most likely, as it is the default), the Chromium browser process receives key press events from the X server. Then, it finds out if the target of those events is one of its browser windows, and sends it to the proper Window object to be processed.
These are the classes involved in the first part of this process:
The interface PlatformEventSource represents an undetermined source of events coming from the platform, and a PlatformEventDispatcher is any object in the browser capable of managing those events, dispatching them to the actual webpage or UI element. These two classes are related, the PlatformEventSource keeps a list of dispatchers it will forward the event to, if they can manage it (CanDispatchEvent).
The X11EventSource class implements PlatformEventSource; it has the code managing the events coming from an X11 server, in particular. It additionally keeps a list of XEventDispatcher objects, which is a class to manage X11 Event objects independently, but it’s not an implementation of PlatformEventDispatcher.
The X11Window class is the central piece, implementing both the PlatformEventDispatcher and the XEventDispatcher interfaces, in addition to the XWindow class. It has all the means required to find out if it can dispatch an event, and do it.
The main event processing loop looks like this:
- An event arrives to X11EventSource.
X11EventSource loops through its list of XEventDispatcher, and calls CheckCanDispatchNextPlatformEvent for each of them.
The X11Window implementing that function checks if the XWindow ID of the event target matches the ID of the XWindow represented by that object, and saves the XEvent object if affirmative.
X11EventSource calls DispatchEvent as implemented by its parent class PlatformEventSource.
The PlatformEventSource loops through its list of PlatformEventDispatchers and calls CanDispatchEvent on each one of them.
The X11Window object, which had previously run CheckCanDispatchNextPlatformEvent, just verifies if the XEvent object was saved then, and considers that a confirmation it can dispatch the event.
When one of the dispatchers answers positively, it receives the event for processing in a call to DispatchEvent; it is implemented at X11Window.
If it’s a keyboard event, it takes the steps required to send it to any ATs listening to it, which had been previously registered via ATK.
When X11Window ends processing the event, it returns POST_DISPATCH_STOP_PROPAGATION, telling PlatformEventSource to stop looping through the rest of dispatchers.
This is a sequence diagram summarizing this process:
Events leave to the ATs
As explained in the previous post, ATs can register callbacks for key press events, which ultimately call AtkUtilClass::add_key_event_listener. AtkUtilClass is a struct of function pointers, the actual implementation is provided by Chromium in the AtkUtilAuraLinux class, which keeps a list of those callbacks.
When an X11Window class encounters an event that is targetting its own X Window, and it is a keyboard event, it calls X11ExtensionDelegate::OnAtkEvent() which is actually implemented by the class DesktopWindowTreeHostLinux; it ultimately hands the event to the AtkUtilAuraLinux class and runs HandleAtkEvent(). It will loop through, and run, any listeners that may have been registered.
Native dialogs are different
Native dialogs are stand-alone windows in the X server, different from the browser window that called them, and the browser process doesn’t wrap them in X11Window object. It is considered unnecessary, because the windows for native dialogs talk to the X server and receive events from it directly.
They do belong to the browser process, though, which means that the browser will still receive events targetting the dialog windows. They will go through all the steps mentioned above to eventually be dismissed, because there is no X11Window object in the browser matching the ID of the target window of the process.
Another consequence of dialog windows belonging to the browser process is that the AtkUtilClass struct points to Chromium’s own implementation, and here comes the problem… The dialog is expected to manage its own events through GTK+ code, including the GTK+ implementation of AtkUtilClass, but Chromium overrode it. The key press listeners that ATs registered are kept in Chromium code, so the dialog cannot notify them.
Finally, fixing the problem
Chromium does receive the keyboard events targetted to the dialog windows, but it does nothing with them because the target of those events is not a browser window. It gives us, though, a leg towards building a solution.
To fix the problem, I made Chromium X Windows manage the keyboard events addressed to the native dialogs in addition to their own. For that, I took advantage of the “transient” property, which indicates a dependency of one window from the other: the dialog window had been set as transient for the browser window. In my first approach, I modified X11Window::CheckCanDispatchNextPlatformEvent() to verify if the target of the event was a transient window of the browser X Window, and in that case it would hand the event to X11ExtensionDelegate to be sent to ATs, following the code patch previously explained. It stopped processing at this point, otherwise the browser window would have received key presses directed to the dialog.
The approach had one performance problem: I was calling the X server to check that property, for every keystroke, and that call implied using synchronous IPC. This was unacceptable! But it could be worked around: we could also notify the corresponding internal X11Window object about the existence of this transient window, when the dialog is created. This implies no IPC at all, we just store one new property in the X11Window object that can be checked locally when keyboard events are processed.
This is a link to the review process of the patch, if you are interested in its history. To sum up, in the final solution:
- Chromium creates the native dialog and calls XWindow::SetTransientWindow, setting that property in the corresponding browser X Window.
When Chromium receives a keyboard event, it is captured by the X11Window object whose transient window property has been set before.
X11ExtensionDelegate::OnAtkEvent() is called for that event, then no more processing of this event happens in Chromium.
The native dialog code will also receive the event and manage the keystroke accordingly.
I hope you enjoyed this trip through Chromium event processing code. If you want to use the diagrams in this post, you may find their Dia source files in this link. Happy hacking!
Pingback: A recap of Chromium dialog accessibility enhancements | Jacobo's home at Igalia
Pingback: The trip of a key press event in Chromium accessibility | Jacobo's home at Igalia