The step in question states: (bold emphasis and parentheticals are mine)
Let control be the first descendant element of subject (the dialog), in tree order, that is not inert and has the autofocus attribute specified.
If there isn't one, then let control be the first non-inert descendant element of subject, in tree order.
If there isn't one of those either, then let control be subject.
In other words, if control is not an autofocused element, or a natively focusable element is not found, then let control be the dialog itself.
Without a screen reader active, there is no visible focus placement. Querying the activeElement after the modal dialog is opened returns the body element.
Android Chrome with TalkBack sends focus to the web view.
macOS Chrome with VoiceOver focuses the web view.
Chrome with JAWS 2019 sends the virtual cursor within the
dialog, but "modal dialog" is not announced.
Chrome with NVDA 2018.4.1 sends the virtual cursor to the document root. Hitting down arrow will announce "dialog", but the virtual cursor will fail to enter the dialog. Instead just announcing "blank".