Motion, Animations, Video, Audio

There are 4 main approaches to use when it comes to accessing audio and video:

  • Captions: wording that corresponds with media for the deaf.
  • Transcripts: the verbal dialogue of words spoken in text format; as well as the description of any relevant visual details in each scene. The media file is meant to be read, not listened to or watched.
  • Descriptive audio: a variation of the media file that is narrated by someone who is explaining relevant visual details (like unspoken events and actions) for blind people.
  • Sign language interpretation: footage of an interpreter, in sync with a media file (or within the video frame) for those who can understand sign language.

Other things to consider:

  • Clear audio: for people with hearing problems, loud background noise makes it troublesome to know what is being said.
  • Seizure prevention: individuals suffering from vestibular disorders may become disoriented; they may also experience dizziness or nausea when there is substantial movement in animated or video content.
  • Stopping auto-play audio conflict with screen readers: auto-play enabled media players may conflict with a user’s screen reader, making it difficult to hear what is being said.
  • Accessible media player: the actual media player must articulate the values, functions, and names of the controls, as well as their states.
  • Live multimedia (audio plus video) events that include narration/dialogue need to be paired with synchronized captions.
  • Transcript and checklist captions: prerecorded multimedia (audio plus video) files must contain synchronized captions.
  • Live audio-exclusive content that features narration or dialogue should be paired with synchronized captions that are text-based.

When transcripts are required:

  • A written transcript must accompany multimedia (audio plus video) content.
  • Content only containing prerecorded audio needs to be paired with easy-to-access written transcripts.
  • A written transcript describing visual details must be offered for video-exclusive content.

Transcripts and captions must be comprised of the following:

  • Content captions must be scripted verbatim.
  • Transcripts and captions must match live or unscripted content verbatim (though exceptions are made for filler words or stuttering mannerisms like “uh.” Transcripts and captions will then be more legible and comprehendible.
  • The transcript must articulate vital visual occurrences.
  • Relevant background noises need to be articulated in transcripts and captions, ideally in (parenthesis) or [brackets].
  • Words spoken off-camera have to be addressed in transcripts and captions.
  • The individual speaking needs to be named in transcripts and captions.
  • Punctuation must be used in transcripts and captions to express emphasis, when applicable, as opposed to writing additional narratives. Exceptions are made for instances where phonetic spelling is relevant to the context of the scene.
  • Transcripts and captions cannot purposefully reveal undisclosed content details before a predetermined time.
  • Music must be credited with the artist and song title in transcripts and captions. Exceptions are made when doing so would conflict with the content shown.
  • Content-relevant lyrics to songs must be added to transcripts and captions.
  • When dialogue is hard to understand or inaudible, transcripts and captions should acknowledge this in neutral terminology.
  • Coarse language should not be omitted from transcripts and captions, where appropriate. They can be muted or bleeped out based on the preferences of an audience.
  • Transcripts and captions should articulate the narrative when dialogue is mouthed silently or whispered.
  • Captions should articulate audio noises by describing the actual sounds, as opposed to describing what is producing a particular sound.
  • Captions are discouraged from being longer than a few lines per scene.
  • When applicable, line breaks of captions are encouraged to be inserted at spots between phrases that correspond with a scene, as opposed to the center of a phrase. Mixed-case type is encouraged for captions.
  • A caption’s default font is sans-serif.
  • A caption should not contain any more than 32 characters per line.
  • Captions are encouraged to be left on a screen for at least one second in any scene, taking into consideration the amount of words used. When feasible, each word should be left on screen for at least 0.3 seconds.
  • Captions should not cover up any relevant visual details, faces of people or onscreen wording of any kind.
  • Captions must be in synch with audio using absolute precision. Exceptions are made if this action makes captions harder to read.
  • A black background with white wording is the default color combination for a caption.
  • A caption’s default background and font color needs to have a 3:1 ratio at the least (if an 18 point minimum font size is used).
  • A caption’s default minimum font size is 22pt.
  • A caption’s default weight is regular/normal (as opposed to bold).
  • Caption colors shouldn’t be used as the sole method of articulating meaning.
  • All caps or italics for a caption can be used to emphasize a point when punctuation isn’t enough to articulate the overall meaning.
  • In a caption, quotation marks (or alternatively, underlines and italics, if the format supports it), as well as multi-case capitalization, is encouraged to define titles (such as for films or books) where applicable.
  • The final caption frame is encouraged to be taken off the screen when lengthy silent intervals are taking place.
  • Any capitalized words in captions are encouraged to be at least 1.5 seconds long.
  • The silent durations should be described in situations where the visual content implies relevant dialogue or noise.
  • For caption frames, extra onscreen periods can be inserted in a scene with a substantial amount of visual activity.

Visual personalization of captions

  • Users should have the ability to personalize a caption’s visual presentation.

File format of captions

  • Inclusion of several file formats for captions are encouraged.
  • A WebVTT file should be one the file format options available.

Audio descriptions (when required)

  • All multimedia (audio plus video) that is pre-taped needs to have descriptive audio.
  • Video files that are exclusively pre-taped need to have descriptive audio.
  • Descriptive audio can be offered for live multimedia (video plus audio) content.
  • Descriptive audio can be offered for live video-exclusive content.

Extended descriptive audio

  • In cases where foreground sounds aren’t enough for descriptive audio to articulate, extended descriptive audio can be offered for multimedia (audio plus video) content that is pre-taped.

Sign Language Interpretation

Establishing when sign language is necessary

  • Sign language interpretation can be offered for pre-taped audio-exclusive content.
  • Sign language interpretation can be offered for multimedia (audio plus video) content that is live.
  • Sign language interpretation can be offered for audio-exclusive content that is live.

Media Player Functionality

Keyboard availability

  • All of a medial player’s functions need to be accessible on the screen reader for a keyboard user.
  • A media player’s controls needs to display the proper screen reader names, values, and responsibilities to users.

Descriptive audio, transcripts, and captions

  • Media players should give users access to descriptive audio, transcripts, and captions.


  • Media players should offer personalization of captions.
  • Media players are encouraged to retain user preferences.
  • Full-screen video is an option media players should provide.

Background Audio

Background noise in media

  • Minimization of background noise in pre-taped audio and pre-taped multimedia content is encouraged (20dB lower than foreground noise, with exceptions made for sporadic noise that are 2 seconds or less) or removed during speech or narration; or an option to switch off background noise must be available.
  • Minimization of background noise in audio-exclusive audio and live multimedia content is encouraged (20dB lower than foreground nose, with exceptions made for sporadic noise lasting 2 seconds or less); or removed during speech or narration; there must be an option to switch off background noise.

Web page background audio

  • A system must be in place to change, mute, pause, or stop audio volume that plays automatically on a page for longer than a few seconds.

Flashing content

  • Content should not be featured on a page that flashes three times or more per second. Exceptions are made if the content flashed is substantially small with low contrast and in accordance with basic flash thresholds.

Motion parallax effects and animations

  • Parallax effects are encouraged to be used sparingly with regards to total usage allowances, the parallax amount within each separate effect, and the size of the space affected.
  • All features and content inside parallax scrolling content should be keyboard accessible.
  • The wording contrast against a moving background needs to be 3 to 1 for bold or large text; or at least 4.5 to 1 for small text.

Background animations and videos

  • Content deemed important cannot be articulated through background animations and videos. Exceptions are made if users can regulate playback and have access to necessary descriptive audio, transcripts, or captions.
  • A strategy needs to be presented to hide, stop, or pause any background animations or videos that start playing by default and last at least 5 seconds.
  • A strategy should be on hand to restart and pause background animations and videos.
  • Background animations or videos against text needs to have a contrast of at least 3 to 1 for bold and large text; or 4.5 to 1 for small text.
  • Motion inside of background animations and videos should be limited.
  • Background animations and videos are discouraged from utilizing sounds.


  • Users should have the ability to hide, stop, or pause media content that starts playing by default and lasts for a minimum of 5 seconds.