Formatting14 min read

Writing a 30-Second Commercial: The Audio/Video (A/V) Format Explained

Thirty seconds. Approximately 75 words of dialogue. Every image and word must earn its place. Why commercial scripts use a different format—and how to master it.

ScreenWeaver Logo
ScreenWeaver Editorial Team
March 17, 2026

A/V script format with video and audio columns; dark mode technical sketch, black background, thin white lines

Prompt: Dark Mode Technical Sketch, a commercial script in A/V format showing two columns—video descriptions on left, audio content on right—with timing markers, thin white hand-drawn lines, solid black background, high contrast, minimalist, no 3D renders, no neon colors --ar 16:9

Thirty seconds. That's 750 frames at 25fps. Approximately 75 words of dialogue if you want the audience to actually absorb what's being said. Less if there's music. Less if there's action. Less if you want moments to breathe.

Most screenwriters think in ninety-minute increments. A commercial writer thinks in thirty-second increments—and every second costs six figures to produce and millions to air. There is no room for indulgence. Every image, every word, every beat must earn its place.

The format used for commercials isn't the screenplay format you know. It's called A/V—Audio/Video—and it looks different because it serves a different purpose. A screenplay describes what an audience experiences linearly. An A/V script separates picture and sound into parallel tracks, allowing creatives, clients, and production teams to evaluate each element independently.

If you've never written A/V, it feels strange at first. Where's the dramatic prose? Where's the white space? But once you understand the logic, you'll see that A/V is elegant—a format designed for the specific constraints of advertising, where time is money and ambiguity is unacceptable.


What A/V Format Actually Looks Like

A standard A/V script divides the page into two columns:

Left column: VIDEO. This describes everything the audience sees. Camera angles, shot compositions, on-screen text, actor actions, visual effects. Written in present tense, usually in capital letters or standard sentence case depending on agency style.

Right column: AUDIO. This describes everything the audience hears. Dialogue, voiceover, music cues, sound effects. Often includes timing notes.

The two columns run in parallel. The reader can trace picture and sound simultaneously, understanding exactly how they sync.

A sample A/V script might look like this:


VIDEOAUDIO
OPEN on a kitchen. Morning light. A WOMAN (30s) pours coffee.(SFX: Coffee pouring)
She takes a sip. Smiles.(MUSIC: Warm acoustic guitar begins)
CUT TO: Same woman at her desk. Laptop open. She types confidently.VO: "Every morning deserves a great start."
SUPER: "Sunrise Coffee. Wake up to better."VO: "Sunrise Coffee. Wake up to better."
Logo appears. Product shot beside the logo.(MUSIC: Button)

Notice several things: the visuals are spare but specific. The audio cues include SFX (sound effects), MUSIC, VO (voiceover), and SUPER (superimposed text). Everything is timed to fit thirty seconds.


Why A/V Instead of Screenplay Format?

Commercials serve a different master than films. The client—the brand paying for the spot—needs to approve every element before production. They don't read screenplays fluently; they need to see exactly what's happening on screen and exactly what's being said.

A/V format provides that clarity. The client can scan the VIDEO column and see every image. They can scan the AUDIO column and see every word. Nothing is buried in prose. Nothing is ambiguous.

This also serves production. The director, DP, editor, and sound designer can each focus on their column. The editor can cut picture to the VIDEO column while the sound mixer mixes to the AUDIO column. Synchronization is built into the document.

A/V format isn't less creative than screenplay format. It's more transparent—which is exactly what advertising requires.


The Anatomy of a 30-Second Spot

Thirty seconds is both very short and precisely structured. Most commercials follow a recognizable pattern:

0:00–0:05 — The Hook

You have five seconds to stop the viewer from skipping or looking away. This is usually a striking image, a provocative question, or a moment of immediate relevance. "You" language often appears here: "You've tried everything..."

0:05–0:20 — The Body

The core message. This is where the product or service is introduced, demonstrated, or explained. Visuals show the product in action or show the benefit. Voiceover delivers the value proposition.

0:20–0:25 — The Pivot

The emotional turn. This is where the message becomes personal or aspirational. "Imagine a morning where..." or "Now you can..."

0:25–0:30 — The Tag

The brand moment. Logo, tagline, call to action. This is what the viewer is meant to remember. Often accompanied by a musical "button"—a short, memorable sting.

Not every commercial follows this structure exactly, but most do. The constraint of thirty seconds imposes discipline.


Writing the VIDEO Column

The VIDEO column describes images, not cinematography. You're not directing—you're conveying what the viewer sees so that everyone (client, director, editor) shares a vision.

Be specific but not over-technical. Write "CU on her hands as she opens the box" rather than "50mm lens, shallow depth of field, dolly push." The director makes technical choices; you make visual choices.

Use standard abbreviations. CU (close-up), WS (wide shot), MS (medium shot), ECU (extreme close-up), POV (point of view). These are understood across the industry.

Indicate supers. Any on-screen text is a SUPER (superimposition). Write it exactly as it should appear: "SUPER: '50% more effective.'"

Describe action in present tense. "She opens the door. Sees the surprise. Her face lights up."

Include transitions. CUT TO, DISSOLVE TO, FADE TO BLACK. These help the reader understand pacing.


Writing the AUDIO Column

The AUDIO column contains everything heard: dialogue, VO (voiceover), SFX (sound effects), and MUSIC cues.

Voiceover is king. Most commercials rely on voiceover to deliver the message. Write VO dialogue tight—every word counts. Read it aloud. Time it. If you write eighty words for a thirty-second spot, it won't fit.

Dialogue is rare but impactful. If characters speak on screen (not VO), mark it clearly with the character name. "WOMAN: 'Finally, a coffee that doesn't taste like regret.'"

Music cues are tonal directions. You don't need to specify a song; you specify a feel. "MUSIC: Upbeat indie pop, building energy." The music supervisor handles licensing.

Sound effects add texture. "SFX: Door slam. Coffee pouring. Keyboard typing." These are listed as they sync with visuals.

The button matters. Most commercial music ends with a "button"—a short musical punctuation that signals the end. Indicate it: "MUSIC: Button."


A/V script with timing annotations showing a 30-second breakdown; dark mode technical sketch, thin white lines on black background

Prompt: Dark Mode Technical Sketch, an A/V script page with timing annotations (0:00, 0:10, 0:20, 0:30) marked in the margins, thin white lines, black background, minimalist, no 3D renders --ar 16:9

Word Count and Timing: The Math You Need to Know

Average speaking pace for commercials: 2.5–3 words per second for comfortable delivery. Faster feels rushed; slower feels languid.

30 seconds of VO: Maximum 75–90 words. Realistically, 60–75 if there's music and pauses.

15 seconds of VO: Maximum 35–45 words.

Music and SFX reduce word budget. If your spot opens with five seconds of music-only, you've lost fifteen words. Budget accordingly.

Supers compete with VO. If there's text on screen while someone's talking, the viewer has to split attention. Either simplify the super or pause the VO.

Here's a sample word budget for a 30-second spot:

SectionDurationWord Budget
Hook (visual-driven, minimal VO)5 sec10 words
Body (product message)15 sec40 words
Pivot (emotional turn)5 sec15 words
Tag (logo + tagline)5 sec10 words
Total30 sec75 words

Three Scenarios: Different Commercial Types

Scenario A: Product Demo

A cleaning product that removes stains. The spot shows before/after demonstrations.

VIDEO column: Heavy on product shots. CU on stain. Product applied. Stain disappears. Before/after split screen.

AUDIO column: VO explains the product benefit. "Watch the toughest stains vanish in seconds." Music is upbeat but not distracting.

Key challenge: Making the demo visually interesting without feeling like an infomercial.


Scenario B: Emotional Brand Spot

A car brand selling not the car but the feeling—freedom, adventure, family connection.

VIDEO column: Cinematic imagery. A family driving through mountains. Kids laughing in the backseat. Sunset. Minimal product shots until the tag.

AUDIO column: VO is sparse or absent. Music does the emotional work. If there's VO, it's philosophical: "The road doesn't care where you've been. Only where you're going."

Key challenge: Making the brand feel earned. If the emotion doesn't connect to the product, the spot fails.


Scenario C: Humor/Disruption

A snack brand using absurdist humor to stand out. The concept is that the snack is so good, it makes people act ridiculous.

VIDEO column: Setup appears normal—office, meeting. Someone opens the snack. Chaos. People diving for it. Exaggerated reactions.

AUDIO column: Dialogue is comedic. "Is that... is that a FlavorBlast?" SFX for absurd punctuation—boing, record scratch. Music shifts from corporate to silly.

Key challenge: Landing the joke in thirty seconds. Setup must be instant; payoff must be clear.


The "Trench Warfare" Section: What Goes Wrong

Failure Mode #1: Too Many Words

The writer packs 100 words into thirty seconds. The VO sounds rushed. The message is lost.

How to Fix It: Time your script. Read aloud with a stopwatch. If it doesn't fit, cut words—not speed.

Failure Mode #2: Unclear Visual Logic

The VIDEO column jumps between unrelated images. The viewer can't follow the story.

How to Fix It: Every image should connect to the previous one. Cause and effect. Question and answer. Progression, not randomness.

Failure Mode #3: Buried Call to Action

The spot is charming, but the viewer doesn't know what to do. No website, no product name, no reason to act.

How to Fix It: The tag exists for a reason. Make the CTA clear: logo, tagline, URL or action prompt.

Failure Mode #4: Mismatched Tone

The visuals are warm and emotional; the VO is corporate and stiff. The spot feels schizophrenic.

How to Fix It: Write VIDEO and AUDIO as a unified experience. They should feel like the same commercial.

Failure Mode #5: Ignoring Client Language

The client has mandatory legal copy ("Offer valid through December 31"). The writer ignores it. The script gets rejected.

How to Fix It: Get mandatories upfront. Build them into the word budget. They're non-negotiable.


A comparison of a rejected script and revised script for the same commercial; dark mode technical sketch, thin white lines on black background

Prompt: Dark Mode Technical Sketch, two A/V script pages side by side—one marked "REJECTED" with red annotations, one marked "APPROVED" with cleaner content—thin white lines, black background, minimalist, no 3D renders --ar 16:9

Client Rounds: How the Script Evolves

Commercial scripts go through multiple revision rounds. Here's a typical process:

Round 1: Internal creative review. The agency creative team refines the script before the client sees it.

Round 2: Client presentation. The client (brand team) reviews. They give notes—often about messaging, tone, legal, or brand alignment.

Round 3: Revisions. The writer addresses client notes. Some notes are creative ("Make it funnier"); some are mandatory ("Add the disclaimer").

Round 4: Legal review. Legal and compliance review all claims. "50% more effective" needs substantiation. "The best coffee" might need to become "A better coffee."

Round 5: Pre-production script. The final approved script goes to production. Any further changes require new approvals.

A/V format helps this process because changes are easy to track. A client can say "I don't like the visual at 0:15" and everyone knows exactly what they mean.


The Perspective: Constraints Are Creative

Thirty seconds is not a limitation. It's a discipline.

Every great commercial writer learns to love the constraint. Because when you can't sprawl, you must choose. Every word is deliberate. Every image is intentional. The result, when it works, is a piece of communication that lands instantly and stays in memory.

A/V format supports this discipline. By separating picture and sound, it forces you to justify each element. Does this image earn its place? Does this word earn its place? If not, cut it.

Screenwriters who move into commercials often find it clarifying. The habits they develop—economy, precision, visual thinking—make their longer-form work better too.

Thirty seconds isn't small. It's concentrated. And concentration, in any medium, is craft.

[YOUTUBE VIDEO: A commercial writer walking through an A/V script from concept to final production, showing how the document evolved through client rounds and what changed between script and finished spot.]


Further reading:

  • For guidance on writing sizzle reels and rip-o-matics (which share A/V format conventions), see rip-o-matics and sizzle reels to sell tone.
  • If you're writing longer-form branded content, our pitch deck template covers how to structure visual presentations.
  • The Association of National Advertisers has resources on commercial production standards at ana.net{:rel="nofollow"}.

Continue reading

ScreenWeaver Logo

About the Author

The ScreenWeaver Editorial Team is composed of veteran filmmakers, screenwriters, and technologists working to bridge the gap between imagination and production.