SMIL 2.0 became a W3C Recommendation in August 2001. SMIL 2.0 introduced a modular language structure that facilitated integration of SMIL semantics into other XML-based languages. Basic animation and timing modules were integrated into Scalable Vector Graphics (SVG) and the SMIL modules formed a basis for Timed-Text. The modular structure made it possible to define the standard SMIL language profile and the XHTML+SMIL language profile with common syntax and standard semantics.

SMIL 2.1 became a W3C Recommendation in December 2005. SMIL 2.1 includes a small number of extensions based on practical experience gathered using SMIL in the Multimedia Messaging System on mobile phones.

A SMIL document is similar in structure to an HTML document in that they are typically divided between an optional <head> section and a required <body> section. The <head> section contains layout and metadata information. The <body> section contains the timing information, and is generally composed of combinations of three main tags - sequential ("<seq>", simple playlists), parallel ("<par>", multi-zone/multi-layer playback) and exclusive ("<excl>", event-triggered interrupts). SMIL refers to media objects by URLs, allowing them to be shared between presentations and stored on different servers for load balancing. The language can also associate different media objects with different bandwidth requirements.

SMIL files take either a .smi or .smil file extension. However, SAMI files and Macintosh self mounting images also use .smi, which creates some ambiguity at first glance. As a result, SMIL files commonly use the .smil file extension to avoid confusion.

While RSS and Atom are web syndication methods, with the former being more popular as a syndication method for podcasts, SMIL is potentially useful as a script or playlist that can tie sequential pieces of multimedia together and can then be syndicated through RSS or Atom.[4][5] In addition, the combination of multimedia-laden .smil files with RSS or Atom syndication would be useful for accessibility to audio-enabled podcasts by the deaf through Timed Text closed captions,[6] and can also turn multimedia into hypermedia that can be hyperlinked to other linkable audio and video multimedia.[7]

VoiceXML can be combined with SMIL to provide a sequential reading of several pre-provided pages or slides in a voice browser, while combining SMIL with MusicXML would allow for the creation of infinitely-recombinable sequences of music sheets. Combining SMIL+VoiceXML or SMIL+MusicXML with RSS or Atom could be useful in the creation of an audible pseudo-podcast with embedded hyperlinks, while combining SMIL+SVG with VoiceXML and/or MusicXML would be useful in the creation of an automatically audio-enabled vector graphicsanimation with embedded hyperlinks.