HTML as a document model

A signed document with a pen sitting on top of it

When you understand things, there's no more magic.

Tim Burners-Lee, inventor of the world wide web

In my last post, I spoke about how web content is made with HTML and how to write your first lines of HTML code. I also mentioned that HTML is modeled after real-life documents.

In this post, I'd like to go a little deeper on HTML being a document model.

The purpose of the web

The original purpose of the web was to share scientific papers, reports and documentation among researchers over the internet1. In order for its creator, Tim Burners-Lee, to do that, he created HTML, a language used to create digital documents on the web.

This decision comes with a couple downsides (for instance, it's harder to make web applications than it is to make web documents etc.), but it has a couple advantages, the main one being that anyone who understood documents could learn HTML fairly quickly, as it was just a digital representation of a document.

So how do HTML documents map to real-life documents?

The document parallels

Let's go through each of the individual parts of a document and see how that connects to HTML.

The document itself

Of course, what makes a document is what is in the document, but a document's content has to be written on something. In our world today, that medium is paper.

In HTML, the "paper" that we write upon is the root element called <html>. Therefore, all proper HTML documents start with <html>:

<html></html>

The document head

The document head usually has some information that may be unrelated to the document itself but gives more information about the document.

A perfect example of this is a letter with the address at the top. The address is not needed to understand the letter, but it is helpful in getting the document to where it needs to be, and provides more information for those who are interested.

This parallel exists in HTML as the <head> element. As explained in the last paragraph, the <head> element contains metadata (or "data about data"). It usually appears at the (...you guessed it...) head of the document i.e. the top:

<html>
    <head></head>
</html>

When the HTML page is rendered by the browser, information in the head is not shown to the site visitor at all and is not really considered to be useful to visitor. I see it as a literal head; no one can read the thoughts inside your head2.

The only data that could possibly be seen from the <head> element is the <title>. Adding a title to <head> makes the title appear in the browser's tab title area:

<html>
    <head>
        <title>My Webpage</title>
    </head>
</html>

The document body

Next is your document's content. What do you want to tell the world? This is where the <body> element comes in:

<html>
    <head>
        <title>My Webpage</title>
    </head>
    <body></body>
</html>

Everything in the <body> is visible to the site visitor3.

Here is where you could divide your document into the header, the main content, and the footer:

<html>
    <head>
        <title>My Webpage</title>
    </head>
    <body>
        <header></header>
        <main></main>
        <footer></footer>
    </body>
</html>

Note that this structure depends on what the document is. For example, if there is nothing to present in the <footer>, you could leave it out.

The document header

In HTML, <header> is different from <head>. One major distinction is that <header>s are always visible (unless you explicitly hide them) while the <head> is never visible.

The <header> represents what is at the top of your document and is also relevant to the document. An example of elements that may be created under the <header> are navigation bars (i.e. <nav>) and the top heading of the document (i.e. <h1>).

The <nav> element helps you to navigate the document; it contains links (or anchors i.e. <a>) for you to quickly access different parts of the document (or possibly link you to another document entirely). They also act as a table of contents revealing topics covered by the document.

The <h1> element is the heading of your document, giving a short title to the content. Unlike <title>, <h1> is shown on the page, usually at the top.

Updating our document4:

<html>
    <head>
        <title>My Webpage</title>
    </head>
    <body>
        <header>
            <nav>
                <ul>
                    <li><a href="#">Home</a></li>
                    <li><a href="#">About</a></li>
                    <li><a href="#">Contact</a></li>
                </ul>
            </nav>
            <h1>Welcome to my site</h1>
        </header>
        <main></main>
        <footer></footer>
    </body>
</html>

The document's main content

The document's main content comes right after the header. This contains the "meat and potatoes" of the site. This portion in HTML contains paragraphs (i.e. <p>), subheadings (i.e. <h2>, <h3> etc), bold text (i.e. <strong>), italicized text (i.e. <em>), highlighted text (i.e. <mark>) and another other tool used in writing to present information.

Adding this to our document:

<html>
    <head>
        <title>My Webpage</title>
    </head>
    <body>
        <header>
            <nav>
                <ul>
                    <li><a href="#">Home</a></li>
                    <li><a href="#">About</a></li>
                    <li><a href="#">Contact</a></li>
                </ul>
            </nav>
            <h1>Welcome to my site</h1>
        </header>
        <main>
            <p>This is where I start to <strong>talk</strong> about what <em>I</em> want <mark>to talk about</mark>.

            <h2>More things I would like to say</h2>
            <p>This is another topic.</p>

            <h2>Even more things I would like to say</h2>
            <p>This is another topic, again.</p>
        </main>
        <footer></footer>
    </body>
</html>

This is where you leave the last parts of your documents like links, copyright notices, privacy policies and other information that is not immediately relevant, but relevant nonetheless.

<html>
    <head>
        <title>My Webpage</title>
    </head>
    <body>
        <header>
            <nav>
                <ul>
                    <li><a href="#">Home</a></li>
                    <li><a href="#">About</a></li>
                    <li><a href="#">Contact</a></li>
                </ul>
            </nav>
            <h1>Welcome to my site</h1>
        </header>
        <main>
            <p>This is where I start to <strong>talk</strong> about what <em>I</em> want <mark>to talk about</mark>.

            <h2>More things I would like to say</h2>
            <p>This is another topic.</p>

            <h2>Even more things I would like to say</h2>
            <p>This is another topic, again.</p>
        </main>
        <footer>
            <h2>Welcome again to my site</h2>
            <p>Don't let the door hit you on the way out.</p>
            <p>&copy; My site forever</p>

            <h3>Other Links</h3>
            <ul>
                <li><a href="#">Photos</a></li>
                <li><a href="#">Videos</a></li>
                <li><a href="#">Marketplace</a></li>
            </ul>
        </footer>
    </body>
</html>

Bottom line

I hope this post gave you more of an idea of what I mean by "HTML is a document model"5. It mimics documents in so many ways. Understanding this drops the barrier to entry of web programming.


  1. The "internet" and the "web" are actually different. 

  2. However, if you would really like to see it, you can view it through DevTools. 

  3. Unless, of course, it's explicitly hidden. 

  4. Note that <ul> stands for unordered list and is used when presenting (...you guessed it...) unordered lists (i.e. lists with no numbering). There is also <ol> for ordered lists. Both lists contain list items (i.e. <li>). 

  5. Of course the example HTML code used in this post is oversimplified, but it drives the point home. 

If you would like to reply to or comment on this blog post, feel free to email me at efe@mmhq.me.