HTML is the markup language that we use to write web pages. It’s understood by standard web browsers, as well as dozens of other types of "user agents", including mobile phones, search engine spiders, aural browsers etc.)

HTML consists of two types of things:

  • Tags
  • Text content

A few tags can be content of their own (like images, Flash movies, or metadata), but most HTML tags are used to apply structure to content.

Semantic HTML, or "semantically-correct HTML", is HTML where the tags used to structure content are selected and applied appropriately to the meaning of the content.

So, for example, if you’re wanting your HTML to be semantically-correct…

  • A <p></p> paragraph tag pair should only be used to indicate a paragraph (which is a structural concept). It should never be used to apply space to a web page. Never, ever, use a series of <p> tags to create space!
  • The HTML tags <b></b> (for bold), and <i></i> (for italic) should never be used, because they’re to do with formatting, not with the meaning or structure of the content. Instead, use the replacements <strong></strong> and <em></em> (meaning emphasis), which by default will turn text bold and italic (but don’t have to do so in all browsers), while adding meaning to the structure of the content.

Always separate style from content

HTML tags should never be used to apply presentation – that’s the job of CSS (Cascading Style Sheets). See http://webdesignfromscratch.com/how-html-css-js-work-together.cfm to learn more about how HTML, CSS and JavaScript fit together in web pages. (Note, perfect production practice also removes all JavaScript functions and event handlers from the markup as well!)

Why semantically correct HTML is better

Writing semantic HTML brings a wide range of benefits:

  • Ease of use
  • Accessibility
  • Search Engine Optimisation
  • Repurposing

Ease of use

First of all, semantic HTML is clean HTML. It’s much easier to read and edit markup that’s not littered with extra tags and inline styling. Clean markup also saves time and money when other people have to interact with it – say, a web developer who has to implement your page template in a content management system or any other web application.

A corollary benefit is that your HTML files are also smaller, so they load quicker.

Accessibility

Unless you’ve had to interact with HTML markup through media other than your web browser, it doesn’t seem obvious to imagine that your web pages have a life outside the browser window, but they very often do. Web pages can be consumed by humans and machines in lots of different ways!

When you separate visual aspects (i.e. style) from the actual meaning of a document, you end up with a document that always means the same thing. The way it’s presented or consumed can vary. One common technique web designers use is to apply different style sheets for different media. For example, you can apply a certain stylesheet only when a document is printed to paper, another one when it’s viewed on screen, and yet another when it’s accessed by a text-to-speech aural browser.

A text-to-speech reader also understands the tags <strong> or <em> but it treats text output with those tags very differently to the way a visual browser responds. The TTS reader adjusts vocal tone or volume, rather than contrast or text style, which conveys the same meaning but through a different medium.

Search Engine Optimisation

Search engine spiders and crawlers, like Googlebot, represent another genus of user agents. They also consume web page content, in an attempt to discern the meaning within.

When a crawler finds a web page, it stores its assessment of what the page is about on an indexed database to use when matching people’s search queries. The big question is – how do search engines match search terms to known pages to create a prioritised list?

Of course, they all do it a bit differently, but one of the keys to Search Engine Optimisation is to use plain old common sense. If you were a search engine, how would you do it? If you work through the problems a search engine faces, a few things soon become clear, often easily expressed prefixed with “all other things being equal…”.

Let’s say you have two web pages, each with exactly the same text content (10 kilobytes).

One of the pages has an additional 5KB of HTML markup, neatly annotating the semantic meaning in the content.

The second page has 30KB of additional markup, with inline styles, lots of nested <div> tags, and decorative imagery.

Now, the more graphically intense page might look better to human visitors (might!), but if each page contains the search term "bluebottle" 5 times, which would you (pretending to be a search engine) judge was most relevant to someone searching for “bluebottle”?

Clearly, it’s the first, more lightweight page, for a few possible reasons:

  1. The keyword density of the lightweight page is greater. It features the search term five times in 15KB of markup, whereas the second page features it five times in 40KB of markup. Whatever the additional markup is for (the search engine might not be able to tell), it doesn’t seem to be about “bluebottle”.
  2. Each occurrence of the search term is likely to be higher up towards the start of the document in the lightweight page than it is in the 40KB page. All other things being equal, the earlier you find a search term within a document, it’s more likely that the document is about that term, or the term is more prominent in the document’s content.
  3. Assuming that the first document is neatly marked up with semantically correct HTML, it’s more likely that the search term will be placed inside a higher-value tag (such as a heading, or link) than in a more graphical page (which might use an image as a link, perhaps without a proper alt attribute).

Repurposing

When your markup (content, with meaning) is separated from your styles (style sheets for different media), obviously the content can be understood more easily by all user agents. That means not only user agents you already know about, but ones you don’t yet know about (like automated crawlers that create custom RSS news feeds on a certain topic, or image- or video-specific search engines), as well as others that have not yet been invented!

The last couple of years have seen mixing and mashing content emerge as a major feature of new web sites and applications. This can happen without the knowledge of the original site owner, but in most cases this freedom of content to move around the web, adapting to various media, is beneficial to the original creator.

Often in these situations, the content taken from a web page is formatted differently on the new remixed page, which makes it all the more important to remove any style content from the markup itself. (Note that inline styles, applied directly within HTML tags, override any other styles implemented through separate stylesheets, and so they would have to be stripped off programatically.)

Clearly, it’s easier to grab and re-use content from any source, and apply it to any medium, when it does not contain any hard-coded style information, and also when it does contain semantic markup that can help a computer program understand the meaning and structure of the content.

Read more in our “Guide to Semantic HTML” e-book (£5.00)

Ben Hunt has published a e-book which also provides:

  • Comprehensive list of HTML tags, which to avoid and which ones to use, each with their semantically appropriate uses
  • Tips for writing better semantic HTML, including a neat DHTML trick for automatically wrapping HTML tags in new tags
  • A thorough worked example, in which Ben works through a fresh design and creates a semantically-correct HTML page, explaining the decision-making process at each step

Get “Guide to Semantic HTML” now for £5.00

Make Better Web Pages!

How to make your web site sell - use my secrets

Find out How

Do you love our approach to crafting simple & effective web sites that just work for people?

We'd love to hear about your web strategy.

Contact one of our team today!

2 Comments

  1. Mike Birch says:

    There is a case for using i and b elements, as well as strong and em:
    http://www.456bereastreet.com/archive/200711/posh_plain_old_semantic_html/

Leave a comment

Articles + tutorials in HTML / CSS

Complete List of HTML/xHTML Tags, With Guide to Proper Semantic Use
My Complete List of HTML/xHTML Tags, With Guide to their Proper Semantic Use
Semantic HTML Handbook – Benefits of Writing Semantic HTML
Free article on Semantic HTML. Why you should learn Semantic HTML. Benefits for SEO and code reuse explained.
A Few Tips and Tricks to Write Better Semantic HTML
Tips and tricks for writing better semantic HTML or xHTML from a professional web producer
Building a web page with HTML & CSS for complete beginners
Learn what HTML is and how to build a website from scratch. A guide to creating a web page using HTML and CSS for people with no prior knowledge
Keeping your content in order of priority with flexible CSS layouts.
How to keep your most important content at the top of your page, no matter what kind of column layout you wish to achieve.
Web page production with xHTML and CSS (ebook)
Introduction to Semantic HTML
A Basic Guide to writing semantically correct semantic HTML or xHTML markup
Anatomy of HTML/xHTML tags
HTML basics tutorial: Learn HTML tags, covering the most common attributes
Introduction to Cascading Style Sheets (CSS)
Beginner's introduction to Cascading Style Sheets (CSS), learn CSS.
Making a Tabular list in HTML
Create an appealing tabular list using HTML, CSS and JavaScript
How HTML, CSS and JavaScript work together in web pages
Best practice for using HTML, Cascading Style Sheets, and JavaScript together to make web pages.
HTML Tables – when and how to use tables in HTML
When to use tables in HTML, and how to do it properly
Block vs Inline display style in CSS
An article explaining the differences between block and inline display property in CSS. Gives examples of CSS Block vs Inline CSS and how they are applied in CSS/HTML files and pages.
Datasheet-style form using HTML and CSS
Make a datasheet-style web form using HTML, CSS and JavaScript
Inheritance and Cascading Styles in CSS Explained
Introduction to CSS inheritance and how styles apply in CSS through inheritance and cascading. Read this guide detailed guide on using CSS inheritance.
HTML Lists: unordered, ordered and definition lists
My guide to HTML lists explains the 3 main types of lists web browsers support and how to implement them in HTML / xHTML
HTML, how to code HTML/xHTML markup
Free HTML tutorials: Learn top-quality web page production skills using HTML/xHTML markup
Using CSS (Cascading Style Sheets)
Free tutorial on learning CSS for web design and development
Introduction to HTML – basics of HTML
Introduction to basic HTML tags and the structure of HTML documents.
© Scratchmedia Limited, 2006-2010
Floor 3, 111 Buckingham Palace Road, London, SW1W 0WQ, UK
+44 (0)207 1600 989