HTML

Lecture Notes for CS 142
Fall 2010
John Ousterhout

  • Traditional approach to GUIs: pixel-centric (start with operations on individual pixels, build up to applications).
  • In contrast, the Web uses a document-centric approach:
    • "Display the following document"
    • No general-purpose pixel-level access
    • To enable applications, the Web provides a few special features (forms) plus the ability to modify the structure of the document on-the-fly using Javascript
  • Web documents formatted using HTML: HyperText Markup Language

Markup languages

  • Suppose this is the desired result:

Introduction

There are several good reasons for taking CS142: Web Applications:

  • You will learn a variety of interesting concepts.
  • It may inspire you to change the way software is developed.
  • It will give you the tools to become fabulously wealthy.
  • Start with the document's raw text:
    Introduction
    There are several good reasons for taking
    CS142: Web Applications:
    You will learn a variety of interesting concepts.
    It may inspire you to change the way software is developed.
    It will give you the tools to become fabulously wealthy.
    
  • Add markup tags, which provide additional information about the text:
    • Formatting information (<i> for italic)
    • Meaning of the text:
      • <h1> means top-level heading
      • <p> means paragraph
      • <ul><li> for unordered (bulleted) list
    • Additional content to display (e.g., <img>)
    <h2>Introduction</h2>
    <p>
    There are several good reasons for taking
    <i>CS142: Web Applications</i>:
    </p>
    <ul>
    <li>
    You will learn a variety of interesting concepts.
    </li>
    <li>
    It may inspire you to change the way software is developed.
    </li>
    <li>
    It will give you the tools to become fabulously wealthy.
    </li>
    </ul>
    

HTML

  • HTML: markup language for Web documents
  • In this class we will use XHTML, a subset of HTML that is more structured and regular (based on XML markup language).
  • Trivial "hello world" Web page:
    <?xml version="1.0" encoding="utf-8"?>
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
      <head>
        <title>Hello World</title>
      </head>
      <body>
        <p>Hello world!</p>
      </body>
    </html>
    
  • Basic syntax rules for XHTML:
    • Document: hierarchical collection of elements, starting with <html>
    • Element: start tag, contents, end tag
    • Every element must have an explicit start and end (but can use <foo /> as shorthand for <foo></foo>).
    • Start tags can contain attributes:
      <img src="face.jpg">
      <input type="text" value="94301" name="zip">
      <div class="header">
      
    • To display a literal < or > in a document, use entities:
      &lt; Displays <
      &gt; Displays >
      &amp; Displays &
      &quot; Displays "
      &nbsp; Nonbreaking space
      Many other entities are defined for special characters.
    • Whitespace is not significant except in a few cases.
  • XHTML document structure:
    • <?xml> and <!DOCTYPE> lines: indicate that this is an XHTML document, conforming to version 1.0 of the standard; use these lines verbatim in all the web pages you create for this class.
    • <html>: outermost element containing the document
    • <head>: header section containing miscellaneous things such as page title, CSS stylesheets, etc.
    • <body>: the main body of the document
    • XHTML defines numerous other tags for use inside <head> and <body>
  • How is HTML different from XHTML?
    • Supports the same tags, same features, but allows quirkier syntax
    • Can skip some end tags, such as </br>, </p>
    • Not all attributes have to have values:
      <select multiple>
      
    • Elements can overlap:
      <p><b>first</p><p>second</b> third</p>
      
    • Early browsers tried to "do the right thing" even in the face of incorrect HTML:
      • Ignore unknown tags
      • Carry on even with obvious syntax errors such as missing <body> or </html>
      • Infer the position of missing close tags
      • Guess that some < characters are literal, as in "What if x < 0?"
    • Not obvious how to interpret some documents (and browsers differed)
    • Users came to depend on browser quirks, so browsers couldn't change
  • XHTML uses a cleaner syntax, and the header lines declare the document types so browsers don't have to support quirks
  • XHTML is required for all projects in this class

HTML Tags to Learn

  • Only a small set of HTML tags are used commonly in Web applications. Here are the tags you will need to know for this class; to learn about these tags, go to http://www.w3schools.com/html/default.asp:
    <p> New paragraph
    <br> Force a line break within the same paragraph
    <h1>, <h2>, ... Headings
    <b>, <i> Boldface and italic
    <pre> Typically used for code: indented with a fixed-width font, spaces are significant (e.g., newlines are preserved)
    <title> Used in the <head> section to specify a title for the page, which will appear in the title bar for the browser window
    <img> Images
    <table>, <tr>, <td> Tables
    <ul>, <li> Unordered list (with bullets)
    <ol>, <li> Ordered list (numbered)
    <dl>, <dt>, <dd> Definition list
    <a href="..."> Hyperlink to another Web page
    <div> Used for grouping related elements, where the group occupies entire lines (there's a forced line break before and after the <div>)
    <span> Used for grouping related elements, where the group is within a single line (no forced line breaks)
    <frame>, <iframe> Rectangular region with its own separate URL; allows you to create pages that combine content from several different sources (mashups)
    <form>, <input>, <textarea>, <select>, ... Used to create forms where users can input data
    <script> Used to add Javascript to a page for dynamic behaviors
    <link> Used in the <head> section to include CSS stylesheets