Literary Studies and the Digital Library: Exercise 2, Simple XML and PHP

Matthew L. Jockers
Stanford University

Part one, a simple xhtml file

Unlike some languages, HTML can be pretty darn sloppy, and the browsers that process HTML are, therefore, pretty darn forgiving of sloppy code, which is why even badly coded web pages will show up on the web. With that said, it's still best (for a number of good reasons) to write "clean" code--even though poorly formed HTML will often parse and look OK in a browser. For this course we will need to understand "XML" (eXtensible Markup Language) and to help us get started will learn a bit about "XHTML," which is a specialized type of XML. XHTML, unlike regular HTML, has to be "well-formed." We'll talk more about what "well-formed" means as the class progresses, but for now think of it the way you would think about the grammar of a sentence. In order to be well-formed, a sentence cannot be, for example, a fragment. It has to adhere to certain rules in order to be called a "sentence."

SIDE NOTE

In some respects, writing good code can be likened to writing clear prose. In addition to being grammatically correct, good prose is "elegant." One way of conveying a message is frequently more elegant than another. Samuel Taylor Coleridge spoke of good poetry as being made up of the best possible words in the best possible order; we could say the same for good code, and in writing good code, you should strive to "say" the most with as little as possible. In writing we'd call this an "economy of words."

END SIDE NOTE

In this exercise, you are going to write some text and then have it display on a web page. Simple? Well, yes, for the most part, but I'll complicate things a bit by asking you to validate your code--more on that in a minute. . .

In the xhtml file that you create, the first thing you need to do is to alert the parser (in this case a web browser) as to what sort of file it is going to be processing. This is what is called a "Document Prologue" or "Doctype Declaration." In this exercise the processor of the code is a web browser. Web browsers know to look for certain "reserved" characters and then act on that information. XHTML, HTML and XML all use angle brackets to enclose "tags"; the information in a tag is passed to the browser as a sort of "instruction."

So, you will begin your file with several bits of preliminary code that lay the foundation. First, you'll provide an XML declaration to notify the parser (browser) that the file is an XML document, and you'll include information in this declaration about the "character set" you are using.

SIDE NOTE

An XML declaration is not required in all XML documents; however XHTML document authors are strongly encouraged to use XML declarations in all their documents. Such a declaration is required when the character encoding of the document is other than the default UTF-8 or UTF-16 and no encoding was determined by a higher-level protocol. See--http://www.w3.org/TR/xhtml1/

END SIDE NOTE

Here is what the XML declaration looks like (notice that I have included some comments using the method described in exercise 1):

START xml CODE

                
<?xml version="1.0" encoding="UTF-8"?>
<!--Your Name, Exercise 2 last modified mm/dd/yyyy
my first xml file--> 

            

END xml CODE

Next, we add a bit more specific and more important information to the document prologue, infomation that says that we are going to be coding a file in HTML

START XHTML CODE

                
<?xml version="1.0" encoding="UTF-8"?>
<!--Your Name, Exercise 2 last modified mm/dd/yyyy
my first xml file-->
<!DOCTYPE html SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 

            

END XHTML CODE

For now we won't get bogged down in the details of what this code means. Do note though that it identifies the document type as "html" and includes a URI ("link") to a file called "xhtml1-transitional.dtd". A DTD is something you will learn more about later, but for now, think of it as a set of rules for how the corresponding XML file (or in this case xhtml file) may be constructed.

Now we have finished the prologue, and we are ready to insert our "root" element. The root element in this case is "html" or more specifically "<html>." Elements (sometimes called "tags") provide the main structure for our file, the skeleton upon which all our content is attached. They provide "markers" for the parser (browser) which translate into instructions.

To indicate that our file is an HTML file we next add the "<html>" tag. In addition to signaling to the browser that this is the start of an html document, we need to signal to the processor when the file ends. End tags in html are similar to start tags except for the use of a slash: "</html>."

If you haven't already done so, open oXygen and choose "New" from the file menu. Select "html" from the list, and you'll discover that oXygen auto-magically creates most of the code you need for this exercise. That's because oXygen is set up by default to write xhtml that conforms to the transitional dtd--cool! You'll want to add the XML declaration even though it is not required

START XHTML CODE

                
<?xml version="1.0" encoding="UTF-8"?>
<!--Your Name, Exercise 2 last modified mm/dd/yyyy 
my first xml file-->
<!DOCTYPE html SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <title></title>
    </head>
    <body>
    </body>
</html>

            

END XHTML CODE

Note that because we are writing XHTML (not simple, sloppy HTML), we are going to have what's called an "attribute" as part of our html element. The attribute "xmlns" that we have here (oXygen added it automatically) is called a "namespace" attribute, and it provides a simple method for qualifying the elements used in XML documents by associating them with namespaces identified by a URI. We'll learn more about this later, but you can have a look at This Page for more information.

Once we have the html open and close tags, we can insert a wide variety of other tags inside ("nested" within) these html tags; these nested tags give the processor (in this case the web browser) further information about what to display on the screen. There is, of course, a great deal of complex "stuff" that can occur inside these opening and closing tags; we'll keep it simple. Since we are going to write "valid" xhtml and are going to validate it using the w3c's transitional dtd, we are required to have both a "head" and a "body" tag inside of our document root, "html." So, after the html opening tag, we'll find a "head" tag and then a nested "title" element inside of "head". The contents of the title element are what get displayed in the title bar of your web browser. Next we need a "body" tag into which we will place our primary content:

START XHTML CODE

                
<?xml version="1.0" encoding="UTF-8"?>
<!--Your Name, Exercise 2 last modified mm/dd/yyyy
my first xml file-->
<!DOCTYPE html SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
            <head>
                        <title> My First xHtml exercise </title>
            </head>
            <body>
                        <p>Hello World</p>
            </body>
</html>

            

END XHTML CODE

Enter the code above into your editor, then save the file as "exercise2.1.html" using oXygen's "Save to URL" feature under the file menu. Save a copy via SFTP to an exercise2 (you will need to create one) subdirectory of your http://www.stanford.edu/~sunetid/cgi-bin/digHum directory. Now go to http://www.stanford.edu/~sunetid/cgi-bin/digHum/exercise2/exercise2.1.html and you should see the result in your browser window. You can then use the "view source" feature in your browser to see the "code" you wrote.

Next you might want to do something a bit more exciting, so add to the code you already wrote so that you have a second bit of text as follows

START XHTML CODE

                
<p>Hello World</p>
<p>My Name is Matt</p>

            

END XHTML CODE

The "p" tag works like a paragraph return. Save this revised file and refresh your browser window to see two lines saying "Hello World" and "My Name is ..."

In the world of XML we typically call these tags, "elements". <html> is an element as is <p>. We'll use the terms "tag" and "element" to mean the same thing. Elements and html tags, as noted above, can have "attributes". Attributes provide additional information that the processor may or may not know what to do with--why this is the case is a discussion for another day. For now, let's assume that our processor knows what to do. We are going to add an "align" attribute to the "p" tag so that our second line goes to the center of the browser window. Edit your text as follows and then save it again to your cgi space. Now open it in a browser (or hit "reload" if it is already open) to see how it looks

START XHTML CODE

                
<?xml version="1.0" encoding="UTF-8"?>
<!--Your Name, Exercise 2 last modified mm/dd/yyyy
my first xml file-->
<!DOCTYPE html SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!--root element-->
<html xmlns="http://www.w3.org/1999/xhtml">
            <head>
                        <title> My First xHtml exercise </title>
            </head>
            <body>
                        <p>Hello World</p>
                        <p align="center">My Name is Matt</p>
            </body>
            <!--end of file-->
</html>

            

END XHTML CODE

As was noted above, html can be sloppy, but xhtml can't be. oXygen has a variety of built in features to ensure that we write well formed code. But you may also test your code via the w3c web service. To see if it is valid, add the following code to your file before the closing "body" tag:

START XHTML CODE

                
               

<a href="http://validator.w3.org/check?uri=referer"> <img style="border:0;width:88px;height:31px" src="http://www.w3.org/Icons/valid-xhtml10" alt="Valid XHTML 1.0!" /> </a>

END XHTML CODE

Now when you save this file to the server and call it up in your web browser, it will have a W3C xhtml icon which you can click on to check the validity of your code. You should follow that link and review the report.

That's the end of part one. You have created a simple xhtml file. To reinforce what you have learned here, you should have some fun and play with some of the other tags and attributes that are available in html. Save a copy of your exercise2.1.html file as exercise2.2.html and modify it using some other html elements. There are thousands of html tutorials available on the web--just do a google search and you'll find all sorts. Have a look around, you might want to try http://www.cwru.edu/help/introHTML/toc.html

Part Two: php

Unlike html which uses

<!-- and -->

to enclose comments, php uses the slash asterisk...asterisk slash

/* */

to enclose comments. If you just have a short comment you can use a double slash

//

to comment out a single line; here is an example:

START php CODE

                
//this is a short comment
/*this is a much longer comment
that might go on for several lines*/

            

END php CODE

In part two we are going to write some simple php code to output a line of text to our browser. In this case, the processing is done on the server and is then passed to the user's browser; a process called "server-side processing." In HTML, processing occurs in the user's browser (client-based processing); the browser interprets the code. With php, the processing is done by the server (in this case leland where itss has installed the php software). There are great advantages to server-side processing as you'll soon discover.

Just like HTML, we begin a php file by alerting the processor as to what kind of file we are sending it. There are several ways to begin a php page, and depending on the php installation on the server, one style might work better than another. You can begin the file with

START php CODE

                
<?
//and end it with
?>
//or you can use the more formal
<?php
//and
?>

            

END php CODE

For our purposes either style will work. So, begin by creating a new file in your oXygen editor. When the window pops up asking you what type of file to create, chose "php." Next add the php opening and closing tags to the new file. Since other processors (unlike html browsers) can be very picky about the "well-formedness" of your code, it's a good idea to always create your end tag right after you create your opening tag. If you get caught up in the fun of coding, you may forget to insert the end tag, and then your file may not parse, and you'll get an error. So, begin your file as I have done below:

START php CODE

                
<?php

?> 

            

END php CODE

Now that we have the open and close tag, we want to pass the parser some instructions that will then be output to a browser. The first thing we are going to do is create a type of container (called a "variable") where we can store some information. To create a variable in php, we use a dollar sign followed by a variable name. Variable names can be just about anything, but they can't begin with a number character or have spaces. If you want a space, you can use an underscore. In the next line, I create a variable and by using the "=" sign. Now I put something in it; in this case, I put the words (or what is called a "string" in programming lingo) "Hello World" into the variable called "$my_first_variable." Below the variable assignment, I include a bit of commented information to explain what I have done

START php CODE

                
<?php

/***Yourname, Exercise 2, part two, last updated mm/dd/yyyy****/

$my_first_variable = "Hello World";

/*English Translation: I have put the string 
'Hello World' into the variable titled $my_first_variable*/

?>

            

END php CODE

Notice that I put a semicolon ";" at the end of the line. This is another php convention that signals the end of a section of code, the end of an argument. Not all php statements end this way, we'll learn more about that later. But for now, remember to end all your php instructions with a semicolon. Now that I have a variable and some content, I can do all sorts of "stuff" with it. For example, I can "print" it. Don't get confused by the word "print," I don't mean print in the sense of "to the printer in the other room." Print simply means "show" it. See line below:

START php CODE

                
<?php

/***Yourname, Exercise 2, part two, last updated mm/dd/yyyy****/
$my_first_variable = "Hello World";
/*English Translation: I have put the string 
"hello world" into the variable titled my_first_variable*/
print $my_first_variable;

?>

            

END php CODE

If you copy the code above and save it to your exercise2 directory on the server (using the "Save to URL" option in the oXygen file menu) as "exercise2.3.php", you can then open (or "load") the file in your browser by entering
http://www.stanford.edu/~sunet_id/cgi-bin/digHum/exercise2/exercise2.3.php
The result in your browser should be the words "Hello World"

To make things a bit more fun, let's create a second variable called "my_second_variable" and put a second "string" into it. We'll then instruct the processor to "print" it too. See below:

START php CODE

                
<?php

/***Yourname, Exercise 2, part two, last updated mm/dd/yyyy****/

$my_first_variable = "Hello World";
/*English Translation: I have put the string 
"hello world" into the variable titled my_first_variable*/

$my_second_variable = "I am writing php code.";
print $my_first_variable;
print $my_second_variable;
?>

            

END php CODE

The result should look like this "Hello WorldI am writing php code."

Note that the two lines are run together; that's because the strings are treated literally--i.e. there is no white space after the word "world" or before the word "I" and no whitespace gets printed. There are at least three ways you could fix this problem. Two of them involve adding whitespace to one or the other of the two strings we already have. What is a third possibility? How would you fix it? Figure it out and fix the file--please include a comment explaining how you did it.

You now know how to write simple, but valid, xhtml and simple php. Understanding that php can be used to return strings to the browser exactly as you create them, can you write some php code to return the two variables above and have them centered in the browser and have the whole thing be valid xhtml? Do this final exercise and save the file to your directory as "exercise2.4.php"

Challenge Exercise: Go to the string functions page of the php manual and poke around at what's on the menu. There are a variety of functions in php that can be used to manipulate strings. With a string already inside of a variable (e.g. $my_second_variable) you can apply a function to that variable and then print the result. See if you can figure out how to convert the contents of $my_second_variable into uppercase letters. If you succeed with that, save a new file as "exercise2.5" that performs the conversion and then feel free to have a go at using a few of the other string functions. HINT: clicking on the function names in the php manual will take you to some very useful descriptions and examples of how to use the functions.