Extract
This page tests the ability to syntactically pull urls out of a page. All of the URLs are in absolute form. The HTML has various combinations of whitespace, quotes, comments, and whatnot.
test dir
test dir without slash
a.html
no quotes
name tag
whitespace and name
whitespace
doofusmode
spaces
Oops!
ftp
mailto
address (noURL)
"http://www.stanford.edu/should_NOT_print2.html"