How to implement a SPAM control countermeasure that remains accessible using the Stanford Web Application Toolkit

From Web Services Wiki

Jump to: navigation, search

Contents

Problem

You want to implement a SPAM control countermeasure in PHP that does not rely on CAPTCHAs, which often inconvenience and alienate legitimate users.

Solution

SWAT provides a class called StanfordForm which can help prevent different types of spambots from abusing your forms. This class includes a basic general error handling mechanism and automatic decision-making for ease of use. Please note that since this tool relies heavily on sessions, we do not suggest using it at all without enabling MySQL-based sessions. Using the default file-based sessions will result in randomly expired forms that will annoy and frustrate users.

Detecting SPAM

StanfordForm filters submissions using honeypots and timeouts. Honeypots are automatically generated upon calling the function get_antispam_code. The code returned by this function must be outputted within your form. Upon submission, if any of the honeypot fields have been modified or omitted, the form is marked as SPAM. Forms submitted too quickly or too late are also rejected, but instead of discarding the submission, the form is redisplayed so that potentially legitimate users may resend the data with a fresh form.

Skeleton code

We suggest copying the following skeleton code and building your form and PHP response on top of it. Simply create a new StanfordForm and then fill in the given blocks of code. There are four components:

  • should_allow_submission: Handle the submission (e.g. Do error checking and then save to DB, send via e-mail, etc)
  • should_display_success: Display a success message (e.g. "Thank you!")
  • should_display_errors: Display error messages (e.g. "Invalid e-mail address" or "Form submitted too quickly")
  • should_display_form: Display the form (either for the first time or redisplay when there are errors)

It is very important that the order of the four code blocks stays the same as shown in the code below, as error checking performed in should_allow_submission may impact the output of should_display_errors and also errors need to be displayed before a form, not below it.

// Include StanfordForm
include_once("stanford.form.php");
 
// Create a new StanfordForm
$form = new StanfordForm();
 
if($form->should_allow_submission() == true) {
 
  // In this block, the form has been submitted and has not been detected as SPAM
  // Here, you should perform your own error and sanity checking on each of the form fields as you normally would
  // When you encounter an error (e.g. invalid e-mail address), use $form->add_error_message("Descriptive error message") and 
  //  the error message will be shown atop a redisplayed form
 
  // Example:
  // if($_POST['email'] == '') {
  //   $form->add_error_message("E-mail address is empty");
  // }
 
  // When there are no errors, allow the submission (save to DB, send via e-mail, etc)
 
  // if($form->has_errors() == false) { /* Accept submission */ }
}
 
if($form->should_display_success() == true) {
 
  // Display a success message
  // This block is called when:
  //  a) The form has been submitted and there are no errors, or
  //  b) A spambot has been detected (displaying success tricks the bot into thinking it has exploiting your form)
 
}
 
if($form->should_display_errors() == true) {
 
  // Display any errors which occurred as an HTML list using the $form->display_errors() method
  // Possible errors include:  form timed out, form submitted too quickly, 
  //  or custom errors set by the programmer using $form->add_error_message("Message")
  // Alternatively, get the list using $form->get_errors() and display manually
 
  $form->display_errors();
 
}
 
if($form->should_display_form() == true) {
 
  // Display the form
  // IMPORTANT: Output the code obtained from $form->get_antispam_code() withing <form> tags, e.g.:
 
  echo "<p>Please fill out the following fields:</p>";
  echo "<form action='index.php' method='post'>";
  echo $form->get_antispam_code();
 
  // Output the rest of your form here
 
  echo "</form>";
 
}

Customizing form timeouts

By default, the minimum time to submit a form is 3 seconds and the maximum time is 24 hours. You may wish to change these values. Use the methods set_min_submission_time and set_timeout to manually set the timeouts.

// Set minimum submission time to 5 seconds
$form->set_min_submission_time(5);
 
// Set maximum submission time to 1 hour
$form->set_timeout(60*60);

StanfordForm also has a few predefined detection modes to make the process easier.

  • TOLERANT: 2 seconds min, 3 days max
  • MODERATE: 3 seconds min, 24 hours max (default)
  • STRICT: 5 seconds min, 1 hour max
  • NO_TIMEOUTS: timeouts are disabled (both set to zero)

Set any of these modes as follows:

$form->set_detection_mode(StanfordForm::STRICT);

Please note that the maximum allowed time to submit the form is limited by the lifetime of the PHP session. Once the session expires, so does the form (even when timeouts are disabled).

A full example

The following example illustrates many of the features provided by StanfordForm. It is a simple form containing two fields: first name and last name. When the form is submitted in less than 2 seconds, it is redisplayed with a message telling the user they submitted the form too quickly. After 24 hours, the form is expired and redisplayed similarly. If either of the fields are left blank, an error message tells the user to correct their mistake. On success, a custom thank you message is displayed. When any of the automatically generated invisible honeypot fields are modified or omitted, the submission is rejected but a thank you message is displayed to mislead the spambot.

<?php
 
// Include StanfordForm
include_once("stanford.form.php");
 
// Create a new StanfordForm
$form = new StanfordForm();
 
// Since this is a short form (first name and last name), set the minimum submission time to 2 seconds instead of 3.
$form->set_min_submission_time(2);
 
 
if($form->should_allow_submission() == true) {
 
  // Error checking
  if($_POST['first_name'] == '') {
    $form->add_error_message("Please enter a first name.");
  }
 
  if($_POST['last_name'] == '') {
    $form->add_error_message("Please enter a last name.");
  }
 
  // Save the form input
  if($form->has_errors() == false) {
    // Save to DB, send via e-mail, log to file, etc.
  }
 
}
 
if($form->should_display_success() == true) {
 
  // Display success/thank you message
  echo "<h3>Saved form input successfully!</h3>";
  echo "<p>Thanks, ", htmlspecialchars($_POST['first_name']), ", we hope you enjoy our site.</p>";
 
}
 
if($form->should_display_errors() == true) {
 
  // Display list of error messages
  $form->display_errors();
 
}
 
if($form->should_display_form() == true) {
 
  // Get the anti-spam code:
  $html_code = $form->get_antispam_code();
 
  // Display the form
  ?>
 
  <p>Please enter your name.</p>
  <form action="index.php" method="post">
 
  <? /* Important: Output the anti-spam code obtained from StanfordForm */ ?>
  <? echo $html_code; ?>
 
  <p>
    <label for="first_name">First name:</label><br/>
    <input type="text" name="first_name" id="first_name" value="<?=htmlspecialchars($_POST['first_name'])?>" />
  </p>
 
  <p>
    <label for="last_name">Last name:</label><br/>
    <input type="text" name="last_name" id="last_name" value="<?=htmlspecialchars($_POST['last_name'])?>" />
  </p>
 
  <input type="submit" name="submit" value="Submit" />
 
  </form>
 
  <?  
}
?>

Discussion

CAPTCHAs

A CAPTCHA is a type of Turing test designed to distinguish between humans and computers. CAPTCHAs are often used in forms to protect against spam. A common example of such a test requires that the user type the letters and numbers of a distorted image at the bottom of a form. They are effective in preventing spam, but they are not foolproof. In addition, CAPTCHAs irritate users, thus hindering user interaction, and they reduce the accessibility of a website. Screen readers used by the blind are unable to recognize image-based CAPTCHAs, and though audio-based solutions have been proposed, they are often impractical. We suggest using a more accessible solution to spam-filtering, such as the methods described in this recipe.

Honeypots

Honeypots are a sort of trap designed to trick malicious users and bots into revealing themselves. In this recipe, we are using text fields hidden by CSS which cannot be seen on most web browsers but are visible to less-sophisticated bots which do not understand stylesheets. If we assume that the fields are invisible to valid users and if we also display a warning on browsers that do not support CSS, then we may safely mark any message with a modified honeypot field as spam.

Types of spambots and countermeasures

Playback bots

A playback bot is often aided by a human who finds a form and records the names of the fields. This information is stored along with the URL of the form-handling script in a database. The spambot may then send requests to the vulnerable script without visiting the form, which results an automated process to easily send e-mail messages through the server on which the script resides.

Defeating playback bots

We can defend against playback bots by enforcing valid sessions and timeouts. When the form is loaded, a random form ID is generated. The ID is stored on the server (in a session) and on the client (in a hidden form field), and it maps the form to the rest of the data associated with it such as the current time, the time of expiration, and honeypot fields. When the form is submitted, if the server does not recognize the ID, the form has expired and the submission is rejected. The expiration time is also checked against the current time, and stale forms are not accepted. This method makes typical playback bots which simply post data to the script useless and presents difficulties to more sophisticated systems.

Form-filling bots

Form-filling bots work by filling out different combinations of fields in a form until a successful response is returned.

Defeating form-filling bots

It is possible to trick form-filling bots by using honeypots. A honeypot is a text field hidden by CSS that bots can see but humans cannot. If one of the hidden text fields is modified, we may safely assume that a bot filled out the form. In the case where CSS is disabled or a blind person is using a screen reader, we must warn people to not edit honeypots. When we do detect a bot using this method, it is important to not only discard the message, but also display a success message to make the bot think that it has found a winning combination. In addition to using honeypots, we may enforce a minimum time that must pass before submitting a form (a few seconds or more, depending on the size of the form), as bots are much faster than humans.

Can I customize the output of display_errors?

Yes. Errors are displayed using the following template:

<div class='{error_css_class}'>
  <p>{error_heading}</p>
  <ul>
    <li>Error 1</li>
    <li>Error 2</li>
    <li>Error 3</li>
  </ul>
</div>

There are two functions you may use to customize the output: set_error_heading and set_error_css_class.

// Customize error output
$form->set_error_heading("Oops!  The following errors prevented your form submission from being accepted:");
$form->set_error_css_class("my_css_class");
 
// Display errors
$form->display_errors();

You may then use a stylesheet to customize colors, fonts, margins, etc.

Personal tools