Tweeter Project Introduction

Lecture Notes for CS 190
Spring 2015
John Ousterhout

What Tweeter Does

  • Twitter-like Web service: Tweeter
    • Users identified with 64-bit unique integers
    • Friendships:
      • If you are my friend, then I am your follower
    • Status updates (tweets)
      • Associated with the user who creates them
      • Visible to followers of that user
      • Each tweet has a 64-bit unique identifier
      • Tweet ids indicate order: higher id => later tweet
  • Applications access Tweeter using HTTP and JSON:
    • HTTP protocol for requests and responses
    • Requests described with URLs, query values, form data
    • Structured data returned in JSON format
    • Web services interface only: no HTML output
  • Each request described with a URL that specifies an operation and parameters, such as:
    http://localhost:8080/friendships/create?my_id=100&user_id=200
    
    Fields of URL:
    • Scheme (http:): identifies protocol used to fetch the content.
      • http: is the most common scheme and is the only one used in this class.
    • Host name (//host.company.com or //localhost): name of the machine running the desired server.
    • Server's port number (8080): allows multiple servers to run on the same machine. Normal Web servers usually run on port 80 (the default).
    • Hierarchical portion (/friendships/create): identifies a particular request, such as create a new friendship.
    • Query info (?my_id=100&user_id=200): provides parameters for the request
    • URL encoding:
      • If a query value contains any character other than A-Z, a-z, 0-9, or any of -_.~ it must be represented as %xx, where xx is the hexadecimal value of the character.
        • " " becomes %20
        • "&" becomes %26, etc.
        • Example:
          /statuses/update?status=Stuck%20in%20traffic
          
        • When extracting information from URLs, you must reverse the escaping.
  • Requests are sent to servers using HTTP: HyperText Transfer Protocol
  • Simple request-response protocol, sent using TCP/IP sockets.
  • Sample request (see slide):
    • First line contains method, URL (hierarchical portion and query), version number
      • GET method: read information from server. Should have no side effects.
      • POST method: uploads data from the browser to the server (typically form data), returns information from the server. Likely to have side effects.
      • There are several other methods defined besides these two, but they are not needed for this project.
    • Headers: name-value pairs providing various information that may be useful to the server.
    • A request can also contain data following the headers
      • GET method doesn't have any data
      • POST method may include additional parameters in body, in the same format as in URLs (my_id=100&user_id=200)
  • Sample response (see slide):
    • First line contains protocol version number, numerical status code, textual explanation.
    • Headers have same general format as for requests
    • Blank line separates headers from response data.
    • Response body will be in JSON format for this class.
  • Tweeter responses are in JSON format (JavaScript Object Notation):
    • Commonly used to transmit structured data in Web applications
    • String format that describes a JavaScript literal value:
      • Simple values: strings, numbers
      • Arrays
      • Objects (collections of named values)
    • See http://json.org/ for details
  • URLs/operations (see slide)
    • Users are identified by 64-bit integers
    • Each tweet has a 64-bit unique identifier that you must assign, in order
    • Create and delete follower relationships:
      • /friendships/create
      • /friendships/destroy
    • Query follower relationships
      • /friends/ids.json
      • /followers/ids.json
    • Create tweets:
      • /statuses/update
    • Retrieve recent tweets:
      • /statuses/user_timeline.json
      • /statuses/home_timeline.json

Implementing Tweeter

  • Most important goal: clean, simple, obvious code
  • Must be well commented
  • Write in Java (suggest using Eclipse)
  • Teams of two
  • Must build your own mechanisms for implementing the HTTP protocol and for generating JSON.
    • Do not use existing Java libraries
    • Can use basic Java classes such as Socket, InputStreamReader, and OutputStreamReader.
    • If in doubt about what existing classes you may use, ask me.
  • Durability
    • Must store user info and tweets in files
    • Must recover information from files if the server crashes and restarts
    • See project writeup for more details.
    • Tweets are never deleted.
  • Consider a few performance issues in your design:
    • Overall, application must have "reasonable" performance.
      • Must handle millions of users
      • Must handle millions of tweets per day.
    • Users may have large numbers of followers (millions), but unlikely for users to have more than a few hundred friends.
    • Important to cache frequently used information in memory:
      • Assume 100-200GB of DRAM
      • Can keep all user info in memory all the time
      • Can keep millions of tweets in memory (but not all tweets for all time)
    • Entire application runs on a single server.
  • Escaping: special characters must be encoded, such as "&" or "=" in query values or "\" or in JSON strings.
    • HTTP requests (query values, form data): URL encoding (already described).
    • JSON strings: you must escape special characters (backslash, double-quote, control characters such as newline and tab, any character with a value less than 32, and the character with value 0x3f)
      • E.g., replace "\" with "\\"
      • E.g., replace character code 1 with "\x01"