This is a very simple, quick and dirty attempt to make use of the NYT Articles API from within R.
It retrieves the publishing dates of articles that contain a query string and plots the number of articles over time, like this:

NYT Articles containing "Health Care Reform"

NYT Articles containing "Health Care Reform"

The How-To:

There are a currently few limitations on the NYT Article API.

  1. Results are in JSON format only. XML may be available in the future. In order to handle this format we take advantage of the RJSONIO package from omegahat, which needs to be installed from source.
  2. Maximal 10 records are being returned per query.
  3. The total amount of queries is limited to 10/sec and 5,000/day per API key.

To run the code below (free) registration for an API key is required.

R:
  1. # Need to install from source http://www.omegahat.org/RJSONIO/RJSONIO_0.2-3.tar.gz
  2. # then load:
  3. library(RJSONIO)
  4.  
  5. ### set parameters ###
  6. api <- "XXXX" ###### <<<API key goes here!!
  7.  
  8. q <- "health+care+reform" # Query string, use + instead of space
  9. records <- 500 # total number of records to return, note limitations above
  10.  
  11. # calculate parameter for offset
  12. os <- 0:(records/10-1)
  13.  
  14. # read first set of data in
  15. uri <- paste ("http://api.nytimes.com/svc/search/v1/article?format=json&query=", q, "&offset=", os[1], "&fields=date&api-key=", api, sep="")
  16. raw.data <- readLines(uri, warn="F") # get them
  17. res  <- fromJSON(raw.data) # tokenize
  18. dat <- unlist(res$results) # convert the dates to a vector
  19.  
  20. # read in the rest via loop
  21. for (i in 2:length(os)) {
  22. # concatenate URL for each offset
  23. uri <- paste ("http://api.nytimes.com/svc/search/v1/article?format=json&query=", q, "&offset=", os[i], "&fields=date&api-key=", api, sep="")
  24. raw.data <- readLines(uri, warn="F")
  25. res  <- fromJSON(raw.data)
  26. dat <- append(dat, unlist(res$results))  # append
  27. }
  28.  
  29. # aggregate counts for dates and coerce into a data frame
  30. cts <- as.data.frame(table(dat))
  31.  
  32. # establish date range
  33. dat.conv <- strptime(dat, format="%Y%m%d") # need to convert dat into POSIX format for this
  34. daterange <- c(min(dat.conv), max(dat.conv))
  35. dat.all <- seq(daterange[1], daterange[2], by="day") # all possible days
  36.  
  37. # compare dates from counts dataframe with the whole data range
  38. # assign 0 where there is no count, otherwise take count
  39. # (take out PSD at the end to make it comparable)
  40. dat.all <- strptime(dat.all, format="%Y-%m-%d")
  41. # cant' seem to be able to compare Posix objects with %in%, so coerce them to character for this:
  42. freqs <- ifelse(as.character(dat.all) %in% as.character(strptime(cts$dat, format="%Y%m%d")), cts$Freq, 0)
  43.  
  44. plot (freqs, type="l", xaxt="n", main=paste("Search term(s):",q), ylab="# of articles", xlab="date")
  45. axis(1, 1:length(freqs), dat.all)
  46. lines(lowess(freqs, f=.2), col = 2)

No TweetBacks yet. (Be the first to Tweet this post)