Code/Data for the paper: Baselines and Bigrams: Simple, Good Sentiment and Topic Classification
Code: scr_nb_acl12.zip
Data: data_nb_acl12 (404.4MB)
Very brief overview of the idea
The paper: compareacl.pdf
Slides of my talk given at ACL2012 pptx pdf
Follow up:
At least a couple more ACL/EMNLP 2012 papers also published results on these datasets that are not better than linear classifier with bigrams, highlighting the importance of establishing these baselines.
Instruction
Data, not including the huge IMDB dataset: data_nb_acl12 (108.5MB)
- Make sure liblinear is in the path, or modify the first line of master.m
- the directory structure should be your_folder/scr and your_folder/data
- Put the data directory in parallel with the code directory
- Run master to produce the results from the paper
- Results and detailed are logged in resultslog.txt and details.txt
- A table with all results will be printed to the screen after master completes
- folder misc containing various data processing code and other research code not quite cleanned up
- The data folder contains datasets collected by others, please cite the original sources if you work with them