Skip to content
Snippets Groups Projects
README.md 778 B
Newer Older
Ferdinand Schober's avatar
Ferdinand Schober committed
# Text Categorizer
Ferdinand Schober's avatar
Ferdinand Schober committed

## Prerequisites
- Up to date java installation (openjdk-16)
- sbt installed on the system and in environment variables
- Up to date version of Python3 installed
ferdinand-schober's avatar
ferdinand-schober committed
- `$pip install praw` to install the python reddit api wrapper library

## Downloading reddit submissions
- generate api token at [https://www.reddit.com/prefs/apps](https://www.reddit.com/prefs/apps)
- insert token in download.py
- `$ python3 ./downloads.py -s -c askreddit iama all`
ferdinand-schober's avatar
ferdinand-schober committed
- The -s flag tells the scipt to fetch the submission body, while the -c flag tells it to download the top comments.
ferdinand-schober's avatar
ferdinand-schober committed
## Usage
- `$ sbt "run /path/to/text-samples/ askreddit iama all"`
ferdinand-schober's avatar
ferdinand-schober committed

Thanks to [https://www.wordfrequency.info/coca.asp](https://www.wordfrequency.info/coca.asp) for
providing the corpus data.