Reddit Crawler
A Python service that allows instant tracking of posts shared on desired subreddits.
Reddit Crawler
The Reddit Crawler is a Python application that crawls posts from a specified subreddit and stores them in a SQLite database. It also provides an API server to retrieve and filter the crawled posts.
Features
- Crawls posts from a specified subreddit using the Reddit API.
- Stores the crawled posts in a SQLite database.
- Provides an API server to retrieve and filter the stored posts.
- Supports error handling and logging mechanisms.
- Implements unit tests for different modules.
Missions
- Ability to login.
- Storage of crawled posts in the database.
- Real-time monitoring of posts.
- Serving of posts through the API.
- Testing of all written code.
- Dockerizing the application.
Requirements
- Python 3.9 or higher
- SQLite database
Installation
Clone the repository:
git clone <repository_url>
Change the directory
cd <repository_name>
Install the requirements
pip3 install -r requirements.txt
Create & Run following command to apply the schema database
sqlite3 reddit_posts.db < schema.sql
Usage
Initialize the SQLite database & Start crawling posts:
python3 main.py
Run the API server:
python3 api_server.py
Access the API at http://localhost:5001/posts
Docker Build
docker build -t [APP_NAME] .
docker run -p [host_port]:[container_port] [APP_NAME]
Configuration
- Reddit API credentials:
Set the CLIENT_ID, CLIENT_SECRET, and USER_AGENT variables in reddit_crawler.py to your Reddit API credentials.
Otherwise you will get an error: Error occurred during crawling: received 401 HTTP response
- Database configuration:
Modify the reddit_posts.db file path in database/database_handler.py if desired.
Testing
To run the unit tests, use the following command:
pytest
Filter and Sorting
Filter for posts containing a specific keyword in the title:
/posts?keyword=python
Sorting posts within a specific date range, include the start_date and end_date parameters:
/posts?start_date=2023-01-01&end_date=2023-06-30
Sorting posts with a minimum or maximum number of upvotes:
/posts?min_upvotes=10&max_upvotes=100