Reddit Crawler

The Reddit Crawler is a Python application that crawls posts from a specified subreddit and stores them in a SQLite database. It also provides an API server to retrieve and filter the crawled posts.

Features

  • Crawls posts from a specified subreddit using the Reddit API.
  • Stores the crawled posts in a SQLite database.
  • Provides an API server to retrieve and filter the stored posts.
  • Supports error handling and logging mechanisms.
  • Implements unit tests for different modules.

Missions

  • Ability to login.
  • Storage of crawled posts in the database.
  • Real-time monitoring of posts.
  • Serving of posts through the API.
  • Testing of all written code.
  • Dockerizing the application.

Requirements

  • Python 3.9 or higher
  • SQLite database

Installation

  1. Clone the repository:

    git clone <repository_url>
    
  2. Change the directory

    cd <repository_name>
    
  3. Install the requirements

    pip3 install -r requirements.txt
    

Create & Run following command to apply the schema database

sqlite3 reddit_posts.db < schema.sql

Usage

Initialize the SQLite database & Start crawling posts:

python3 main.py

Run the API server:

python3 api_server.py

Access the API at http://localhost:5001/posts

Docker Build

docker build -t [APP_NAME] .
docker run -p [host_port]:[container_port] [APP_NAME]

Configuration

  • Reddit API credentials:

Set the CLIENT_ID, CLIENT_SECRET, and USER_AGENT variables in reddit_crawler.py to your Reddit API credentials.

Otherwise you will get an error: Error occurred during crawling: received 401 HTTP response

  • Database configuration:

Modify the reddit_posts.db file path in database/database_handler.py if desired.

Testing

To run the unit tests, use the following command:

pytest

Filter and Sorting

Filter for posts containing a specific keyword in the title:

/posts?keyword=python

Sorting posts within a specific date range, include the start_date and end_date parameters:

/posts?start_date=2023-01-01&end_date=2023-06-30

Sorting posts with a minimum or maximum number of upvotes:

/posts?min_upvotes=10&max_upvotes=100