Reddit Crawler

The Reddit Crawler is a Python application that crawls posts from a specified subreddit and stores them in a SQLite database. It also provides an API server to retrieve and filter the crawled posts.


  • Crawls posts from a specified subreddit using the Reddit API.
  • Stores the crawled posts in a SQLite database.
  • Provides an API server to retrieve and filter the stored posts.
  • Supports error handling and logging mechanisms.
  • Implements unit tests for different modules.


  • Ability to login.
  • Storage of crawled posts in the database.
  • Real-time monitoring of posts.
  • Serving of posts through the API.
  • Testing of all written code.
  • Dockerizing the application.


  • Python 3.9 or higher
  • SQLite database


  1. Clone the repository:

    git clone <repository_url>
  2. Change the directory

    cd <repository_name>
  3. Install the requirements

    pip3 install -r requirements.txt

Create & Run following command to apply the schema database

sqlite3 reddit_posts.db < schema.sql


Initialize the SQLite database & Start crawling posts:


Run the API server:


Access the API at http://localhost:5001/posts

Docker Build

docker build -t [APP_NAME] .
docker run -p [host_port]:[container_port] [APP_NAME]


  • Reddit API credentials:

Set the CLIENT_ID, CLIENT_SECRET, and USER_AGENT variables in to your Reddit API credentials.

Otherwise you will get an error: Error occurred during crawling: received 401 HTTP response

  • Database configuration:

Modify the reddit_posts.db file path in database/ if desired.


To run the unit tests, use the following command:


Filter and Sorting

Filter for posts containing a specific keyword in the title:


Sorting posts within a specific date range, include the start_date and end_date parameters:


Sorting posts with a minimum or maximum number of upvotes: