Reddit Crawler

A Python service that allows instant tracking of posts shared on desired subreddits.

https://gitlab.com/c4pt-mqs/reddit-crawler

Reddit Crawler

The Reddit Crawler is a Python application that crawls posts from a specified subreddit and stores them in a SQLite database. It also provides an API server to retrieve and filter the crawled posts.

Features

Crawls posts from a specified subreddit using the Reddit API.
Stores the crawled posts in a SQLite database.
Provides an API server to retrieve and filter the stored posts.
Supports error handling and logging mechanisms.
Implements unit tests for different modules.

Missions

Ability to login.
Storage of crawled posts in the database.
Real-time monitoring of posts.
Serving of posts through the API.
Testing of all written code.
Dockerizing the application.

Requirements

Python 3.9 or higher
SQLite database

Installation

Clone the repository:
```
git clone <repository_url>
```
Change the directory
```
cd <repository_name>
```
Install the requirements
```
pip3 install -r requirements.txt
```

Create & Run following command to apply the schema database

sqlite3 reddit_posts.db < schema.sql

Usage

Initialize the SQLite database & Start crawling posts:

python3 main.py

Run the API server:

python3 api_server.py

Access the API at http://localhost:5001/posts

Docker Build

docker build -t [APP_NAME] .
docker run -p [host_port]:[container_port] [APP_NAME]

Configuration

Reddit API credentials:

Set the CLIENT_ID, CLIENT_SECRET, and USER_AGENT variables in reddit_crawler.py to your Reddit API credentials.

Otherwise you will get an error: Error occurred during crawling: received 401 HTTP response

Database configuration:

Modify the reddit_posts.db file path in database/database_handler.py if desired.

Testing

To run the unit tests, use the following command:

pytest

Filter and Sorting

Filter for posts containing a specific keyword in the title:

/posts?keyword=python

Sorting posts within a specific date range, include the start_date and end_date parameters:

/posts?start_date=2023-01-01&end_date=2023-06-30

Sorting posts with a minimum or maximum number of upvotes:

/posts?min_upvotes=10&max_upvotes=100

Reddit Crawler

Reddit Crawler

Reddit Crawler

Features

Missions

Requirements

Installation

Create & Run following command to apply the schema database

Usage

Docker Build

Configuration

Testing

Filter and Sorting

Comments

Leave a Comment

Scan QR Code

Reddit Crawler

Reddit Crawler

Reddit Crawler

Features

Missions

Requirements

Installation

Create & Run following command to apply the schema database

Usage

Docker Build

Configuration

Testing

Filter and Sorting

Comments

Leave a Comment