Phishder

This project sets up a Dockerized FastAPI application that fetches phishing feed data from various sources (Phishtank, OpenPhish, and PhishStats) and stores it in a PostgreSQL database. The application provides endpoints to access the fetched feed data.

Table of Contents

Features

This project provides the following features:

Dockerized FastAPI Application: The application is containerized using Docker, allowing easy deployment and portability across different environments.
Phishing Feed Data Fetching: The application fetches phishing feed data from three different sources - Phishtank, OpenPhish, and PhishStats.
PostgreSQL Database: The fetched feed data is stored in a PostgreSQL database, ensuring data persistence and easy retrieval.
API Endpoints: The application provides various API endpoints to access the fetched feed data, including endpoints for Phishtank, OpenPhish, and PhishStats data.
Asynchronous Crawling: The application uses asynchronous processing to crawl and fetch data from the different sources, improving performance and responsiveness.
Configuration Management: The application's settings are managed in a settings.toml file, making it easy to customize and configure different aspects of the application.
Automated Data Updates: The application automatically updates the feed data at regular intervals (every 10 seconds), ensuring the data is up-to-date.
Data Schema Validation: The application uses Pydantic schemas to validate and enforce the structure of the fetched feed data.
Threading for Data Crawling: Threading is employed to run the feed data crawling process in the background while the FastAPI server handles incoming requests.
Modular Structure: The project follows a modular structure, separating the application's components into distinct directories for better organization and maintainability.
FastAPI Framework: The application is built using the FastAPI framework, providing high-performance asynchronous web APIs with automatic validation and documentation support.
Requests to External APIs: The application uses the requests library to make HTTP requests and fetch data from the external feed sources.
BeautifulSoup for HTML Parsing: For the OpenPhish data source, the application uses BeautifulSoup to parse the HTML response and extract relevant feed data.
Error Handling: The application includes error handling mechanisms to handle issues with external API requests and database operations gracefully.
Unit Testing: The project includes a tests directory with unit tests to verify the correctness and functionality of critical components.

Missions

  • Use Poetry for dependency management in the project.
  • Utilize Docker Compose to manage the application and PostgreSQL database as separate containers.
  • Implement a FastAPI application with endpoints for CRUD (Create, Read, Update, Delete) operations.
  • Utilize SQLAlchemy as the ORM (Object-Relational Mapping) tool for interacting with the PostgreSQL database.
  • Avoid using while True loops in the application.
  • Refrain from hardcoding any credentials in the project.
  • Store global variables and configurations in a separate config.py file or consider using dynaconf for environment management.
  • Implement typing in the project to enhance code readability and maintainability.

Requirements

Technologies and tools to be used

Installation & Usage

  1. Clone the repository:

    git clone https://gitlab.com/c4pt-mqs/phishder.git
    
  2. Change the directory

    cd phishder
    
  3. Install the requirements

    poetry install --no-dev
    
  4. Run the app

    python3 main.py
    

Docker Build

docker-compose up --build -d

Once the containers are up and running, you can access the FastAPI application at http://0.0.0.0:1122.

API Endpoints

The following endpoints are available:

  1. GET /
  • Description: Root endpoint to check if the server is running.
  • Response: {"message": "Server is running!"}
  1. GET /phishtank_feed
  • Description: Fetches Phishtank feed data.
  • Query Parameters:
    • skip (optional): The number of records to skip (default is 0).
    • limit (optional): The maximum number of records to return (default is 10).
  • Response: An array of objects containing Phishtank feed data.
  1. GET /openphish_feed
  • Description: Fetches OpenPhish feed data.
  • Query Parameters:
    • skip (optional): The number of records to skip (default is 0).
    • limit (optional): The maximum number of records to return (default is 10).
  • Response: An array of objects containing OpenPhish feed data.
  1. GET /phishstats_feed
  • Description: Fetches PhishStats feed data.
  • Query Parameters:
    • skip (optional): The number of records to skip (default is 0).
    • limit (optional): The maximum number of records to return (default is 10).
  • Response: An array of objects containing PhishStats feed data.

Configuration

The application's settings are defined in the settings.toml file. You can modify this file to change the following configurations:

  • HOST: The host on which the FastAPI server runs.
  • PORT: The port on which the FastAPI server listens.
  • DATABASE_URL: The URL of the PostgreSQL database used to store feed data.
  • OPENPHISH_URL: The URL of the OpenPhish feed data source.
  • PHISHSTATS_URL: The URL of the PhishStats feed data source.
  • PHISHTANK_URL: The URL of the Phishtank feed data source.

Testing

pytest tests/

Data Sources

This application fetches phishing feed data from the following sources: