Phisder
It fetches phishing feeds from some sources and stores it in a DB.
Phishder
This project sets up a Dockerized FastAPI application that fetches phishing feed data from various sources (Phishtank, OpenPhish, and PhishStats) and stores it in a PostgreSQL database. The application provides endpoints to access the fetched feed data.
Table of Contents
- Phishder
- Table of Contents
- Features
- Missions
- Requirements
- Installation & Usage
- Docker Build
- API Endpoints
- Configuration
- Testing
- Data Sources
Features
This project provides the following features:
Dockerized FastAPI Application: The application is containerized using Docker, allowing easy deployment and portability across different environments.
Phishing Feed Data Fetching: The application fetches phishing feed data from three different sources - Phishtank, OpenPhish, and PhishStats.
PostgreSQL Database: The fetched feed data is stored in a PostgreSQL database, ensuring data persistence and easy retrieval.
API Endpoints: The application provides various API endpoints to access the fetched feed data, including endpoints for Phishtank, OpenPhish, and PhishStats data.
Asynchronous Crawling: The application uses asynchronous processing to crawl and fetch data from the different sources, improving performance and responsiveness.
Configuration Management: The application's settings are managed in a settings.toml file, making it easy to customize and configure different aspects of the application.
Automated Data Updates: The application automatically updates the feed data at regular intervals (every 10 seconds), ensuring the data is up-to-date.
Data Schema Validation: The application uses Pydantic schemas to validate and enforce the structure of the fetched feed data.
Threading for Data Crawling: Threading is employed to run the feed data crawling process in the background while the FastAPI server handles incoming requests.
Modular Structure: The project follows a modular structure, separating the application's components into distinct directories for better organization and maintainability.
FastAPI Framework: The application is built using the FastAPI framework, providing high-performance asynchronous web APIs with automatic validation and documentation support.
Requests to External APIs: The application uses the requests library to make HTTP requests and fetch data from the external feed sources.
BeautifulSoup for HTML Parsing: For the OpenPhish data source, the application uses BeautifulSoup to parse the HTML response and extract relevant feed data.
Error Handling: The application includes error handling mechanisms to handle issues with external API requests and database operations gracefully.
Unit Testing: The project includes a tests directory with unit tests to verify the correctness and functionality of critical components.
Missions
- Use Poetry for dependency management in the project.
- Utilize Docker Compose to manage the application and PostgreSQL database as separate containers.
- Implement a FastAPI application with endpoints for CRUD (Create, Read, Update, Delete) operations.
- Utilize SQLAlchemy as the ORM (Object-Relational Mapping) tool for interacting with the PostgreSQL database.
- Avoid using while True loops in the application.
- Refrain from hardcoding any credentials in the project.
- Store global variables and configurations in a separate config.py file or consider using dynaconf for environment management.
- Implement typing in the project to enhance code readability and maintainability.
Requirements
Technologies and tools to be used
- Poetry (https://python-poetry.org/docs/)
- pytest (https://docs.pytest.org/)
- Docker Compose (https://docs.docker.com/compose/)
- FastAPI (https://fastapi.tiangolo.com/)
- SQLAlchemy ORM (https://docs.sqlalchemy.org/en/20/orm/)
- PostgreSQL (https://www.postgresql.org/docs/)
Installation & Usage
Clone the repository:
git clone https://gitlab.com/c4pt-mqs/phishder.git
Change the directory
cd phishder
Install the requirements
poetry install --no-dev
Run the app
python3 main.py
Docker Build
docker-compose up --build -d
Once the containers are up and running, you can access the FastAPI application at http://0.0.0.0:1122.
API Endpoints
The following endpoints are available:
- GET /
- Description: Root endpoint to check if the server is running.
- Response: {"message": "Server is running!"}
- GET /phishtank_feed
- Description: Fetches Phishtank feed data.
- Query Parameters:
- skip (optional): The number of records to skip (default is 0).
- limit (optional): The maximum number of records to return (default is 10).
- Response: An array of objects containing Phishtank feed data.
- GET /openphish_feed
- Description: Fetches OpenPhish feed data.
- Query Parameters:
- skip (optional): The number of records to skip (default is 0).
- limit (optional): The maximum number of records to return (default is 10).
- Response: An array of objects containing OpenPhish feed data.
- GET /phishstats_feed
- Description: Fetches PhishStats feed data.
- Query Parameters:
- skip (optional): The number of records to skip (default is 0).
- limit (optional): The maximum number of records to return (default is 10).
- Response: An array of objects containing PhishStats feed data.
Configuration
The application's settings are defined in the settings.toml file. You can modify this file to change the following configurations:
- HOST: The host on which the FastAPI server runs.
- PORT: The port on which the FastAPI server listens.
- DATABASE_URL: The URL of the PostgreSQL database used to store feed data.
- OPENPHISH_URL: The URL of the OpenPhish feed data source.
- PHISHSTATS_URL: The URL of the PhishStats feed data source.
- PHISHTANK_URL: The URL of the Phishtank feed data source.
Testing
pytest tests/
Data Sources
This application fetches phishing feed data from the following sources:
- Phishtank: https://data.phishtank.com/data/online-valid.json
- OpenPhish: https://openphish.com/
- PhishStats: https://phishstats.info:2096/api/phishing