Opcharist

Opcharist is a FastAPI-based application designed to analyze text extracted from images for sensitive data. It supports both image URLs and file uploads, utilizes Redis for caching, and is containerized using Docker Compose for easy deployment.

Features

  • Sensitive Data Analysis: Detects various types of sensitive information within text extracted from images.
  • OCR Integration: Uses Tesseract OCR engine for text extraction.
  • Caching: Implements Redis to cache analysis results, improving performance.
  • Containerization: Docker Compose setup for streamlined deployment.
  • Dependency Management: Utilizes Poetry for managing Python dependencies.

Technologies Used

  • Python 3.11
  • FastAPI
  • Docker Compose
  • Poetry
  • Redis (for caching)

Installation

Prerequisites

  • Docker
  • Docker Compose
  • Poetry

Steps

  1. Clone the repository:

    git clone https://gitlab.com/c4pt-mqs/opcharist.git
    cd opcharist
    
  2. Install dependencies using Poetry:

    poetry install
    
  3. Run the application:

    uvicorn main:app --reload
    

    The application will be accessible at http://localhost:8000.

Using Docker Compose

To run the application using Docker Compose:

docker-compose up --build -d

This will start the FastAPI application along with a Redis container.

API Endpoints

  • GET /: Health check endpoint to verify the server is running.
  • POST /analyze/: Analyze text extracted from an image URL.
  • POST /upload/: Upload an image file and analyze the extracted text.

Expected HTTP Status Codes

  • HTTP 200 - Successful: Operation completed successfully.
  • HTTP 204 - No Content: Image was read, but no content was found.
  • HTTP 400 - Bad Request: Image could not be read due to an invalid format.

Sensitive Data Types Detected

  • PHONE_NUMBER
  • ID_NUMBER
  • CREDIT_CARD_NUMBER
  • PLATE
  • DATE
  • EMAIL
  • DOMAIN
  • URL
  • HASH
  • COMBOLIST

Example Responses

HTTP 200 - Successful

{
  "content": "Herkese selam! Bugün sizinle cc paylaşımı yapacağım. Artık burada daha aktif olmaya karar verdim. Daha fazla içeriğe sahip olmak için beni mutlaka takip edin. Paylaştığım cc'yi direkt kullanabilirsiniz, checklenmişti. 6011779370011770 Daha fazla içerik için telegram kanalımı da takip edebilirsiniz https://t.me/foo Özelden ulaşmak isteyenler için mail adresim foo_bar06@gmail.com",
  "status": "successful",
  "findings": [
    {
      "value": "6011779370011770",
      "type": "CREDIT_CARD_NUMBER"
    },
    {
      "value": "https://t.me/foo",
      "type": "URL"
    }
  ]
}

HTTP 200 - Successful

{
  "content": "Hidden Content foo@bar.com:barbaz foo.bar@baz.com:foobar 3.49 py",
  "status": "successful",
  "findings": [
    {
      "value": "foo@bar.com",
      "type": "EMAIL"
    },
    {
      "value": "foo.bar@baz.com",
      "type": "EMAIL"
    },
    {
      "value": "foo@bar.com:barbaz",
      "type": "COMBOLIST"
    },
    {
      "value": "foo.bar@baz.com:foobar",
      "type": "COMBOLIST"
    }
  ]
}

HTTP 200 - Successful

{
  "content": "05/06/2024 hazır olun arkadaşlar foo.com için DDOS vuruyoruz, bekliyoruz.",
  "status": "successful",
  "findings": [
    {
      "value": "2024-06-05 00:00:00",
      "type": "DATE"
    },
    {
      "value": "foo.com",
      "type": "DOMAIN"
    }
  ]
}

HTTP 204 - No Content

{}

HTTP 400 - Bad Request

{
  "status": "bad request. wrong file format."
}

License

This project is licensed under the GNU Affero General Public License (AGPL).