Compare commits
1 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 730a5e7c69 |
@ -1,112 +0,0 @@
|
||||
# File Manager Enhancement Plan
|
||||
|
||||
This document outlines the plan to enhance the `my-app/utils/file_manager.py` script based on user feedback.
|
||||
|
||||
**Goals:**
|
||||
|
||||
1. Add support for loading configuration from a `config.yaml` file.
|
||||
2. Implement a new action (`--move-cold`) to move inactive ("cold") files from fast storage back to slow storage based on modification time.
|
||||
3. Add an `--interactive` flag to prompt for confirmation before moving files.
|
||||
4. Implement a new action (`--generate-stats`) to create a JSON file containing storage statistics (file counts, sizes by age) for both source and target directories.
|
||||
5. Calculate and log the total size of files being moved by the `--move-cold` action.
|
||||
|
||||
**Detailed Plan:**
|
||||
|
||||
1. **Configuration File (`config.yaml`):**
|
||||
* **Goal:** Allow users to define common settings in a YAML file.
|
||||
* **Implementation:**
|
||||
* Define structure for `config.yaml` (e.g., `~/.config/file_manager/config.yaml` or specified via `--config`).
|
||||
* Use `PyYAML` library (requires `pip install PyYAML`).
|
||||
* Modify `parse_arguments` to load settings, allowing command-line overrides.
|
||||
* Add `--config` argument.
|
||||
|
||||
2. **Move Cold Files Back (`--move-cold` action):**
|
||||
* **Goal:** Move files from fast (target) to slow (source) storage if inactive.
|
||||
* **Implementation:**
|
||||
* Add action: `--move-cold`.
|
||||
* Add argument: `--stale-days` (default 30, uses modification time `st_mtime`).
|
||||
* New function `find_stale_files(directory, days)`: Scans `target_dir` based on `st_mtime`.
|
||||
* New function `move_files_cold(relative_file_list, source_dir, target_dir, dry_run, interactive)`:
|
||||
* Similar to `move_files`.
|
||||
* Moves files from `target_dir` to `source_dir` using `rsync`.
|
||||
* Handles paths relative to `target_dir`.
|
||||
* Calculates and logs total size of files to be moved before `rsync`.
|
||||
* Incorporates interactive confirmation.
|
||||
|
||||
3. **Interactive Confirmation (`--interactive` flag):**
|
||||
* **Goal:** Add a safety check before moving files.
|
||||
* **Implementation:**
|
||||
* Add global flag: `--interactive`.
|
||||
* Modify `move_files` and `move_files_cold`:
|
||||
* If `--interactive` and not `--dry-run`:
|
||||
* Log files/count.
|
||||
* Use `input()` for user confirmation (`yes/no`).
|
||||
* Proceed only on "yes".
|
||||
|
||||
4. **Enhanced Reporting/Stats File (`--generate-stats` action):**
|
||||
* **Goal:** Create a persistent JSON file with storage statistics.
|
||||
* **Implementation:**
|
||||
* Add action: `--generate-stats`.
|
||||
* Add argument: `--stats-file` (overrides config).
|
||||
* New function `analyze_directory(directory)`:
|
||||
* Walks directory, calculates total count/size, count/size by modification time brackets.
|
||||
* Returns data as a dictionary.
|
||||
* Modify `main` or create orchestrator for `--generate-stats`:
|
||||
* Call `analyze_directory` for source and target.
|
||||
* Combine results with a timestamp.
|
||||
* Write dictionary to `stats_file` using `json`.
|
||||
* **(Optional):** Modify `--summarize-unused` to potentially use the stats file.
|
||||
|
||||
**Workflow Visualization (Mermaid):**
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
Start --> ReadConfig{Read config.yaml (Optional)}
|
||||
ReadConfig --> ParseArgs[Parse Command Line Args]
|
||||
ParseArgs --> ValidateArgs{Validate Args & Config}
|
||||
ValidateArgs --> ActionRouter{Route based on Action}
|
||||
|
||||
ActionRouter -- --generate-stats --> AnalyzeSrc[Analyze Source Dir]
|
||||
AnalyzeSrc --> AnalyzeTgt[Analyze Target Dir]
|
||||
AnalyzeTgt --> WriteStatsFile[Write stats.json]
|
||||
WriteStatsFile --> End
|
||||
|
||||
ActionRouter -- --move --> FindRecent[Find Recent Files (Source)]
|
||||
FindRecent --> CheckInteractiveHot{Interactive?}
|
||||
CheckInteractiveHot -- Yes --> ConfirmHot(Confirm Move Hot?)
|
||||
CheckInteractiveHot -- No --> ExecuteMoveHot[Execute rsync Hot (Source->Target)]
|
||||
ConfirmHot -- Yes --> ExecuteMoveHot
|
||||
ConfirmHot -- No --> AbortHot(Abort Hot Move)
|
||||
AbortHot --> End
|
||||
ExecuteMoveHot --> End
|
||||
|
||||
ActionRouter -- --move-cold --> FindStale[Find Stale Files (Target)]
|
||||
FindStale --> CalculateColdSize[Calculate Total Size of Cold Files]
|
||||
CalculateColdSize --> CheckInteractiveCold{Interactive?}
|
||||
CheckInteractiveCold -- Yes --> ConfirmCold(Confirm Move Cold?)
|
||||
CheckInteractiveCold -- No --> ExecuteMoveCold[Execute rsync Cold (Target->Source)]
|
||||
ConfirmCold -- Yes --> ExecuteMoveCold
|
||||
ConfirmCold -- No --> AbortCold(Abort Cold Move)
|
||||
AbortCold --> End
|
||||
ExecuteMoveCold --> End
|
||||
|
||||
ActionRouter -- --count --> FindRecentForCount[Find Recent Files (Source)]
|
||||
FindRecentForCount --> CountFiles[Log Count]
|
||||
CountFiles --> End
|
||||
|
||||
ActionRouter -- --summarize-unused --> SummarizeUnused[Summarize Unused (Target)]
|
||||
SummarizeUnused --> LogSummary[Log Summary]
|
||||
LogSummary --> End
|
||||
|
||||
ActionRouter -- No Action/Error --> ShowHelp[Show Help / Error]
|
||||
ShowHelp --> End
|
||||
```
|
||||
|
||||
**Summary of Changes:**
|
||||
|
||||
* New dependencies: `PyYAML`.
|
||||
* New command-line arguments: `--move-cold`, `--stale-days`, `--interactive`, `--generate-stats`, `--stats-file`, `--config`.
|
||||
* New functions: `find_stale_files`, `move_files_cold`, `analyze_directory`.
|
||||
* Modifications to existing functions: `parse_arguments`, `move_files`, `main`.
|
||||
* Introduction of `config.yaml` for settings.
|
||||
* Introduction of a JSON stats file for persistent reporting.
|
||||
@ -1,160 +0,0 @@
|
||||
# Plan refaktoryzacji integracji OpenRouter
|
||||
|
||||
## Cel
|
||||
Refaktoryzacja kodu w `resume_analysis.py` w celu eliminacji wszystkich zależności od OpenAI API i wykorzystania wyłącznie OpenRouter API, z poprawą obecnej implementacji połączenia z OpenRouter.
|
||||
|
||||
## Diagram przepływu zmian
|
||||
```mermaid
|
||||
graph TD
|
||||
A[Obecna implementacja] --> B[Faza 1: Usunięcie zależności OpenAI]
|
||||
B --> C[Faza 2: Refaktoryzacja klienta OpenRouter]
|
||||
C --> D[Faza 3: Optymalizacja obsługi odpowiedzi]
|
||||
D --> E[Faza 4: Testy i walidacja]
|
||||
|
||||
subgraph "Faza 1: Usunięcie zależności OpenAI"
|
||||
B1[Usuń importy OpenAI]
|
||||
B2[Usuń zmienne konfiguracyjne OpenAI]
|
||||
B3[Usuń logikę wyboru klienta]
|
||||
end
|
||||
|
||||
subgraph "Faza 2: Refaktoryzacja klienta OpenRouter"
|
||||
C1[Stwórz dedykowaną klasę OpenRouterClient]
|
||||
C2[Implementuj prawidłową konfigurację nagłówków]
|
||||
C3[Dodaj obsługę różnych modeli]
|
||||
end
|
||||
|
||||
subgraph "Faza 3: Optymalizacja obsługi odpowiedzi"
|
||||
D1[Ujednolicenie formatu odpowiedzi]
|
||||
D2[Implementacja lepszej obsługi błędów]
|
||||
D3[Dodanie walidacji odpowiedzi]
|
||||
end
|
||||
|
||||
subgraph "Faza 4: Testy i walidacja"
|
||||
E1[Testy jednostkowe]
|
||||
E2[Testy integracyjne]
|
||||
E3[Dokumentacja zmian]
|
||||
end
|
||||
```
|
||||
|
||||
## Szczegółowa implementacja
|
||||
|
||||
### 1. Dedykowana klasa OpenRouterClient
|
||||
|
||||
```python
|
||||
class OpenRouterClient:
|
||||
def __init__(self, api_key: str, model_name: str):
|
||||
self.api_key = api_key
|
||||
self.model_name = model_name
|
||||
self.base_url = "https://openrouter.ai/api/v1"
|
||||
self.session = requests.Session()
|
||||
self.session.headers.update({
|
||||
"Authorization": f"Bearer {api_key}",
|
||||
"HTTP-Referer": "https://github.com/OpenRouterTeam/openrouter-examples",
|
||||
"X-Title": "CV Analysis Tool"
|
||||
})
|
||||
|
||||
def create_chat_completion(self, messages: list, max_tokens: int = None):
|
||||
endpoint = f"{self.base_url}/chat/completions"
|
||||
payload = {
|
||||
"model": self.model_name,
|
||||
"messages": messages,
|
||||
"max_tokens": max_tokens
|
||||
}
|
||||
|
||||
response = self.session.post(endpoint, json=payload)
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
|
||||
def get_available_models(self):
|
||||
endpoint = f"{self.base_url}/models"
|
||||
response = self.session.get(endpoint)
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
```
|
||||
|
||||
### 2. Konfiguracja i inicjalizacja
|
||||
|
||||
```python
|
||||
def initialize_openrouter_client():
|
||||
if not OPENROUTER_API_KEY:
|
||||
raise ValueError("OPENROUTER_API_KEY is required")
|
||||
|
||||
client = OpenRouterClient(
|
||||
api_key=OPENROUTER_API_KEY,
|
||||
model_name=OPENROUTER_MODEL_NAME
|
||||
)
|
||||
|
||||
# Verify connection and model availability
|
||||
try:
|
||||
models = client.get_available_models()
|
||||
if not any(model["id"] == OPENROUTER_MODEL_NAME for model in models):
|
||||
raise ValueError(f"Model {OPENROUTER_MODEL_NAME} not available")
|
||||
logger.debug(f"Successfully connected to OpenRouter. Available models: {models}")
|
||||
return client
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to initialize OpenRouter client: {e}")
|
||||
raise
|
||||
```
|
||||
|
||||
### 3. Obsługa odpowiedzi
|
||||
|
||||
```python
|
||||
class OpenRouterResponse:
|
||||
def __init__(self, raw_response: dict):
|
||||
self.raw_response = raw_response
|
||||
self.choices = self._parse_choices()
|
||||
self.usage = self._parse_usage()
|
||||
self.model = raw_response.get("model")
|
||||
|
||||
def _parse_choices(self):
|
||||
choices = self.raw_response.get("choices", [])
|
||||
return [
|
||||
{
|
||||
"message": choice.get("message", {}),
|
||||
"finish_reason": choice.get("finish_reason"),
|
||||
"index": choice.get("index")
|
||||
}
|
||||
for choice in choices
|
||||
]
|
||||
|
||||
def _parse_usage(self):
|
||||
usage = self.raw_response.get("usage", {})
|
||||
return {
|
||||
"prompt_tokens": usage.get("prompt_tokens", 0),
|
||||
"completion_tokens": usage.get("completion_tokens", 0),
|
||||
"total_tokens": usage.get("total_tokens", 0)
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Obsługa błędów
|
||||
|
||||
```python
|
||||
class OpenRouterError(Exception):
|
||||
def __init__(self, message: str, status_code: int = None, response: dict = None):
|
||||
super().__init__(message)
|
||||
self.status_code = status_code
|
||||
self.response = response
|
||||
|
||||
def handle_openrouter_error(error: Exception) -> OpenRouterError:
|
||||
if isinstance(error, requests.exceptions.RequestException):
|
||||
if error.response is not None:
|
||||
try:
|
||||
error_data = error.response.json()
|
||||
message = error_data.get("error", {}).get("message", str(error))
|
||||
return OpenRouterError(
|
||||
message=message,
|
||||
status_code=error.response.status_code,
|
||||
response=error_data
|
||||
)
|
||||
except ValueError:
|
||||
pass
|
||||
return OpenRouterError(str(error))
|
||||
```
|
||||
|
||||
## Kolejne kroki
|
||||
|
||||
1. Implementacja powyższych klas i funkcji
|
||||
2. Usunięcie wszystkich zależności OpenAI
|
||||
3. Aktualizacja istniejącego kodu do korzystania z nowego klienta
|
||||
4. Dodanie testów jednostkowych i integracyjnych
|
||||
5. Aktualizacja dokumentacji
|
||||
@ -7,7 +7,9 @@
|
||||
"build": "next build --no-lint",
|
||||
"start": "next start",
|
||||
"lint": "next lint",
|
||||
"debug": "NODE_DEBUG=next node server.js"
|
||||
"debug": "NODE_DEBUG=next node server.js",
|
||||
"test": "pytest utils/tests/test_resume_analysis.py",
|
||||
"count_documents": "mongosh mongodb://127.0.0.1:27017/cv_summary_db --eval 'db.cv_processing_collection.countDocuments()'"
|
||||
},
|
||||
"dependencies": {
|
||||
"@ai-sdk/google": "^1.1.17",
|
||||
|
||||
Binary file not shown.
BIN
my-app/utils/__pycache__/resume_analysis.cpython-312.pyc
Normal file
BIN
my-app/utils/__pycache__/resume_analysis.cpython-312.pyc
Normal file
Binary file not shown.
@ -1,5 +0,0 @@
|
||||
source_dir: /mnt/archive_nfs
|
||||
target_dir: /mnt/local_ssd
|
||||
recent_days: 2
|
||||
stale_days: 45
|
||||
stats_file: /home/user/logs/file_manager_stats.json
|
||||
87
my-app/utils/default_openai.txt
Normal file
87
my-app/utils/default_openai.txt
Normal file
@ -0,0 +1,87 @@
|
||||
```json
|
||||
{
|
||||
"sections": {
|
||||
"Summary": {
|
||||
"score": 8,
|
||||
"suggestions": [
|
||||
"Consider adding specific achievements or metrics to highlight impact.",
|
||||
"Simplify language for clearer understanding."
|
||||
],
|
||||
"summary": "The summary provides a clear overview of the candidate's experience and roles in business analysis and IT management but can be improved by adding specific achievements to quantify their contributions.",
|
||||
"keywords": {
|
||||
"analityk": 3,
|
||||
"doświadczenie": 2,
|
||||
"architekt": 1,
|
||||
"manager": 1
|
||||
}
|
||||
},
|
||||
"Work Experience": {
|
||||
"score": 9,
|
||||
"suggestions": [],
|
||||
"summary": "The work experience section is detailed, presenting clear job roles, responsibilities, and contributions. It utilizes strong action verbs but could be enhanced with quantifiable results in some roles.",
|
||||
"keywords": {
|
||||
"analiz": 5,
|
||||
"biznesowy": 4,
|
||||
"systemowy": 4,
|
||||
"projekt": 4,
|
||||
"współpraca": 3,
|
||||
"wymagania": 2
|
||||
}
|
||||
},
|
||||
"Education": {
|
||||
"score": 8,
|
||||
"suggestions": [
|
||||
"Specify the graduation status for higher education.",
|
||||
"Consider listing any honors or relevant coursework."
|
||||
],
|
||||
"summary": "The education section is comprehensive, including degrees and specialized training, but it lacks mention of graduation status and could highlight additional relevant coursework.",
|
||||
"keywords": {
|
||||
"Politechnika": 2,
|
||||
"CISCO": 1,
|
||||
"Magisterskie": 1,
|
||||
"Inżynierskie": 1
|
||||
}
|
||||
},
|
||||
"Skills": {
|
||||
"score": 7,
|
||||
"suggestions": [
|
||||
"Categorize skills into technical and soft skills for clarity.",
|
||||
"Add more specific technologies or methodologies relevant to the roles applied for."
|
||||
],
|
||||
"summary": "The skills section is minimal and lacks depth. Categorizing skills can improve clarity and relevance, and including specific technologies or methodologies would strengthen the section.",
|
||||
"keywords": {
|
||||
"szkoleń": 4,
|
||||
"certyfikaty": 2,
|
||||
"prawo jazdy": 1
|
||||
}
|
||||
},
|
||||
"Certifications": {
|
||||
"score": 9,
|
||||
"suggestions": [],
|
||||
"summary": "The certifications section is strong, detailing relevant training and certifications that add credibility to the candidate's qualifications.",
|
||||
"keywords": {
|
||||
"certyfikat": 1,
|
||||
"szkolenie": 9
|
||||
}
|
||||
},
|
||||
"Projects": {
|
||||
"score": 6,
|
||||
"suggestions": [
|
||||
"Create a separate section for key projects with descriptions and outcomes.",
|
||||
"Highlight individual contributions to collaborative projects."
|
||||
],
|
||||
"summary": "The projects are mentioned informally within work experience; however, creating a dedicated section would better emphasize significant projects and achievements.",
|
||||
"keywords": {
|
||||
"projekt": 4,
|
||||
"wymagania": 2
|
||||
}
|
||||
}
|
||||
},
|
||||
"openai_stats": {
|
||||
"input_tokens": 2585,
|
||||
"output_tokens": 677,
|
||||
"total_tokens": 3262,
|
||||
"cost": 0.01308
|
||||
}
|
||||
}
|
||||
```
|
||||
@ -0,0 +1,19 @@
|
||||
{
|
||||
"choices": [
|
||||
{
|
||||
"message": {
|
||||
"content": "```json\n{\n \"sections\": {\n \"Summary\": {\n \"score\": 8,\n \"suggestions\": [\n \"Consider adding specific achievements or metrics to illustrate impact.\",\n \"Make the summary more concise by focusing on key strengths.\"\n ],\n \"summary\": \"The summary provides a brief overview of experience and roles but lacks specific accomplishments and is slightly verbose.\",\n \"keywords\": { \"analityk\": 3, \"doświadczenie\": 2, \"systemowy\": 2, \"technologicznych\": 1, \"menedżer\": 1 }\n },\n \"Work Experience\": {\n \"score\": 9,\n \"suggestions\": [],\n \"summary\": \"The work experience section is detailed and relevant, showcasing roles and responsibilities effectively, with clear job titles and dates.\",\n \"keywords\": { \"analityk\": 4, \"systemów\": 4, \"IT\": 6, \"projekty\": 4, \"współpraca\": 3 }\n },\n \"Education\": {\n \"score\": 7,\n \"suggestions\": [\n \"Provide dates for all educational entries for consistency.\",\n \"Consider adding any relevant coursework or projects to enhance completeness.\"\n ],\n \"summary\": \"The education section lists qualifications but lacks specific dates for every entry and does not include additional relevant details.\",\n \"keywords\": { \"studia\": 3, \"Politechnika\": 3, \"certyfikaty\": 1, \"sieci\": 1 }\n },\n \"Skills\": {\n \"score\": 8,\n \"suggestions\": [\n \"Group skills into categories (e.g., technical skills, soft skills) for clarity.\",\n \"Add specific software or tools to demonstrate technical expertise.\"\n ],\n \"summary\": \"The skills section summarizes capabilities but could benefit from organization and inclusion of specific skills relevant to jobs being applied for.\",\n \"keywords\": { \"techniczne\": 1, \"wiedza\": 1, \"umiejętności\": 1 }\n },\n \"Certifications\": {\n \"score\": 8,\n \"suggestions\": [\n \"Organize certifications in chronological order or by relevance.\",\n \"Include the dates of certifications for better context.\"\n ],\n \"summary\": \"The certifications are relevant but could be polished by adding organization and dates to enhance clarity.\",\n \"keywords\": { \"certyfikat\": 2, \"szkolenie\": 6, \"ITIL\": 2 }\n },\n \"Projects\": {\n \"score\": 6,\n \"suggestions\": [\n \"Provide more detail on individual projects, focusing on specific roles and outcomes.\",\n \"Include dates for project completion to establish a timeline.\"\n ],\n \"summary\": \"The projects section is present but lacks depth regarding specific responsibilities or results, making it less impactful.\",\n \"keywords\": { \"projekt\": 3, \"systemy\": 2, \"migrować\": 1 }\n }\n },\n \"openai_stats\": {\n \"input_tokens\": 1424,\n \"output_tokens\": 668,\n \"total_tokens\": 2092,\n \"cost\": 0.002092\n }\n}\n```",
|
||||
"role": "assistant"
|
||||
},
|
||||
"finish_reason": "stop",
|
||||
"index": 0
|
||||
}
|
||||
],
|
||||
"usage": {
|
||||
"prompt_tokens": 2668,
|
||||
"completion_tokens": 679,
|
||||
"total_tokens": 3347
|
||||
},
|
||||
"cost": 0,
|
||||
"model": "gpt-4o-mini-2024-07-18"
|
||||
}
|
||||
@ -0,0 +1,19 @@
|
||||
{
|
||||
"choices": [
|
||||
{
|
||||
"message": {
|
||||
"content": "```json\n{\n \"sections\": {\n \"Summary\": {\n \"score\": 8,\n \"suggestions\": [\n \"Consider elaborating on specific achievements or key projects to highlight impact.\",\n \"Include more quantifiable metrics to showcase successful outcomes.\"\n ],\n \"summary\": \"The summary provides a clear overview of the candidate's professional background and experience in business analysis and system architecture. It indicates substantial experience but lacks specific examples of accomplishments.\",\n \"keywords\": {\n \"Analityk\": 4,\n \"biznesowy\": 2,\n \"systemowy\": 2,\n \"doświadczenie\": 1,\n \"technologicznych\": 1\n }\n },\n \"Work Experience\": {\n \"score\": 9,\n \"suggestions\": [],\n \"summary\": \"The work experience section is comprehensive, detailing various roles and responsibilities across multiple companies. It demonstrates a strong background in the IT sector with clear responsibilities and contributions but could benefit from more quantifiable outcomes.\",\n \"keywords\": {\n \"analityk\": 6,\n \"systemów\": 5,\n \"projekt\": 4,\n \"współpraca\": 3,\n \"technologii\": 3,\n \"wymagań\": 2,\n \"usług\": 2\n }\n },\n \"Education\": {\n \"score\": 8,\n \"suggestions\": [\n \"Specify the dates for when the education was completed.\",\n \"Only include institutions that are directly relevant to the position being applied for.\"\n ],\n \"summary\": \"The education section lists relevant degrees and institutions, highlighting a solid academic background in technology and information systems. Adding completion dates could enhance clarity.\",\n \"keywords\": {\n \"studia\": 3,\n \"Politechnika\": 2,\n \"informatycznych\": 2,\n \"CISCO\": 1,\n \"specjalność\": 1\n }\n },\n \"Skills\": {\n \"score\": 7,\n \"suggestions\": [\n \"Add more technical skills that are specifically relevant to the industry.\",\n \"Provide a clearer structure, possibly categorizing hard and soft skills.\"\n ],\n \"summary\": \"The skills section is notably brief. It lists language proficiency but lacks a comprehensive enumeration of technical and soft skills essential for the role of a business analyst.\",\n \"keywords\": {\n \"angielski\": 1,\n \"niemiecki\": 1\n }\n },\n \"Certifications\": {\n \"score\": 9,\n \"suggestions\": [],\n \"summary\": \"The certifications are presented clearly, showing a variety of relevant courses and certifications. This indicates a commitment to professional development and continuous learning.\",\n \"keywords\": {\n \"certyfikat\": 2,\n \"szkolenie\": 8,\n \"ITIL\": 2,\n \"IBM\": 3\n }\n },\n \"Projects\": {\n \"score\": 6,\n \"suggestions\": [\n \"Include specific projects with concise descriptions and impacts.\",\n \"List projects in a structured format, summarizing outcomes and key learnings.\"\n ],\n \"summary\": \"The projects section is not explicitly defined and lacks specifics. While detailed experience is found in work experience, this section would benefit from a clear presentation of significant projects and their outcomes.\",\n \"keywords\": {}\n }\n },\n \"openai_stats\": {\n \"input_tokens\": 1244,\n \"output_tokens\": 646,\n \"total_tokens\": 1890,\n \"cost\": 0.002\n }\n}\n```",
|
||||
"role": "assistant"
|
||||
},
|
||||
"finish_reason": "stop",
|
||||
"index": 0
|
||||
}
|
||||
],
|
||||
"usage": {
|
||||
"prompt_tokens": 2668,
|
||||
"completion_tokens": 760,
|
||||
"total_tokens": 3428
|
||||
},
|
||||
"cost": 0.0,
|
||||
"model": "gpt-4o-mini-2024-07-18"
|
||||
}
|
||||
@ -0,0 +1,19 @@
|
||||
{
|
||||
"choices": [
|
||||
{
|
||||
"message": {
|
||||
"content": "```json\n{\n \"sections\": {\n \"Summary\": {\n \"score\": 8,\n \"suggestions\": [\n \"Consider adding specific achievements or metrics to quantify your impact.\",\n \"Refine language to be more concise and impactful.\"\n ],\n \"summary\": \"The summary provides a clear professional profile highlighting experience in business analysis and technology. However, it lacks specific achievements.\",\n \"keywords\": {\n \"Analityk\": 3,\n \"biznesowy\": 3,\n \"systemowy\": 3,\n \"doświadczenie\": 2,\n \"technologicznych\": 1\n }\n },\n \"Work Experience\": {\n \"score\": 9,\n \"suggestions\": [],\n \"summary\": \"The work experience section is comprehensive, detailing roles and responsibilities with an emphasis on contributions to projects. The use of bullet points enhances readability.\",\n \"keywords\": {\n \"analityk\": 4,\n \"programów\": 3,\n \"systemów\": 4,\n \"projektów\": 4,\n \"współpraca\": 3\n }\n },\n \"Education\": {\n \"score\": 8,\n \"suggestions\": [\n \"Specify the completion dates for each education entry.\",\n \"Include any honors or relevant courses to enhance detail.\"\n ],\n \"summary\": \"The education section lists relevant degrees and certifications, but lacks completion dates and additional achievements.\",\n \"keywords\": {\n \"studia\": 3,\n \"Politechnika\": 2,\n \"CISCO\": 1,\n \"certyfikat\": 1\n }\n },\n \"Skills\": {\n \"score\": 7,\n \"suggestions\": [\n \"List specific technical skills or tools you are proficient in.\",\n \"Group skills into categories for improved clarity.\"\n ],\n \"summary\": \"The skills section is minimal and lacks specificity. Adding more detailed skills related to business analysis and technology would be beneficial.\",\n \"keywords\": {\n \"analityka\": 1,\n \"systemowy\": 1\n }\n },\n \"Certifications\": {\n \"score\": 9,\n \"suggestions\": [],\n \"summary\": \"The certifications section is well-detailed and relevant, showcasing important qualifications for the field.\",\n \"keywords\": {\n \"certyfikat\": 1,\n \"szkolenie\": 5\n }\n },\n \"Projects\": {\n \"score\": 6,\n \"suggestions\": [\n \"Add specific project names and outcomes to illustrate contributions.\",\n \"Include metrics or results achieved in projects.\"\n ],\n \"summary\": \"The projects section is lacking, as it does not list projects explicitly or specify contributions. More detail could improve understanding of expertise.\",\n \"keywords\": {\n \"projekt\": 1,\n \"analiz\": 1\n }\n }\n },\n \"openai_stats\": {\n \"input_tokens\": 1318,\n \"output_tokens\": 509,\n \"total_tokens\": 1827,\n \"cost\": 0.002053\n }\n}\n```",
|
||||
"role": "assistant"
|
||||
},
|
||||
"finish_reason": "stop",
|
||||
"index": 0
|
||||
}
|
||||
],
|
||||
"usage": {
|
||||
"prompt_tokens": 2668,
|
||||
"completion_tokens": 658,
|
||||
"total_tokens": 3326
|
||||
},
|
||||
"cost": 0,
|
||||
"model": "gpt-4o-mini-2024-07-18"
|
||||
}
|
||||
@ -0,0 +1,19 @@
|
||||
{
|
||||
"choices": [
|
||||
{
|
||||
"message": {
|
||||
"content": "```json\n{\n \"sections\": {\n \"Summary\": {\n \"score\": 8,\n \"suggestions\": [\"Add specific metrics to quantify achievements.\", \"Clarify the type of industries and roles you are most experienced in.\"],\n \"summary\": \"The summary provides a brief professional profile, emphasizing business and system analysis experience. However, it lacks specific metrics or examples of achievements.\",\n \"keywords\": {\n \"analityk\": 3,\n \"doświadczenie\": 2,\n \"systemowy\": 2,\n \"architekt\": 1,\n \"manager\": 1\n }\n },\n \"Work Experience\": {\n \"score\": 9,\n \"suggestions\": [],\n \"summary\": \"The work experience section is comprehensive, detailing roles, responsibilities, and projects. Each role is clearly delineated, showcasing relevant experience and contributions.\",\n \"keywords\": {\n \"analityk\": 5,\n \"system\": 4,\n \"projekt\": 4,\n \"zespół\": 2,\n \"usługi\": 3,\n \"współpraca\": 2\n }\n },\n \"Education\": {\n \"score\": 8,\n \"suggestions\": [\"Add graduation dates for each educational experience.\", \"Clearly specify the fields of study.\"],\n \"summary\": \"The education section provides various qualifications, but it could benefit from specific graduation dates and clarification of study fields.\",\n \"keywords\": {\n \"Politechnika\": 2,\n \"studia\": 3,\n \"CISCO\": 1,\n \"magisterskie\": 1,\n \"inżynierskie\": 1\n }\n },\n \"Skills\": {\n \"score\": 7,\n \"suggestions\": [\"List both hard and soft skills explicitly.\", \"Include any technical skills relevant to the roles applied for.\"],\n \"summary\": \"The skills section needs improvement; it lacks a clear list of both hard and soft skills that could enhance the individual's candidacy.\",\n \"keywords\": {\n \"CRM\": 2,\n \"analiza\": 2,\n \"zrozumienie\": 1,\n \"systemowy\": 1,\n \"projektowanie\": 1\n }\n },\n \"Certifications\": {\n \"score\": 9,\n \"suggestions\": [],\n \"summary\": \"The certifications section is strong with relevant certifications listed, demonstrating a commitment to professional development.\",\n \"keywords\": {\n \"certyfikat\": 2,\n \"ITIL\": 2,\n \"szkolenie\": 5,\n \"IBM\": 3\n }\n },\n \"Projects\": {\n \"score\": 7,\n \"suggestions\": [\"Add more details about specific projects (e.g., outcomes, skills used).\", \"Highlight any leadership roles in projects.\"],\n \"summary\": \"The projects section is present but lacks depth; it could highlight key achievements and the impact of each project.\",\n \"keywords\": {\n \"projekt\": 4,\n \"systemowy\": 2,\n \"analiza\": 1,\n \"zespół\": 2\n }\n }\n },\n \"openai_stats\": {\n \"input_tokens\": 1291,\n \"output_tokens\": 566,\n \"total_tokens\": 1857,\n \"cost\": 0.004\n }\n}\n```",
|
||||
"role": "assistant"
|
||||
},
|
||||
"finish_reason": "stop",
|
||||
"index": 0
|
||||
}
|
||||
],
|
||||
"usage": {
|
||||
"prompt_tokens": 2668,
|
||||
"completion_tokens": 720,
|
||||
"total_tokens": 3388
|
||||
},
|
||||
"cost": 0.0,
|
||||
"model": "gpt-4o-mini-2024-07-18"
|
||||
}
|
||||
@ -0,0 +1,19 @@
|
||||
{
|
||||
"choices": [
|
||||
{
|
||||
"message": {
|
||||
"content": "```json\n{\n \"sections\": {\n \"Summary\": {\n \"score\": 8,\n \"suggestions\": [\n \"Make the summary more concise by focusing on key skills and achievements.\",\n \"Add specific examples of business analysis and architecture achievements.\"\n ],\n \"summary\": \"Strong professional summary indicating a solid background in business and system analysis with over 10 years of relevant experience, but lacks specific accomplishments.\",\n \"keywords\": {\n \"business analyst\": 1,\n \"system architect\": 1,\n \"manager\": 1,\n \"experience\": 1\n }\n },\n \"Work Experience\": {\n \"score\": 9,\n \"suggestions\": [],\n \"summary\": \"Detailed work experience in various roles with a focus on business analysis and IT management. Effective descriptions of responsibilities and contributions, although some job roles could highlight specific achievements more clearly.\",\n \"keywords\": {\n \"business analysis\": 5,\n \"system\": 6,\n \"IT\": 4,\n \"project\": 2,\n \"analysis\": 3,\n \"documentation\": 2\n }\n },\n \"Education\": {\n \"score\": 8,\n \"suggestions\": [\n \"Specify graduation dates for each educational qualification.\",\n \"Include any honors or distinctions received during studies.\"\n ],\n \"summary\": \"Solid educational background with relevant degrees and certifications in technology and electronics, but lacks detail on specific achievements or honors.\",\n \"keywords\": {\n \"degree\": 3,\n \"education\": 2,\n \"network associate\": 1\n }\n },\n \"Skills\": {\n \"score\": 7,\n \"suggestions\": [\n \"Expand on the range of technical and soft skills relevant to the positions sought.\",\n \"Organize skills into categories (e.g., Technical, Analytical, Interpersonal) for better clarity.\"\n ],\n \"summary\": \"Skills listed are somewhat general; better categorization and specificity could improve overall relevance.\",\n \"keywords\": {\n \"skills\": 1,\n \"analysis\": 2,\n \"communication\": 1\n }\n },\n \"Certifications\": {\n \"score\": 9,\n \"suggestions\": [],\n \"summary\": \"The section is well-structured and lists relevant certifications clearly, showcasing continuous professional development.\",\n \"keywords\": {\n \"certification\": 1,\n \"ITIL\": 1,\n \"CISCO\": 1\n }\n },\n \"Projects\": {\n \"score\": 6,\n \"suggestions\": [\n \"Provide more specific details about project outcomes or impacts.\",\n \"Highlight personal contributions or leadership roles in notable projects.\"\n ],\n \"summary\": \"Projects are mentioned but lack depth regarding impact and individual contributions. More concrete successes would strengthen the narrative.\",\n \"keywords\": {\n \"project\": 3,\n \"migration\": 1,\n \"implementation\": 1\n }\n }\n },\n \"openai_stats\": {\n \"input_tokens\": 2155,\n \"output_tokens\": 722,\n \"total_tokens\": 2877,\n \"cost\": 0.002877\n }\n}\n```",
|
||||
"role": "assistant"
|
||||
},
|
||||
"finish_reason": "stop",
|
||||
"index": 0
|
||||
}
|
||||
],
|
||||
"usage": {
|
||||
"prompt_tokens": 2668,
|
||||
"completion_tokens": 668,
|
||||
"total_tokens": 3336
|
||||
},
|
||||
"cost": 0.0,
|
||||
"model": "gpt-4o-mini-2024-07-18"
|
||||
}
|
||||
@ -0,0 +1,19 @@
|
||||
{
|
||||
"choices": [
|
||||
{
|
||||
"message": {
|
||||
"content": "```json\n{\n \"sections\": {\n \"Summary\": {\n \"score\": 8,\n \"suggestions\": [\"Consider including specific achievements or metrics to highlight your impact.\", \"Make the language more concise and powerful.\"],\n \"summary\": \"The summary provides a clear overview of the candidate's role and experience but lacks specific accomplishments that could strengthen it.\",\n \"keywords\": { \"Analityk\": 2, \"doświadczenie\": 1, \"manager\": 1, \"architekt\": 1 }\n },\n \"Work Experience\": {\n \"score\": 9,\n \"suggestions\": [],\n \"summary\": \"The work experience section is detailed and comprehensive, showcasing a strong career progression and relevant expertise in various roles.\",\n \"keywords\": { \"IT\": 6, \"analityk\": 5, \"systemów\": 5, \"projekt\": 5, \"współpraca\": 4, \"klientów\": 3, \"usług\": 3 }\n },\n \"Education\": {\n \"score\": 9,\n \"suggestions\": [],\n \"summary\": \"Education section is informative and highlights relevant degrees and certifications, showcasing the candidate's academic background.\",\n \"keywords\": { \"studia\": 3, \"Politechnika Warszawska\": 2, \"CISCO\": 1, \"Magister\": 1, \"Inżynierskie\": 1 }\n },\n \"Skills\": {\n \"score\": 7,\n \"suggestions\": [\"List skills specifically related to the positions applied for.\", \"Consider organizing skills into relevant categories.\"],\n \"summary\": \"Skills section is not explicitly defined, making it difficult to quickly assess the candidate's qualifications. Specific skills and categories would add clarity.\",\n \"keywords\": { \"analiza\": 2, \"systemy\": 1, \"współpraca\": 1, \"usługi\": 1 }\n },\n \"Certifications\": {\n \"score\": 8,\n \"suggestions\": [\"Add the date for each certification obtained for better clarity.\", \"Consider grouping certifications by relevance.\"],\n \"summary\": \"The certifications section lists various relevant training and qualifications but would benefit from more organization and specificity.\",\n \"keywords\": { \"certyfikat\": 1, \"szkolenie\": 1, \"ITIL\": 2 }\n },\n \"Projects\": {\n \"score\": 7,\n \"suggestions\": [\"Include specific project names and outcomes to enhance detail.\", \"Highlight individual contributions more clearly.\"],\n \"summary\": \"The projects section provides some context but lacks clear delineation of specific projects or the candidate's individual contributions and results.\",\n \"keywords\": { \"projekt\": 3, \"współpraca\": 2, \"systemy\": 1 }\n }\n },\n \"openai_stats\": {\n \"input_tokens\": 1526,\n \"output_tokens\": 469,\n \"total_tokens\": 1995,\n \"cost\": 0.09975\n }\n}\n```",
|
||||
"role": "assistant"
|
||||
},
|
||||
"finish_reason": "stop",
|
||||
"index": 0
|
||||
}
|
||||
],
|
||||
"usage": {
|
||||
"prompt_tokens": 2668,
|
||||
"completion_tokens": 647,
|
||||
"total_tokens": 3315
|
||||
},
|
||||
"cost": 0.0,
|
||||
"model": "gpt-4o-mini-2024-07-18"
|
||||
}
|
||||
@ -0,0 +1,19 @@
|
||||
{
|
||||
"choices": [
|
||||
{
|
||||
"message": {
|
||||
"content": "```json\n{\n \"sections\": {\n \"Summary\": {\n \"score\": 8,\n \"suggestions\": [\n \"Consider adding specific achievements or metrics to highlight impact.\",\n \"Simplify language for clearer understanding.\"\n ],\n \"summary\": \"The summary provides a clear overview of the candidate's experience and roles in business analysis and IT management but can be improved by adding specific achievements to quantify their contributions.\",\n \"keywords\": {\n \"analityk\": 3,\n \"doświadczenie\": 2,\n \"architekt\": 1,\n \"manager\": 1\n }\n },\n \"Work Experience\": {\n \"score\": 9,\n \"suggestions\": [],\n \"summary\": \"The work experience section is detailed, presenting clear job roles, responsibilities, and contributions. It utilizes strong action verbs but could be enhanced with quantifiable results in some roles.\",\n \"keywords\": {\n \"analiz\": 5,\n \"biznesowy\": 4,\n \"systemowy\": 4,\n \"projekt\": 4,\n \"współpraca\": 3,\n \"wymagania\": 2\n }\n },\n \"Education\": {\n \"score\": 8,\n \"suggestions\": [\n \"Specify the graduation status for higher education.\",\n \"Consider listing any honors or relevant coursework.\"\n ],\n \"summary\": \"The education section is comprehensive, including degrees and specialized training, but it lacks mention of graduation status and could highlight additional relevant coursework.\",\n \"keywords\": {\n \"Politechnika\": 2,\n \"CISCO\": 1,\n \"Magisterskie\": 1,\n \"Inżynierskie\": 1\n }\n },\n \"Skills\": {\n \"score\": 7,\n \"suggestions\": [\n \"Categorize skills into technical and soft skills for clarity.\",\n \"Add more specific technologies or methodologies relevant to the roles applied for.\"\n ],\n \"summary\": \"The skills section is minimal and lacks depth. Categorizing skills can improve clarity and relevance, and including specific technologies or methodologies would strengthen the section.\",\n \"keywords\": {\n \"szkoleń\": 4,\n \"certyfikaty\": 2,\n \"prawo jazdy\": 1\n }\n },\n \"Certifications\": {\n \"score\": 9,\n \"suggestions\": [],\n \"summary\": \"The certifications section is strong, detailing relevant training and certifications that add credibility to the candidate's qualifications.\",\n \"keywords\": {\n \"certyfikat\": 1,\n \"szkolenie\": 9\n }\n },\n \"Projects\": {\n \"score\": 6,\n \"suggestions\": [\n \"Create a separate section for key projects with descriptions and outcomes.\",\n \"Highlight individual contributions to collaborative projects.\"\n ],\n \"summary\": \"The projects are mentioned informally within work experience; however, creating a dedicated section would better emphasize significant projects and achievements.\",\n \"keywords\": {\n \"projekt\": 4,\n \"wymagania\": 2\n }\n }\n },\n \"openai_stats\": {\n \"input_tokens\": 2585,\n \"output_tokens\": 677,\n \"total_tokens\": 3262,\n \"cost\": 0.01308\n }\n}\n```",
|
||||
"role": "assistant"
|
||||
},
|
||||
"finish_reason": "stop",
|
||||
"index": 0
|
||||
}
|
||||
],
|
||||
"usage": {
|
||||
"prompt_tokens": 2668,
|
||||
"completion_tokens": 706,
|
||||
"total_tokens": 3374
|
||||
},
|
||||
"cost": 0.0,
|
||||
"model": "gpt-4o-mini-2024-07-18"
|
||||
}
|
||||
@ -0,0 +1,19 @@
|
||||
{
|
||||
"choices": [
|
||||
{
|
||||
"message": {
|
||||
"content": "```json\n{\n \"sections\": {\n \"Summary\": {\n \"score\": 8,\n \"suggestions\": [\n \"Add specific accomplishments or metrics to demonstrate impact\",\n \"Consider using bullet points for easier readability\"\n ],\n \"summary\": \"The summary provides a clear overview of the candidate's experience and roles as a business analyst, architect, and manager. However, it lacks specific details on achievements or metrics that could enhance its impact.\",\n \"keywords\": {\n \"Analityk biznesowy\": 1,\n \"systemowy\": 1,\n \"architekt\": 1,\n \"manager\": 1,\n \"doświadczenie\": 1\n }\n },\n \"Work Experience\": {\n \"score\": 9,\n \"suggestions\": [],\n \"summary\": \"This section provides comprehensive details about the candidate's relevant work experience, including roles, responsibilities, and achievements. It is well-structured and effectively highlights the candidate’s expertise.\",\n \"keywords\": {\n \"analityk\": 5,\n \"systemowy\": 2,\n \"kierownik\": 2,\n \"dzieło\": 2,\n \"projekt\": 3,\n \"współpraca\": 2,\n \"systemy\": 3,\n \"dokumentacja\": 2\n }\n },\n \"Education\": {\n \"score\": 8,\n \"suggestions\": [\n \"Include graduation years for better context\",\n \"Consider adding any honors or relevant coursework\"\n ],\n \"summary\": \"The education section lists relevant degrees and certifications, but lacks graduation dates and specifics about honors which could strengthen the presentation.\",\n \"keywords\": {\n \"Magisterskie\": 1,\n \"Inżynierskie\": 1,\n \"Politechnika\": 2,\n \"CISCO\": 1,\n \"specjalność\": 3\n }\n },\n \"Skills\": {\n \"score\": 7,\n \"suggestions\": [\n \"Add more specific technical and soft skills\",\n \"Group skills into categories for clarity\"\n ],\n \"summary\": \"The skills section is brief and could benefit from more detail. Including specific technical skills, soft skills, and grouping them would enhance this section’s effectiveness.\",\n \"keywords\": {}\n },\n \"Certifications\": {\n \"score\": 9,\n \"suggestions\": [],\n \"summary\": \"The certifications section is well-detailed, showcasing a range of relevant training and certifications that support the candidate's qualifications. No improvements needed.\",\n \"keywords\": {\n \"certyfikat\": 3,\n \"szkolenie\": 6,\n \"ITIL\": 2\n }\n },\n \"Projects\": {\n \"score\": 8,\n \"suggestions\": [\n \"Provide more detailed descriptions of key projects\",\n \"Highlight any specific outcomes or results achieved\"\n ],\n \"summary\": \"The projects section includes relevant experiences but would be improved by elaborating on the specifics of projects and their outcomes, including metrics or achievements.\",\n \"keywords\": {\n \"projekt\": 4,\n \"analiza\": 2,\n \"współpraca\": 1\n }\n }\n },\n \"openai_stats\": {\n \"input_tokens\": 1695,\n \"output_tokens\": 712,\n \"total_tokens\": 2407,\n \"cost\": 0.0035\n }\n}\n```",
|
||||
"role": "assistant"
|
||||
},
|
||||
"finish_reason": "stop",
|
||||
"index": 0
|
||||
}
|
||||
],
|
||||
"usage": {
|
||||
"prompt_tokens": 2668,
|
||||
"completion_tokens": 729,
|
||||
"total_tokens": 3397
|
||||
},
|
||||
"cost": 0,
|
||||
"model": "gpt-4o-mini-2024-07-18"
|
||||
}
|
||||
@ -1,819 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
import time
|
||||
import logging
|
||||
import json # Added for stats file
|
||||
from datetime import datetime, timedelta
|
||||
from pathlib import Path # Added for easier path handling
|
||||
|
||||
# --- Dependencies ---
|
||||
# Requires PyYAML: pip install PyYAML
|
||||
try:
|
||||
import yaml
|
||||
except ImportError:
|
||||
print("Error: PyYAML library not found. Please install it using: pip install PyYAML", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
# --- Configuration ---
|
||||
# These act as fallback defaults if not specified in config file or command line
|
||||
DEFAULT_SOURCE_DIR = "/mnt/slow_storage"
|
||||
DEFAULT_TARGET_DIR = "/mnt/fast_storage"
|
||||
DEFAULT_RECENT_DAYS = 1
|
||||
DEFAULT_STALE_DAYS = 30 # Default for moving cold files back
|
||||
DEFAULT_STATS_FILE = None # Default: Don't generate stats unless requested
|
||||
DEFAULT_MIN_SIZE = "0" # Default: No minimum size filter
|
||||
DEFAULT_CONFIG_PATH = Path.home() / ".config" / "file_manager" / "config.yaml"
|
||||
|
||||
# --- Logging Setup ---
|
||||
def setup_logging():
|
||||
"""Configures basic logging."""
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format="[%(asctime)s] [%(levelname)s] %(message)s",
|
||||
datefmt="%Y-%m-%d %H:%M:%S",
|
||||
)
|
||||
|
||||
# --- Helper Function ---
|
||||
def format_bytes(size):
|
||||
"""Converts bytes to a human-readable string (KB, MB, GB)."""
|
||||
if size is None: return "N/A"
|
||||
if size < 1024:
|
||||
return f"{size} B"
|
||||
elif size < 1024**2:
|
||||
return f"{size / 1024:.2f} KB"
|
||||
elif size < 1024**3:
|
||||
return f"{size / 1024**2:.2f} MB"
|
||||
else:
|
||||
return f"{size / 1024**3:.2f} GB"
|
||||
|
||||
# --- Helper Function: Parse Size String ---
|
||||
def parse_size_string(size_str):
|
||||
"""Converts a size string (e.g., '10G', '500M', '10k') to bytes."""
|
||||
size_str = str(size_str).strip().upper()
|
||||
if not size_str:
|
||||
return 0
|
||||
if size_str == '0':
|
||||
return 0
|
||||
|
||||
units = {"B": 1, "K": 1024, "M": 1024**2, "G": 1024**3, "T": 1024**4}
|
||||
unit = "B" # Default unit
|
||||
|
||||
# Check last character for unit
|
||||
if size_str[-1] in units:
|
||||
unit = size_str[-1]
|
||||
numeric_part = size_str[:-1]
|
||||
else:
|
||||
numeric_part = size_str
|
||||
|
||||
if not numeric_part.replace('.', '', 1).isdigit(): # Allow float for parsing e.g. 1.5G
|
||||
raise ValueError(f"Invalid numeric part in size string: '{numeric_part}'")
|
||||
|
||||
try:
|
||||
value = float(numeric_part)
|
||||
except ValueError:
|
||||
raise ValueError(f"Cannot convert numeric part to float: '{numeric_part}'")
|
||||
|
||||
return int(value * units[unit])
|
||||
|
||||
|
||||
# --- Configuration Loading ---
|
||||
def load_config(config_path):
|
||||
"""Loads configuration from a YAML file."""
|
||||
config = {}
|
||||
resolved_path = Path(config_path).resolve() # Resolve potential symlinks/relative paths
|
||||
if resolved_path.is_file():
|
||||
try:
|
||||
with open(resolved_path, 'r') as f:
|
||||
config = yaml.safe_load(f)
|
||||
if config is None: # Handle empty file case
|
||||
config = {}
|
||||
logging.info(f"Loaded configuration from: {resolved_path}")
|
||||
except yaml.YAMLError as e:
|
||||
logging.warning(f"Error parsing config file {resolved_path}: {e}. Using defaults.")
|
||||
except OSError as e:
|
||||
logging.warning(f"Error reading config file {resolved_path}: {e}. Using defaults.")
|
||||
else:
|
||||
# It's okay if the default config doesn't exist, don't log warning unless user specified one
|
||||
if str(resolved_path) != str(DEFAULT_CONFIG_PATH.resolve()):
|
||||
logging.warning(f"Specified config file not found at {resolved_path}. Using defaults/CLI args.")
|
||||
else:
|
||||
logging.info(f"Default config file not found at {resolved_path}. Using defaults/CLI args.")
|
||||
return config
|
||||
|
||||
# --- Argument Parsing ---
|
||||
def parse_arguments():
|
||||
"""Parses command line arguments, considering config file defaults."""
|
||||
|
||||
# Initial minimal parse to find config path *before* defining all args
|
||||
pre_parser = argparse.ArgumentParser(add_help=False)
|
||||
pre_parser.add_argument('--config', default=str(DEFAULT_CONFIG_PATH), help=f'Path to YAML configuration file (Default: {DEFAULT_CONFIG_PATH}).')
|
||||
pre_args, _ = pre_parser.parse_known_args()
|
||||
|
||||
# Load config based on pre-parsed path
|
||||
config = load_config(pre_args.config)
|
||||
|
||||
# Get defaults from config or fallback constants
|
||||
cfg_source_dir = config.get('source_dir', DEFAULT_SOURCE_DIR)
|
||||
cfg_target_dir = config.get('target_dir', DEFAULT_TARGET_DIR)
|
||||
cfg_recent_days = config.get('recent_days', DEFAULT_RECENT_DAYS)
|
||||
cfg_stale_days = config.get('stale_days', DEFAULT_STALE_DAYS)
|
||||
cfg_stats_file = config.get('stats_file', DEFAULT_STATS_FILE)
|
||||
cfg_min_size = config.get('min_size', DEFAULT_MIN_SIZE)
|
||||
|
||||
# Main parser using loaded config defaults
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Manages files between storage tiers based on access/modification time, generates stats, and summarizes.",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog=f"""Examples:
|
||||
# Move hot files (accessed < {cfg_recent_days}d ago) from {cfg_source_dir} to {cfg_target_dir}
|
||||
{sys.argv[0]} --move
|
||||
|
||||
# Move cold files (modified > {cfg_stale_days}d ago) from {cfg_target_dir} to {cfg_source_dir} (interactive)
|
||||
{sys.argv[0]} --move-cold --interactive
|
||||
|
||||
# Simulate moving hot files with custom settings
|
||||
{sys.argv[0]} --move --recent-days 3 --source-dir /data/archive --target-dir /data/hot --dry-run
|
||||
|
||||
# Count potential hot files larger than 100MB to move
|
||||
{sys.argv[0]} --count --min-size 100M
|
||||
{sys.argv[0]} --count
|
||||
|
||||
# Summarize unused files in target directory
|
||||
{sys.argv[0]} --summarize-unused
|
||||
|
||||
# Generate storage statistics report
|
||||
{sys.argv[0]} --generate-stats --stats-file /var/log/file_manager_stats.json
|
||||
|
||||
# Use a specific configuration file
|
||||
{sys.argv[0]} --config /path/to/my_config.yaml --move
|
||||
"""
|
||||
)
|
||||
|
||||
action_group = parser.add_argument_group('Actions (at least one required)')
|
||||
action_group.add_argument('--move', action='store_true', help='Move recently accessed ("hot") files from source to target.')
|
||||
action_group.add_argument('--move-cold', action='store_true', help='Move old unmodified ("cold") files from target back to source.')
|
||||
action_group.add_argument('--count', action='store_true', help='Count hot files in source that would be moved (based on access time).')
|
||||
action_group.add_argument('--summarize-unused', action='store_true', help='Analyze target directory for unused files based on modification time.')
|
||||
action_group.add_argument('--generate-stats', action='store_true', help='Generate a JSON stats report for source and target directories.')
|
||||
|
||||
config_group = parser.add_argument_group('Configuration Options (Overrides config file)')
|
||||
config_group.add_argument('--config', default=str(DEFAULT_CONFIG_PATH), help=f'Path to YAML configuration file (Default: {DEFAULT_CONFIG_PATH}).') # Re-add for help text
|
||||
config_group.add_argument('--source-dir', default=cfg_source_dir, help=f'Source directory (Default: "{cfg_source_dir}").')
|
||||
config_group.add_argument('--target-dir', default=cfg_target_dir, help=f'Target directory (Default: "{cfg_target_dir}").')
|
||||
config_group.add_argument('--recent-days', type=int, default=cfg_recent_days, help=f'Define "recent" access in days for --move/--count (Default: {cfg_recent_days}).')
|
||||
config_group.add_argument('--stale-days', type=int, default=cfg_stale_days, help=f'Define "stale" modification in days for --move-cold (Default: {cfg_stale_days}).')
|
||||
config_group.add_argument('--stats-file', default=cfg_stats_file, help=f'Output file for --generate-stats (Default: {"None" if cfg_stats_file is None else cfg_stats_file}).')
|
||||
config_group.add_argument('--min-size', default=cfg_min_size, help=f'Minimum file size to consider for move actions (e.g., 100M, 1G, 0 to disable). (Default: {cfg_min_size})')
|
||||
behavior_group = parser.add_argument_group('Behavior Modifiers')
|
||||
behavior_group.add_argument('--dry-run', action='store_true', help='Simulate move actions without actual changes.')
|
||||
behavior_group.add_argument('--interactive', action='store_true', help='Prompt for confirmation before executing move actions (ignored if --dry-run).')
|
||||
|
||||
|
||||
# If no arguments were given (just script name), print help
|
||||
if len(sys.argv) == 1:
|
||||
parser.print_help(sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Validate that at least one action is selected
|
||||
action_selected = args.move or args.move_cold or args.count or args.summarize_unused or args.generate_stats
|
||||
if not action_selected:
|
||||
parser.error("At least one action flag (--move, --move-cold, --count, --summarize-unused, --generate-stats) is required.")
|
||||
|
||||
# Validate days arguments
|
||||
if args.recent_days <= 0:
|
||||
parser.error("--recent-days must be a positive integer.")
|
||||
if args.stale_days <= 0:
|
||||
parser.error("--stale-days must be a positive integer.")
|
||||
|
||||
# Validate stats file if action is selected
|
||||
if args.generate_stats and not args.stats_file:
|
||||
parser.error("--stats-file must be specified when using --generate-stats (or set in config file).")
|
||||
|
||||
# Validate and parse min_size
|
||||
try:
|
||||
args.min_size_bytes = parse_size_string(args.min_size)
|
||||
if args.min_size_bytes < 0:
|
||||
parser.error("--min-size cannot be negative.")
|
||||
except ValueError as e:
|
||||
parser.error(f"Invalid --min-size value: {e}")
|
||||
|
||||
return args
|
||||
|
||||
# --- Core Logic Functions ---
|
||||
|
||||
def find_recent_files(source_dir, days, min_size_bytes):
|
||||
"""Finds files accessed within the last 'days' in the source directory."""
|
||||
size_filter_msg = f" and size >= {format_bytes(min_size_bytes)}" if min_size_bytes > 0 else ""
|
||||
logging.info(f"Scanning '{source_dir}' for files accessed within the last {days} day(s){size_filter_msg}...")
|
||||
recent_files = []
|
||||
cutoff_time = time.time() - (days * 86400) # 86400 seconds in a day
|
||||
try:
|
||||
for root, _, files in os.walk(source_dir):
|
||||
for filename in files:
|
||||
filepath = os.path.join(root, filename)
|
||||
try:
|
||||
# Check if it's a file and not a broken symlink etc.
|
||||
if not os.path.isfile(filepath) or os.path.islink(filepath):
|
||||
continue
|
||||
stat_result = os.stat(filepath)
|
||||
# Check access time AND minimum size
|
||||
if stat_result.st_atime > cutoff_time and stat_result.st_size >= min_size_bytes:
|
||||
# Get path relative to source_dir for rsync --files-from
|
||||
relative_path = os.path.relpath(filepath, source_dir)
|
||||
recent_files.append(relative_path)
|
||||
except FileNotFoundError:
|
||||
logging.warning(f"File not found during scan, skipping: {filepath}")
|
||||
continue # File might have been deleted during scan
|
||||
except OSError as e:
|
||||
logging.warning(f"Cannot access file stats, skipping: {filepath} ({e})")
|
||||
continue
|
||||
except FileNotFoundError:
|
||||
logging.error(f"Source directory '{source_dir}' not found during scan.")
|
||||
return None # Indicate error
|
||||
except Exception as e:
|
||||
logging.error(f"An unexpected error occurred during 'recent' file scan: {e}")
|
||||
return None
|
||||
|
||||
logging.info(f"Found {len(recent_files)} files matching the 'recent' criteria.")
|
||||
return recent_files
|
||||
|
||||
# --- New Function: Find Stale Files ---
|
||||
def find_stale_files(target_dir, days, min_size_bytes):
|
||||
"""Finds files modified more than 'days' ago in the target directory."""
|
||||
size_filter_msg = f" and size >= {format_bytes(min_size_bytes)}" if min_size_bytes > 0 else ""
|
||||
logging.info(f"Scanning '{target_dir}' for files modified more than {days} day(s) ago{size_filter_msg}...")
|
||||
stale_files = []
|
||||
# Cutoff time is *before* this time
|
||||
cutoff_time = time.time() - (days * 86400) # 86400 seconds in a day
|
||||
try:
|
||||
for root, _, files in os.walk(target_dir):
|
||||
for filename in files:
|
||||
filepath = os.path.join(root, filename)
|
||||
try:
|
||||
# Check if it's a file and not a broken symlink etc.
|
||||
if not os.path.isfile(filepath) or os.path.islink(filepath):
|
||||
continue
|
||||
stat_result = os.stat(filepath)
|
||||
# Check modification time
|
||||
# Check modification time AND minimum size
|
||||
if stat_result.st_mtime < cutoff_time and stat_result.st_size >= min_size_bytes:
|
||||
# Get path relative to target_dir for rsync --files-from
|
||||
relative_path = os.path.relpath(filepath, target_dir)
|
||||
stale_files.append(relative_path)
|
||||
except FileNotFoundError:
|
||||
logging.warning(f"File not found during stale scan, skipping: {filepath}")
|
||||
continue # File might have been deleted during scan
|
||||
except OSError as e:
|
||||
logging.warning(f"Cannot access file stats during stale scan, skipping: {filepath} ({e})")
|
||||
continue
|
||||
except FileNotFoundError:
|
||||
logging.error(f"Target directory '{target_dir}' not found during stale scan.")
|
||||
return None # Indicate error
|
||||
except Exception as e:
|
||||
logging.error(f"An unexpected error occurred during 'stale' file scan: {e}")
|
||||
return None
|
||||
|
||||
logging.info(f"Found {len(stale_files)} files matching the 'stale' criteria (modified > {days} days ago).")
|
||||
return stale_files
|
||||
|
||||
|
||||
def move_files(relative_file_list, source_dir, target_dir, dry_run, interactive): # Added interactive
|
||||
"""Moves files using rsync (hot files: source -> target)."""
|
||||
if not relative_file_list:
|
||||
logging.info("No 'hot' files found to move.")
|
||||
return True # Nothing to do, considered success
|
||||
|
||||
action_desc = "move hot files"
|
||||
simulating = dry_run
|
||||
num_files = len(relative_file_list)
|
||||
|
||||
logging.info(f"--- {'Simulating ' if simulating else ''}{action_desc.capitalize()} ---")
|
||||
logging.info(f"Source Base: {source_dir}")
|
||||
logging.info(f"Target Base: {target_dir}")
|
||||
logging.info(f"Files to process: {num_files}")
|
||||
logging.info("--------------------")
|
||||
|
||||
# Interactive prompt
|
||||
if interactive and not simulating:
|
||||
try:
|
||||
confirm = input(f"Proceed with moving {num_files} hot files from '{source_dir}' to '{target_dir}'? (yes/no): ").lower().strip()
|
||||
if confirm != 'yes':
|
||||
logging.warning("Move operation cancelled by user.")
|
||||
return False # Indicate cancellation
|
||||
except EOFError: # Handle non-interactive environments gracefully
|
||||
logging.warning("Cannot prompt in non-interactive mode. Aborting move.")
|
||||
return False
|
||||
|
||||
|
||||
rsync_cmd = ['rsync', '-avP', '--relative', '--info=progress2'] # archive, verbose, progress/partial, relative paths
|
||||
|
||||
if simulating:
|
||||
rsync_cmd.append('--dry-run')
|
||||
else:
|
||||
rsync_cmd.append('--remove-source-files')
|
||||
|
||||
# Use --files-from=- with source as '.' because paths are relative to source_dir
|
||||
# Target directory is the destination for the relative structure
|
||||
rsync_cmd.extend(['--files-from=-', '.', target_dir])
|
||||
|
||||
# Prepare file list for stdin (newline separated)
|
||||
files_input = "\n".join(relative_file_list).encode('utf-8')
|
||||
|
||||
try:
|
||||
logging.info(f"Executing rsync command: {' '.join(rsync_cmd)}")
|
||||
# Run rsync in the source directory context
|
||||
process = subprocess.run(
|
||||
rsync_cmd,
|
||||
input=files_input,
|
||||
capture_output=True,
|
||||
# text=True, # Removed: Input is bytes, output will be bytes
|
||||
check=False, # Don't raise exception on non-zero exit
|
||||
cwd=source_dir # Execute rsync from the source directory
|
||||
)
|
||||
|
||||
# Decode output/error streams
|
||||
stdout_str = process.stdout.decode('utf-8', errors='replace') if process.stdout else ""
|
||||
stderr_str = process.stderr.decode('utf-8', errors='replace') if process.stderr else ""
|
||||
|
||||
if stdout_str:
|
||||
logging.info("rsync output:\n" + stdout_str)
|
||||
if stderr_str:
|
||||
# rsync often prints stats to stderr, log as info unless exit code is bad
|
||||
log_level = logging.WARNING if process.returncode != 0 else logging.INFO
|
||||
logging.log(log_level, "rsync stderr:\n" + stderr_str)
|
||||
|
||||
if process.returncode == 0:
|
||||
logging.info(f"rsync {'simulation' if simulating else action_desc} completed successfully.")
|
||||
logging.info("--------------------")
|
||||
return True
|
||||
else:
|
||||
logging.error(f"rsync {'simulation' if simulating else action_desc} failed with exit code {process.returncode}.")
|
||||
logging.info("--------------------")
|
||||
return False
|
||||
|
||||
except FileNotFoundError:
|
||||
logging.error("Error: 'rsync' command not found. Please ensure rsync is installed and in your PATH.")
|
||||
return False
|
||||
except Exception as e:
|
||||
logging.error(f"An unexpected error occurred during rsync execution for hot files: {e}")
|
||||
return False
|
||||
|
||||
# --- New Function: Move Cold Files ---
|
||||
def move_files_cold(relative_file_list, source_dir, target_dir, dry_run, interactive):
|
||||
"""Moves files using rsync (cold files: target -> source)."""
|
||||
if not relative_file_list:
|
||||
logging.info("No 'cold' files found to move back.")
|
||||
return True # Nothing to do, considered success
|
||||
|
||||
action_desc = "move cold files back"
|
||||
simulating = dry_run
|
||||
num_files = len(relative_file_list)
|
||||
total_size = 0
|
||||
|
||||
# Calculate total size before prompt/move
|
||||
logging.info("Calculating total size of cold files...")
|
||||
for rel_path in relative_file_list:
|
||||
try:
|
||||
full_path = os.path.join(target_dir, rel_path)
|
||||
if os.path.isfile(full_path): # Check again in case it vanished
|
||||
total_size += os.path.getsize(full_path)
|
||||
except OSError as e:
|
||||
logging.warning(f"Could not get size for {rel_path}: {e}")
|
||||
|
||||
|
||||
logging.info(f"--- {'Simulating ' if simulating else ''}{action_desc.capitalize()} ---")
|
||||
logging.info(f"Source (of cold files): {target_dir}")
|
||||
logging.info(f"Destination (archive): {source_dir}")
|
||||
logging.info(f"Files to process: {num_files}")
|
||||
logging.info(f"Total size: {format_bytes(total_size)}")
|
||||
logging.info("--------------------")
|
||||
|
||||
# Interactive prompt
|
||||
if interactive and not simulating:
|
||||
try:
|
||||
confirm = input(f"Proceed with moving {num_files} cold files ({format_bytes(total_size)}) from '{target_dir}' to '{source_dir}'? (yes/no): ").lower().strip()
|
||||
if confirm != 'yes':
|
||||
logging.warning("Move operation cancelled by user.")
|
||||
return False # Indicate cancellation
|
||||
except EOFError: # Handle non-interactive environments gracefully
|
||||
logging.warning("Cannot prompt in non-interactive mode. Aborting move.")
|
||||
return False
|
||||
|
||||
# Note: We run rsync from the TARGET directory now
|
||||
rsync_cmd = ['rsync', '-avP', '--relative'] # archive, verbose, progress/partial, relative paths
|
||||
|
||||
if simulating:
|
||||
rsync_cmd.append('--dry-run')
|
||||
else:
|
||||
rsync_cmd.append('--remove-source-files') # Remove from TARGET after successful transfer
|
||||
|
||||
# Use --files-from=- with source as '.' (relative to target_dir)
|
||||
# Target directory is the destination (source_dir in this context)
|
||||
rsync_cmd.extend(['--files-from=-', '.', source_dir])
|
||||
|
||||
# Prepare file list for stdin (newline separated)
|
||||
files_input = "\n".join(relative_file_list).encode('utf-8')
|
||||
|
||||
try:
|
||||
logging.info(f"Executing rsync command: {' '.join(rsync_cmd)}")
|
||||
# Run rsync in the TARGET directory context
|
||||
process = subprocess.run(
|
||||
rsync_cmd,
|
||||
input=files_input,
|
||||
capture_output=True,
|
||||
# text=True, # Removed: Input is bytes, output will be bytes
|
||||
check=False, # Don't raise exception on non-zero exit
|
||||
cwd=target_dir # <<< Execute rsync from the TARGET directory
|
||||
)
|
||||
|
||||
# Decode output/error streams
|
||||
stdout_str = process.stdout.decode('utf-8', errors='replace') if process.stdout else ""
|
||||
stderr_str = process.stderr.decode('utf-8', errors='replace') if process.stderr else ""
|
||||
|
||||
if stdout_str:
|
||||
logging.info("rsync output:\n" + stdout_str)
|
||||
if stderr_str:
|
||||
log_level = logging.WARNING if process.returncode != 0 else logging.INFO
|
||||
logging.log(log_level, "rsync stderr:\n" + stderr_str)
|
||||
|
||||
if process.returncode == 0:
|
||||
logging.info(f"rsync {'simulation' if simulating else action_desc} completed successfully.")
|
||||
logging.info("--------------------")
|
||||
return True
|
||||
else:
|
||||
logging.error(f"rsync {'simulation' if simulating else action_desc} failed with exit code {process.returncode}.")
|
||||
logging.info("--------------------")
|
||||
return False
|
||||
|
||||
except FileNotFoundError:
|
||||
logging.error("Error: 'rsync' command not found. Please ensure rsync is installed and in your PATH.")
|
||||
return False
|
||||
except Exception as e:
|
||||
logging.error(f"An unexpected error occurred during rsync execution for cold files: {e}")
|
||||
return False
|
||||
|
||||
|
||||
def count_files(file_list):
|
||||
"""Logs the count of files found."""
|
||||
logging.info("--- Counting Hot Move Candidates ---")
|
||||
if file_list is None:
|
||||
logging.warning("File list is not available (likely due to earlier error).")
|
||||
else:
|
||||
logging.info(f"Found {len(file_list)} potential hot files to move based on access time.")
|
||||
logging.info("----------------------------")
|
||||
|
||||
def summarize_unused(target_dir):
|
||||
"""Summarizes unused files in the target directory based on modification time."""
|
||||
logging.info("--- Summarizing Unused Files in Target ---")
|
||||
logging.info(f"Target Directory: {target_dir}")
|
||||
logging.info("Criteria: Based on modification time (-mtime)")
|
||||
logging.info("------------------------------------------")
|
||||
|
||||
periods_days = [1, 3, 7, 14, 30]
|
||||
now = time.time()
|
||||
period_cutoffs = {days: now - (days * 86400) for days in periods_days}
|
||||
# Add a bucket for > 30 days
|
||||
size_by_period = {days: 0 for days in periods_days + ['30+']}
|
||||
count_by_period = {days: 0 for days in periods_days + ['30+']} # Also count files
|
||||
|
||||
file_count = 0
|
||||
total_processed_size = 0
|
||||
|
||||
try:
|
||||
for root, _, files in os.walk(target_dir):
|
||||
for filename in files:
|
||||
filepath = os.path.join(root, filename)
|
||||
try:
|
||||
# Check if it's a file and not a broken symlink etc.
|
||||
if not os.path.isfile(filepath) or os.path.islink(filepath):
|
||||
continue
|
||||
stat_result = os.stat(filepath)
|
||||
mtime = stat_result.st_mtime
|
||||
fsize = stat_result.st_size
|
||||
file_count += 1
|
||||
total_processed_size += fsize
|
||||
|
||||
# Check against periods in descending order of age (longest first)
|
||||
period_assigned = False
|
||||
if mtime < period_cutoffs[30]:
|
||||
size_by_period['30+'] += fsize
|
||||
count_by_period['30+'] += 1
|
||||
period_assigned = True
|
||||
elif mtime < period_cutoffs[14]:
|
||||
size_by_period[30] += fsize
|
||||
count_by_period[30] += 1
|
||||
period_assigned = True
|
||||
elif mtime < period_cutoffs[7]:
|
||||
size_by_period[14] += fsize
|
||||
count_by_period[14] += 1
|
||||
period_assigned = True
|
||||
elif mtime < period_cutoffs[3]:
|
||||
size_by_period[7] += fsize
|
||||
count_by_period[7] += 1
|
||||
period_assigned = True
|
||||
elif mtime < period_cutoffs[1]:
|
||||
size_by_period[3] += fsize
|
||||
count_by_period[3] += 1
|
||||
period_assigned = True
|
||||
# else: # Modified within the last day - doesn't count for these summaries
|
||||
|
||||
except FileNotFoundError:
|
||||
logging.warning(f"File not found during summary, skipping: {filepath}")
|
||||
continue
|
||||
except OSError as e:
|
||||
logging.warning(f"Cannot access file stats during summary, skipping: {filepath} ({e})")
|
||||
continue
|
||||
|
||||
logging.info(f"Scanned {file_count} files, total size: {format_bytes(total_processed_size)}")
|
||||
|
||||
# Calculate cumulative sizes and counts
|
||||
cumulative_size = {days: 0 for days in periods_days + ['30+']}
|
||||
cumulative_count = {days: 0 for days in periods_days + ['30+']}
|
||||
|
||||
# Iterate backwards through sorted periods for cumulative calculation
|
||||
# These keys represent the *lower bound* of the age bucket (e.g., key '30' means 14 < age <= 30 days)
|
||||
# The cumulative value for key 'X' means "total size/count of files older than X days"
|
||||
sorted_periods_desc = ['30+'] + sorted(periods_days, reverse=True) # e.g., ['30+', 30, 14, 7, 3, 1]
|
||||
last_period_size = 0
|
||||
last_period_count = 0
|
||||
temp_cumulative_size = {}
|
||||
temp_cumulative_count = {}
|
||||
|
||||
for period_key in sorted_periods_desc:
|
||||
current_size = size_by_period[period_key]
|
||||
current_count = count_by_period[period_key]
|
||||
temp_cumulative_size[period_key] = current_size + last_period_size
|
||||
temp_cumulative_count[period_key] = current_count + last_period_count
|
||||
last_period_size = temp_cumulative_size[period_key]
|
||||
last_period_count = temp_cumulative_count[period_key]
|
||||
|
||||
# Map temporary cumulative values to the correct "older than X days" meaning
|
||||
# cumulative_size[1] should be size of files older than 1 day (i.e. temp_cumulative_size[3])
|
||||
cumulative_size[1] = temp_cumulative_size.get(3, 0)
|
||||
cumulative_count[1] = temp_cumulative_count.get(3, 0)
|
||||
cumulative_size[3] = temp_cumulative_size.get(7, 0)
|
||||
cumulative_count[3] = temp_cumulative_count.get(7, 0)
|
||||
cumulative_size[7] = temp_cumulative_size.get(14, 0)
|
||||
cumulative_count[7] = temp_cumulative_count.get(14, 0)
|
||||
cumulative_size[14] = temp_cumulative_size.get(30, 0)
|
||||
cumulative_count[14] = temp_cumulative_count.get(30, 0)
|
||||
cumulative_size[30] = temp_cumulative_size.get('30+', 0)
|
||||
cumulative_count[30] = temp_cumulative_count.get('30+', 0)
|
||||
cumulative_size['30+'] = temp_cumulative_size.get('30+', 0) # Redundant but harmless
|
||||
cumulative_count['30+'] = temp_cumulative_count.get('30+', 0)
|
||||
|
||||
|
||||
logging.info("Cumulative stats for files NOT modified for more than:")
|
||||
# Display in ascending order of days for clarity
|
||||
logging.info(f" > 1 day: {format_bytes(cumulative_size[1])} ({cumulative_count[1]} files)")
|
||||
logging.info(f" > 3 days: {format_bytes(cumulative_size[3])} ({cumulative_count[3]} files)")
|
||||
logging.info(f" > 7 days: {format_bytes(cumulative_size[7])} ({cumulative_count[7]} files)")
|
||||
logging.info(f" > 14 days:{format_bytes(cumulative_size[14])} ({cumulative_count[14]} files)")
|
||||
logging.info(f" > 30 days:{format_bytes(cumulative_size[30])} ({cumulative_count[30]} files)")
|
||||
|
||||
|
||||
except FileNotFoundError:
|
||||
logging.error(f"Target directory '{target_dir}' not found for summary.")
|
||||
except Exception as e:
|
||||
logging.error(f"An unexpected error occurred during unused file summary: {e}")
|
||||
|
||||
logging.info("------------------------------------------")
|
||||
|
||||
# --- New Function: Analyze Directory for Stats ---
|
||||
def analyze_directory(directory):
|
||||
"""Analyzes a directory and returns statistics."""
|
||||
logging.info(f"Analyzing directory for statistics: {directory}")
|
||||
stats = {
|
||||
'total_files': 0,
|
||||
'total_size': 0,
|
||||
'size_by_mod_time_days': { # Buckets represent age > X days (key '1' means 0 < age <= 1 day)
|
||||
'1': {'count': 0, 'size': 0}, # <= 1 day old
|
||||
'3': {'count': 0, 'size': 0}, # > 1 day, <= 3 days old
|
||||
'7': {'count': 0, 'size': 0}, # > 3 days, <= 7 days old
|
||||
'14': {'count': 0, 'size': 0},# > 7 days, <= 14 days old
|
||||
'30': {'count': 0, 'size': 0}, # > 14 days, <= 30 days old
|
||||
'over_30': {'count': 0, 'size': 0} # > 30 days old
|
||||
},
|
||||
'error_count': 0,
|
||||
}
|
||||
periods_days = [1, 3, 7, 14, 30]
|
||||
now = time.time()
|
||||
# Cutoffs: if mtime < cutoff[X], file is older than X days
|
||||
period_cutoffs = {days: now - (days * 86400) for days in periods_days}
|
||||
|
||||
try:
|
||||
for root, _, files in os.walk(directory):
|
||||
for filename in files:
|
||||
filepath = os.path.join(root, filename)
|
||||
try:
|
||||
if not os.path.isfile(filepath) or os.path.islink(filepath):
|
||||
continue
|
||||
stat_result = os.stat(filepath)
|
||||
mtime = stat_result.st_mtime
|
||||
fsize = stat_result.st_size
|
||||
|
||||
stats['total_files'] += 1
|
||||
stats['total_size'] += fsize
|
||||
|
||||
# Assign to age buckets based on modification time (oldest first)
|
||||
if mtime < period_cutoffs[30]:
|
||||
stats['size_by_mod_time_days']['over_30']['count'] += 1
|
||||
stats['size_by_mod_time_days']['over_30']['size'] += fsize
|
||||
elif mtime < period_cutoffs[14]:
|
||||
stats['size_by_mod_time_days']['30']['count'] += 1
|
||||
stats['size_by_mod_time_days']['30']['size'] += fsize
|
||||
elif mtime < period_cutoffs[7]:
|
||||
stats['size_by_mod_time_days']['14']['count'] += 1
|
||||
stats['size_by_mod_time_days']['14']['size'] += fsize
|
||||
elif mtime < period_cutoffs[3]:
|
||||
stats['size_by_mod_time_days']['7']['count'] += 1
|
||||
stats['size_by_mod_time_days']['7']['size'] += fsize
|
||||
elif mtime < period_cutoffs[1]:
|
||||
stats['size_by_mod_time_days']['3']['count'] += 1
|
||||
stats['size_by_mod_time_days']['3']['size'] += fsize
|
||||
else: # Modified within the last day
|
||||
stats['size_by_mod_time_days']['1']['count'] += 1
|
||||
stats['size_by_mod_time_days']['1']['size'] += fsize
|
||||
|
||||
except FileNotFoundError:
|
||||
logging.warning(f"File not found during stats analysis, skipping: {filepath}")
|
||||
stats['error_count'] += 1
|
||||
continue
|
||||
except OSError as e:
|
||||
logging.warning(f"Cannot access file stats during stats analysis, skipping: {filepath} ({e})")
|
||||
stats['error_count'] += 1
|
||||
continue
|
||||
|
||||
logging.info(f"Analysis complete for {directory}: Found {stats['total_files']} files, total size {format_bytes(stats['total_size'])}.")
|
||||
if stats['error_count'] > 0:
|
||||
logging.warning(f"Encountered {stats['error_count']} errors during analysis of {directory}.")
|
||||
return stats
|
||||
|
||||
except FileNotFoundError:
|
||||
logging.error(f"Directory '{directory}' not found for statistics analysis.")
|
||||
return None # Indicate error
|
||||
except Exception as e:
|
||||
logging.error(f"An unexpected error occurred during statistics analysis of {directory}: {e}")
|
||||
return None
|
||||
|
||||
# --- New Function: Generate Stats Report ---
|
||||
def generate_stats(args):
|
||||
"""Generates a JSON statistics report for source and target directories."""
|
||||
logging.info("--- Generating Statistics Report ---")
|
||||
report = {
|
||||
'report_generated_utc': datetime.utcnow().isoformat() + 'Z',
|
||||
'source_directory': args.source_dir,
|
||||
'target_directory': args.target_dir,
|
||||
'source_stats': None,
|
||||
'target_stats': None,
|
||||
}
|
||||
success = True
|
||||
|
||||
# Analyze source directory if it exists
|
||||
if os.path.isdir(args.source_dir):
|
||||
logging.info(f"Analyzing source directory: {args.source_dir}")
|
||||
source_stats = analyze_directory(args.source_dir)
|
||||
if source_stats is None:
|
||||
logging.error(f"Failed to analyze source directory: {args.source_dir}")
|
||||
success = False # Mark as partial failure, but continue
|
||||
report['source_stats'] = source_stats
|
||||
else:
|
||||
logging.warning(f"Source directory '{args.source_dir}' not found, skipping analysis.")
|
||||
report['source_stats'] = {'error': 'Directory not found'}
|
||||
|
||||
|
||||
# Analyze target directory if it exists
|
||||
if os.path.isdir(args.target_dir):
|
||||
logging.info(f"Analyzing target directory: {args.target_dir}")
|
||||
target_stats = analyze_directory(args.target_dir)
|
||||
if target_stats is None:
|
||||
logging.error(f"Failed to analyze target directory: {args.target_dir}")
|
||||
success = False # Mark as partial failure
|
||||
report['target_stats'] = target_stats
|
||||
else:
|
||||
logging.warning(f"Target directory '{args.target_dir}' not found, skipping analysis.")
|
||||
report['target_stats'] = {'error': 'Directory not found'}
|
||||
|
||||
|
||||
if not success:
|
||||
logging.warning("Stats generation encountered errors analyzing one or both directories.")
|
||||
# Continue to write partial report
|
||||
|
||||
# Write the report to the specified file
|
||||
stats_file_path = Path(args.stats_file)
|
||||
try:
|
||||
# Create parent directories if they don't exist
|
||||
stats_file_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
with open(stats_file_path, 'w') as f:
|
||||
json.dump(report, f, indent=4)
|
||||
logging.info(f"Successfully wrote statistics report to: {stats_file_path}")
|
||||
return success # Return True if both analyses succeeded, False otherwise
|
||||
except OSError as e:
|
||||
logging.error(f"Error writing statistics report to {stats_file_path}: {e}")
|
||||
return False
|
||||
except Exception as e:
|
||||
logging.error(f"An unexpected error occurred while writing stats report: {e}")
|
||||
return False
|
||||
|
||||
|
||||
# --- Main Execution ---
|
||||
def main():
|
||||
"""Main function to orchestrate the script."""
|
||||
setup_logging()
|
||||
args = parse_arguments() # Now handles config loading
|
||||
|
||||
# --- Directory Validation ---
|
||||
# Check source if needed
|
||||
source_ok = True
|
||||
if (args.move or args.count or args.generate_stats or args.move_cold): # move_cold needs source as destination
|
||||
if not os.path.isdir(args.source_dir):
|
||||
logging.error(f"Source directory '{args.source_dir}' not found or is not a directory.")
|
||||
source_ok = False
|
||||
else:
|
||||
logging.debug(f"Source directory validated: {args.source_dir}")
|
||||
|
||||
# Check target if needed
|
||||
target_ok = True
|
||||
if (args.move or args.summarize_unused or args.generate_stats or args.move_cold): # move_cold needs target as source
|
||||
if not os.path.isdir(args.target_dir):
|
||||
logging.error(f"Target directory '{args.target_dir}' not found or is not a directory.")
|
||||
target_ok = False
|
||||
else:
|
||||
logging.debug(f"Target directory validated: {args.target_dir}")
|
||||
|
||||
# Exit if essential directories are missing for the requested actions that *require* them
|
||||
if not source_ok and (args.move or args.count):
|
||||
logging.error("Aborting: Source directory required for --move or --count is invalid.")
|
||||
sys.exit(1)
|
||||
if not target_ok and (args.summarize_unused):
|
||||
logging.error("Aborting: Target directory required for --summarize-unused is invalid.")
|
||||
sys.exit(1)
|
||||
if (not source_ok or not target_ok) and args.move_cold:
|
||||
logging.error("Aborting: Both source and target directories required for --move-cold are invalid.")
|
||||
sys.exit(1)
|
||||
# Note: generate_stats handles missing dirs internally
|
||||
|
||||
# --- Action Execution ---
|
||||
exit_code = 0 # Track if any operation fails
|
||||
|
||||
# --- Find files first if needed by multiple actions ---
|
||||
hot_files_to_process = None
|
||||
if args.move or args.count:
|
||||
# We already checked source_ok above for these actions
|
||||
hot_files_to_process = find_recent_files(args.source_dir, args.recent_days, args.min_size_bytes)
|
||||
if hot_files_to_process is None:
|
||||
logging.error("Aborting due to error finding recent 'hot' files.")
|
||||
sys.exit(1) # Abort if find failed
|
||||
|
||||
cold_files_to_process = None
|
||||
if args.move_cold:
|
||||
# We already checked target_ok above for this action
|
||||
cold_files_to_process = find_stale_files(args.target_dir, args.stale_days, args.min_size_bytes)
|
||||
if cold_files_to_process is None:
|
||||
logging.error("Aborting due to error finding 'cold' files.")
|
||||
sys.exit(1) # Abort if find failed
|
||||
|
||||
|
||||
# --- Execute Actions ---
|
||||
if args.count:
|
||||
count_files(hot_files_to_process) # Counts hot files
|
||||
|
||||
if args.move:
|
||||
# We already checked source_ok and target_ok for this action
|
||||
move_success = move_files(hot_files_to_process, args.source_dir, args.target_dir, args.dry_run, args.interactive)
|
||||
if not move_success and not args.dry_run:
|
||||
logging.error("Move 'hot' files operation failed or was cancelled.")
|
||||
exit_code = 1 # Mark failure
|
||||
|
||||
if args.move_cold:
|
||||
# We already checked source_ok and target_ok for this action
|
||||
move_cold_success = move_files_cold(cold_files_to_process, args.source_dir, args.target_dir, args.dry_run, args.interactive)
|
||||
if not move_cold_success and not args.dry_run:
|
||||
logging.error("Move 'cold' files operation failed or was cancelled.")
|
||||
exit_code = 1 # Mark failure
|
||||
|
||||
if args.summarize_unused:
|
||||
# We already checked target_ok for this action
|
||||
summarize_unused(args.target_dir)
|
||||
|
||||
if args.generate_stats:
|
||||
# generate_stats handles its own directory checks internally now
|
||||
stats_success = generate_stats(args)
|
||||
if not stats_success:
|
||||
# generate_stats already logged errors
|
||||
exit_code = 1
|
||||
|
||||
|
||||
logging.info("Script finished.")
|
||||
sys.exit(exit_code) # Exit with 0 on success, 1 on failure
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@ -1,186 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
import logging
|
||||
import requests
|
||||
from typing import Optional, Dict, List, Any
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
class OpenRouterError(Exception):
|
||||
"""Custom exception for OpenRouter API errors."""
|
||||
def __init__(self, message: str, status_code: int = None, response: dict = None):
|
||||
super().__init__(message)
|
||||
self.status_code = status_code
|
||||
self.response = response
|
||||
|
||||
class OpenRouterResponse:
|
||||
"""Wrapper for OpenRouter API responses."""
|
||||
def __init__(self, raw_response: dict):
|
||||
self.raw_response = raw_response
|
||||
self.choices = self._parse_choices()
|
||||
self.usage = self._parse_usage()
|
||||
self.model = raw_response.get("model")
|
||||
|
||||
def _parse_choices(self) -> List[Dict[str, Any]]:
|
||||
choices = self.raw_response.get("choices", [])
|
||||
return [
|
||||
{
|
||||
"message": choice.get("message", {}),
|
||||
"finish_reason": choice.get("finish_reason"),
|
||||
"index": choice.get("index")
|
||||
}
|
||||
for choice in choices
|
||||
]
|
||||
|
||||
def _parse_usage(self) -> Dict[str, int]:
|
||||
usage = self.raw_response.get("usage", {})
|
||||
return {
|
||||
"prompt_tokens": usage.get("prompt_tokens", 0),
|
||||
"completion_tokens": usage.get("completion_tokens", 0),
|
||||
"total_tokens": usage.get("total_tokens", 0)
|
||||
}
|
||||
|
||||
class OpenRouterClient:
|
||||
"""Client for interacting with the OpenRouter API."""
|
||||
def __init__(self, api_key: str, model_name: str):
|
||||
if not api_key:
|
||||
raise ValueError("OpenRouter API key is required")
|
||||
if not model_name:
|
||||
raise ValueError("Model name is required")
|
||||
|
||||
self.api_key = api_key
|
||||
self.model_name = model_name
|
||||
self.base_url = "https://openrouter.ai/api/v1"
|
||||
self.session = requests.Session()
|
||||
self.session.headers.update({
|
||||
"Authorization": f"Bearer {api_key}",
|
||||
"HTTP-Referer": "https://github.com/OpenRouterTeam/openrouter-examples",
|
||||
"X-Title": "CV Analysis Tool",
|
||||
"Content-Type": "application/json"
|
||||
})
|
||||
|
||||
def create_chat_completion(
|
||||
self,
|
||||
messages: List[Dict[str, str]],
|
||||
max_tokens: Optional[int] = None
|
||||
) -> OpenRouterResponse:
|
||||
"""
|
||||
Create a chat completion using the OpenRouter API.
|
||||
|
||||
Args:
|
||||
messages: List of message dictionaries with 'role' and 'content' keys
|
||||
max_tokens: Maximum number of tokens to generate
|
||||
|
||||
Returns:
|
||||
OpenRouterResponse object containing the API response
|
||||
|
||||
Raises:
|
||||
OpenRouterError: If the API request fails
|
||||
"""
|
||||
endpoint = f"{self.base_url}/chat/completions"
|
||||
payload = {
|
||||
"model": self.model_name,
|
||||
"messages": messages
|
||||
}
|
||||
|
||||
if max_tokens is not None:
|
||||
payload["max_tokens"] = max_tokens
|
||||
|
||||
try:
|
||||
response = self.session.post(endpoint, json=payload)
|
||||
response.raise_for_status()
|
||||
return OpenRouterResponse(response.json())
|
||||
except requests.exceptions.RequestException as e:
|
||||
raise self._handle_request_error(e)
|
||||
|
||||
def get_available_models(self) -> List[Dict[str, Any]]:
|
||||
"""
|
||||
Get list of available models from OpenRouter API.
|
||||
|
||||
Returns:
|
||||
List of model information dictionaries
|
||||
|
||||
Raises:
|
||||
OpenRouterError: If the API request fails
|
||||
"""
|
||||
endpoint = f"{self.base_url}/models"
|
||||
|
||||
try:
|
||||
logger.debug(f"Fetching available models from: {endpoint}")
|
||||
response = self.session.get(endpoint)
|
||||
response.raise_for_status()
|
||||
|
||||
data = response.json()
|
||||
logger.debug(f"Raw API response: {data}")
|
||||
|
||||
if not isinstance(data, dict) or "data" not in data:
|
||||
raise OpenRouterError(
|
||||
message="Invalid response format from OpenRouter API",
|
||||
response=data
|
||||
)
|
||||
|
||||
return data
|
||||
except requests.exceptions.RequestException as e:
|
||||
raise self._handle_request_error(e)
|
||||
|
||||
def verify_model_availability(self) -> bool:
|
||||
"""
|
||||
Verify if the configured model is available.
|
||||
|
||||
Returns:
|
||||
True if model is available, False otherwise
|
||||
"""
|
||||
try:
|
||||
response = self.get_available_models()
|
||||
# OpenRouter API zwraca listę modeli w formacie:
|
||||
# {"data": [{"id": "model_name", ...}, ...]}
|
||||
models = response.get("data", [])
|
||||
logger.debug(f"Available models: {[model.get('id') for model in models]}")
|
||||
return any(model.get("id") == self.model_name for model in models)
|
||||
except OpenRouterError as e:
|
||||
logger.error(f"Failed to verify model availability: {e}")
|
||||
return False
|
||||
except Exception as e:
|
||||
logger.error(f"Unexpected error while verifying model availability: {e}")
|
||||
return False
|
||||
|
||||
def _handle_request_error(self, error: requests.exceptions.RequestException) -> OpenRouterError:
|
||||
"""Convert requests exceptions to OpenRouterError."""
|
||||
if error.response is not None:
|
||||
try:
|
||||
error_data = error.response.json()
|
||||
message = error_data.get("error", {}).get("message", str(error))
|
||||
return OpenRouterError(
|
||||
message=message,
|
||||
status_code=error.response.status_code,
|
||||
response=error_data
|
||||
)
|
||||
except ValueError:
|
||||
pass
|
||||
return OpenRouterError(str(error))
|
||||
|
||||
def initialize_openrouter_client(api_key: str, model_name: str) -> OpenRouterClient:
|
||||
"""
|
||||
Initialize and verify OpenRouter client.
|
||||
|
||||
Args:
|
||||
api_key: OpenRouter API key
|
||||
model_name: Name of the model to use
|
||||
|
||||
Returns:
|
||||
Initialized OpenRouterClient
|
||||
|
||||
Raises:
|
||||
ValueError: If client initialization or verification fails
|
||||
"""
|
||||
try:
|
||||
client = OpenRouterClient(api_key=api_key, model_name=model_name)
|
||||
|
||||
# Verify connection and model availability
|
||||
if not client.verify_model_availability():
|
||||
raise ValueError(f"Model {model_name} not available")
|
||||
|
||||
logger.debug(f"Successfully initialized OpenRouter client with model: {model_name}")
|
||||
return client
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to initialize OpenRouter client: {e}")
|
||||
raise
|
||||
@ -6,31 +6,20 @@ import json
|
||||
import logging
|
||||
from datetime import datetime, timezone
|
||||
import uuid
|
||||
from typing import Optional, Any, Dict
|
||||
from typing import Optional, Any
|
||||
import time
|
||||
|
||||
from dotenv import load_dotenv
|
||||
import pymongo
|
||||
import openai
|
||||
from pdfminer.high_level import extract_text
|
||||
|
||||
from openrouter_client import initialize_openrouter_client, OpenRouterError, OpenRouterResponse
|
||||
|
||||
# Load environment variables
|
||||
load_dotenv()
|
||||
|
||||
# Configuration
|
||||
OPENROUTER_API_KEY = os.getenv("OPENROUTER_API_KEY")
|
||||
if not OPENROUTER_API_KEY:
|
||||
# Use logger here if possible, but it might not be configured yet.
|
||||
# Consider raising the error later or logging after basicConfig.
|
||||
print("ERROR: OPENROUTER_API_KEY environment variable is required", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
OPENROUTER_MODEL_NAME = os.getenv("OPENROUTER_MODEL_NAME")
|
||||
if not OPENROUTER_MODEL_NAME:
|
||||
print("ERROR: OPENROUTER_MODEL_NAME environment variable is required", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
|
||||
MODEL_NAME = os.getenv("MODEL_NAME")
|
||||
MAX_TOKENS = int(os.getenv("MAX_TOKENS", 500))
|
||||
USE_MOCKUP = os.getenv("USE_MOCKUP", "false").lower() == "true"
|
||||
MOCKUP_FILE_PATH = os.getenv("MOCKUP_FILE_PATH")
|
||||
@ -39,177 +28,109 @@ MONGODB_DATABASE = os.getenv("MONGODB_DATABASE")
|
||||
|
||||
MONGO_COLLECTION_NAME = "cv_processing_collection"
|
||||
|
||||
# Initialize OpenAI client
|
||||
openai.api_key = OPENAI_API_KEY
|
||||
|
||||
# Logging setup
|
||||
LOG_LEVEL = os.getenv("LOG_LEVEL", "DEBUG").upper()
|
||||
|
||||
logging.basicConfig(
|
||||
level=LOG_LEVEL,
|
||||
format="[%(asctime)s] [%(name)s] [%(levelname)s] %(message)s",
|
||||
datefmt="%Y-%m-%dT%H:%M:%S%z",
|
||||
format='[%(asctime)s] [%(name)s] [%(levelname)s] %(message)s',
|
||||
datefmt='%Y-%m-%dT%H:%M:%S%z'
|
||||
)
|
||||
logger = logging.getLogger(__name__) # Define logger earlier
|
||||
|
||||
# Global variable to hold the client instance
|
||||
_opernrouter_client_instance = None
|
||||
|
||||
def get_opernrouter_client():
|
||||
"""
|
||||
Initializes and returns the OpenRouter client instance (lazy initialization).
|
||||
Ensures the client is initialized only once.
|
||||
"""
|
||||
global _opernrouter_client_instance
|
||||
if _opernrouter_client_instance is None:
|
||||
logger.info("Initializing OpenRouter client for the first time...")
|
||||
logger.debug(f"Using model: {OPENROUTER_MODEL_NAME}")
|
||||
logger.debug("API Key present and valid format: %s", bool(OPENROUTER_API_KEY and OPENROUTER_API_KEY.startswith("sk-or-v1-")))
|
||||
try:
|
||||
_opernrouter_client_instance = initialize_openrouter_client(
|
||||
api_key=OPENROUTER_API_KEY,
|
||||
model_name=OPENROUTER_MODEL_NAME
|
||||
)
|
||||
logger.info(f"Successfully initialized OpenRouter client with model: {OPENROUTER_MODEL_NAME}")
|
||||
except ValueError as e:
|
||||
logger.error(f"Configuration error during client initialization: {e}")
|
||||
# Re-raise or handle appropriately, maybe return None or raise specific error
|
||||
raise # Re-raise the ValueError to be caught higher up if needed
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to initialize OpenRouter client: {e}", exc_info=True)
|
||||
# Re-raise or handle appropriately
|
||||
raise # Re-raise the exception
|
||||
else:
|
||||
logger.debug("Returning existing OpenRouter client instance.")
|
||||
return _opernrouter_client_instance
|
||||
|
||||
|
||||
def get_mongo_collection():
|
||||
"""Initialize and return MongoDB collection."""
|
||||
# Consider lazy initialization for MongoDB as well if beneficial
|
||||
mongo_client = pymongo.MongoClient(MONGODB_URI)
|
||||
db = mongo_client[MONGODB_DATABASE]
|
||||
return db[MONGO_COLLECTION_NAME]
|
||||
|
||||
|
||||
def parse_arguments():
|
||||
"""Parses command line arguments."""
|
||||
parser = argparse.ArgumentParser(
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
description="""This tool analyzes resumes using the OpenRouter API. Parameters are required to run the analysis.
|
||||
|
||||
Required Environment Variables:
|
||||
- OPENROUTER_API_KEY: Your OpenRouter API key
|
||||
- OPENROUTER_MODEL_NAME: OpenRouter model to use (e.g. google/gemma-7b-it)
|
||||
- MONGODB_URI: MongoDB connection string (optional for mockup mode)
|
||||
- MAX_TOKENS: Maximum tokens for response (default: 500)""",
|
||||
usage="resume_analysis.py [-h] [-f FILE] [-m]",
|
||||
epilog="""Examples:
|
||||
Analyze a resume: resume_analysis.py -f my_resume.pdf
|
||||
Test with mockup data: resume_analysis.py -f test.pdf -m
|
||||
|
||||
Note: Make sure your OpenRouter API key and model name are properly configured in the .env file.""",
|
||||
)
|
||||
parser.add_argument(
|
||||
"-f", "--file", help="Path to the resume file to analyze (PDF or text)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"-m", "--mockup", action="store_true", help="Use mockup response instead of calling LLM API"
|
||||
)
|
||||
if len(sys.argv) == 1:
|
||||
parser.print_help()
|
||||
return None
|
||||
return parser.parse_args()
|
||||
|
||||
|
||||
def load_resume_text(args):
|
||||
"""Loads resume text from a file or uses mockup text."""
|
||||
use_mockup = args.mockup
|
||||
if use_mockup:
|
||||
resume_text = "Mockup resume text"
|
||||
else:
|
||||
if not os.path.exists(args.file):
|
||||
logger.error(f"File not found: {args.file}")
|
||||
sys.exit(1)
|
||||
|
||||
start_file_read_time = time.time()
|
||||
if args.file.lower().endswith(".pdf"):
|
||||
logger.debug(f"Using pdfminer to extract text from PDF: {args.file}")
|
||||
resume_text = extract_text(args.file)
|
||||
else:
|
||||
with open(
|
||||
args.file, "r", encoding="utf-8"
|
||||
) as f: # Explicitly specify utf-8 encoding for text files
|
||||
resume_text = f.read()
|
||||
file_read_time = time.time() - start_file_read_time
|
||||
logger.debug(f"File read time: {file_read_time:.2f} seconds")
|
||||
return resume_text
|
||||
|
||||
|
||||
def analyze_resume_with_llm(resume_text, use_mockup):
|
||||
"""Analyzes resume text using OpenRouter API."""
|
||||
start_time = time.time()
|
||||
response = call_llm_api(resume_text, use_mockup)
|
||||
llm_api_time = time.time() - start_time
|
||||
logger.debug(f"LLM API call time: {llm_api_time:.2f} seconds")
|
||||
return response
|
||||
|
||||
|
||||
def store_llm_response(response, use_mockup, input_file_path):
|
||||
"""Writes raw LLM response to a file."""
|
||||
write_llm_response(response, use_mockup, input_file_path)
|
||||
|
||||
|
||||
def save_processing_data(resume_text, summary, response, args, processing_id, use_mockup, cv_collection):
|
||||
"""Saves processing data to MongoDB."""
|
||||
insert_processing_data(
|
||||
resume_text,
|
||||
summary,
|
||||
response,
|
||||
args,
|
||||
processing_id,
|
||||
use_mockup,
|
||||
cv_collection,
|
||||
)
|
||||
|
||||
|
||||
def get_cv_summary_from_response(response):
|
||||
"""Extracts CV summary from LLM response."""
|
||||
if response and hasattr(response, "choices"):
|
||||
message_content = response.choices[0]['message']['content']
|
||||
try:
|
||||
summary = json.loads(message_content)
|
||||
except json.JSONDecodeError as e:
|
||||
logger.error(f"Failed to parse LLM response: {e}")
|
||||
summary = {"error": "Invalid JSON response from LLM"}
|
||||
else:
|
||||
summary = {"error": "No response from LLM"}
|
||||
return summary
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
def main():
|
||||
"""Main function to process the resume."""
|
||||
args = parse_arguments()
|
||||
if args is None:
|
||||
return
|
||||
use_mockup = args.mockup # Ustal, czy używać makiety na podstawie flagi -m
|
||||
parser = argparse.ArgumentParser(
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
description="""This tool analyzes resumes using OpenAI's API. Parameters are required to run the analysis.
|
||||
|
||||
Required Environment Variables:
|
||||
- OPENAI_API_KEY: Your OpenAI API key
|
||||
- MODEL_NAME: OpenAI model to use (e.g. gpt-3.5-turbo)
|
||||
- MONGODB_URI: MongoDB connection string (optional for mockup mode)""",
|
||||
usage="resume_analysis.py [-h] [-f FILE] [-m]",
|
||||
epilog="""Examples:
|
||||
Analyze a resume: resume_analysis.py -f my_resume.txt
|
||||
Test with mockup data: resume_analysis.py -f test.txt -m"""
|
||||
)
|
||||
parser.add_argument('-f', '--file', help='Path to the resume file to analyze (TXT)')
|
||||
parser.add_argument('-p', '--pdf', help='Path to the resume file to analyze (PDF)')
|
||||
parser.add_argument('-m', '--mockup', action='store_true', help='Use mockup response instead of calling OpenAI API')
|
||||
|
||||
# If no arguments provided, show help and exit
|
||||
if len(sys.argv) == 1:
|
||||
parser.print_help()
|
||||
sys.exit(1)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Determine whether to use mockup based on the -m flag, overriding USE_MOCKUP
|
||||
use_mockup = args.mockup
|
||||
|
||||
# Load the resume text from the provided file or use mockup
|
||||
if use_mockup:
|
||||
resume_text = "Mockup resume text"
|
||||
else:
|
||||
if args.pdf:
|
||||
if not os.path.exists(args.pdf):
|
||||
logger.error(f"PDF file not found: {args.pdf}")
|
||||
sys.exit(1)
|
||||
|
||||
start_file_read_time = time.time()
|
||||
try:
|
||||
resume_text = extract_text(args.pdf)
|
||||
except Exception as e:
|
||||
logger.error(f"Error extracting text from PDF: {e}", exc_info=True)
|
||||
sys.exit(1)
|
||||
file_read_time = time.time() - start_file_read_time
|
||||
logger.debug(f"PDF file read time: {file_read_time:.2f} seconds")
|
||||
# Save extracted text to file
|
||||
pdf_filename = os.path.splitext(os.path.basename(args.pdf))[0]
|
||||
text_file_path = os.path.join(os.path.dirname(args.pdf), f"{pdf_filename}_text.txt")
|
||||
with open(text_file_path, "w", encoding="utf-8") as text_file:
|
||||
text_file.write(resume_text)
|
||||
logger.debug(f"Extracted text saved to: {text_file_path}")
|
||||
elif args.file:
|
||||
if not os.path.exists(args.file):
|
||||
logger.error(f"File not found: {args.file}")
|
||||
sys.exit(1)
|
||||
|
||||
start_file_read_time = time.time()
|
||||
with open(args.file, 'r', encoding='latin-1') as f:
|
||||
resume_text = f.read()
|
||||
file_read_time = time.time() - start_file_read_time
|
||||
logger.debug(f"File read time: {file_read_time:.2f} seconds")
|
||||
else:
|
||||
parser.print_help()
|
||||
sys.exit(1)
|
||||
|
||||
# Call the OpenAI API with the resume text
|
||||
start_time = time.time()
|
||||
try:
|
||||
resume_text = load_resume_text(args)
|
||||
except FileNotFoundError as e:
|
||||
logger.error(f"File error: {e}")
|
||||
sys.exit(1)
|
||||
response = call_openai_api(resume_text, use_mockup)
|
||||
openai_api_time = time.time() - start_time
|
||||
logger.debug(f"OpenAI API call time: {openai_api_time:.2f} seconds")
|
||||
except Exception as e:
|
||||
logger.error(f"Error loading resume text: {e}")
|
||||
sys.exit(1)
|
||||
|
||||
response = analyze_resume_with_llm(resume_text, use_mockup)
|
||||
store_llm_response(response, use_mockup, args.file)
|
||||
|
||||
logger.error(f"Error during OpenAI API call: {e}", exc_info=True)
|
||||
response = None
|
||||
# Initialize MongoDB collection only when needed
|
||||
cv_collection = get_mongo_collection()
|
||||
processing_id = str(uuid.uuid4())
|
||||
summary = get_cv_summary_from_response(response)
|
||||
save_processing_data(resume_text, summary, response, args, processing_id, use_mockup, cv_collection)
|
||||
|
||||
logger.info(f"Resume analysis completed. Processing ID: {processing_id}")
|
||||
|
||||
# Measure MongoDB insertion time
|
||||
start_mongo_time = time.time()
|
||||
cost = insert_processing_data(resume_text, {}, response, args, str(uuid.uuid4()), use_mockup, cv_collection)
|
||||
mongo_insert_time = time.time() - start_mongo_time
|
||||
logger.debug(f"MongoDB insert time: {mongo_insert_time:.2f} seconds")
|
||||
write_openai_response(response, use_mockup, args.file, cost)
|
||||
|
||||
def load_mockup_response(mockup_file_path: str) -> dict:
|
||||
"""Load mockup response from a JSON file."""
|
||||
@ -218,190 +139,154 @@ def load_mockup_response(mockup_file_path: str) -> dict:
|
||||
raise FileNotFoundError(f"Mockup file not found at: {mockup_file_path}")
|
||||
with open(mockup_file_path, "r") as f:
|
||||
response = json.load(f)
|
||||
response.setdefault(
|
||||
"llm_stats", {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0}
|
||||
)
|
||||
#response.setdefault("openai_stats", {"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0})
|
||||
return response
|
||||
|
||||
|
||||
def call_llm_api(text: str, use_mockup: bool) -> Optional[OpenRouterResponse]:
|
||||
"""Call OpenRouter API to analyze resume text."""
|
||||
if use_mockup:
|
||||
logger.debug("Using mockup response.")
|
||||
return load_mockup_response(MOCKUP_FILE_PATH)
|
||||
|
||||
prompt_path = os.path.join(os.path.dirname(__file__), "prompt.txt")
|
||||
logger.debug(f"Loading system prompt from: {prompt_path}")
|
||||
|
||||
def call_openai_api(text: str, use_mockup: bool) -> Optional[Any]:
|
||||
"""Call OpenAI API to analyze resume text."""
|
||||
logger.debug("Calling OpenAI API.")
|
||||
try:
|
||||
# Load system prompt
|
||||
if not os.path.exists(prompt_path):
|
||||
raise FileNotFoundError(f"System prompt file not found: {prompt_path}")
|
||||
|
||||
with open(prompt_path, "r") as prompt_file:
|
||||
if use_mockup:
|
||||
return load_mockup_response(os.path.join(os.path.dirname(__file__), 'tests', 'mockup_response.json'))
|
||||
|
||||
with open(os.path.join(os.path.dirname(__file__), "prompt.txt"), "r") as prompt_file:
|
||||
system_content = prompt_file.read()
|
||||
|
||||
if not system_content.strip():
|
||||
raise ValueError("System prompt file is empty")
|
||||
|
||||
# Prepare messages
|
||||
messages = [
|
||||
{"role": "system", "content": system_content},
|
||||
{"role": "user", "content": text}
|
||||
]
|
||||
|
||||
logger.debug("Prepared messages for API call:")
|
||||
logger.debug(f"System message length: {len(system_content)} chars")
|
||||
logger.debug(f"User message length: {len(text)} chars")
|
||||
|
||||
# Call OpenRouter API
|
||||
logger.info(f"Calling OpenRouter API with model: {OPENROUTER_MODEL_NAME}")
|
||||
logger.debug(f"Max tokens set to: {MAX_TOKENS}")
|
||||
|
||||
# Get the client instance (initializes on first call)
|
||||
try:
|
||||
client = get_opernrouter_client()
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to get OpenRouter client: {e}")
|
||||
return None # Cannot proceed without a client
|
||||
|
||||
response = client.create_chat_completion(
|
||||
messages=messages,
|
||||
response = openai.chat.completions.create(
|
||||
model=MODEL_NAME,
|
||||
messages=[
|
||||
{"role": "system", "content": system_content},
|
||||
{"role": "user", "content": text}
|
||||
],
|
||||
max_tokens=MAX_TOKENS
|
||||
)
|
||||
|
||||
# Validate response
|
||||
if not response.choices:
|
||||
logger.warning("API response contains no choices")
|
||||
return None
|
||||
|
||||
# Log response details
|
||||
logger.info("Successfully received API response")
|
||||
logger.debug(f"Response model: {response.model}")
|
||||
logger.debug(f"Token usage: {response.usage}")
|
||||
logger.debug(f"Number of choices: {len(response.choices)}")
|
||||
|
||||
logger.debug(f"OpenAI API response: {response}")
|
||||
return response
|
||||
|
||||
except FileNotFoundError as e:
|
||||
logger.error(f"File error: {e}")
|
||||
return None
|
||||
except OpenRouterError as e:
|
||||
logger.error(f"OpenRouter API error: {e}", exc_info=True)
|
||||
if hasattr(e, 'response'):
|
||||
logger.error(f"Error response: {e.response}")
|
||||
return None
|
||||
except Exception as e:
|
||||
logger.error(f"Unexpected error during API call: {e}", exc_info=True)
|
||||
logger.error(f"Error during OpenAI API call: {e}", exc_info=True)
|
||||
return None
|
||||
|
||||
|
||||
def write_llm_response(
|
||||
response: Optional[OpenRouterResponse], use_mockup: bool, input_file_path: str = None
|
||||
) -> None:
|
||||
"""Write raw LLM response to a file."""
|
||||
def write_openai_response(response: Any, use_mockup: bool, input_file_path: str = None, cost: float = 0) -> None:
|
||||
"""Write raw OpenAI response to a file."""
|
||||
if use_mockup:
|
||||
logger.debug("Using mockup response; no LLM message to write.")
|
||||
logger.debug("Using mockup response; no OpenAI message to write.")
|
||||
return
|
||||
|
||||
if response is None:
|
||||
logger.warning("No response to write")
|
||||
return
|
||||
if response and response.choices:
|
||||
message_content = response.choices[0].message.content
|
||||
logger.debug(f"Raw OpenAI message content: {message_content}")
|
||||
|
||||
if not response.choices:
|
||||
logger.warning("No choices in LLM response")
|
||||
logger.debug(f"Response object: {response.raw_response}")
|
||||
return
|
||||
if input_file_path:
|
||||
output_dir = os.path.dirname(input_file_path)
|
||||
base_filename = os.path.splitext(os.path.basename(input_file_path))[0]
|
||||
else:
|
||||
logger.warning("Input file path not provided. Using default output directory and filename.")
|
||||
output_dir = os.path.join(os.path.dirname(__file__)) # Default to script's directory
|
||||
base_filename = "default" # Default filename
|
||||
|
||||
try:
|
||||
# Get output directory and base filename
|
||||
output_dir = os.path.dirname(input_file_path) if input_file_path else "."
|
||||
base_filename = (
|
||||
os.path.splitext(os.path.basename(input_file_path))[0]
|
||||
if input_file_path
|
||||
else "default"
|
||||
)
|
||||
|
||||
# Generate unique file path
|
||||
processing_id = str(uuid.uuid4())
|
||||
now = datetime.now()
|
||||
timestamp_str = now.strftime("%Y%m%d_%H%M%S")
|
||||
file_path = os.path.join(
|
||||
output_dir, f"{base_filename}_llm_response_{timestamp_str}_{processing_id}"
|
||||
) + ".json"
|
||||
file_path = os.path.join(output_dir, f"{base_filename}_openai_response_{processing_id}") + ".json"
|
||||
openai_file_path = os.path.join(output_dir, f"{base_filename}_openai.txt")
|
||||
|
||||
# Prepare serializable response
|
||||
serializable_response = {
|
||||
"choices": response.choices,
|
||||
"usage": response.usage,
|
||||
"model": response.model,
|
||||
"raw_response": response.raw_response
|
||||
}
|
||||
try:
|
||||
message_content = response.choices[0].message.content if response and response.choices else "No content"
|
||||
with open(openai_file_path, "w", encoding="utf-8") as openai_file:
|
||||
openai_file.write(message_content)
|
||||
logger.debug(f"OpenAI response written to {openai_file_path}")
|
||||
|
||||
# Write response to file
|
||||
with open(file_path, "w") as f:
|
||||
json.dump(serializable_response, f, indent=2)
|
||||
logger.debug(f"LLM response written to {file_path}")
|
||||
|
||||
except IOError as e:
|
||||
logger.error(f"Failed to write LLM response to file: {e}")
|
||||
except Exception as e:
|
||||
logger.error(f"Unexpected error while writing response: {e}", exc_info=True)
|
||||
serializable_response = {
|
||||
"choices": [
|
||||
{
|
||||
"message": {
|
||||
"content": choice.message.content,
|
||||
"role": choice.message.role
|
||||
},
|
||||
"finish_reason": choice.finish_reason,
|
||||
"index": choice.index
|
||||
} for choice in response.choices
|
||||
],
|
||||
"usage": {
|
||||
"prompt_tokens": response.usage.prompt_tokens,
|
||||
"completion_tokens": response.usage.completion_tokens,
|
||||
"total_tokens": response.usage.total_tokens
|
||||
},
|
||||
"cost": cost, # Include cost in the output JSON
|
||||
"model": response.model
|
||||
}
|
||||
with open(file_path, "w") as f:
|
||||
json.dump(serializable_response, f, indent=2, ensure_ascii=False)
|
||||
logger.debug(f"OpenAI response written to {file_path}")
|
||||
|
||||
except IOError as e:
|
||||
logger.error(f"Failed to write OpenAI response to file: {e}")
|
||||
else:
|
||||
logger.warning("No choices in OpenAI response to extract message from.")
|
||||
logger.debug(f"Response object: {response}")
|
||||
|
||||
|
||||
def insert_processing_data(
|
||||
text_content: str,
|
||||
summary: dict,
|
||||
response: Optional[OpenRouterResponse],
|
||||
args: argparse.Namespace,
|
||||
processing_id: str,
|
||||
use_mockup: bool,
|
||||
cv_collection,
|
||||
) -> None:
|
||||
def insert_processing_data(text_content: str, summary: dict, response: Any, args: argparse.Namespace, processing_id: str, use_mockup: bool, cv_collection) -> float:
|
||||
"""Insert processing data into MongoDB."""
|
||||
if use_mockup:
|
||||
logger.debug("Using mockup; skipping MongoDB insertion.")
|
||||
return
|
||||
logger.debug("Inserting processing data into MongoDB.")
|
||||
cost = 0.0 # Initialize cost to 0.0
|
||||
if not use_mockup:
|
||||
if response and response.choices:
|
||||
message_content = response.choices[0].message.content
|
||||
openai_stats = {} # Initialize openai_stats
|
||||
try:
|
||||
# Attempt to decode JSON, handling potential decode errors
|
||||
openai_stats_content = json.loads(message_content.encode('utf-8').decode('unicode_escape'))
|
||||
openai_stats = openai_stats_content.get("openai_stats", {})
|
||||
cost = openai_stats.get("cost", 0.0)
|
||||
except json.JSONDecodeError as e:
|
||||
logger.error(f"JSONDecodeError in message_content: {e}", exc_info=True)
|
||||
cost = 0.0
|
||||
except AttributeError as e:
|
||||
logger.error(f"AttributeError accessing openai_stats: {e}", exc_info=True)
|
||||
cost = 0.0
|
||||
except Exception as e:
|
||||
logger.error(f"Unexpected error extracting cost: {e}", exc_info=True)
|
||||
cost = 0.0
|
||||
|
||||
logger.debug("Preparing processing data for MongoDB insertion.")
|
||||
|
||||
# Initialize default values
|
||||
usage_data = {
|
||||
"input_tokens": 0,
|
||||
"output_tokens": 0,
|
||||
"total_tokens": 0
|
||||
}
|
||||
|
||||
# Extract usage data if available
|
||||
if response and response.usage:
|
||||
usage_data = {
|
||||
"input_tokens": response.usage.get("prompt_tokens", 0),
|
||||
"output_tokens": response.usage.get("completion_tokens", 0),
|
||||
"total_tokens": response.usage.get("total_tokens", 0)
|
||||
except AttributeError as e:
|
||||
logger.error(f"AttributeError when accessing openai_stats or cost: {e}", exc_info=True)
|
||||
cost = 0.0
|
||||
|
||||
try:
|
||||
usage = response.usage
|
||||
input_tokens = usage.prompt_tokens
|
||||
output_tokens = usage.completion_tokens
|
||||
total_tokens = usage.total_tokens
|
||||
except Exception as e:
|
||||
logger.error(f"Error extracting usage data: {e}", exc_info=True)
|
||||
input_tokens = output_tokens = total_tokens = 0
|
||||
|
||||
else:
|
||||
logger.error("Invalid response format or missing usage data.")
|
||||
input_tokens = output_tokens = total_tokens = 0
|
||||
cost = 0.0
|
||||
openai_stats = {}
|
||||
usage = {}
|
||||
|
||||
|
||||
processing_data = {
|
||||
"processing_id": processing_id,
|
||||
"timestamp": datetime.now(timezone.utc).isoformat(),
|
||||
"text_content": text_content,
|
||||
"summary": summary,
|
||||
"usage_prompt_tokens": input_tokens, # Renamed to avoid collision
|
||||
"usage_completion_tokens": output_tokens, # Renamed to avoid collision
|
||||
"usage_total_tokens": total_tokens, # Renamed to avoid collision
|
||||
"cost": cost
|
||||
}
|
||||
|
||||
# Prepare processing data
|
||||
processing_data = {
|
||||
"processing_id": processing_id,
|
||||
"timestamp": datetime.now(timezone.utc).isoformat(),
|
||||
"text_content": text_content,
|
||||
"summary": summary,
|
||||
"model": response.model if response else None,
|
||||
**usage_data,
|
||||
"raw_response": response.raw_response if response else None
|
||||
}
|
||||
|
||||
# Insert into MongoDB
|
||||
try:
|
||||
cv_collection.insert_one(processing_data)
|
||||
logger.debug(f"Successfully inserted processing data for ID: {processing_id}")
|
||||
logger.debug(f"Token usage - Input: {usage_data['input_tokens']}, "
|
||||
f"Output: {usage_data['output_tokens']}, "
|
||||
f"Total: {usage_data['total_tokens']}")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to insert processing data into MongoDB: {e}", exc_info=True)
|
||||
|
||||
try:
|
||||
cv_collection.insert_one(processing_data)
|
||||
logger.debug(f"Inserted processing data for ID: {processing_id}")
|
||||
return cost # Return the cost
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to insert processing data into MongoDB: {e}", exc_info=True)
|
||||
else:
|
||||
logger.debug("Using mockup; skipping MongoDB insertion.")
|
||||
return cost # Return 0 for mockup mode
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
||||
Binary file not shown.
174
my-app/utils/tests/test_resume_analysis.py
Normal file
174
my-app/utils/tests/test_resume_analysis.py
Normal file
@ -0,0 +1,174 @@
|
||||
import os
|
||||
import sys
|
||||
import pytest
|
||||
from unittest.mock import patch, MagicMock
|
||||
import json
|
||||
import logging
|
||||
import argparse # Import argparse
|
||||
from dotenv import load_dotenv
|
||||
|
||||
# Add the project root to the sys path to allow imports from the main package
|
||||
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
|
||||
|
||||
from resume_analysis import (
|
||||
call_openai_api,
|
||||
insert_processing_data,
|
||||
load_mockup_response,
|
||||
main,
|
||||
get_mongo_collection
|
||||
)
|
||||
|
||||
# Load environment variables for testing
|
||||
load_dotenv()
|
||||
|
||||
# Constants for Mocking
|
||||
MOCKUP_FILE_PATH = os.path.join(os.path.dirname(__file__), 'mockup_response.json')
|
||||
TEST_RESUME_PATH = os.path.join(os.path.dirname(__file__), 'test_resume.txt')
|
||||
|
||||
# Create a logger
|
||||
logger = logging.getLogger(__name__)
|
||||
logger.setLevel(logging.DEBUG)
|
||||
|
||||
# Create a handler and set the formatter
|
||||
ch = logging.StreamHandler()
|
||||
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
|
||||
ch.setFormatter(formatter)
|
||||
|
||||
# Add the handler to the logger
|
||||
logger.addHandler(ch)
|
||||
|
||||
# Mockup response data
|
||||
MOCKUP_RESPONSE_DATA = {
|
||||
"id": "chatcmpl-123",
|
||||
"object": "chat.completion",
|
||||
"created": 1677652288,
|
||||
"model": "gpt-3.5-turbo-0301",
|
||||
"usage": {
|
||||
"prompt_tokens": 100,
|
||||
"completion_tokens": 200,
|
||||
"total_tokens": 300
|
||||
},
|
||||
"choices": [
|
||||
{
|
||||
"message": {
|
||||
"role": "assistant",
|
||||
"content": '{"openai_stats": {"prompt_tokens": 100, "completion_tokens": 200, "total_tokens": 300}}'
|
||||
},
|
||||
"finish_reason": "stop",
|
||||
"index": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
# Fixtures
|
||||
@pytest.fixture
|
||||
def mock_openai_response():
|
||||
mock_response = MagicMock()
|
||||
mock_response.id = "chatcmpl-123"
|
||||
mock_response.object = "chat.completion"
|
||||
mock_response.created = 1677652288
|
||||
mock_response.model = "gpt-3.5-turbo-0301"
|
||||
mock_response.usage = MagicMock(prompt_tokens=100, completion_tokens=200, total_tokens=300)
|
||||
mock_response.choices = [MagicMock(message=MagicMock(role="assistant", content='{"openai_stats": {"prompt_tokens": 100, "completion_tokens": 200, "total_tokens": 300}}'), finish_reason="stop", index=0)]
|
||||
return mock_response
|
||||
|
||||
@pytest.fixture
|
||||
def test_resume_file():
|
||||
# Create a dummy resume file for testing
|
||||
with open(TEST_RESUME_PATH, 'w') as f:
|
||||
f.write("This is a test resume.")
|
||||
yield TEST_RESUME_PATH
|
||||
os.remove(TEST_RESUME_PATH)
|
||||
|
||||
@pytest.fixture
|
||||
def mock_mongo_collection():
|
||||
# Mock MongoDB collection for testing
|
||||
class MockMongoCollection:
|
||||
def __init__(self):
|
||||
self.inserted_data = None
|
||||
|
||||
def insert_one(self, data):
|
||||
self.inserted_data = data
|
||||
|
||||
return MockMongoCollection()
|
||||
|
||||
# Unit Tests
|
||||
def test_load_mockup_response():
|
||||
# Create a mockup response file
|
||||
with open(MOCKUP_FILE_PATH, 'w') as f:
|
||||
json.dump(MOCKUP_RESPONSE_DATA, f)
|
||||
|
||||
response = load_mockup_response(MOCKUP_FILE_PATH)
|
||||
assert response == MOCKUP_RESPONSE_DATA
|
||||
os.remove(MOCKUP_FILE_PATH)
|
||||
|
||||
def test_load_mockup_response_file_not_found():
|
||||
with pytest.raises(FileNotFoundError):
|
||||
load_mockup_response("non_existent_file.json")
|
||||
|
||||
@patch("resume_analysis.openai.chat.completions.create")
|
||||
def test_call_openai_api_success(mock_openai_chat_completions_create, mock_openai_response):
|
||||
mock_openai_chat_completions_create.return_value = mock_openai_response
|
||||
response = call_openai_api("test resume text", False)
|
||||
assert response == mock_openai_response
|
||||
|
||||
@patch("resume_analysis.openai.chat.completions.create")
|
||||
def test_call_openai_api_failure(mock_openai_chat_completions_create):
|
||||
mock_openai_chat_completions_create.side_effect = Exception("API error")
|
||||
response = call_openai_api("test resume text", False)
|
||||
assert response is None
|
||||
|
||||
def test_call_openai_api_mockup_mode():
|
||||
# Create a mockup response file
|
||||
with open(MOCKUP_FILE_PATH, 'w') as f:
|
||||
json.dump(MOCKUP_RESPONSE_DATA, f)
|
||||
|
||||
response = call_openai_api("test resume text", True)
|
||||
assert response == MOCKUP_RESPONSE_DATA
|
||||
os.remove(MOCKUP_FILE_PATH)
|
||||
|
||||
def test_insert_processing_data_success(mock_openai_response, mock_mongo_collection):
|
||||
args = argparse.Namespace(file="test.pdf")
|
||||
cost = insert_processing_data("test resume text", {}, mock_openai_response, args, "test_id", False, mock_mongo_collection)
|
||||
assert mock_mongo_collection.inserted_data is not None
|
||||
assert cost == 0
|
||||
|
||||
def test_insert_processing_data_mockup_mode(mock_mongo_collection):
|
||||
args = argparse.Namespace(file="test.pdf")
|
||||
cost = insert_processing_data("test resume text", {}, MOCKUP_RESPONSE_DATA, args, "test_id", True, mock_mongo_collection)
|
||||
assert mock_mongo_collection.inserted_data is None
|
||||
assert cost == 0
|
||||
|
||||
@patch("resume_analysis.get_mongo_collection")
|
||||
def test_main_success(mock_get_mongo_collection, test_resume_file, mock_openai_response):
|
||||
mock_get_mongo_collection.return_value.insert_one.return_value = None
|
||||
with patch("resume_analysis.call_openai_api") as mock_call_openai_api:
|
||||
mock_call_openai_api.return_value = mock_openai_response
|
||||
with patch("resume_analysis.write_openai_response") as mock_write_openai_response:
|
||||
sys.argv = ["resume_analysis.py", "-f", test_resume_file]
|
||||
main()
|
||||
assert mock_call_openai_api.called
|
||||
assert mock_write_openai_response.called
|
||||
|
||||
@patch("resume_analysis.get_mongo_collection")
|
||||
def test_main_mockup_mode(mock_get_mongo_collection, test_resume_file, mock_openai_response):
|
||||
mock_get_mongo_collection.return_value.insert_one.return_value = None
|
||||
with patch("resume_analysis.call_openai_api") as mock_call_openai_api:
|
||||
mock_call_openai_api.return_value = mock_openai_response
|
||||
with patch("resume_analysis.write_openai_response") as mock_write_openai_response:
|
||||
sys.argv = ["resume_analysis.py", "-f", test_resume_file, "-m"]
|
||||
main()
|
||||
assert mock_call_openai_api.called
|
||||
assert mock_write_openai_response.called
|
||||
|
||||
def test_main_file_not_found():
|
||||
with pytest.raises(SystemExit) as pytest_wrapped_e:
|
||||
sys.argv = ["resume_analysis.py", "-f", "non_existent_file.pdf"]
|
||||
main()
|
||||
assert pytest_wrapped_e.type == SystemExit
|
||||
assert pytest_wrapped_e.value.code == 1
|
||||
|
||||
def test_get_mongo_collection():
|
||||
# Test that the function returns a valid MongoDB collection object
|
||||
collection = get_mongo_collection()
|
||||
assert collection is not None
|
||||
32
plan.md
Normal file
32
plan.md
Normal file
@ -0,0 +1,32 @@
|
||||
# Plan for Modifying resume_analysis.py
|
||||
|
||||
## Objective
|
||||
|
||||
Modify the `my-app/utils/resume_analysis.py` script to save the extracted text from a PDF file and the OpenAI response to separate text files, with filenames derived from the original PDF's basename.
|
||||
|
||||
## Steps
|
||||
|
||||
1. **Examine `resume_analysis.py`:** Read the file to understand the existing PDF processing logic and how the OpenAI response is handled.
|
||||
2. **Clarify Naming Convention:** Confirm the exact naming convention for the output files.
|
||||
3. **Implement Changes:** Modify the script to:
|
||||
* Extract the PDF's basename.
|
||||
* Save the extracted text to a file named `basename._text.txt` in the same directory as the PDF.
|
||||
* Save the OpenAI response to a file named `basename_openai.txt` in the same directory.
|
||||
4. **Test:** Ensure that the changes work correctly for different PDF files and that the output files are created with the correct content and naming.
|
||||
5. **Create a Plan File:** Create a markdown file with the plan.
|
||||
6. **Switch Mode:** Switch to code mode to implement the changes.
|
||||
|
||||
## Mermaid Diagram
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
A[Start] --> B{Examine resume_analysis.py};
|
||||
B --> C{Clarify Naming Convention};
|
||||
C --> D{Modify Script};
|
||||
D --> E{Extract PDF Basename};
|
||||
E --> F{Save Extracted Text};
|
||||
F --> G{Save OpenAI Response};
|
||||
G --> H{Test Changes};
|
||||
H --> I{Create Plan File};
|
||||
I --> J{Switch to Code Mode};
|
||||
J --> K[End];
|
||||
Loading…
x
Reference in New Issue
Block a user