- Created `install_and_run.bat` for Windows installation and setup. - Created `install_and_run.sh` for Unix-based systems installation and setup. - Removed `main.py` as it is no longer needed. - Updated `requirements.txt` to specify package versions and added PyQt5. - Deleted `start.bat` as it is redundant. - Added unit tests for core functionality and scraping modes. - Implemented input validation utilities in `utils/validators.py`. - Added support for dual scraping modes in the scraper.
90 lines
2.6 KiB
Markdown
90 lines
2.6 KiB
Markdown
# EBoek.info Scraper
|
|
|
|
Een moderne PyQt5 GUI applicatie voor het scrapen van EBoek.info met dual scraping modes, real-time voortgangsmonitoring en veilige opslag van inloggegevens.
|
|
|
|
## ✨ Functies
|
|
|
|
- **Twee scraping modi**: All Comics en Latest Comics
|
|
- **Gebruiksvriendelijke GUI** met real-time voortgang
|
|
- **Veilige credential opslag** in JSON config
|
|
- **Cross-platform** ondersteuning (Windows/macOS)
|
|
- **Background threading** - GUI blijft responsief
|
|
- **Graceful cancellation** tijdens operaties
|
|
|
|
## 📋 Vereisten
|
|
|
|
- **Python 3.8+**
|
|
- **Google Chrome** browser
|
|
- **EBoek.info** account
|
|
|
|
## 🚀 Installatie
|
|
|
|
### Windows
|
|
Dubbelklik op `install_and_run.bat`
|
|
|
|
### macOS / Linux
|
|
```bash
|
|
chmod +x install_and_run.sh
|
|
./install_and_run.sh
|
|
```
|
|
|
|
### Handmatig
|
|
```bash
|
|
pip install selenium urllib3 PyQt5
|
|
python3 gui_main.py
|
|
```
|
|
|
|
## 🎯 Gebruik
|
|
|
|
1. **Start de applicatie**: `python3 gui_main.py`
|
|
2. **Voer credentials in**: Klik "Change Credentials"
|
|
3. **Kies scraping mode**: All Comics of Latest Comics
|
|
4. **Stel pagina bereik in**: Start/eind pagina
|
|
5. **Start scraping**: Klik "Start Scraping"
|
|
|
|
## 📊 Scraping Modi
|
|
|
|
### Mode 0: All Comics
|
|
- **URL patroon**: `stripverhalen-alle/page/X/`
|
|
- **Structuur**: Traditionele blog layout
|
|
- **Selecteer**: `h2.post-title a`
|
|
|
|
### Mode 1: Latest Comics
|
|
- **URL patroon**: `laatste?_page=X&ref=dw`
|
|
- **Structuur**: Grid layout met containers
|
|
- **Selecteer**: `.pt-cv-wrapper .pt-cv-ifield h5.pt-cv-title a`
|
|
|
|
## 🗂️ Project Structuur
|
|
|
|
```
|
|
├── gui_main.py # GUI applicatie entry point
|
|
├── install_and_run.bat # Windows installer
|
|
├── install_and_run.sh # macOS/Linux installer
|
|
├── requirements.txt # Dependencies
|
|
├── core/ # Scraping logic
|
|
│ ├── scraper.py # Dual-mode scraper
|
|
│ ├── scraper_thread.py # Threading wrapper
|
|
│ └── credentials.py # Config management
|
|
├── gui/ # GUI components
|
|
│ ├── main_window.py # Main interface
|
|
│ ├── login_dialog.py # Credential input
|
|
│ └── progress_dialog.py # Progress monitoring
|
|
├── tests/ # Test scripts
|
|
└── utils/ # Helper functions
|
|
```
|
|
|
|
## 🔧 Troubleshooting
|
|
|
|
**GUI start niet**: Controleer PyQt5 installatie
|
|
**Login problemen**: Test credentials via GUI
|
|
**Download issues**: Controleer `~/Downloads` folder
|
|
|
|
## 💡 Tips
|
|
|
|
- Begin met 1-2 pagina's om de functionaliteit te testen
|
|
- Gebruik headless mode voor optimale snelheid
|
|
- Monitor de voortgang in de progress dialog
|
|
|
|
---
|
|
|
|
**Veel succes met scrapen! 🚀** |