Files
eboek.info-scraper/README.md
Louis Mylle ea4cab15c3 feat: Add installation scripts for Windows and Unix-based systems
- Created `install_and_run.bat` for Windows installation and setup.
- Created `install_and_run.sh` for Unix-based systems installation and setup.
- Removed `main.py` as it is no longer needed.
- Updated `requirements.txt` to specify package versions and added PyQt5.
- Deleted `start.bat` as it is redundant.
- Added unit tests for core functionality and scraping modes.
- Implemented input validation utilities in `utils/validators.py`.
- Added support for dual scraping modes in the scraper.
2026-01-10 14:45:00 +01:00

90 lines
2.6 KiB
Markdown

# EBoek.info Scraper
Een moderne PyQt5 GUI applicatie voor het scrapen van EBoek.info met dual scraping modes, real-time voortgangsmonitoring en veilige opslag van inloggegevens.
## ✨ Functies
- **Twee scraping modi**: All Comics en Latest Comics
- **Gebruiksvriendelijke GUI** met real-time voortgang
- **Veilige credential opslag** in JSON config
- **Cross-platform** ondersteuning (Windows/macOS)
- **Background threading** - GUI blijft responsief
- **Graceful cancellation** tijdens operaties
## 📋 Vereisten
- **Python 3.8+**
- **Google Chrome** browser
- **EBoek.info** account
## 🚀 Installatie
### Windows
Dubbelklik op `install_and_run.bat`
### macOS / Linux
```bash
chmod +x install_and_run.sh
./install_and_run.sh
```
### Handmatig
```bash
pip install selenium urllib3 PyQt5
python3 gui_main.py
```
## 🎯 Gebruik
1. **Start de applicatie**: `python3 gui_main.py`
2. **Voer credentials in**: Klik "Change Credentials"
3. **Kies scraping mode**: All Comics of Latest Comics
4. **Stel pagina bereik in**: Start/eind pagina
5. **Start scraping**: Klik "Start Scraping"
## 📊 Scraping Modi
### Mode 0: All Comics
- **URL patroon**: `stripverhalen-alle/page/X/`
- **Structuur**: Traditionele blog layout
- **Selecteer**: `h2.post-title a`
### Mode 1: Latest Comics
- **URL patroon**: `laatste?_page=X&ref=dw`
- **Structuur**: Grid layout met containers
- **Selecteer**: `.pt-cv-wrapper .pt-cv-ifield h5.pt-cv-title a`
## 🗂️ Project Structuur
```
├── gui_main.py # GUI applicatie entry point
├── install_and_run.bat # Windows installer
├── install_and_run.sh # macOS/Linux installer
├── requirements.txt # Dependencies
├── core/ # Scraping logic
│ ├── scraper.py # Dual-mode scraper
│ ├── scraper_thread.py # Threading wrapper
│ └── credentials.py # Config management
├── gui/ # GUI components
│ ├── main_window.py # Main interface
│ ├── login_dialog.py # Credential input
│ └── progress_dialog.py # Progress monitoring
├── tests/ # Test scripts
└── utils/ # Helper functions
```
## 🔧 Troubleshooting
**GUI start niet**: Controleer PyQt5 installatie
**Login problemen**: Test credentials via GUI
**Download issues**: Controleer `~/Downloads` folder
## 💡 Tips
- Begin met 1-2 pagina's om de functionaliteit te testen
- Gebruik headless mode voor optimale snelheid
- Monitor de voortgang in de progress dialog
---
**Veel succes met scrapen! 🚀**