|
bdd50d2267
|
Refactor xpath query into a separate function.
|
2015-10-01 14:18:02 +02:00 |
|
|
dfd0ec343e
|
Implement parsing of scraplang.
|
2015-10-01 01:32:27 +02:00 |
|
|
05af365c58
|
Move command line parsing code to a new file.
|
2015-09-30 01:13:48 +02:00 |
|
|
c69252604c
|
Default to static tidy-html5, but let the user configure this.
|
2015-09-28 23:44:11 +02:00 |
|
|
8e517e5de9
|
Parse options through boost program_options.
|
2015-09-28 21:48:46 +02:00 |
|
|
4f85fa01a9
|
Update libtidy and curlcpp.
|
2015-09-28 15:30:09 +02:00 |
|
|
3bfea89568
|
Drop tidy from the repo and import it as submodule.
|
2015-03-01 03:17:47 +01:00 |
|
|
0e077a4930
|
Refactoring to put html retrieval & cleaning into a separate file.
This version should also be capable of retrieving data from https urls.
|
2014-06-07 22:07:13 +02:00 |
|
|
cb00e484fa
|
Working example.
Invoke it with ie:
./scraper http://www.dilbert.com '//div[@class='\''STR_Image'\'']/a/img/@src'
|
2014-06-07 20:44:43 +02:00 |
|
|
aa015ddd6a
|
Working example.
Tested with:
./scraper //meta[@name]
Note that libtidy adds a meta name=generator tag.
|
2014-06-07 01:15:06 +02:00 |
|
|
e2d74fd092
|
Trying to use libtidy but it throws.
|
2014-06-06 22:22:12 +02:00 |
|
|
f213ce5411
|
First import
|
2014-06-06 20:24:24 +02:00 |
|