Commit graph

8 commits

Author SHA1 Message Date
8e517e5de9 Parse options through boost program_options. 2015-09-28 21:48:46 +02:00
4f85fa01a9 Update libtidy and curlcpp. 2015-09-28 15:30:09 +02:00
3bfea89568 Drop tidy from the repo and import it as submodule. 2015-03-01 03:17:47 +01:00
0e077a4930 Refactoring to put html retrieval & cleaning into a separate file.
This version should also be capable of retrieving data from https urls.
2014-06-07 22:07:13 +02:00
cb00e484fa Working example.
Invoke it with ie:
./scraper http://www.dilbert.com '//div[@class='\''STR_Image'\'']/a/img/@src'
2014-06-07 20:44:43 +02:00
aa015ddd6a Working example.
Tested with:
./scraper //meta[@name]
Note that libtidy adds a meta name=generator tag.
2014-06-07 01:15:06 +02:00
e2d74fd092 Trying to use libtidy but it throws. 2014-06-06 22:22:12 +02:00
f213ce5411 First import 2014-06-06 20:24:24 +02:00