76f403b3ce
Extract read_all() functions into a separate file.
2018-02-08 00:54:17 +00:00
6dffe9b848
Writing the code to go from tree to mustache dictionary.
2018-01-17 23:24:35 +00:00
fcb25ed456
WiP reworking the AST interpreter.
2018-01-13 18:16:11 +00:00
f0e7a1d136
Trying to get scraplang implemented
...
Lots of changes I made on the train and had little
time to make tidily.
Use c++17 (for std::optional)
Clean up the cmake script a bit
Get rid of unused stuff
Skeleton implementation of some classes for scraplang
2018-01-10 20:25:19 +00:00
41b0f59039
Bump version to 0.2.1b
2015-10-01 15:32:30 +02:00
bdd50d2267
Refactor xpath query into a separate function.
2015-10-01 14:18:02 +02:00
dfd0ec343e
Implement parsing of scraplang.
2015-10-01 01:32:27 +02:00
05af365c58
Move command line parsing code to a new file.
2015-09-30 01:13:48 +02:00
c69252604c
Default to static tidy-html5, but let the user configure this.
2015-09-28 23:44:11 +02:00
8e517e5de9
Parse options through boost program_options.
2015-09-28 21:48:46 +02:00
4f85fa01a9
Update libtidy and curlcpp.
2015-09-28 15:30:09 +02:00
3bfea89568
Drop tidy from the repo and import it as submodule.
2015-03-01 03:17:47 +01:00
0e077a4930
Refactoring to put html retrieval & cleaning into a separate file.
...
This version should also be capable of retrieving data from https urls.
2014-06-07 22:07:13 +02:00
cb00e484fa
Working example.
...
Invoke it with ie:
./scraper http://www.dilbert.com '//div[@class='\''STR_Image'\'']/a/img/@src'
2014-06-07 20:44:43 +02:00
aa015ddd6a
Working example.
...
Tested with:
./scraper //meta[@name]
Note that libtidy adds a meta name=generator tag.
2014-06-07 01:15:06 +02:00
e2d74fd092
Trying to use libtidy but it throws.
2014-06-06 22:22:12 +02:00
f213ce5411
First import
2014-06-06 20:24:24 +02:00