41bb315b02
Store the same tree for ApplyBlocks too.
2018-01-17 01:29:46 +00:00
2fd4daf52c
Keep FromBlock data in tree form.
2018-01-16 19:50:35 +00:00
26b912d66c
Allow dots in scraplang identifiers.
2018-01-16 10:42:25 +00:00
3572803f66
Allow nesting of structs.
2018-01-16 10:36:53 +00:00
fcb25ed456
WiP reworking the AST interpreter.
2018-01-13 18:16:11 +00:00
29f8fe299e
Still trying to get the AST interpreted without much luck.
...
I may be going down the wrong path so I'm committing and
scrapping everything to try a different approach.
2018-01-13 02:03:01 +00:00
f0e7a1d136
Trying to get scraplang implemented
...
Lots of changes I made on the train and had little
time to make tidily.
Use c++17 (for std::optional)
Clean up the cmake script a bit
Get rid of unused stuff
Skeleton implementation of some classes for scraplang
2018-01-10 20:25:19 +00:00
c31d317d51
Print a notice in the --help view in debug builds.
2015-10-01 15:37:09 +02:00
41b0f59039
Bump version to 0.2.1b
2015-10-01 15:32:30 +02:00
5d0a895978
Attach GPLv3.
2015-10-01 15:31:58 +02:00
bdd50d2267
Refactor xpath query into a separate function.
2015-10-01 14:18:02 +02:00
c9db1d8ba3
Wrap the unique_ptr so that dtor is called from the cpp.
...
This make it unnecessary to include scrapast.hpp in whatever
cpp takes ownership of the unique_ptr. Without this tho,
the destructor of unique_ptr would force you to include scrapast.
2015-10-01 01:45:42 +02:00
dfd0ec343e
Implement parsing of scraplang.
2015-10-01 01:32:27 +02:00
7dfd1f4a38
Update tidy (fixes the build with tidy as a submodule)
2015-09-30 16:33:52 +02:00
bf3b85498b
Add an option to customize the user agent at runtime.
2015-09-30 01:27:28 +02:00
c947eab83f
Show some readable message when being passed an unknown option.
2015-09-30 01:14:47 +02:00
05af365c58
Move command line parsing code to a new file.
2015-09-30 01:13:48 +02:00
c304ffbbf0
Don't detect if it's a tty - only read from stdin when url is -
2015-09-29 21:04:28 +02:00
db1311839d
Check fstats instead of using isatty().
2015-09-29 17:40:01 +02:00
c69252604c
Default to static tidy-html5, but let the user configure this.
2015-09-28 23:44:11 +02:00
49aa62815a
Allow piping.
...
Atm you still need to specify some parameter for the url, even if
it's not needed. The good news is that the value doesn't have to
be a valid URL, so any string will do.
2015-09-28 23:37:42 +02:00
943e760ffd
Add dump parameters.
...
Allows to dump both raw and cleaned up html.
2015-09-28 23:24:23 +02:00
00150938dd
Fix the html cleaning code that was not really cleaning.
2015-09-28 22:59:09 +02:00
8e517e5de9
Parse options through boost program_options.
2015-09-28 21:48:46 +02:00
4f85fa01a9
Update libtidy and curlcpp.
2015-09-28 15:30:09 +02:00
44992458ac
Quick dirty fix to avoid invalid characters in scripts.
...
Note that with this change scritps are stripped away, so
you won't find any <script></script> pair in the html.
Also print some more detailed info about errors.
2015-03-01 05:03:12 +01:00
3bfea89568
Drop tidy from the repo and import it as submodule.
2015-03-01 03:17:47 +01:00
0e077a4930
Refactoring to put html retrieval & cleaning into a separate file.
...
This version should also be capable of retrieving data from https urls.
2014-06-07 22:07:13 +02:00
cb00e484fa
Working example.
...
Invoke it with ie:
./scraper http://www.dilbert.com '//div[@class='\''STR_Image'\'']/a/img/@src'
2014-06-07 20:44:43 +02:00
aa015ddd6a
Working example.
...
Tested with:
./scraper //meta[@name]
Note that libtidy adds a meta name=generator tag.
2014-06-07 01:15:06 +02:00
c9de3d3389
Updating to tidy-html5.
...
See http://w3c.github.io/tidy-html5/
and https://github.com/w3c/tidy-html5 .
2014-06-06 22:33:27 +02:00
e2d74fd092
Trying to use libtidy but it throws.
2014-06-06 22:22:12 +02:00
56f0736d1a
Move headers into tidy/ subdirectory.
2014-06-06 21:34:01 +02:00
3182e098bb
Import of libtidy with custom cmake file.
2014-06-06 21:18:30 +02:00
f213ce5411
First import
2014-06-06 20:24:24 +02:00