duckscraper

Author	SHA1	Message	Date
King_DuckZ	bdd50d2267	Refactor xpath query into a separate function.	2015-10-01 14:18:02 +02:00
King_DuckZ	dfd0ec343e	Implement parsing of scraplang.	2015-10-01 01:32:27 +02:00
King_DuckZ	05af365c58	Move command line parsing code to a new file.	2015-09-30 01:13:48 +02:00
King_DuckZ	c69252604c	Default to static tidy-html5, but let the user configure this.	2015-09-28 23:44:11 +02:00
King_DuckZ	8e517e5de9	Parse options through boost program_options.	2015-09-28 21:48:46 +02:00
King_DuckZ	4f85fa01a9	Update libtidy and curlcpp.	2015-09-28 15:30:09 +02:00
King_DuckZ	3bfea89568	Drop tidy from the repo and import it as submodule.	2015-03-01 03:17:47 +01:00
King_DuckZ	0e077a4930	Refactoring to put html retrieval & cleaning into a separate file. This version should also be capable of retrieving data from https urls.	2014-06-07 22:07:13 +02:00
King_DuckZ	cb00e484fa	Working example. Invoke it with ie: ./scraper http://www.dilbert.com '//div[@class='\''STR_Image'\'']/a/img/@src'	2014-06-07 20:44:43 +02:00
King_DuckZ	aa015ddd6a	Working example. Tested with: ./scraper //meta[@name] Note that libtidy adds a meta name=generator tag.	2014-06-07 01:15:06 +02:00
King_DuckZ	e2d74fd092	Trying to use libtidy but it throws.	2014-06-06 22:22:12 +02:00
King_DuckZ	f213ce5411	First import	2014-06-06 20:24:24 +02:00