1
0
Fork 0
mirror of https://github.com/KingDuckZ/dindexer.git synced 2024-11-25 00:53:43 +00:00

Update README and add posts about dindexer to the repo.

This commit is contained in:
King_DuckZ 2016-04-30 21:16:58 +02:00
parent e636c1dc13
commit eede8e3236
4 changed files with 72 additions and 1 deletions

View file

@ -5,7 +5,7 @@
[![Build Status](https://drone.io/bitbucket.org/King_DuckZ/dindexer/status.png)](https://drone.io/bitbucket.org/King_DuckZ/dindexer/latest)
#### Latest release ###
Latest release is __0.1.4__. However there are several known problems with that release. Please use the latest version from master instead.
Latest stable release is __0.1.5b__.
#### Flattr ####
[![Flattr this git repo](http://api.flattr.com/button/flattr-badge-large.png)](https://flattr.com/submit/auto?user_id=King_DuckZ&url=https%3A%2F%2Fbitbucket.org%2FKing_DuckZ%2Fdindexer&title=dindexer&language=en_GB&tags=bitbucket&category=software)

View file

@ -0,0 +1,11 @@
# My new project: dindexer #
Its only been a few months since I began working on #dindexer, about 3 and a half. When I started the project I thought it was going to be much quicker to write. In fact I started it because I needed to figure out what files on my hard disks had already been burned onto DVDs, and thus could be safely deleted to make room for new stuff, and what needed to go through K3B first. Unfortunately Im still not at the point where I can do exactly what I needed, but Im getting closer and closer to it.
My use case is this: I have a lot of files - old videos, holiday pictures, tar archives, source code, humble bundle games. Some of those files already made their way onto some DVD and to the box in the storage room, some others still only exist on my hard disk. How can I tell them apart? Re-burning everything could be a solution (and in fact thats what Ive always done), so I can be sure everything is on at least one DVD. But the downside is that my already large collection of backup would grow much faster. In the end, duplicate files only add to confusion.
Ideally, I should be able to simply put everything in a directory (or maybe in a K3B savefile), point some tool to that directory and have it filter out things that already exist on some DVD. Thats the idea behind dindexer.
dindexer is a #c++ command-line #tool for #linux systems. There is no GUI yet, although Im hoping to have one at some point.
Ill be posting updates to my project as the development goes forward, so stay tuned! In the meantime please try out the latest version of [dindexer on bitbucket](https://bitbucket.org/King_DuckZ/dindexer) and let me hear your feedback!

17
docs/posts/02_untitled.md Normal file
View file

@ -0,0 +1,17 @@
The first piece of functionality I implemented in #dindexer is the part that looks for files and directories and calculates the hash. Thats indeed the heart of the project, and almost everything else is built around that central idea. It all happenes in the `hash_dir()` function. And “all” is definitely too much. Let me explain: in spite of the name, many things are done in that function, such as traversing the directory tree (literally the logic) and getting the mime type of files. Thats no surprise, since the project is very young and its still moving away from being a prototype. During the past months Ive been adding new values to the DB and new functionalities to the program, and more than once that central function was the quick way to add the new things. Now Im at the point where dindexer as a concept is working for me, and so Im planning to keep the development going and add new features (which I will discuss here in the future).
This looks like the right moment to refactor that code, and I had to think on how best to do it. I should mention at this point that Im considering the possibility to use more than one hashing algorithm for each item being indexed in order to minimize the collision probability, but this is a larger topic and Im not so sure about the whole idea anymore, so lets leave this discussion for another time. Anyways, with that in mind my first idea was to implement the different operations that `hash_dir()` is currently doing as jobs, in order to take advantage of multi-core systems. Running several hashing algorithms in parallel sounded like a good idea, but a friend of mine talked me out of that, so its going to stay single thread and single hash for now.
The next idea I had was to split the disk scanning process into tasks, and have a manager executing whatever tasks you registered with it. Thats very convenient because it will trim a lot of crap out of `main()` and will also let me easily add or remove jobs from the manager (for example in case there will be command line switches that enable or disable parts of the scanning process). My very first approach was just as described: manager + base task class with `run_task()` virtual method.
Still, there is a bit of dependency management involved. For example hashing files needs me to have a list of files to go through in the first place, and detecting the content type of a disk needs both that same list plus the media type. And those are hard dependencies, so its not like you can just skip one task and still expect everything to work. My way of keeping the task-based approach and still have some way to enforce compulsory dependencies is to give up on the task manager and the common base class approach and just have a completely different class for each task. This turns out to be very convenient since each task is producing something different (a list of files, an enum, a list of hashes...), and I can have each task require the tasks it depends on at construction time, so I get build errors if some key dependency is missing. How about the manager class? Thats also not needed anymore. Once Ive instantiated the last object in the task chain, the one that returns the full list of data to be sent to the DB, Ill just have to call its `get_or_create()` method and it will go up and collect all the bits and pieces it needs to do its own part.
Thats still not the entire story: some parts of the new tasks could still benefit from being put into a common base class, and I still need to be able to swap tasks for unit testing. For example lets say I want to test the content-detection functions, which depend on having a list of files and a given media type. For the sake of clearness, lets say you want to test if video DVDs are being detected fine. You will need a list of files containing a VIDEO_TS and AUDIO_TS directory, plus some VOB, IFO etc. And you need the media type to be a DVD. The base class I came up with is templated over the return type of its `get_or_create()` method, so by declaring the constructor in scantask::ContentType as `ContentType ( Base<FileList>&, Base<MediaTypes>& )` I leave the way open to replacing the tasks above in the dependency tree.
At this point the changes Im working on are on a separate branch. Feel free to look at the *hashdir_refactoring* branch if you want to see the work-in-progress!
As usual, you can find [dindexer on bitbucket.org](https://bitbucket.org/King_DuckZ/dindexer).
\#opensource #linux #dindexer #cpp

View file

@ -0,0 +1,43 @@
# v0.1.5b released #
## Release notes ##
It's been a while since the last version number change, and as you might already know from reading the README file v0.1.4b had a major flaw and it shouldn't be used at all. I've been exceptionally busy between that release and now, thus the long wait. dindexer however has not been put aside during this time! Here is what has changed in this release:
*Features and usability*
* Add new "navigate" command
* Fix behaviour of main dindexer when no actions are found
* Add more builtin info that can be viewed with `dindexer -b`
* Identify the type of data on a disk (eg: video dvd)
* Allow searching by hash in `locate`
* Add scripts and functionality to the code to enable bash autocompletion for actions
* Let dindexer give you a hint if you mis-spell an action
* Main dindexer program now understands the `--version` option
*Bug fixes*
* Fix wrong hash for directories
* Fix scan hanging after listing directories
* Fix mimetype retrieval
* Buildfix on ARM 32 and 64 bit
*Code improvements*
* Cleanup for building on gcc 5 and clang
* "install" and "test" targets in cmake
* Add some unit tests
* Scan actions are now done in tasks that can be composed together
* Many improvements in the code
* Improve the build system (cmake scripts)
## Final notes ##
As you might guess from the "b" in the version number this is still an early release, and there is still a lot more I want to do.
One of the things I should take care of as soon as possible is coming up with a nicer name for this project. Many people seem to think of dindexer as a realtime file indexing program (like Baloo for example), and I believe this is due to the *indexer* part in the name. I'd like to address this before the project goes on too much, so any suggestions are welcome! Feel free to give me any suggestions either here or send a private message on IRC freenode to **King_DuckZ**.
It is also very likely that the main repository will be moved to github, but you can count on the bitbucket page to be updated regularly for the foreseeable future.
Happy scanning with the new release, and let me have your comments and ideas! :)
\#dindexer #opensource #linux #cpp