par2 check during download

Get help, report and discuss bugs.
nobody

par2 check during download

Post by nobody » 05 Apr 2008, 12:09

Would it be possible to download the small par2 file first and use that to verify
every file directly after it is downloaded? It takes a very long time and disk load
to par2 verify all files after everything has been downloaded (especially for
large stuff like over 8gb). I think that the file might still be in buffers too
somewhere shortly after the download and the par2 check will actually be much faster
anyway. When there are data blocks missing in the files nzbget could instantly
un-pause the paused par2 files (at the end of the queue), or at least keep track
of the amount of missing data blocks.

hugbug

RE: par2 check during download

Post by hugbug » 05 Apr 2008, 13:59

Libpar (and par2cmdline) has no ability to save/reload par-check-state (like quickpar does). If we want to hold the state, we must keep parrepair-object alive (this is what nzbget does during waiting for extra par-blocks). If the download queue was rearranged (for example if new download was added to the top of queue), we need either to create another parrepair-object for a new nzb-collection (and allow the first object to keep used memory) or stop the first object and lost par-check-state.
This makes the "on the fly"-thing not easy, but if we make it adjustable (something like an option for "max keeped objects") it might work.

With libpar we cannot verify just one file. Parrepair-object takes par-file as parameter and verifies all files belonging to a par-set it coud find in a directory. Then it can verify extra files we pass to it. That's why we cannot use approach like "create parrepair-object, verify one downloaded file, destroy parrepair-object" or use one parrepair-object for files from different par-sets. We must keep one parrepair-object for each par-set.
In this case we do not need to start verify again, if our parrepair-object for this nzb-collection (actually for the par-set) was keeped alive.

The problem occurs only, if files from different nzb-collections (or par-sets from one collection) in download queue are mixed (remember, nzbget allows to control (I mean "move") individual files in download queue, not just nzb-collections). If we want to verify a file, which does not belong to existed parrepair-object, we need to create a new parrepair-object or we can cancel verify-process in existed parrepair-object (and lose all verify-info for this par-set).

But if we suppose, that the download queue is not edited often,we could achieve good results with a limited number of parrepair-objects (we need to be able to configure the max count).

--------------------------------

Summary: this is possible but requires a significant rework of postprocessor-module. I will keep it in TODO-list.

nobody

RE: par2 check during download

Post by nobody » 05 Apr 2008, 14:43

Thanks for the response.

I didn't know that libpar had such a limitation. I often use 'cfv' to check
individual files using the checksums in a par2 archive, but I guess you can't use
that since its just a python script.

As a workaround I messed around a little with a post processing script. It checks if
downdir has a _brokenlog.txt in it and runs a quick 'cfv -m' which checks for missing
files but also reports if the file size doesn't match. If needed it then un-pauses
some PAR2 files using nzbget --edit U <ids>. I am not completely sure if there is much
need to do a full crc check if the yDecode didn't report any errors? Doesn't yEnc
store checksums as well? Anyway, after un-pausing those extra par files from the
post processing script, the script exits, but nzbget does not execute the script again
after it finishes downloading the extra pars, so neither nzbget nor the post processing
script actually repairs anything now. Is there a way to get nzbget to execute the post
script again after it finished getting the extra pars?

- Emiel

hugbug

RE: par2 check during download

Post by hugbug » 08 Apr 2008, 11:55

1.
>Is there a way to get nzbget to execute the post
>script again after it finished getting the extra pars?

nzbget does not execute the script again on purpose - to prevent the script from messing the things up :)

I can add an option to allow this.


2. As for using checksums from par2 it is an interesting idea. I need to investigate the possibilities (how to do this without libpar2).

hugbug

RE: par2 check during download

Post by hugbug » 08 Apr 2008, 11:56

>I am not completely sure if there is much need to do a full
>crc check if the yDecode didn't report any errors?
>Doesn't yEnc store checksums as well?

Checksum is optional in yEnc. In most cases it is provided, but not always.

nobody

RE: par2 check during download

Post by nobody » 11 Apr 2008, 23:53

Ok, if I understand this properly, nzbget verifies the files in a collection after download, and then downloads extra pars without having to redo the verification bit. This means there is no speed advantage in knowing how many extra pars we need to download upfront, right? So for broken releases, the speed is already good, but for unbroken releases it's not. So there is no real need to calculate the exact number of missing blocks and a simple checksum check will be good enough. If you can keep track of the number of successfully verified files and this number is equal to the number of files in the par2 archive, you can then skip the final verification step and go straight to the post-script. I quickly wrote a par2 parser that you can use exactly for this purpose in nzbget - if you want to.

par2chk.h / par2chk.cpp, the interface is this:
int par2chk_verify(char *parfile, char *diskfile); // return 1 if OK
int par2chk_numfiles(char *parfile); // return number of files in par2

It uses the par2cmdline stuff for the MD5 checksum calculation (turns out that is twice as fast as the GNULIB version on my pc).

I uploaded the thing to http://apollo.spacelabs.nl/~emiel/par2chk.tar.gz and you can use it however you want, or not at all.. but it would be nice if you can make nzbget even faster :) oh I didn't implement the par2-self check to check if the par2 file itself is good.. you can probably add that if you want to use it.

hugbug

RE: par2 check during download

Post by hugbug » 12 Apr 2008, 08:50

Brilliant. I will use your code (already saved to disk).

Have you made speed tests? Is the calculation of MD5 faster than the verifying of the same file by par2cmdline?

For broken releases the files will be verified two times: once by par2chk and then by libpar2. This does not make the process slower, but increases CPU usage. I think it is not a problem for fast desktop PCs, but only for slow NAS and routers.
Anyway for slow CPUs the option "par2chk" can be optional.

nobody

RE: par2 check during download

Post by nobody » 12 Apr 2008, 19:35

Yeah, the best method would be to do what you described in your first reply in this thread... but since there are some issues with implementing that this will be a nice feature for some of us for now. I added the par2 self check, I fixed a possible problem with filename matching and I removed a few mem leaks. You can get the updated version in the same place as before.

About the speed test, the main advantage will be that the files will be verified in a separate thread during the download instead of afterwards, and it uses the libpar2/par2cmdline code for the checksum calculation. I don't really know how to verify a single file by par2cmdline, but I can delete all other rars and par2s and test that way. (~30 MByte file)

> time ./test "Fruits Basket.par2" "Fruits Basket.part001.rar"
file 'Fruits Basket.part001.rar' crc OK
./test "Fruits Basket.par2" "Fruits Basket.part001.rar" 0.15s user 0.01s system 98% cpu 0.167 total

> time par2 v -qq "Fruits Basket.par2"
par2 v -qq "Fruits Basket.par2" 0.86s user 0.01s system 99% cpu 0.875 total

> time cfv "Fruits Basket.part001.rar"
Fruits Basket.par2: 1 files, 1 OK. 0.121 seconds, 242847.3K/s
cfv "Fruits Basket.part001.rar" 0.14s user 0.02s system 98% cpu 0.167 total

Anyway, don't forget to get the updated version.

dalrun

RE: par2 check during download

Post by dalrun » 22 Jun 2008, 02:09

>> Is there a way to get nzbget to execute the post
>> script again after it finished getting the extra pars?

> I can add an option to allow this.

I'd be very interested in this (a new edit switch?) or a command that returns success after the additional pars have been downloaded (dummy me thinking the Edit command would return the Collection...downloaded line). My sleeping for blocks x 30sec solution is pretty lame.

> We must keep one parrepair-object for each par-set.
> In this case we do not need to start verify again

Are you saying that you can download pars after starting the check and that the new files will be included in the check? I've been restarting the check because I thought that par2 only looked for files at start-up.

hugbug

RE: par2 check during download

Post by hugbug » 22 Jun 2008, 11:18

>> I can add an option to allow this.

>I'd be very interested in this (a new edit switch?)

I meant a config-file-option, that would disable the existing "was the postprocessing already done for this nzb-file?"-check, performed after the last unpaused file (of some nzb-file) is completed.


>Are you saying that you can download pars after starting the check
>and that the new files will be included in the check?

Yes, it works that way in nzbget's internal par-checker.


>I've been restarting the check because I thought that par2
>only looked for files at start-up.

External par2-utility (par2cmdline) does not have an ability to wait for additional par2-files. It just exits with error code if there are no enough par-blocks. So it needs to be restarted after new par-blocks were downloaded.

That's why the nzbget's internal parchecker is better than doing parcheck from a postprocess-script.

Post Reply

Who is online

Users browsing this forum: Google [Bot] and 64 guests