Queue is NZB's or par-sets?

Discuss newly added features or request new features.
Yil
Posts: 49
Joined: 26 May 2014, 02:44

Queue is NZB's or par-sets?

Post by Yil » 25 Jun 2014, 13:29

There are some real benefits to having NZBGet understand which files belongs to what par set. I may be able to work around not knowing this until late in the game, but it would be much more useful to know it ahead of time. If I pre-process a NZB to sort it in par-sets one side effect of that would be the splitting back into the original parts of a combined/merged NZB. Given that we have such info should it be user visible by default?

Consider the case of a set of 13 episodes each with subs and a sample for a total of 39 par-sets all in one NZB. Should it be 1, 13, or 39 entries on the queue? I don't know if it's possible but maybe it could even show up as 1 entry but you could click a "+" sign or something and have it split apart to show up as sub-tasks of a NZB. Or show up as 1 entry until you hit a new "split" button in which case it replaces the 1 NZB with 39. I suppose you could click them all and hit merge to go back to just 1 if you wanted. I'm still pretty new to NZBGet, is there any actual benefit to merging nzb's besides taking up less space in the queue? I believe I tried adding pars that were in a separate nzb to one that just had files and just the .par2 and it didn't use the extra pars to fix it. Did I miss a step or something? I even added the files paused and then merged before I started downloading.

Of course knowing what files are in what par ahead of time does offer some benefits. You could skip downloading stuff you didn't need to repair things you did want downloaded, i.e. skip the "sample" and it's pars since the sample file itself wasn't referenced except in a par that only recovered it... The health of the download would be better known because knowing the block size when you get an article not found or corrupted article allows NZBGet to determine how many blocks are effected, etc. History could be updated to include the individual titles of the episodes.

In the 39 in 1 case if you get a failure of a particular par-set how is that best handled? Right now what gets deleted? How would you easily determine which one failed? Split out failed into separate history/failure messages and leave the good as a single entry? Lots of things to consider and I'm curious what people think should happen.

prinz2311
Posts: 466
Joined: 08 Dec 2012, 00:03

Re: Queue is NZB's or par-sets?

Post by prinz2311 » 25 Jun 2014, 13:45

I currently split and rename such nzb's manually (in Alt.binz) before adding them to nzbget.

hugbug
Developer & Admin
Posts: 7645
Joined: 09 Sep 2008, 11:58
Location: Germany

Re: Queue is NZB's or par-sets?

Post by hugbug » 25 Jun 2014, 14:10

  1. If any of par-sets fail the nzb-job is marked as failure and no files are extracted;
  2. It's generally impossible (or too hard) to determine if par-sets belong together (video, sample and subtitles) or are independent (separate episodes);
  3. Season packs are sometimes obfuscated in a way making impossible to determine how files should be grouped, when all episodes has the same file name. That's a really hard case which NZBGet can successfully handle assuming all par-sets are repairable;
  4. There is a split button in NZBGet to manually split nzb into several jobs;
  5. Because of 1) it's better to split nzbs with multiple par-sets if individual par-sets have separate value for user;
  6. AFAIK SAB has a feature to automatically split enqueued nzbs if it detects mutiple par-sets. A nice feature but see 2). But at least it could work in most common cases.

prinz2311
Posts: 466
Joined: 08 Dec 2012, 00:03

Re: Queue is NZB's or par-sets?

Post by prinz2311 » 25 Jun 2014, 14:23

Best would have been if the nzb specs would have been extended to include subsets in one nzb. So that such Season nzb's would have already logical sub par2 sets. This would also helped if someone only needed one or more specific episodes instead of the complete season. But that is more or less of topic here...

hugbug
Developer & Admin
Posts: 7645
Joined: 09 Sep 2008, 11:58
Location: Germany

Re: Queue is NZB's or par-sets?

Post by hugbug » 25 Jun 2014, 17:41

Nzb-files are usually created automatically by analyzing article headers. The fact multiple par-sets were packed into one nzb-file means the analyzing tool (indexer) has treated them as one post and was not able to split it.

prinz2311
Posts: 466
Joined: 08 Dec 2012, 00:03

Re: Queue is NZB's or par-sets?

Post by prinz2311 » 25 Jun 2014, 17:55

Most are posted on IRC with a bot to download them. They are created by the poster via a posting script, so they could be created the way I said. Some Indexers use these nzb's from IRC...

Yil
Posts: 49
Joined: 26 May 2014, 02:44

Re: Queue is NZB's or par-sets?

Post by Yil » 26 Jun 2014, 13:30

I thought about this a while and came up with something I think should work pretty well.
  1. Split by poster and treat each separately. I don't believe any NZB indexer would automatically group a bunch of NZBs together if they were posted by different people. The only way this would happen is if a user manually selected several things to download and got a combined NZB back or if they used the merge NZB feature in NZBGet, either way it's almost surely separate things.
  2. Parse each file into basename and extension where the last can be empty. Note the sequence number if one is available.
  3. Sort by basename. For each basename is there a single .par2 file and only one set of vol#+#.par2 files or no volume pars? If so then this is almost surely a single entity and we remove these files into it a new NZB using basename as the new name. As a safety check we could examine the sequence numbers, if available, and see if the max number found is roughly the number of files present and that we don't have too many duplicates. Double check the timestamps on the posts to see if they are too far apart (treating files and pars separately). If the double-checks fail still create the NZB, but pause the duplicates whose timestamps/groups don't match the par-set.
  4. Split by groups posted to and treat each separately. Again, it seems very unlikely that some files from a par-set would be explicitly uploaded to one set of groups and the rest to another and we just handled the easy case where the pars were posted separately from the files but they had the same basename.
  5. Focus on par-sets and group them based upon .par2 and vol#+#.par2 groups. Is anybody renaming the volume par files? I don't think anyone does that because any damage to the single .par2 file would invalidate the whole thing for most people (you could check the files individually to find one that belonged to the par-set and kick-start recovery that way though).
  6. If there are no par-sets we are done and this can be handled however a no-par download normally would be.
  7. Temporarily give up. Tag a note onto entry that auto-split is delayed until downloading starts if we're totally paused or can't connect, or that we are processing and please wait.
  8. Go grab one .par2 file from each set as a high-priority download (trying .par2, then smallest volume pars until a valid entry found - only test the first part of files to avoid large downloads which do us no good at this point.
  9. Parse .par2 file and see if the filenames in the list match the names of files left in NZB. If we can match all the files then it's a set and we create a new NZB using the basename of the majority of files in the par (it's split files), or the original name of the NZB we are splitting with a -<num> appended to it (random collection of files - like say an ebook collection).
  10. Files were renamed after par-creation. We have no choice but to download files one by one and figure out who they belong to. When a complete set of files is found we spin that off as a new NZB and move it into a new directory so it can start post-processing.
  11. Leave whatever is leftover in the original NZB and pray it's complete.
The first 3 rules probably handle almost all the cases and the tougher processing should hopefully be unnecessary.

Did I miss something?

Edit update: Not sure how to handle the case of the pars being posted by someone else. Perhaps run rule #3 immediately with a strict rule about failing on timestamp/duplicate detection and then run the rest of the rules including #3 again but with the permissive feature allowing mismatches.
Last edited by Yil on 26 Jun 2014, 13:44, edited 1 time in total.

Yil
Posts: 49
Joined: 26 May 2014, 02:44

Re: Queue is NZB's or par-sets?

Post by Yil » 26 Jun 2014, 13:34

Actually hugbug, nzbclub tries to be helpful and notices a bunch of posts all together have similar enough names and puts them together. Works rather well for a season of TV, but it means it's a big NZB with tons of par-sets in it.

prinz2311
Posts: 466
Joined: 08 Dec 2012, 00:03

Re: Queue is NZB's or par-sets?

Post by prinz2311 » 27 Jun 2014, 11:04

There is one problem:

With many obfuscated season packs multiple episodes (par-sets) have files with the exact same filename. So you can't know to which par-set they belong. (A human can in most cases from the Message Titles, not filenames)

Yil
Posts: 49
Joined: 26 May 2014, 02:44

Re: Queue is NZB's or par-sets?

Post by Yil » 27 Jun 2014, 13:54

Each episode uses the same basename for the the files? That is obviously a really bad situation to be in. How does NZBGet handle that right now? I think I remember .duplicate added to names so I presume it can download all the parts with lots of dups and then post-processing could rename them assuming the pars actually have unique names. If the pars also uses the same name for everything I think things get really messed up right now though... In the cases where you know you are dealing with this style of obfuscated name it's probably best to not create merged NZBs in the first place :)

The good news is the above rules should handle even this case. Rule #3 will recognize there are too many files with the same name and reject splitting. This can happen if you managed to create an NZB with duplicates (think original + repost) or are just unlucky and have non-unique names for everything. So you end up delaying the split until you get the pars and start downloading. You just grab the files in chronological order (since they are in theory from the same poster to the same groups it's not unreasonable to hope they are in sequence) and use the crc/md5 hashes on blocks in the file to place them with the appropriate par. Once you have a complete set of files you split that off and handle it separately. Since the name is known to not be unique you could choose to peek inside the rar and see what's there and use that as the name or append that to the basename of the files to make it easier to recognize what is what. Slightly complicating things is the case where the par-sets use different block sizes. It's not a big deal but you'll need to compute crc's carefully to prevent extra work especially in the case where the first block in the file is corrupt and later blocks get even more staggered.

One tricky situation I came up with is when someone has a NZB for something with pars but it also includes a second set of pars for the same thing without any files. We can handle this case once we get the pars into par-set groups and have one .par2 from each downloaded. Just examine them for files that exactly match based upon the expected md5 of the files in the par2. Using this idea of comparing md5 file hashes from the par2 to handle other tricky situations such as reposts merged with original post, etc. All we need to do is insert some post-processing to figure out what is going on after we have gotten our hands on a par2 from each set. We'd still have to wait until same-named files are actually downloaded though to verify what we expect.

Another problem to handle that isn't exactly related to this is the stupid par2 set that includes the par's themselves as part of the set. I presume people who managed to do this somehow create a par2 set and then create another par2 on top of that. Bleck. This should be easy to spot during par2 processing by looking for .par2 files listed in the set itself. In that case reject the .par2 and use the header from one of the vol#+#.par2 files instead which won't have that issue.

Post Reply

Who is online

Users browsing this forum: No registered users and 18 guests