Page 1 of 1

Duplicate file names

Posted: 23 Oct 2018, 10:23
by quantum
With two rss feeds that sometimes return duplicate file names except for the capitalisation - how would one de-duplicate these?
Thank you for a great product and thanks in advanced!

Re: Duplicate file names

Posted: 24 Oct 2018, 11:04
by hugbug
Letter case should be ignored (since v20). Are you sure there are no other differences? For proper duplicate handling duplicate keys should be used instead of filenames. Please see RSS documentation for details - https://nzbget.net/rss.

Re: Duplicate file names

Posted: 26 Oct 2018, 12:20
by quantum
hugbug wrote:
24 Oct 2018, 11:04
Letter case should be ignored (since v20). Are you sure there are no other differences? For proper duplicate handling duplicate keys should be used instead of filenames. Please see RSS documentation for details - https://nzbget.net/rss.
Thanks I have read the RSS doco. The files have no key other than the file names. The names look identical except for capitalisation. How can I add the entire file name in upper case as a key?

Re: Duplicate file names

Posted: 26 Oct 2018, 16:12
by hugbug
You don't have to. The titles (filenames) are compared case insensitive. Why do you think there is a problem? Are both nzbs downloaded? Only one should be downloaded and the other one should remain in history as duplicate.

Re: Duplicate file names

Posted: 29 Oct 2018, 09:43
by quantum
hugbug wrote:
26 Oct 2018, 16:12
You don't have to. The titles (filenames) are compared case insensitive. Why do you think there is a problem? Are both nzbs downloaded? Only one should be downloaded and the other one should remain in history as duplicate.
My two RSS feeds from two NZB indexers are returning two files and they mostly only differ by capitalisation. Sometimes they differ with . instead of spaces or .1 at the end of the file name. Both are in the download queue and one is not marked as duplicate - I have to manually delete the duplicate files. I'm not sure if there is some weird unprintable characters in the names causing the issue.

Files added via a single rss feed seem to have duplicate handling working.

A hook that would allow me to write a duplicate checking script may work?

Re: Duplicate file names

Posted: 29 Oct 2018, 10:29
by hugbug
If you talk about movies and tv shows when using a proper indexer is the only thing needed. Good indexers identify movies and tv shows and properly associate every rss item with imdb or tv-sites. When this information is provided in rss feed NZBGet uses it automatically and it doesn't matter how good/bad titles are.

Inspect content of rss feeds (by saving them in web-browser into a file) for fields imdbid, tvdbid, tvmazeid. If the fields are not present at all you may need to add extra parameters to rss feed url. Ask your indexer support. If the fields are present but empty ask their support again to fix this. If indexers don't provide fields and there are no extra parameters for that - find better indexers. Most indexers use newznab software (or derivatives) and that is a standard feature.

In a case you download other types of content where indexers do not provide appropriate id matching, you can indeed transform rss feed titles for better duplicate check. That can be achieved in a feed script. As example see [FeedScript] ImdbWatchlist - RSS with IMDb integration.

Re: Duplicate file names

Posted: 31 Oct 2018, 12:56
by quantum
hugbug wrote:
29 Oct 2018, 10:29
If you talk about movies and tv shows when using a proper indexer is the only thing needed. Good indexers identify movies and tv shows and properly associate every rss item with imdb or tv-sites. When this information is provided in rss feed NZBGet uses it automatically and it doesn't matter how good/bad titles are.

Inspect content of rss feeds (by saving them in web-browser into a file) for fields imdbid, tvdbid, tvmazeid. If the fields are not present at all you may need to add extra parameters to rss feed url. Ask your indexer support. If the fields are present but empty ask their support again to fix this. If indexers don't provide fields and there are no extra parameters for that - find better indexers. Most indexers use newznab software (or derivatives) and that is a standard feature.

In a case you download other types of content where indexers do not provide appropriate id matching, you can indeed transform rss feed titles for better duplicate check. That can be achieved in a feed script. As example see [FeedScript] ImdbWatchlist - RSS with IMDb integration.
Thanks I'll give that a go.