Duplicate file names

Get help, report and discuss bugs.
Post Reply
quantum
Posts: 4
Joined: 23 Oct 2018, 10:16

Duplicate file names

Post by quantum » 23 Oct 2018, 10:23

With two rss feeds that sometimes return duplicate file names except for the capitalisation - how would one de-duplicate these?
Thank you for a great product and thanks in advanced!

hugbug
Developer & Admin
Posts: 7645
Joined: 09 Sep 2008, 11:58
Location: Germany

Re: Duplicate file names

Post by hugbug » 24 Oct 2018, 11:04

Letter case should be ignored (since v20). Are you sure there are no other differences? For proper duplicate handling duplicate keys should be used instead of filenames. Please see RSS documentation for details - https://nzbget.net/rss.

quantum
Posts: 4
Joined: 23 Oct 2018, 10:16

Re: Duplicate file names

Post by quantum » 26 Oct 2018, 12:20

hugbug wrote:
24 Oct 2018, 11:04
Letter case should be ignored (since v20). Are you sure there are no other differences? For proper duplicate handling duplicate keys should be used instead of filenames. Please see RSS documentation for details - https://nzbget.net/rss.
Thanks I have read the RSS doco. The files have no key other than the file names. The names look identical except for capitalisation. How can I add the entire file name in upper case as a key?

hugbug
Developer & Admin
Posts: 7645
Joined: 09 Sep 2008, 11:58
Location: Germany

Re: Duplicate file names

Post by hugbug » 26 Oct 2018, 16:12

You don't have to. The titles (filenames) are compared case insensitive. Why do you think there is a problem? Are both nzbs downloaded? Only one should be downloaded and the other one should remain in history as duplicate.

quantum
Posts: 4
Joined: 23 Oct 2018, 10:16

Re: Duplicate file names

Post by quantum » 29 Oct 2018, 09:43

hugbug wrote:
26 Oct 2018, 16:12
You don't have to. The titles (filenames) are compared case insensitive. Why do you think there is a problem? Are both nzbs downloaded? Only one should be downloaded and the other one should remain in history as duplicate.
My two RSS feeds from two NZB indexers are returning two files and they mostly only differ by capitalisation. Sometimes they differ with . instead of spaces or .1 at the end of the file name. Both are in the download queue and one is not marked as duplicate - I have to manually delete the duplicate files. I'm not sure if there is some weird unprintable characters in the names causing the issue.

Files added via a single rss feed seem to have duplicate handling working.

A hook that would allow me to write a duplicate checking script may work?

hugbug
Developer & Admin
Posts: 7645
Joined: 09 Sep 2008, 11:58
Location: Germany

Re: Duplicate file names

Post by hugbug » 29 Oct 2018, 10:29

If you talk about movies and tv shows when using a proper indexer is the only thing needed. Good indexers identify movies and tv shows and properly associate every rss item with imdb or tv-sites. When this information is provided in rss feed NZBGet uses it automatically and it doesn't matter how good/bad titles are.

Inspect content of rss feeds (by saving them in web-browser into a file) for fields imdbid, tvdbid, tvmazeid. If the fields are not present at all you may need to add extra parameters to rss feed url. Ask your indexer support. If the fields are present but empty ask their support again to fix this. If indexers don't provide fields and there are no extra parameters for that - find better indexers. Most indexers use newznab software (or derivatives) and that is a standard feature.

In a case you download other types of content where indexers do not provide appropriate id matching, you can indeed transform rss feed titles for better duplicate check. That can be achieved in a feed script. As example see [FeedScript] ImdbWatchlist - RSS with IMDb integration.

quantum
Posts: 4
Joined: 23 Oct 2018, 10:16

Re: Duplicate file names

Post by quantum » 31 Oct 2018, 12:56

hugbug wrote:
29 Oct 2018, 10:29
If you talk about movies and tv shows when using a proper indexer is the only thing needed. Good indexers identify movies and tv shows and properly associate every rss item with imdb or tv-sites. When this information is provided in rss feed NZBGet uses it automatically and it doesn't matter how good/bad titles are.

Inspect content of rss feeds (by saving them in web-browser into a file) for fields imdbid, tvdbid, tvmazeid. If the fields are not present at all you may need to add extra parameters to rss feed url. Ask your indexer support. If the fields are present but empty ask their support again to fix this. If indexers don't provide fields and there are no extra parameters for that - find better indexers. Most indexers use newznab software (or derivatives) and that is a standard feature.

In a case you download other types of content where indexers do not provide appropriate id matching, you can indeed transform rss feed titles for better duplicate check. That can be achieved in a feed script. As example see [FeedScript] ImdbWatchlist - RSS with IMDb integration.
Thanks I'll give that a go.

Post Reply

Who is online

Users browsing this forum: No registered users and 54 guests