RSS Dupe Help?

Get help, report and discuss bugs.
douche
Posts: 57
Joined: 08 Feb 2014, 23:50

Re: RSS Dupe Help?

Post by douche » 13 Jul 2018, 23:55

hugbug wrote:
13 Jul 2018, 21:48
douche wrote:
13 Jul 2018, 20:20
regardless, would it not makes sense for NZBGet to also dupecheck regardless of the space/delimiter used?
It could help a little but wouldn't solve the problem because nzb names have a lot of extra words (video quality etc.). That's why nzbget have concept of dupekeys.

Indexers (the proper ones) use additional information to identify titles. They don't rely solely on nzb name, they download nfos and parse them, etc. Then provide movie identification via field imdbid.

If you can code you can try to identify titles from nzb names and build proper dupekeys by writing an RSS feed extension script. We can discuss this further if you are going to write such a script.
In my case, the nzb names of releases I'm getting dupes of are typically the exact same, except for the spacer/delimiter - but I suppose I am in the minority on that one though.


Unfortunately I'm not versed in coding outside of batch-scripting and some powershell.
all releases (movies in this case) are *name* (spacer/delimiter) *year* (spacer/delimiter) *everything else not necessarily needed to match to IMDB
using the OMDb API (free api, just need to provide a valid api key), could then match the name (with or without spacer/delimiter - it doesn't seem to matter for this API), and then the year, in an api request such as this: http://www.omdbapi.com/?apikey=<mykey>& ... zer&y=2014
which then returns either an xml or json, which includes the field:

Code: Select all

JSON:
imdbID	"tt0455944"

XML:
imdbID="tt0455944"
so it is possible to get the imdbid via this api, using information from the

however, I have absolutely no clue how to put this all together in a script. I'm willing to try, but I'm unfamiliar with most scripting languages.
I guess I could post a request for this, if you think that's the best course of action.

thanks

hugbug
Developer & Admin
Posts: 7645
Joined: 09 Sep 2008, 11:58
Location: Germany

Re: RSS Dupe Help?

Post by hugbug » 14 Jul 2018, 00:21

Dupekey can be any string. You don't need to find imdbid for RSS item. All you need is to generate a unique string which is always the same for items of the same movie title. A movie title itself is suitable for that purpose as long as it's cleared from extra words (quality, release group etc.).

You can start with writing a feed script which would replace underscores and spaces in RSS item names with dots. At this stage you will not generate dupekeys yet but this simple modification of RSS feed content should help with your primary concern.

Feed scripts work like this: nzbget receives feed from feed provider (indexer) and saves it into disk. That's an xml-file. Nzbget executes your feed script (written in any language) and passes the path to xml-file plus few other infos using environment variables. Nzbget waits for your script to terminate, then reads the xml-file from disk and processes it via RSS feed filter.

Your feed script can read xml-file, do any modifications and write it back. The simplest modification would be to replace characters in titles. More advanced modification would be adding of new fields which you can then use in RSS feed filter (to generate dupekeys for example), but save that for later.

Here is an example feed script which you can use as a starting point - ImdbWatchlist - RSS with IMDb integration. It's written in python language which is friendly for beginners.

Alternatively you could probably achieve the goal with some sort of regex processing of xml-file from a powershell script.

douche
Posts: 57
Joined: 08 Feb 2014, 23:50

Re: RSS Dupe Help?

Post by douche » 14 Jul 2018, 03:12

ok, so I took a quick stab at redoing the script you linked, and some google-foo, think I got rid of the imdb-watchlist specific stuff, and added the lines that would need to be there...

Code: Select all

### NZBGET FEED TITLE RENAME SCRIPT                                        ###
##############################################################################
#Verbose=no

import sys
from os.path import dirname
sys.path.append(dirname(__file__) + '/lib')
import os
import re
import urllib2
import traceback

# Exit codes used by NZBGet
FEEDSCRIPT_SUCCESS=93
FEEDSCRIPT_ERROR=94


# Init script config options
verbose=os.environ['NZBPO_VERBOSE'] == 'yes'

# Init script context
rssfeed_file=os.environ['NZBFP_FILENAME']
out_file=os.environ.get('NZBPO_FEEDOUTFILE')

errors = False

def load_rssfeed(rssfeed_file):
    """ load rss-feed from file """
    data = open(rssfeed_file).read()
    return data

def save_rssfeed(rssfeed_file, data):
    """ save rss-feed back into file """
    open(rssfeed_file, 'w').write(data)

def filter_feed(feed):
    """ build new rss feed containing titles with only '.' spacers/delimiters """
    if verbose:
        print('Renaming')
    new_feed = ''
    for line in feed.splitlines():
        if line.find('</title>'):
			????line???? = line.replace(' ', '.')
			????line???? = line.replace('_', '.')
		else:
			""" do nothing """
    return new_feed


except Exception as e:
    errors = True
    # deleting the feed-xml-file to avoid enqueueing of non-filtered feed
    os.remove(rssfeed_file)
    print('[ERROR] %s' % e)
    traceback.print_exc()

if errors:
    sys.exit(FEEDSCRIPT_ERROR)
else:
    sys.exit(FEEDSCRIPT_SUCCESS)
however, as I'm a complete n00b at python, I'm unsure about these 3 lines:

Code: Select all

        if line.find('</title>'):
			????line???? = line.replace(' ', '.')
			????line???? = line.replace('_', '.')
if I interpreted your original code:
first line should find the line with '</title>' in it
second line should replace any '(space)' with '(dot/period)' - but I'm unsure about what should be in place of '????line????', it could just be 'line' ? yes?
third line should replace any '(underscore)' with '(dot/period)' - but I'm unsure about what should be in place of '????line????', it could just be 'line' ? yes?

am I on the right track?

Thanks again

hugbug
Developer & Admin
Posts: 7645
Joined: 09 Sep 2008, 11:58
Location: Germany

Re: RSS Dupe Help?

Post by hugbug » 14 Jul 2018, 15:48

During development it's better to test the script outside of nzbget. For that use your browser to download and save feed content into a file. Then modify your script to read from this file and save into another file. This allows you to execute script after every modification and to control the result by inspecting the file saved by the script.
douche wrote:
14 Jul 2018, 03:12
first line should find the line with '</title>' in it
second line should replace any '(space)' with '(dot/period)'
Yes if title content is on the same line with "</title>".
????line???? = line.replace(' ', '.')
Yes, seems OK to me.

But you removed way too much. The original script construct new feed content in variable "new_feed". That line your script is missing:

Code: Select all

new_feed += line + "\n"

douche
Posts: 57
Joined: 08 Feb 2014, 23:50

Re: RSS Dupe Help?

Post by douche » 14 Jul 2018, 20:10

hugbug wrote:
14 Jul 2018, 15:48
During development it's better to test the script outside of nzbget. For that use your browser to download and save feed content into a file. Then modify your script to read from this file and save into another file. This allows you to execute script after every modification and to control the result by inspecting the file saved by the script.
douche wrote:
14 Jul 2018, 03:12
first line should find the line with '</title>' in it
second line should replace any '(space)' with '(dot/period)'
Yes if title content is on the same line with "</title>".
????line???? = line.replace(' ', '.')
Yes, seems OK to me.

But you removed way too much. The original script construct new feed content in variable "new_feed". That line your script is missing:

Code: Select all

new_feed += line + "\n"
So if I'm correct then, there would need to be 2 instances of

Code: Select all

new_feed += line + "\n"
in order to have both edited and unedited lines yes?

I also realized I didn't have anything writing to the "out_file" - so added (a variation of your original script):

Code: Select all

 try:
	save_rssfeed(out_file, new_feed)
Meaning the 'complete?' script would be:

Code: Select all

### NZBGET FEED TITLE RENAME SCRIPT                                        ###
##############################################################################
#Verbose=no

import sys
from os.path import dirname
sys.path.append(dirname(__file__) + '/lib')
import os
import re
import urllib2
import traceback

# Exit codes used by NZBGet
FEEDSCRIPT_SUCCESS=93
FEEDSCRIPT_ERROR=94


# Init script config options
verbose=os.environ['NZBPO_VERBOSE'] == 'yes'

# Init script context
rssfeed_file=os.environ['NZBFP_FILENAME']
out_file=os.environ.get('NZBPO_FEEDOUTFILE')

errors = False

def load_rssfeed(rssfeed_file):
    """ load rss-feed from file """
    data = open(rssfeed_file).read()
    return data

def save_rssfeed(rssfeed_file, data):
    """ save rss-feed back into file """
    open(rssfeed_file, 'w').write(data)

def filter_feed(feed):
    """ build new rss feed containing titles with only '.' spacers/delimiters """
    if verbose:
        print('Renaming')
    new_feed = ''
    for line in feed.splitlines():
        if line.find('</title>'):
			line = line.replace(' ', '.')
			line = line.replace('_', '.')
			new_feed += line + "\n"
        else:
			""" do nothing """
			new_feed += line + "\n"
    return new_feed

try:
	save_rssfeed(out_file, new_feed)

except Exception as e:
    errors = True
    # deleting the feed-xml-file to avoid enqueueing of non-filtered feed
    os.remove(rssfeed_file)
    print('[ERROR] %s' % e)
    traceback.print_exc()

if errors:
    sys.exit(FEEDSCRIPT_ERROR)
else:
    sys.exit(FEEDSCRIPT_SUCCESS)
I haven't the faintest idea about how to test this locally... outside of NZBGet

I'm quite outside my comfort level, haha

Post Reply

Who is online

Users browsing this forum: No registered users and 44 guests