[PP-Script / Scan-Script] Automatically Fetch Subtitles

Share your scripts or request scripts with specific features.
Forum rules
Please keep the forum clean - one topic per script. Questions not related to a specific script should be posted in Support forum.
Post Reply
l2g
Posts: 228
Joined: 27 Jun 2014, 22:13
Contact:

[PP-Script / Scan-Script] Automatically Fetch Subtitles

Post by l2g » 19 Jul 2014, 20:10

Subliminal Post-Process and Scheduler Script for NZBGet & SABnzbd
Subliminal.py will automatically fetch subtitles for the TV Shows and Movies you ask it to. Consider that new videos that just aired will most likely not have subtitles available yet.For these cases, you can use this same script to configure an NZBGet Scheduled Task. Using NZBGet you can additionally have this script poll the directories content usually arrives in to keep it's eye open for the subtitles when (or if) they become available.

This script is effectively a wrapper for subliminal (Source/Docs) written by Antoine Bertin. Specifically a very modified version of 0.74 of subliminal. I provide all of the patches i made to be as transparent as possible and honor Antonie's MIT License put in place. It also utilizes the pynzbget i wrote which simplifies NZBGet script development.

Subliminal NZBGet Script Details
Author: Chris Caron <lead2gold@gmail.com>
Current Version: 1.0.0
License: GPLv3
Source: GitHub / Direct Download Link

Updates:
  • Oct 29th, 2017 Update (v1.0.0):
    Subliminal has been stable for long enough that I believe it's safe to officially label it as such. Therefore I bumped the major version for this release. The major changes are:
    • Fully compliant with SABnzbd
    • Portuguese/Brazilian Language codes supported now
    • A lot of unicode bulletproofing (specifically for Podnapisi and Addic7ed)
    • TV Show Year guessing improvements (which increases likelihood of finding subtitle matches)
  • Jul 25th, 2017 Update (v0.10.0):
    New enhancements and bugfixes
    • .idx/.sub files treated as subtitles already present now
    • can scan library for subtitles using NZBGet via configuration screen
    • Slightly better english embedded scanning support added
  • May 1st, 2016 Update (v0.9.9):
    Just a mini-release
    • Restored Addic7ed login/pass functionality but left it as optional (not mandatory)
  • Feb 6th, 2016 Update (v0.9.8):
    • Significant rewrite of how encoding is handled. English/French users of Subliminal won't really notice this, but those who use other languages such as Polish, Turkish, Russian, etc will notice a major change.
      • Prior to this release; some subtitles downloaded would loose the special characters used by their languages. The subliminal core was re-written to use a more precise algorithm for detecting the encoding (thanks to f3bruary for all his work with testing).
      • Subtitle language support can now be identified by a variety of identifiers (as well as mix and matched), hence the --language (-l) previously took input like en or fr (for English an french respectively) only. There is enough smarts now to handle all languages and their converted Terminology (T) and Bibliographic (B) reference identifiers. This knowledge also allowed me to enhance the language handling of the CLI (and NZBGet Interface) further by accepting the following as input and processing it all the same way:
        • --language nl
        • --language Dutch
        • --language dut
        • --language nld
        Combinations (even mix matched) work now too such as: --language Dutch,en,fra
    • Eliminated ambiguity of the --age (-a) switch which defaults to 24 hours on the CLI. Usually this age check is over-ridden by specifying the --force (-f) switch. However it makes sense to disable the age check if it's set to zero (0) too.
      Note: As a result of this update, the --force (-f) switch becomes kind of useless now, but will be kept around to be backwards compatible with previous releases.
    • Updated the version of urllib3 applying all of it's new security fixes (mostly the elimination of SSLv3 an other insecurities).
  • Oct 21st, 2015 Update (v0.9.7):
    • Merged all of the 3 different Subliminal Options into 1 common shared configuration screen which greatly simplifying things and making it more consistent with other scripts.
    • Added new --cross-reference (-x) feature which allows you to handle for situations where you've already gone off yourself to fetch the subtitle on your own... You can specify more then one cross-reference path identifying a local location that may or may not already contain a subtitle file for the video in question. If one is found matching the video being scanned, it is used instead and no online fetch takes place. The matched subtitle is moved (next to) and renamed automatically for you to fit the video's (in the event they differ from one another).
    • Added better support for systems that enforce strict directory permissions on NZBGet. In this release, all subtitles are now pre-fetched from the internet into the temporary directory instead of the directory NZBGet places us in by default (during processing). In some cases this directory was not necessarily granted write and execute permissions making the downloaded subtitles fetch fail. This is no longer the case.
    • Added support for Windows Network Paths; I didn't realize until now that this script never supported the old \\Network\Path syntax in the comma separated list of media paths. Well... now it does.
  • May 30th, 2015 Update (v0.9.6):
    • Podnapisi's website was partially changed once more breaking it within Subliminal... The plugin was updated to work correctly again.
  • Apr 26th, 2015 Update (v0.9.5):
    • Podnapisi's website was redesigned again and breaking it within Subliminal... The entire plugin had to be rewritten to work correctly again.
    • You no longer need the --scandir (-S) switch to specify paths to scan from the command line. You can just specify as many files and directories as you want as seperate arguments. The --scandir (-S) remains an option to avoid breaking it for other people who've just adapted to this.
    • If no parameters are specified from the CLI, the help menu is automatically displayed (this should avoid confusion for some).
    • Guessit Library updated to v0.10.3 and babelfish library updated to v0.5.4 for better compatibility.
  • Mar 19th, 2015 Update (v0.9.4):
    • No Karma branch removed and all of it's features pulled back into this release. At this time there there is no reason to maintain 2 different branches. It's much easier to work with one! :)
    • xml.etree.cElement (XML Parsing) support; Mac users will no longer receive the error that the NZB File is corrupted (when it really wasn't). This improved parsing works great when pulling your content from NewzNab sites!
    • New CLI option -n (--encoding) also available from NZBGet as SystemEncoding which allows you to better handle unicode characters for most languages (French Spanish, Dutch, German, Ukraine, etc). It defaults to UTF-8 but supports: UTF-16, ISO8859-1 (Latin-1), and ISO8859-2 (Latin-2).
    • New CLI option -e (--force-encoding) also available from NZBGet as ForceEncoding which allows you to convert the encoding detected in the subtitles downloaded to a specific consistent type. It defaults to None (leave the files in the format they were retrieved); but supports: UTF-8, UTF-16, ISO8859-1 (Latin-1), and ISO8859-2 (Latin-2).
    • Improved logging and more cleanup of what is a debug message vs what is just an info message.
  • Dec 10th, 2014 Update (v0.9.3):
    • Added more bulletproofing in the handling of filesystems that do not provide meta data during file scans (GitHub Issue #3)
    • New CLI option -c (--minscore) also available from NZBGet as MinScore which just grants the tool access to a switch that was already available by subliminal in the past. It identifies the threshold to not consider matched subtitles against. It defaults to 20. (GitHub Issue #4)
    • Improved logging and more cleanup of what is a debug message vs what is just an info message.
    • New CLI option -k (--skip-embedded) also available from NZBGet as SkipEmbedded which is only applicable during Advanced Mode Scans to which the video itself is analysed the codecs it uses (to help populate guessit results). This switch will now ignore embedded subtitles if they are also detected and still retrieve ones from the designated providers. If this switch is set to No and embedded subtitles are found, the script will be smart enough to avoid polling the internet for more.
  • Dec 1st, 2014 Update (v0.9.2):
    • Better logging (not as verbose when not in debug mode); easier to understand too.
    • Fixed bug with score calculations when using ImpairedFirst or StandardFirst options.
    • Fixed bug where it was possible to search again for subtitles that were already successfully retrieved.
  • Nov 11th, 2014 Update (v0.9.1):
    • More underlining structure changes to Subliminal; all changes officially pushed upstream (pull request #404). I rewrote huge section of testing (making it backwards compatible with python v2.6). All tests pass of course! :).
    • TVSubtitles.net improvements to match TV Shows more often (for those using this provider).
    • added __MACOSX to the excluded Meta Directories (for speed) when scanning for videos.
    • core improvements to subliminal allowing it to handle duplicate subtitle downloads better. Note: Although I fixed this already in an earlier version, I broke subliminal itself for those using the tool (outside of this nzb script). This patch was rewritten to allow it to work with both worlds.
  • Nov 9th, 2014 Update (v0.9.0):
    • guessit updated to v9.3 (and all of it's dependencies)
    • Unicode support for video's containing filenames with extended characters.
    • --force (-f) switch when not specified now correctly ignores older files (making it behave as the ScanScript processing does).
    • PostProcess updates the date/time stamp of the file allowing for the ScanScript to correctly expire the checking for them after the max age is reached. This feature negates the need to have ResetDateTime.py which was required previously.
    • No longer scan OSX Meta Directories (for speed) (.DS_Store and .AppleDouble).
    • No longer scans videos that are way to small to possibly have subtitles associated with them (<150MB); This variable is configurable for those who wish to change it.
    • Added better TV Show/Movie detection if the category can't be determined.
    • Depending on whether we're dealing with a TV Show or Movie, a different subset of providers can be utilized.
  • Sept 14th, 2014 Update (v0.8.0): podnapisi restructured to work using new layout. Minor bugfixes with multiple subtitle settings.
  • Sept 1st, 2014 Update (v0.7.0): some bugfixes and subliminal core stability improvements.
  • Aug 23th, 2014 Update (v0.6.0): New script modes, ImpairedOnly, StandardOnly, BestScore, ImpairedFirst,
    StandardFirst
  • Aug 11th, 2014 Update (v0.5.0): added basic/advance mode to emulate the filename/hash option that existed in the older script.
Installation
Although I provide extra content (to be as transparent as possible), you really only need 2 directories from the the repository to order to install the program into NZBGet:
  • Subliminal.py
  • Subliminal (the directory)
Copy both of these into your nzbget/scripts directory and you should be good to go.

Please note that this script is not compatible with Python v3 at this time. You must be using Python v2.6 or higher.

Features:
  • * Post Processing remains available to you except with some extra features (hearing impaired, overwrite, and single mode).
    * Scheduling Support is available now as well. You can configure NZBGet to poll directories at regular intervals for subtitles in the event they weren't available during Post Processing. There is a variable for this entitled MaxAge which tells Subliminal that after a certain period elapses (relative to when the file was retrieved) to stop searching for subs. This prevents constantly hitting the internet for content that is just simply never going to be available to you.
    * CLI (Command Line Interface) allows you to the Subliminal.py manually if you like from the command line now. Run it with -h to see some options if you like. A simple command might be:
    • * Subliminal.py -f -S /a/path/to/your/show.mkv which simply scans the file specified and retrieves the subtitles for it if they exist.
      * Subliminal.py -f -S /a/path/to/your/show/directory which recursively scans a directory and fetches the subtitles for all videos it detects.
      * Subliminal.py -f -S /a/path/to/your/show/directory,/another/path/to/your/show/directory,/a/path/to/your/show/file.mp4 which lets you mix files and directories and scan everything in one command line. You'll note that i used the -f switch; this is only nessisary if the content your fetching wasn't recently obtained. Part of the security feature (used by the scheduler) is to only look at new stuff to prevent constantly thrashing over content that just has nothing to fetch from.
CLI Help Menu

Code: Select all

Usage: Subliminal.py [options]

Options:
  -h, --help            show this help message and exit
  -S DIR, --scandir=DIR
                        The directory to scan against. Note: that by setting
                        this variable, it is implied that you are running this
                        from the command line.
  -a AGE, --maxage=AGE  The maximum age a file can be to be considered
                        searchable. This value is represented in hours. The
                        default value is 24 hours.
  -n ENCODING, --encoding=ENCODING
                        The system encoding to use (utf-8, ISO-8859-1, etc).
                        The default value is 'UTF-8'.
  -l LANG, --language=LANG
                        The language the fetch the subtitles in (en, fr, etc).
                        The default value is 'en'.
  -p PROVIDER1,PROVIDER2,etc, --providers=PROVIDER1,PROVIDER2,etc
                        Specify a list of providers (use commas as delimiters)
                        to identify the providers you wish to use. The
                        following will be used by default: 'opensubtitles,tvsu
                        btitles,podnapisi,addic7ed,thesubdb'
  -s, --single          Download content without the language code in the
                        subtitle filename.
  -b, --basic           Do not attempt to parse additional information from
                        the video file. Running in a basic mode is much faster
                        but can make it more difficult to determine the
                        correct subtitle if more then one is matched.
  -x PATH1,PATH2,etc, --cross-reference=PATH1,PATH2,etc
                        Specify an optional list of directories to scan for
                        subs first before checking on the internet. This is
                        for directories containing subs (.srt files) that you
                        have already downloaded ahead of time.
  -z SIZE_IN_MB, --minsize=SIZE_IN_MB
                        Specify the minimum size a video must be to be worthy
                        of of checking for subtiles. This value is interpreted
                        in MB (Megabytes) and defaults to 150 MB.
  -c MINSCORE, --minscore=MINSCORE
                        When scoring multiple matched subtitles for a video,
                        this value identifies the threshold to assume the
                        subtitle is no good and should be thrown away when
                        being compared against others. It currently defaults
                        to 20.
  -k, --skip-embedded   If embedded subtitles were detected, choose not to use
                        them and continue to search for the subtitles hosted
                        by the identified provider(s).
  -e ENCODING, --force-encoding=ENCODING
                        Optionally specify the subtitle's file encoding toa
                        specific type (utf-8, ISO-8859-1, etc). If none is
                        specified then the file is left as is.
  -f, --force           Force a download reguardless of the file age. This
                        switch negates any value specified by the --age (-a)
                        switch.
  -o, --overwrite       Overwrite a subtitle in the event one is already
                        present.
  -m MODE, --fetch-mode=MODE
                        Identify the fetch mode you wish to invoke, the
                        options are: 'ImpairedOnly', 'StandardOnly',
                        'BestScore', 'StandardFirst', 'ImpairedFirst'.  The
                        default value is: BestScore
  -L FILE, --logfile=FILE
                        Send output to the specified logfile instead of
                        stdout.
  -D, --debug           Debug Mode
Last edited by l2g on 29 Oct 2017, 19:14, edited 34 times in total.

hugbug
Developer & Admin
Posts: 7645
Joined: 09 Sep 2008, 11:58
Location: Germany

Re: [PP-Script] Subliminal - Subtitles, faster than your tho

Post by hugbug » 19 Jul 2014, 20:23

That's the purpose of VideoSort to move files. Therefore after it is executed the original directory (NZBPP_DIRECTORY) doesn't have any video files or does not even exist (if cleanup is enabled in VideoSort).

The solution is to run Subliminal before VdieoSort and enable Satellite handling in VideoSort.
l2g wrote:The patches also allow subliminal to work with guessit v0.7. Eventually i'll port it up to 0.8
If you have time may be you could port the Subliminal pp-script to newer version of Subliminal library? The API was changed and the library can't be just replaced.

l2g
Posts: 228
Joined: 27 Jun 2014, 22:13
Contact:

Re: [PP-Script] Subliminal - Subtitles, faster than your tho

Post by l2g » 19 Jul 2014, 21:53

Thanks for your speedy reply hugbug.
I added .srt to the satellite portion of the VideoSort.py script and put it at the end. I'll give it a shot and only whine more if it fails for me! (heh).

As per your request, I just merged all my (already patched) versions of subliminal (v0.7.4), guessit (v0.7.1), urllib3 (v1.9), & requests(v2.3) which should allow even people with Python v2.4 to use it. It will definitely work for 2.5 and 2.6. I also use a newer version of babelfish then some (v0.5) which is not backwards compatible with v0.4. Since i forward ported my copy of subliminal to work with these too i additionally included this in your libs/* directory. Hopefully subliminal will source it and not whatever is global to the system.

Here is the first (untested) version for you here.

Once you're satisified, just let me know so i can take it down and let you maintain the hosting of it with your own (proper) versioning applied.

hugbug
Developer & Admin
Posts: 7645
Joined: 09 Sep 2008, 11:58
Location: Germany

Re: [PP-Script] Subliminal - Subtitles, faster than your tho

Post by hugbug » 20 Jul 2014, 20:54

Thanks for trying but this obviously doesn't work due to incompatible changes introduced in subliminal 0.7.
There were also few libs missing but that were easy to fix: dogpile.cache, dogpile.core, pysrt.

The pp-scripts needs to be adjusted for the new library version.

Attached is the version with added missing libs.

NOTE for users: this is a development version, which doesn't work, use version linked in the first post.
Attachments
subliminal-ppscript-0.7.4-0-001.zip
(956.79 KiB) Downloaded 523 times

l2g
Posts: 228
Joined: 27 Jun 2014, 22:13
Contact:

Re: [PP-Script] Subliminal - Subtitles, faster than your tho

Post by l2g » 21 Jul 2014, 02:43

The CentOS copy i'm just using the global installed versions of those. I appologize for missing them! Good catch!

Attached (
Subliminal.py
Updated Subliminal.py
(13.67 KiB) Downloaded 663 times
) is the final piece of the puzzle (the core PP script file itself). It seems to work for me; I had to massage it a bit more today but testing has gone well.

Hopefully it will help you out too! I refactored the code a bit and merged it with some of the nice functionality in VideoSort.py PPScript. As a result, there is some duplication of functions now - perhaps a future common library would be better? The changes i made will allow the new version of Subliminal.py to handle obfusicated files. As you mentioned, you need to ensure the whole thing is executed before VideoSort.py.

Would it be bad practice to have VideoSort.py write to a common file/sqllite db after it's finished its processing? It would simply store the new path and filename while using the time and nzb file as it's key for others to access? It would allow other PP scripts to check this database in the event the os.environ['NZBPP_DIRECTORY'] is ever missing. It would also allow other scripts to continue to process content without having to deal with previously handled obfusication. It would also eliminate some of the script prioritation. Just a thought.

kloaknet
Posts: 337
Joined: 23 Jul 2014, 08:52

Re: [PP-Script] Subliminal - Subtitles, faster than your tho

Post by kloaknet » 23 Jul 2014, 11:02

This is great guys! I were trying to get it working for 0.7.4 but with the limited python knowledge it's not that easy :)

Working on the script, made me think about that it's often the case that when a file is released, the subtitles aren't there yet. They more likely appear within 12 hours. Delaying the script by XX hours after hours might be an option, but I guess it might be even better to run the script as a scheduled script (it must be there for a reason since v13 ;) )? So it checks every day a typical (set of) folder if subs are missing?
Last edited by kloaknet on 26 Jul 2014, 06:17, edited 1 time in total.

hugbug
Developer & Admin
Posts: 7645
Joined: 09 Sep 2008, 11:58
Location: Germany

Re: [PP-Script] Subliminal - Subtitles, faster than your tho

Post by hugbug » 23 Jul 2014, 11:40

l2g wrote:Attached is the final piece of the puzzle
Thanks, I haven't got time to try it yet.
kloaknet wrote:So it checks every day a typical (set of) folder if subs are missing?
That's exactly what I thought ;)
Hope l2g or someone else could add that functionality. I have way too many other things to work on at the moment.

l2g
Posts: 228
Joined: 27 Jun 2014, 22:13
Contact:

Re: [PP-Script] Subliminal - Subtitles, faster than your tho

Post by l2g » 24 Jul 2014, 13:04

I added one small code change to the very end of the script. After all the greasy work was done, and just before it's about to return OKAY it tries to print what it downloaded using variables that don't exist anymore (they wre only in the previous version of subliminal). See here:
Subliminal.py
update #2
(13.64 KiB) Downloaded 640 times
Everything works, it does download subtitles to older shows (as kloaknet pointed out - new ones will fail fetch for obvious reasons). Even obsfucated content still gets checked for subtitles. But it doesn't seem like the satellite part works correctly with these new changes.

I presume it's because when you download a file entitled:
abcdefghijk.mkv

This tool will fetch it's subs for you as:
The.Proper.Name.srt

And when VideoSort.py kicks in... it's only looking for abcdefghijk.srt entries? This script may need to be modified further to rename the files 'alike' others.

The only other issue i notice with the latest version of subliminal (could have been present in an earlier version too), is it doesn't support CD1 and CD2 entries. Some subtitles span 2 files as the downloaded mkv/avi/mp4 files do. But subliminal chokes if it finds a zip file containing 2 srt files in it. This isn't a big deal since most content is usually always just 1 file anyway, but worth noting. It's a bug for the upstream guys who wrote subliminal to tackle i guess.

kloaknet: I'm new to NZBGet development and don't understand all of it's features quite yet. So I'm not sure how the schedules work (nor have i looked into it yet) but you could probably just do something like this:

Code: Select all

DIRS=$(find /directory/your/downloads/got/moved/to)
for DIR in $DIRS; do
    # Check for missing Subs
    subliminal -l en -- $DIR
done
Heck, this might even work:

Code: Select all

subliminal -l en -- $(find /directory/your/downloads/got/moved/to)
You could add a crontab like:

Code: Select all

cat << _EOF > /etc/cron.d/get_subs
# Every 6 hours check for subtitles
0 */6 * * * youruserid  find /directory/your/downloads/got/moved/to -maxdepth 1 -mindepth 1 -type f -regex '.*\.\(mkv\|mp4\|avi\)$' -exec subliminal -l en --  {} \;
_EOF

hugbug
Developer & Admin
Posts: 7645
Joined: 09 Sep 2008, 11:58
Location: Germany

Re: [PP-Script] Subliminal - Subtitles, faster than your tho

Post by hugbug » 24 Jul 2014, 13:40

Many thanks for your work.
l2g wrote:And when Video Sort kicks in... it's only looking for abcdefghijk.srt entries. This script may need to be modified further to rename the files 'alike' others.
That's a good idea. Even if videosort is not used the subtitles-files should be named like video-file since most video player require that.
l2g wrote:I'm not sure how the schedules work
Just like pp-scripts are executed by NZBGet there is a scheduler which executes scheduler-scripts. If you add scheduler script signature to the pp-script it can be used as a scheduler-script too (Extension scripts).

A script which can work in two modes need to determine whether it is called as pp-script or as a scheduler-script. Because different parameters (env. vars) are passed to different kinds of script the check for a specific parameter helps:

Code: Select all

if 'NZBPP_DIRECTORY' in os.environ:
  print('called as pp-script')
In the scheduler-mode the subliminal script will work obviously differently than in pp-script mode. Instead of looking for subs for specific title, it should go through a specified directory (recursive) and fetch subtitles for all video files which don't have any.
The script will need a configuration option to set the directory to search. If it is empty it could search in the standard destination directory instead (NZBOP_DESTDIR). The deobfuscation part is probably not needed in the scheduler mode.

The advantage for users is that they can very easy setup the script for schedule via web-interface without dealing with cron.

l2g
Posts: 228
Joined: 27 Jun 2014, 22:13
Contact:

Re: [PP-Script] Subliminal - Subtitles, faster than your tho

Post by l2g » 03 Aug 2014, 21:24

hugbug wrote:Many thanks for your work.
No problem, I merged a compelete rewrite of Subliminal.py with the new pynzbget backend i wrote. I can't upload anything larger to this forum less then 256KB so i'll host the link here (subliminal-ppscript-0.7.4-0-002.zip) for now.

I tried to package it as you had it before with the addition of all my subliminal backports (discussed earlier) as well as a few new recent ones.
hugbug wrote:Just like pp-scripts are executed by NZBGet there is a scheduler which executes scheduler-scripts. If you add scheduler script signature to the pp-script it can be used as a scheduler-script too (Extension scripts).
Can you give me just few quick points about Scheduler scripts? I didn't see to much on the Extension link you provided ther then I can define them in the HASH tag entries at the top. Do they environment variables you define use something like NZBSC_ or do they use the NZBPP_ ? Or do i just find them in the NZBOP_ for all of the defined varaiables? or are they passed in as arguments?

I understand your instructions involving the checking of NZBOP_DESTDIR to determine what mode i'm in. Does this also require one to set the HASH TAG entries at the top too? ie:

Code: Select all

### NZBGET SCHEDULER SCRIPT,POST-PROCCESS
The only problem i see with merging a post-process script with a scheduler scirpt is, (well like you said), some options (like deobfusication) don't play a roll in this case. Yet it may be an option for the post-process part. Is there a way to separate the options apart... or just write 2 seperate scripts?
hugbug wrote:The advantage for users is that they can very easy setup the script for schedule via web-interface without dealing with cron.
Understood... i'll see what i can put together :)

Post Reply

Who is online

Users browsing this forum: No registered users and 30 guests