OpenWrt Forum Archive

Topic: RSS downloader

The content of this topic has been archived between 29 Mar 2018 and 4 May 2018. Unfortunately there are posts – most likely complete pages – missing.

I have a Netgear WNDR3700v2 and have set it up with Transmission with directory watcher for downloading.
And now all I need is a RSS downloader for torrent files and I have an automated torrent downloader.

I have searched like crazy for a simple way of grabbing torrent files from an RSS feed but every way I have found requires me to compile several programs or install 10MB+ perl stuff (only have 8MB left in my NVRAM).

So, I'm now left with asking for help so:
Have someone found a good and simple way to download from a RSS feed? with or without rules for the download (my feed almost only contain stuff I want to download).
Just instructions for how compiling something might help me too (if possible on my win7 machine, or on the WNDR3700).

RSSDler looked nice but then I have to compile, and with me running a windows computer it got too complicated (I can do it if I must but...).

I so much want to skip the uTorrent (set up to download from an RSS feed) on my windows machine.

Thanks in advanced, Khenke.

(Last edited by Khenke on 10 Aug 2011, 18:37)

I found a (very crude) way that works for me.
I installed python (opkg install python) and used this code with feedparser (http://www.feedparser.org/). Plan to set it up with cron.
And I didn't even need a directory checker smile

#!/usr/bin/python

# Rss feeds

rssfeeds = ['http://www.site.com/rss','http://www.site2.com/rss']

# Code

import feedparser, re, urlparse, urllib
from os.path import basename, join, exists, os

for src in rssfeeds:
        rss = feedparser.parse(src)
        for entry in rss.entries:
                title = entry['title']

                torrent = 'transmission-remote -a "'+entry['link']+'"'
                os.system(torrent.replace('&amp', '&'))
#                print entry['title']
#                print torrent

But if someone knows of a better way I'm still interested smile

(Last edited by Khenke on 12 Aug 2011, 17:31)

I've just published https://github.com/alekseyt/leech

This is small shell script running wget + xsltproc + grep + wget to download files from RSS feeds. It can filter feeds and download only files matched to regexps (you could use .* to download everything). No Python/Perl/etc required (but xsltproc), no need to compile anything.

It is provided with OpenWRT package, and instruction how to add it to cron is very simple: 0/30 * * * * CONFIG_DIR=/etc/leech DOWNLOADS_DIR=/your/dir leech

If anyone interested, i've uploaded new version (0.2) of leech: https://github.com/alekseyt/leech/downloads

What was introduced:

* leech won't download files twice by keeping history in .leech.db file. To ensure your privacy, only MD5 sum of source URL is recorded in DB. This file won't grow indefinitely, instead it will be cleared after the time defined by the next feature.

* Expiration time introduced to deal with feeds that return whole list of files instead of 2- or 1-week of history. By default, leech won't download files older than 1 day, this can be adjusted in $CONFIG_DIR/default. This value is also used to clear downloads database (downloads older than that period will be removed from DB).

If you're interested, please also take a look into known issues and troubleshooting sections:
* https://github.com/alekseyt/leech#known-issues
* https://github.com/alekseyt/leech#troubleshooting

Good, useful to me

leech 0.3 is available: https://github.com/alekseyt/leech/downloads

More features, less bugs:

[+] new option HISTORY that will keep track of .leech.db length instead of EXPIRATION
[+] new option TIMEOUT for interrupting stuck downloads
[+] new option RETRY to retry failed downloads

[-] fixed handling of TMP - will default to DOWNLOADS_DIR as described
[-] fixed bug with leech.lunch left after failed feed download
[-] fixed bug with failed download being stored in .leech.db
[-] correct handling of pubDate parsing error

If you're interested, please pay attention to known issues (https://github.com/alekseyt/leech#known-issues) and troubleshooting (https://github.com/alekseyt/leech#troubleshooting)

leech 0.5 is out: https://github.com/alekseyt/leech/downloads

It now uses cURL instead of wget, mainly because OpenWRT doesn't supply fully functional wget by default. Since wget is essential for OpenWRT normal work, you might break your box during wget upgrade if you're doing it wrong. Both transmission and rtorrent are dependent on libcurl already, so leech dependencies should be partially present already.

This is actually transitional release and 0.6 is coming soon (not soon as in "very soon") to address the problem that inotify is broken in backfire 2.6 (or is it broken just for me?) and transmission's directory watching doesn't work there. Most likely leech will have a download hook, as someone suggested iirc, and will just run `transmission-remote -a` for an URL.

If you have any ideas or suggestions, i would be glad to hear from you by email: aleksey.tulinov@gmail.com

@aleksey_t

Can you add UCI C API configuration support to leech? Btw. UCI can also be compiled as standalone  app :-)

@written_direcon

I'm not familiar with development for UCI (only using it as a part of OpenWRT). I'll take a look, but no promises, definitely not before leech 1.0. While configuration compatibility is a goal for 0.1-1.0, something might change.

What benefits do you see in UCI? leech is designed to have as minimum configuration required as possible, all defaults should be fine for most of the people. All those values in `default` are only fallback if something went wrong, so you might tweak leech if you really need to. The only part that need real configuration is a list of RSS feeds (straightforward) and matching rules (not so straightforward). I'm working on the later as well, i don't really like all the .* and \[ in configuration, it would be more convenient to have wildcards (*, ?, etc) instead of regular expressions. That's not really hard to implement, but this is at the end of the roadmap, .* is not so hard to type after all.

I`ve compiled my own image (r33312). I searched all the menucoinfig and  the only thing similar to xsltproc was libxslt which i selected.
Now I get error:

/mnt/usb_disk/rss/leech/sbin/leech: line 245: xsltproc: not found

I have been also trying to install xsltproc via opkg for days without success:
opkg_download: Failed to download http://downloads.openwrt.org/snapshots/ … ckages.gz, wget returned 8. How do I install xsltproc when compiling my own image ?

I also get another error, but just for one feed:

 Downloading feed: url: ******* Failed: 6 

(Last edited by us on 8 Oct 2012, 21:59)

I solved problem with xsltproc by changing default opkg.conf to http://downloads.openwrt.org/attitude_a … c/packages

I also found the reason for second error which was a typo in an url address.

Here is a simple workarund that may work in some cases for feeds with iso-8859-X encodings:

leech line 167:

    case $RET in
        0)
            sed -i 's/encoding=\"iso-8859-[0-9]*\"/encoding=\"UTF-8\"/I' "$LUNCH"
            echo "OK"
            ;;

(Last edited by us on 18 Oct 2012, 11:40)

This sounds interesting. There's a program called "Automatic" for Linux Based NAS Devices. Not sure if that would work well here.

@us

ISO-8859-X encoding issue is covered here: https://github.com/alekseyt/leech#leech … -eg-cp1251 . I'm afraid that sed workaround would work for ASCII only and anything besides that wouldn't match download rules.

By the way, you can diagnose feeds/torrent fetching issues with cURL error codes (http://curl.haxx.se/libcurl/c/libcurl-errors.html). For instance,

Failed: 6 

Means

CURLE_COULDNT_RESOLVE_HOST (6)
    Couldn't resolve host. The given remote host was not resolved

@ryandigweed

Looks like nice RSS downloading daemon, however, there are some things that i like better in leech, first is that leech is not a daemon :) i.e. no need to reload configuration or anything. But, hey, thanks for noticing it, looking into it right now :)

leech 0.6 is out: https://github.com/alekseyt/leech/downloads

[+] introduced default CONFIG_DIR=/etc/leech
[+] introduced download recipes: default, transmission
[+] introduced wild-downloads (default:WILD_DOWNLOADS)

[*] changes in leech-match-test interface, run w/o arguments to see usage
[*] leech-transmission is supposed to workaround backfire's inotify issue
[*] invalid HTTPS certificates are now ignored (https://github.com/alekseyt/leec
[*] faster leech
[*] no need to put empty lines at the end of configuration files anymore

This one is faster (less CPU, less memory consumed). Now you can also use CONFIG_DIR/wild-downloads as simplified interface for downloading rules, just type there what you want to get, as in

my favorite show 720p

If you're on backfire 2.6 and Transmission crashes with watch dir enabled, configure leech to use leech-transmission recipe (see /etc/leech/default) - it will add torrents directly to transmission-daemon.

leech 0.8 is out: https://github.com/alekseyt/leech/downloads

Mostly bugfix release, but there is also new feature available:

0.8

[+] introduced QUIET_PERIOD that will hold files downloading 
    for some period of time (default:QUIET_PERIOD=)

[*] fixed issue with escaping of not-alphabet character in wild-downloads

You might use QUIET_PERIOD if you want to download torrent some time after it appeared on tracker.

leech 0.10 is out. This release brings support for feeds having links to torrents inside of <enclosure> tag instead of <link>. https://bitbucket.org/alekseyt/leech/downloads

0.10

[+] support for RSS feeds with torrents in <enclosure> instead of <link>

Hey aleksey, are there any plans for a web ui (e.g. in luci or just a regular web page)?

@aleksey_t

I'm now running DD-WRT since I had some stability problems with Open-WRT.
And I want to skip my python script for your Leech.

The problem I have is that I can't get it to work.
First I had to run the Leech command as root from cron, then it couldn't find Curl so I added a path.
But then I just get "curl: can't load library 'libcurl.so.4'".

I have tried adding all kind of path variables to the leech file but with no success.

So if you know what could go wrong and how to fix it I would be VERY grateful, since now I have to run it manually every day.

Thanks for a great script!

EDIT:

I can now answer it on own.
After A LOT of testing I have found a solution. I'm running a rss.sh script from cron that contain the following:

#!/bin/sh
exec /bin/sh --login -c '
source /mnt/sda_part1/root/.profile
CONFIG_DIR="/opt/etc/leech" PERSISTENCE="/mnt/sda_part1/torrents/rss" DOWNLOADS_DIR="/mnt/sda_part1/Stuff/Rss" /opt/usr/sbin/leech >/tmp/leech.log 2>&1'

(Last edited by Khenke on 9 May 2013, 18:47)

@Khenke

Frankly, i can't probably answer that question without looking at your setup. But it looks to me that your cron is running in limited environment, but i never haven't experienced this on OpenWRT with `crontab -e`.

Have you installed leech from .ipk file or manually? What cron file you're editing?

You could try to look at this Stackoverflow question: http://stackoverflow.com/questions/2388 … rect-paths

leech 0.11 is out: https://bitbucket.org/alekseyt/leech/downloads

This release as well as adding features and fixing bugs, also remove some features. So if you want to upgrade, please pay attention to changelog:

0.11

[+] leech will only handle simple comment lines starting with #
    all other lines are not considered to be comments
[+] removed TMP option from config (default:TMP=)
[+] reverse matching support (default:REVERSE_DOWNLOADS=)

[*] lunch download will follow redirects
[*] temporary files will be created with $(mktemp -t)
[*] redone leech-wild-magic
[*] some other under the hood changes

Reverse-matching introduced in this release is somewhat covered here: https://bitbucket.org/alekseyt/leech#ma … g-filters. You can also refer to /etc/leech/reverse-downloads. It is basically the same as /etc/leech/downloads, but used to exclude files from downloads.

root@system:/tmp# opkg install leech_0.11-1_all.ipk
Installing leech (0.11-1) to root...
Collected errors:
* satisfy_dependencies_for: Cannot satisfy the following dependencies for leech:
*     curl *     xsltproc *
* opkg_install_cmd: Cannot install package leech.

I keep getting this error,  despite "Package libcurl (7.29.0-1) installed in root is up to date."

What should I do?

Sorry, posts 26 to 25 are missing from our archive.