/tech/ - Tech

Technology.

catalog
Mode: Reply
Name
E-mail
Subject
Message

Max message length: 8192

Files

Max file size: 20.00 MB

Max files: 3

Password

(used to delete files and postings)

Misc

Remember to follow the rules


(62.41 KB 768x768 rss.png)
Proper RSS setup thread Comrade 08/26/2020 (Wed) 16:48:31 No. 4423
Replacing registration on major, datamining, media hosting sites with secure, sleek, easy RSS set-up Bunker/tech/ alpha draft RSS SETUP (tor + QuiteRSS) >Install: quiterss and tor >Stylize: Options->Fonts and Colors->Colors: stylize to your own liking >Secure QuiteRSS: Tools->Options->Browser-> untick "embedded browser" and leave the "external browser" empty, under 'Content' untick everything >Torify: Tools->Opt.->Network Connections: SOCKS5, proxy server: localhost, port: 9050 >Go to desired Youtube channel page (or visit it via https://invidious.snopyta.org , which may be taken down soon) >Right-click and pick View Page Source (or for invidious copy the unique channel ID which is already copy-pastable from the URL; it's the last string of text after "/channel/") >For Youtube, when at the View Page Source page, ctrl+f "/channel/" and pick the unique channel ID >Insert the unique channel ID (which looks like 'aAi1SEieasu124dSdsa45Ddas') into the end of the following RSS-ready link: https://www.youtube.com/feeds/videos.xml?channel_id=ENTER_UNIQUE_ID_HERE For Soundcloud see bottom of this post >Paste the completed channel link into your RSS reader. MEDIA PLAYER STREAMING (torsocks + SMPlayer + mpv) >Install: torsocks, youtube-dl, smplayer and mpv >Stylize: Open SMPlayer->Options->Preferences->Interface: stylize to your own liking >Torify part 1: SMPlayer->Opt.->Pref.->Network->Proxy: Enable proxy, Type (bottom of list): SOCKS5, Host: localhost, Port: 9050 >Torify part 2: SMPlayer->Opt.->Pref.->Network: support for video sites: mpv + youtube-dl <Final product (pictures for demo) >Select a recent RSS feed notification from one of the channels you're syndicated to >Click "copy URL" >Select SMPlayer->CTRL+U->double-enter Voilà <Extra: >Guide for Soundcloud: >"View page source" on given channel >Ctrl+f "soundcloud://users:" >The unique string is the "USERID" for the following RSS-ready link https://feeds.soundcloud.com/users/soundcloud:users:USERID/sounds.rss >Enter the completed link into the RSS reader Meta: Note: This is the best I've tinkered out by my own, but I'm suspicious that some of these steps are overly complicated, hence why I highlighted this as an 'alpha draft' -- let's cooperate and see if there's even better ways of doing this while still retaining the security perks. I feel like there must be an easier technique for translating into RSS link than what I'm doing for one. Secondly maybe also a more integrated media player than SMPlayer+mpv that still allows for torification of media player youtube streams (youtube-dl).
>>4423 Brainlet here, two questions: 1)Can you please explain the security part better? 2)Once created such a Rss Feed how can i make this usable by more people?
>>4423 Is there a way to do this with your suggested videos on the home page? So that you don't have to, actually, go to youtube ever again?
jesus christ this is all gui faggotry
(1.07 MB 1280x1043 elfeed.png)
>>4423 Damn, I never thought about using RSS over Tor, pretty neat. But how slow is this for loading long lists of feeds? And what if a site blocks Tor and you never notice and just stop receiving feeds from that site? Personally, I just use elfeed in Emacs. It's great to be able to take notes on news articles and such using org-mode, and also watch videos using mpv over youtube-dl. Pic related, notice I can follow this board from RSS, although the links don't work properly. >>4427 No, recommendations only work if you have an account through which they can track you. And in any case, if you're still receiving youtube recommendations, then you're still not free from youtube. After switching to watching youtube over RSS, I have no longer spent hours upon hours watching shitty recommended videos. I only get what matters the most now. It's pretty nice
>>4430 Well, the algorithm helps with me discovering new things. It's not so much about wanting the recommend, in and of itself, but, I need a way to discover new feeds. This is really based, tho.
>>4429 <Autism You don't have to use a GUI. Any RSS reader will work. I use newsboat.
>>4435 That's understandable, but how often do you honestly find good content through youtube itself? Maybe your experience has been different from mine, but all the good channels I watch I've discovered from other places, whether recs from friends, imageboards, mentions by other channels, and so on. It's more word of mouth than anything else. This is mainly why I haven't missed YouTube after ditching it completely. Maybe you could give it a try, I believe there's benefits to doing that.
>>4427 If you learn how to scrape web pages this becomes trivial in most cases. Sometimes you need to reverse-engineer how session ids are generated and shit like that, but that's extremely rare and it's not too much work anyway. I automate most of my web browsing with tor+scraping so I don't have to deal with bullshit web design and tracking. Anything can be scraped and javascript completely avoided even on javascript-dependent sites. When Tor is blocked I just bypass it by adding a free web proxy between Tor and the server (it's dumb but it works with much less effort than proper Tor+VPN solutions). I use libcurl, which is really great at fine-tuning connections, so I can exactly emulate using Tor Browser at highest security level (my scripts pass the amiunique and panopticlick tests).
(1.50 MB 1280x1280 ARPANET-centralization-2015.png)
(161.39 KB 800x450 cancers-2015.jpg)
(678.97 KB 1200x1121 top-100-websites-2019.jpg)
>>4426 >1)Can you please explain the security part better? The security measures involves disabling vectors of exploitation such as the embedded browser, which, due to it's context within a small RSS reader, has a lot of code but not that many eyes on it (which browser security requires). This leads me to conclude that it's better to disable it. Torifying your RSS syndication is a great way of limiting third party fingerprinting of your unique person online (as that's what your very specific set of subscriptions entails). >2)Once created such a Rss Feed how can i make this usable by more people? That's what I'm trying to develop collectively ITT. >>4429 >jesus christ this is all gui faggotry For ease of use by the masses, not the 1337 lifestyle echelon. >>4430 >how slow is this for loading long lists of feeds? I'm not sure what you mean? (the following takes me 9 sec) QuiteRSS: ctrl+a -> copy link(s) (of let's say the Cockshott feed) -> SMPlayer: ctrl+l (playlist) -> Add (URLs) -> ctrl+v -> [enter] on first in list (for potentially hours of content autoplaying without issues). >And what if a site blocks Tor and you never notice and just stop receiving feeds from that site? I have never experienced this. I don't think it is common for podcast-hosting sites (it's not for youtube or soundcloud either) to block tor. Images to demonstrate why this is so important. Any programmers able to make a ease-of-use program out of all of this with configs pre-made? It could be the first Bunker/tech/ software project.
>>4430 >Personally, I just use elfeed in Emacs BASED
>>4447 >>how slow is this for loading long lists of feeds? >I'm not sure what you mean? I was asking about speed because the Tor network is generally slower than clearnet browsing. Perhaps at scale, after lots of feeds are added, this setup could take a long time to pull from all feeds. >(the following takes me 9 sec) Hmmm, I don't have any benchmarks, but this does seem somewhat slow. I think my setup pulls an individual feed a lot faster than that, and can pull a long list of feeds in about the same time. I understand your goal here is to maximize privacy by keeping separate feed pulling and browsing profiles using Tor, but the practicality and general comfort of browsing still matter. Could you try this setup with a crapton of feeds added, and measure how long it takes? See below I have quickly thrown together a list of Youtube feeds based on the Beardtube list, using some Editor MACros. I do not know if all of these are working, because some URLs are different than others (/c/ or /user/ instead of /channel/, not sure what that means), but most should be fine. You can easily remove the channel names using awk, if that makes it easier for you. The list can be found in this paste (privatebin): https://bin.disroot.org/?f37a0a294739c937#E8Esiw75nTBWPN1KzmAe1LR7NGcBQAA3rNMbTuKo1xke Overall I gotta say RSS is very comfy, and a pretty good way to get people to see why a descentralized internet is important. And you know, making that list of feeds gave me an idea... Should we turn the Bunkerchan Approved RSS list into a git repo to which users can contribute? That way, we could distribute the feeds in multiple formats, like plain text, OPML, JSON, and a nicely formatted README.md in the front page including the full list, so people can browse more easily. The matter of how to distribute the feeds was brought up often, and I do not think a simple pastebin would suffice, and a static file makes it harder to contribute and add more feeds. >>4449 Emacschads ASSEMBLE!
>>4447 >Torifying your RSS syndication is a great way of limiting third party fingerprinting of your unique person online (as that's what your very specific set of subscriptions entails). Not true. Torifying sans Tor Browser only hides your IP from server and the website from your ISP, but it does nothing for your fingerprint. These existing RSS readers will use their own user-agents and other HTTP headers which will make you unique. Only real solution currently, and this is what I have done myself, is to write your own RSS reader that exactly emulates Tor Browser in all its behavior so that the fingerprint is exactly the same as if you were just another Tor Browser user. It is not hard to achieve this for your own private use, but it's hard to pass this on to the masses, because whereas it takes only a couple of minutes to write yet another scraper script, it takes huge amount of work to then turn each of these scripts into different stable GUI programs. If more people could code you'd just create a single library with good API and then people can use it for whatever they want.
>>4454 You misunderstood what I was describing. To load the feeds takes 2-3 seconds. Tor really isn't that slow anymore, I'm using VPN on top of it and I can stream 720 videos from the Invidious Snoptyta onion (so in combination seven hops) without the stream needing to buffer. What I was describing in the previous post was just the time it took me to set up a given playlist. >>4449 >>4454 >>Personally, I just use elfeed in Emacs >BASED >Emacschads ASSEMBLE! Wouldn't it be more appropriate, considering our political context, to make an easily usable, secure out of the box RSS+streaming setup that is usable for common folk? Maybe you missed this but this is what >>4413 (A Bunkerchan approved Youtube feed) and many people in >>>/leftypol/747700 (/Beardtube/ Growing Thread) are in need of.
>>talking about rss over tor >JavaScript is required for PrivateBin to work. Sorry for the inconvenience.
>>4455 This is more what I had in mind for this thread. >These existing RSS readers will use their own user-agents and other HTTP headers which will make you unique. Are you saying QuiteRSS uses individually unique IDs, or that it's simply a different one from a Tor Browser (which is to be expected)? >Only real solution currently, and this is what I have done myself, is to write your own RSS reader that exactly emulates Tor Browser in all its behavior so that the fingerprint is exactly the same as if you were just another Tor Browser user. >It is not hard to achieve this for your own private use, but it's hard to pass this on to the masses We could try to do this here, comrade. Where else, really? We are a primary demographic of need for such a thing as this. We should do it.
>>4456 I don't care about your tankie cringe
>>4456 >To load the feeds takes 2-3 seconds >I'm using VPN on top of it and I can stream 720 videos from the Invidious Snoptyta onion (so in combination seven hops) without the stream needing to buffer That is very impressive, actually. Seems my concerns of slowness were unfounded. >Wouldn't it be more appropriate, considering our political context, to make an easily usable, secure out of the box RSS+streaming setup that is usable for common folk? I agree, hence my asking about speed and usability your proposed setup. Also why I mentioned making the git repo thingy, which would make contributions, distribution and presentation of the feeds easier. Not sure what was the issue here. >>4457 Huh. I think privatebin decrypts the contents of the paste in-browser, so that the server has no access to it, which requires JS. Not sure if it is LibreJS. Do you have any suggestions though? Pastebin is not known for being private >>4458 I believe there are RSS reader extensions for web browsers. It shouldn't be too hard to take one of those and adapt it to these specific needs... right? I have no technical knowledge of this stuff, but I understand Tor Browser is based on Firefox, and can take Firefox extensions. Wouldn't this allow for one to have the same fingerprint/useragent as Tor Browser itself when using this reader?
>>4460 >Wouldn't this allow for one to have the same fingerprint/useragent as Tor Browser itself when using this reader? From what I've heard the Tor Project does not recommend installing additional add-ons to the Tor Browser, as this, apparently, definately alters your fingerprint. Yet, for some reason, TailsOS at least feels okay enough to pre-install their package of the Tor Browser with an Adblock add-on, so I'm not entirely sure as to how severe the alternation of the fingerprinting actually is, or if it's dependent of what add-on one is talking about and/or how it's configured. >>4459 Lmao first time in my life I've been called a tankie. Good luck out there, young lib-anon.
>>4457 Technically Javascript "requirement" is not really a problem most of the time, it can mean that you have to hunt down the URL only once, or that there's no permanent URL and you have to dinamically generate it on each visit. But you don't need a browser with a javascript engine for that, it can all be automated. Would still require a RSS reader that supports some kind of extensions though. >>4458 It is a different fingerprint from Tor Browser. Might not seem a huge problem, but Tor anonymity relies on uniformity i.e. everybody looks like the same person, ideally. Any change to the current TB fingerprint and you're a different person. Different RSS readers, different versions of those readers, and it probably becomes quite easy to single you out. Tbf I'm only autistic about this because people need to know how fingerprinting technically works. Otherwise you get FUD that blames Tor network itself for user's own mistakes. And to be honest TB itself is not actually purely uniform either. For example browser devs still haven't found a way to completely bypass viewport fingerprinting (the size of the visible area - which depends on browser window size). And if you don't use a browser you simply don't have a viewport which is in itself just another piece of fingerprint. >We could try to do this here, comrade. I can try cleaning up my scraping library that emulates Tor Browser fingerprint, but it's in a chaotic pre-alpha state with no documentation. I have no experience with UI development, my scripts just shit out text into the terminal and that's it. IMO it would make more sense to work on TorBirdy add-on for Thunderbird rather than reinvent the wheel. IIRC Thunderbird also works as a RSS reader. TorBirdy is still in beta and the last version is two years old, so it probably doesn't match the current Tor Browser fingerprint either and lacks a bunch of other improvements. Thunderbird has support for custom add-ons though so shit like javascript requirements might be bypassed as well. It also already has a large user base. https://trac.torproject.org/projects/tor/wiki/torbirdy >>4460 >I believe there are RSS reader extensions for web browsers. Yeah, this is another approach. But they probably lack some nice quality-of-life features that proper RSS readers have. >>4461 It depends on add-ons. The problem with adblockers is that they load and display an existing page differently than non-adblock users, so it can be detected whether you use one or not, and which specific filters you use - based on which resources you did or didn't load and if certain elements on the page are present or not. On the other hand if a RSS add-on simply fetched only RSS feeds in the background and then displayed the list in a separate tab... then I don't think it could cause any harm. As far as I know websites can't directly detect which add-ons you have installed, they can just make assumptions based on how their own website was modified. Another harmful example is a zoom add-on.
>>4456 As one of the anons that made the thread i 100% confirm that
>>4462 >Tbf I'm only autistic about this because people need to know how fingerprinting technically works. No I totally agree. >IMO it would make more sense to work on TorBirdy add-on for Thunderbird rather than reinvent the wheel. Torbirdy has without a doubt been the go-to solution for this problem previously, but a couple of months(/years?) back something happened with Thunderbird's development that made development for TorBirdy halt and become close to impossible. I think it was something to do with a change in core functionality, I will try to find the links after this post. TorBirdy as of this time leaks, apparently, due to this (i.e. you upgraded Thunderbird a couple of months back and bam it's been leaking ever since). This situation contributed greatly for my creation of this thread. P.S. TailsOS team talked of some regress temp patch for their operating system ("temporarily"/perpetually), but as you said TorBirdy itself hasn't been able to have been patched since all of the Thunderbird changes that broke it.
Here is a quick hack for the merge step after retrieval. import calendar import sys import time import xml.dom.minidom as mini class C: TIMEFMT = "%Y-%m-%dT%H:%M:%S+00:00" EMPTYFEED = """\ [orange]?xml version="1.0" encoding="UTF-8"?> [orange]feed xmlns:yt="http://www.youtube.com/xml/schemas/2015" xmlns:media="http://search.yahoo.com/mrss/" xmlns="http://www.w3.org/2005/Atom"> [orange]title>merged feed[orange]/title> [orange]author> [orange]name>comrade[orange]/name> [orange]/author> [orange]/feed>""" def nodetext (node): out = [] for n in node.childNodes: if n.nodeType == n.TEXT_NODE: out.append (n.data) return ''.join (out) def parsetime (s): t = time.strptime (s, C.TIMEFMT) return calendar.timegm (t) def entrytime (entry): nodes = entry.getElementsByTagName ("updated") if nodes: return parsetime (nodetext (nodes [0]).strip ()) nodes = entry.getElementsByTagName ("published") if nodes: return parsetime (nodetext (nodes [0]).strip ()) return 0 def merge_docs (docs): merged = mini.parseString (C.EMPTYFEED) mfeed = merged.documentElement all = [] for d in docs: entries = d.documentElement.getElementsByTagName ("entry") all.extend (entries) _et = entrytime keys = [(_et (e), k) for k, e in enumerate (all)] keys.sort (reverse = True) for sec, k in keys: mfeed.appendChild (all [k]) return merged def main_merge (paths): docs = [mini.parse (p) for p in paths] merged = merge_docs (docs) out = "merged.xml" with open (out, "w") as f: merged.writexml (f, indent = "", addindent = "", newl = "", encoding = "UTF-8") merged.unlink () for d in docs: d.unlink () def main (): main_merge (sys.argv [1:]) if __name__ == "__main__": main () This can also be done with the likes of bs4 but it was done this way to avoid dependencies outside the standard library. Taking the first two feeds from >>>/leftypol/747700 as https://www.youtube.com/feeds/videos.xml?channel_id=UCSm1_XO-zvR0ToSJYMljmPA -> ~/videos1.xml https://www.youtube.com/feeds/videos.xml?channel_id=UCl1bDSVi34xE65YzreytVxQ -> ~/videos2.xml it can be run as $ python3 rsstool.py ~/videos1.xml ~/videos2.xml Output goes into merged.xml. Here's an example: http://0x0.st/iEqS.xml Any number of files can be passed as arguments, but keep it sane.
>orange Since the bunkerchan code tags are broken, here's the source: https://paste.textboard.org/18ce1407/raw
>>4447 Lmfao, xnxx is one of the largest sites on the internet?
I thought "updated" was something material instead of fluff like comments. Here's the merge in "published" order: https://paste.textboard.org/0392188f/raw And the example: http://0x0.st/iEc8.xml
>>4470 I looked some more into why TorBirdy development stopped and apparently Thunderbird 68+ breaks it because the add-on can't set the necessary preferences anymore. https://trac.torproject.org/projects/tor/ticket/31341 The work by Tails team seems very promising though, they made some patches to Thunderbird, one of which has already been accepted by Mozilla. https://gitlab.tails.boum.org/tails/tails/-/issues/17281#note_1130
Unfortunately the "YouTube RSS Feeds server" doesn't seem to support If-Modified-Since and 304 responses, and sends the entire feed again with 200. $ python3 [...] >>> import itertools >>> import urllib.request as urq >>> def showresp (resp): ... print ("->", resp.status, resp.reason) ... print ("---[ response headers ]---") ... print ("\n".join (itertools.starmap ("{}: {}".format, resp.headers.items ()))) ... >>> headers = {"User-Agent": "Mozilla/5.0 Firefox/66.0"} >>> req = urq.Request ("https://www.youtube.com/feeds/videos.xml?channel_id=UCSm1_XO-zvR0ToSJYMljmPA", headers = headers) >>> resp = urq.urlopen (req, timeout = 30) >>> showresp (resp) -> 200 OK ---[ response headers ]--- Content-Type: text/xml; charset=UTF-8 Date: Fri, 28 Aug 2020 23:27:16 GMT Expires: Fri, 28 Aug 2020 23:42:16 GMT Cache-Control: public, max-age=900 Server: YouTube RSS Feeds server X-XSS-Protection: 0 X-Frame-Options: SAMEORIGIN Alt-Svc: h3-29=":443"; ma=2592000,h3-27=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-T050=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43" Accept-Ranges: none Vary: Accept-Encoding Connection: close Transfer-Encoding: chunked >>> b = resp.read () >>> len (b) 17048 >>> headers = {"User-Agent": "Mozilla/5.0 Firefox/66.0", "If-Modified-Since": "Fri, 28 Aug 2020 23:27:16 GMT"} >>> req = urq.Request ("https://www.youtube.com/feeds/videos.xml?channel_id=UCSm1_XO-zvR0ToSJYMljmPA", headers = headers) >>> resp = urq.urlopen (req, timeout = 30) >>> showresp (resp) -> 200 OK ---[ response headers ]--- Content-Type: text/xml; charset=UTF-8 Date: Fri, 28 Aug 2020 23:27:16 GMT Expires: Fri, 28 Aug 2020 23:42:16 GMT Server: YouTube RSS Feeds server X-XSS-Protection: 0 X-Frame-Options: SAMEORIGIN Cache-Control: public, max-age=900 Age: 74 Alt-Svc: h3-29=":443"; ma=2592000,h3-27=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-T050=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43" Accept-Ranges: none Vary: Accept-Encoding Connection: close Transfer-Encoding: chunked >>> b = resp.read () >>> len (b) 17048 It doesn't seem to support resuming downloads either, but feeds should be small.
>>4490 Try invidious RSS feeds. Even if it also ignores If-Modified-Since, you should use invidious just to avoid direct connections to YouTube. The URL is https://invidio.us/feed/channel/channel_id
>>4491 Thanks for the link. There doesn't seem to be a difference in the lack of If-Modified-Since support but there is no Expires header this time: >>> headers = {"User-Agent": "Mozilla/5.0 Firefox/66.0"} >>> req = urq.Request ("https://invidio.us/feed/channel/UCSm1_XO-zvR0ToSJYMljmPA", headers = headers) >>> resp = urq.urlopen (req, timeout = 30) >>> showresp (resp) -> 200 OK ---[ response headers ]--- Server: nginx Date: Sat, 29 Aug 2020 10:03:35 GMT Content-Type: application/atom+xml Connection: close X-Frame-Options: sameorigin X-XSS-Protection: 1; mode=block X-Content-Type-Options: nosniff Content-Security-Policy: default-src 'none'; script-src 'self'; style-src 'self' 'unsafe-inline'; img-src 'self' data:; font-src 'self' data:; connect-src 'self'; manifest-src 'self'; media-src 'self' blob: https://*.googlevideo.com:443 Referrer-Policy: same-origin Strict-Transport-Security: max-age=31536000; includeSubDomains; preload >>> b = resp.read () >>> len (b) 19689 >>> headers = {"User-Agent": "Mozilla/5.0 Firefox/66.0", "If-Modified-Since": "Sat, 29 Aug 2020 10:03:35 GMT"} >>> req = urq.Request ("https://invidio.us/feed/channel/UCSm1_XO-zvR0ToSJYMljmPA", headers = headers) >>> resp = urq.urlopen (req, timeout = 30) >>> showresp (resp) -> 200 OK ---[ response headers ]--- Server: nginx Date: Sat, 29 Aug 2020 10:05:49 GMT Content-Type: application/atom+xml Connection: close X-Frame-Options: sameorigin X-XSS-Protection: 1; mode=block X-Content-Type-Options: nosniff Content-Security-Policy: default-src 'none'; script-src 'self'; style-src 'self' 'unsafe-inline'; img-src 'self' data:; font-src 'self' data:; connect-src 'self'; manifest-src 'self'; media-src 'self' blob: https://*.googlevideo.com:443 Referrer-Policy: same-origin Strict-Transport-Security: max-age=31536000; includeSubDomains; preload >>> b = resp.read () >>> len (b) 19689 The feed is structurally modified in some harmless ways like moving media:community out of media:group. But there is also loss of information because 'updated' and media:starRating are removed. These do not seem important enough to outweigh the benefit of having two sources for parallel download.
>>4500 The difference is also that in the invidious one all YouTube URLs are replaced with invidio.us ones. So you can't really mix and match, one will link you to a YouTube video page and the other to the invidio.us page for the same video. As for If-Modified-Since... Just update the feed periodically regardless and check for new items against the old copy. Each individual feed is not large, it only becomes a problem if you follow a gazillion of them. In that case maybe some scheduling is necessary where you spread the updates across the whole time interval.
>>4502 >The difference is also that in the invidious one all YouTube URLs are replaced with invidio.us ones. That one is obvious enough that I didn't think it was worth dwelling on, since it's part of the point of that site? >So you can't really mix and match, one will link you to a YouTube video page and the other to the invidio.us page for the same video. The links seem convertible in both directions. It is the loss of 'updated' and media:starRating that cannot be reversed. >it only becomes a problem if you follow a gazillion of them Yeah, about that: $ wget -q -O - 'https://bunkerchan.xyz/leftypol/res/747700.html' | sed -e 's/&#95;/_/g' | grep -E -o -e 'https://www[.]youtube[.]com/channel/[[:alnum:]_-]{24}' | sort -u | wc 61 61 3477 So 304 support would have been useful.
>>4503 >So 304 support would have been useful. Then spread them. It can be as dumb as this: for feed in feeds: update(feed) time.sleep(interval / len(feeds)) If you set interval to 1 hour it will sleep for roughly 1 minute between each feed's update, given that you have 61 feeds. To ensure sleep time is at least some value: time.sleep(max(interval / len(feeds), min_sleep)) And nothing stops you from using If-Modified-Since for servers that support it. I assume you're not interested only in YouTube channels. If you're really intent on not fetching the whole feed needlessly then you could fetch just until the necessary chunk that tells you the date. I don't know if urllib gives you that much control, but it's possible with pycurl where you can drive the transfer manually.
>>4504 >And nothing stops you from using If-Modified-Since for servers that support it. I assume you're not interested only in YouTube channels. I'm not sufficiently into videos+rss that I would know about a feed server with If-Modified-Since support that I could test on. >you could fetch just until the necessary chunk that tells you the date. I don't know if urllib gives you that much control I like this idea. Urllib.request has partial reads, just not on the headers. But the "YouTube RSS Feeds server" supports HEAD requests. Unfortunately it updates the date every 15 minutes on unchanged feeds and doesn't send Last-Modified. $ wget --user-agent="Mozilla/5.0 Firefox/66.0" --server-response --method=HEAD 'https://www.youtube.com/feeds/videos.xml?channel_id=UCSm1_XO-zvR0ToSJYMljmPA' Spider mode enabled. Check if remote file exists. [...] HTTP request sent, awaiting response... HTTP/1.1 200 OK Content-Type: text/xml; charset=UTF-8 Date: Sat, 29 Aug 2020 19:50:38 GMT Expires: Sat, 29 Aug 2020 20:05:38 GMT Cache-Control: public, max-age=900 Server: YouTube RSS Feeds server X-XSS-Protection: 0 X-Frame-Options: SAMEORIGIN Transfer-Encoding: chunked Alt-Svc: h3-29=":443"; ma=2592000,h3-27=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-T050=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43" Accept-Ranges: none Vary: Accept-Encoding Length: unspecified [text/xml] Remote file exists. >it's possible with pycurl where you can drive the transfer manually All the curls are great but I'd rather stick to the standard library for now.
>>4505 You could still read into the feed body itself and compare the first <entry><id>value against the first entry in the old copy. Doesn't seem like there's too much fluff before the first entry, but I'm looking only at an invidious feed of a single channel. Only other approach I can think of atm is inspecting various requests that YouTube's website makes in the background. You might find some useful URL that returns a short response from which you can determine if the channel was updated. This is a long shot, but you never know. JS-bloated websites make a shit ton of redundant requests in the background that become very useful to web scrapers. It's like dumpster diving. I don't know how far you're into the project, but tbh I'd leave hacky and unreliable optimizations like these for later. Even with ~60 YouTube feeds it's not necessary IMO, and I'd rather add some delay between each feed update so that they're not all triggered immediately one after the other. You would probably do this anyway, even if optimized.
Here is a simple version with sequential downloads. http://0x0.st/iEpx.py It takes a file of yt feed urls, one per line, but other types will be added. There are some settings to customize in the C class. The previous merge >>4478 reversed the order of equal time entries; even though there were no equal time entries this has been corrected. For testing 10 random feeds can be obtained from >>>/leftypol/747700 with: $ wget -q -O - 'https://bunkerchan.xyz/leftypol/res/747700.html' | sed -e 's/&#95;/_/g' | grep -E -o -e 'https://www[.]youtube[.]com/channel/[[:alnum:]_-]{24}' | sort -u -R | head -n 10 | sed -r -e 's#^.*/([^/]{24})$#https://www.youtube.com/feeds/videos.xml?channel_id=\1#' > feeds.txt Once a feed list file is available it can be retrieved and merged with: $ python3 rsstool.py getlistmerge feeds.txt The result is in merged.xml by default, which can be given to an rss client. A sample run: $ python3 rsstool.py getlistmerge feeds.txt get https://www.youtube.com/feeds/videos.xml?channel_id=UCkebf5OI9FU6kr77VhUFzTg OK 4906 -> 27851 bytes for https://www.youtube.com/feeds/videos.xml?channel_id=UCkebf5OI9FU6kr77VhUFzTg get https://www.youtube.com/feeds/videos.xml?channel_id=UC5GYwuvmAD_VyV6w5aFnnUw OK 5575 -> 29801 bytes for https://www.youtube.com/feeds/videos.xml?channel_id=UC5GYwuvmAD_VyV6w5aFnnUw get https://www.youtube.com/feeds/videos.xml?channel_id=UClJF8jKPt0E6Za-LYj70agQ OK 1633 -> 5687 bytes for https://www.youtube.com/feeds/videos.xml?channel_id=UClJF8jKPt0E6Za-LYj70agQ get https://www.youtube.com/feeds/videos.xml?channel_id=UCl_A_42M6kvjH8Gr-rwfCUw OK 5025 -> 13897 bytes for https://www.youtube.com/feeds/videos.xml?channel_id=UCl_A_42M6kvjH8Gr-rwfCUw get https://www.youtube.com/feeds/videos.xml?channel_id=UCYJh43ubWNEjOkU3HcUL-pQ OK 17119 -> 52169 bytes for https://www.youtube.com/feeds/videos.xml?channel_id=UCYJh43ubWNEjOkU3HcUL-pQ get https://www.youtube.com/feeds/videos.xml?channel_id=UCd1Ze_UknxhxpK9Lvi5rjYw OK 5021 -> 22987 bytes for https://www.youtube.com/feeds/videos.xml?channel_id=UCd1Ze_UknxhxpK9Lvi5rjYw get https://www.youtube.com/feeds/videos.xml?channel_id=UCYM7I0m-I9EVB-5gaBqiqbg OK 2785 -> 18224 bytes for https://www.youtube.com/feeds/videos.xml?channel_id=UCYM7I0m-I9EVB-5gaBqiqbg get https://www.youtube.com/feeds/videos.xml?channel_id=UCEBbylt9Rax3nOP_hyPnMPA OK 4588 -> 26706 bytes for https://www.youtube.com/feeds/videos.xml?channel_id=UCEBbylt9Rax3nOP_hyPnMPA get https://www.youtube.com/feeds/videos.xml?channel_id=UCCvdjsJtifsZoShjcAAHZpA OK 6589 -> 32738 bytes for https://www.youtube.com/feeds/videos.xml?channel_id=UCCvdjsJtifsZoShjcAAHZpA get https://www.youtube.com/feeds/videos.xml?channel_id=UCcgATilLYyjhDhZnjvrKBCw OK 5046 -> 22736 bytes for https://www.youtube.com/feeds/videos.xml?channel_id=UCcgATilLYyjhDhZnjvrKBCw >>4506 >You could still read into the feed body itself and compare the first <entry><id>value against the first entry in the old copy. That relies on sorted feeds, an assumption I haven't used yet. And it would indeed go over the line of "hacky and unreliable optimizations". My goal was to help the server avoid sending the response body at all, that's why I tried HEAD requests after your previous suggestion. >Only other approach I can think of atm is inspecting various requests that YouTube's website makes in the background. OK, but for that I'd have to visit yt with scripts enabled, which I'm not inclined to do. >I'd leave hacky and unreliable optimizations like these for later. Complexity will only be added gradually, of course. But if 304 support were available it wouldn't be a "hacky and unreliable optimization", it would be standard HTTP behavior and it would make sense to incorporate it early on.
Woah can someone translate the programming work above into layman terms? What's being progressed rn?
Added modification support for the merged feed output file. http://0x0.st/iEkN.py If a previous merged feed exists and the new content would match, the file is not modified. $ python3 rsstool.py mergelist feeds.txt unchanged merged.xml The result is the same after a retrieval step that yields component feeds with unchanged content. This means that a cooperating client and server can detect the no modification condition early. As a demonstration the python SimpleHTTP server can be run in another terminal with: $ python3 -m http.server 8000 --bind 127.0.0.1 The merged feed is downloaded to a different directory: ~/Downloads$ wget --server-response 'http://127.0.0.1:8000/merged.xml' --2020-09-01 11:36:38-- http://127.0.0.1:8000/merged.xml Connecting to 127.0.0.1:8000... connected. HTTP request sent, awaiting response... HTTP/1.0 200 OK Server: SimpleHTTP/0.6 Python/3.6.9 Date: Tue, 01 Sep 2020 11:36:38 GMT Content-type: application/xml Content-Length: 138688 Last-Modified: Tue, 01 Sep 2020 10:29:54 GMT Length: 138688 (135K) [application/xml] Saving to: ‘merged.xml’ merged.xml 100%[=======>] 135,44K --.-KB/s in 0s 2020-09-01 11:36:38 (332 MB/s) - ‘merged.xml’ saved [138688/138688] A no-modification merge: $ python3 rsstool.py mergelist feeds.txt unchanged merged.xml Wget is told to use If-Modified-Since: ~/Downloads$ wget --server-response --timestamping 'http://127.0.0.1:8000/merged.xml' --2020-09-01 11:39:38-- http://127.0.0.1:8000/merged.xml Connecting to 127.0.0.1:8000... connected. HTTP request sent, awaiting response... HTTP/1.0 200 OK Server: SimpleHTTP/0.6 Python/3.6.9 Date: Tue, 01 Sep 2020 11:39:38 GMT Content-type: application/xml Content-Length: 138688 Last-Modified: Tue, 01 Sep 2020 10:29:54 GMT Server ignored If-Modified-Since header for file ‘merged.xml’. You might want to add --no-if-modified-since option. The file is not downloaded due to Last-Modified, but SimpleHTTP does not support If-Modified-Since so it tried to send a response body as well. This can be fixed by telling wget to use HEAD requests instead, to which SimpleHTTP will respond with Last-Modified, unlike the "YouTube RSS Feeds server" >>4505. ~/Downloads$ wget --server-response --timestamping --no-if-modified-since 'http://127.0.0.1:8000/merged.xml' --2020-09-01 11:45:15-- http://127.0.0.1:8000/merged.xml Connecting to 127.0.0.1:8000... connected. HTTP request sent, awaiting response... HTTP/1.0 200 OK Server: SimpleHTTP/0.6 Python/3.6.9 Date: Tue, 01 Sep 2020 11:45:15 GMT Content-type: application/xml Content-Length: 138688 Last-Modified: Tue, 01 Sep 2020 10:29:54 GMT Length: 138688 (135K) [application/xml] Server file no newer than local file ‘merged.xml’ -- not retrieving. This time there is no download and no response body is sent. The view from SimpleHTTP: Serving HTTP on 127.0.0.1 port 8000 (http://127.0.0.1:8000/) ... 127.0.0.1 - - [01/Sep/2020 11:36:38] "GET /merged.xml HTTP/1.1" 200 - 127.0.0.1 - - [01/Sep/2020 11:39:38] "GET /merged.xml HTTP/1.1" 200 - 127.0.0.1 - - [01/Sep/2020 11:45:15] "HEAD /merged.xml HTTP/1.1" 200 - This will not make much of a speed difference on the localhost interface, but it will on a slow connection. It may also help an rss client decide more efficiently that the merged feed has not changed.
Added download error statistics to tell at a glance whether all component feeds downloaded successfully, and removed the invidio.us line parser since they now serve a html page on feed urls. http://0x0.st/iE50.py A sample run: $ python3 rsstool.py getlistmerge feeds.txt get https://www.youtube.com/feeds/videos.xml?channel_id=UCkebf5OI9FU6kr77VhUFzTg OK 4909 -> 27851 bytes for https://www.youtube.com/feeds/videos.xml?channel_id=UCkebf5OI9FU6kr77VhUFzTg get https://www.youtube.com/feeds/videos.xml?channel_id=UCSm1_XO-zvR0ToSJYMljmPA OK 2603 -> 17048 bytes for https://www.youtube.com/feeds/videos.xml?channel_id=UCSm1_XO-zvR0ToSJYMljmPA 0 errors, 2 ok, 2 total changed merged.xml And with the network disabled: $ python3 rsstool.py getlist feeds.txt get https://www.youtube.com/feeds/videos.xml?channel_id=UCkebf5OI9FU6kr77VhUFzTg [error] <urlopen error [Errno -2] Name or service not known> for https://www.youtube.com/feeds/videos.xml?channel_id=UCkebf5OI9FU6kr77VhUFzTg get https://www.youtube.com/feeds/videos.xml?channel_id=UCSm1_XO-zvR0ToSJYMljmPA [error] <urlopen error [Errno -2] Name or service not known> for https://www.youtube.com/feeds/videos.xml?channel_id=UCSm1_XO-zvR0ToSJYMljmPA 2 errors, 0 ok, 2 total
Added the missing timeout error as well as generic support for invidio instances. Instances are declared in INVIDIO_MAP as mappings from domain to some unique id string. The first two from the list at https://invidio.us/ have been added as demonstration. An empty INVIDIO_MAP is valid and disables support. http://0x0.st/iEC-.py A feed list file with yt and invidio instances: $ cat feeds.txt https://www.youtube.com/feeds/videos.xml?channel_id=UCkebf5OI9FU6kr77VhUFzTg https://www.youtube.com/channel/UCSm1_XO-zvR0ToSJYMljmPA https://invidious.snopyta.org/feed/channel/UC5GYwuvmAD_VyV6w5aFnnUw https://invidious.snopyta.org/channel/UCl1bDSVi34xE65YzreytVxQ https://invidious.ggc-project.de/feed/channel/UClJF8jKPt0E6Za-LYj70agQ https://invidious.ggc-project.de/channel/UCl_A_42M6kvjH8Gr-rwfCUw A sample run: $ python3 rsstool.py getlistmerge feeds.txt get https://www.youtube.com/feeds/videos.xml?channel_id=UCkebf5OI9FU6kr77VhUFzTg get https://invidious.snopyta.org/feed/channel/UC5GYwuvmAD_VyV6w5aFnnUw get https://invidious.ggc-project.de/feed/channel/UClJF8jKPt0E6Za-LYj70agQ OK 4909 -> 27851 bytes for https://www.youtube.com/feeds/videos.xml?channel_id=UCkebf5OI9FU6kr77VhUFzTg get https://www.youtube.com/feeds/videos.xml?channel_id=UCSm1_XO-zvR0ToSJYMljmPA OK 2605 -> 17048 bytes for https://www.youtube.com/feeds/videos.xml?channel_id=UCSm1_XO-zvR0ToSJYMljmPA OK 5608 -> 43594 bytes for https://invidious.snopyta.org/feed/channel/UC5GYwuvmAD_VyV6w5aFnnUw OK 1620 -> 7454 bytes for https://invidious.ggc-project.de/feed/channel/UClJF8jKPt0E6Za-LYj70agQ get https://invidious.snopyta.org/feed/channel/UCl1bDSVi34xE65YzreytVxQ get https://invidious.ggc-project.de/feed/channel/UCl_A_42M6kvjH8Gr-rwfCUw OK 5943 -> 54531 bytes for https://invidious.snopyta.org/feed/channel/UCl1bDSVi34xE65YzreytVxQ OK 5074 -> 22824 bytes for https://invidious.ggc-project.de/feed/channel/UCl_A_42M6kvjH8Gr-rwfCUw 0 errors, 6 ok, 6 total new merged.xml A run with an unreasonably low REQUEST_TIMEOUT to show timeout handling: $ python3 rsstool.py getlistmerge feeds.txt get https://www.youtube.com/feeds/videos.xml?channel_id=UCkebf5OI9FU6kr77VhUFzTg get https://invidious.snopyta.org/feed/channel/UC5GYwuvmAD_VyV6w5aFnnUw get https://invidious.ggc-project.de/feed/channel/UClJF8jKPt0E6Za-LYj70agQ OK 4909 -> 27851 bytes for https://www.youtube.com/feeds/videos.xml?channel_id=UCkebf5OI9FU6kr77VhUFzTg OK 1620 -> 7454 bytes for https://invidious.ggc-project.de/feed/channel/UClJF8jKPt0E6Za-LYj70agQ get https://www.youtube.com/feeds/videos.xml?channel_id=UCSm1_XO-zvR0ToSJYMljmPA OK 2605 -> 17048 bytes for https://www.youtube.com/feeds/videos.xml?channel_id=UCSm1_XO-zvR0ToSJYMljmPA [error] The read operation timed out for https://invidious.snopyta.org/feed/channel/UC5GYwuvmAD_VyV6w5aFnnUw get https://invidious.ggc-project.de/feed/channel/UCl_A_42M6kvjH8Gr-rwfCUw OK 5074 -> 22824 bytes for https://invidious.ggc-project.de/feed/channel/UCl_A_42M6kvjH8Gr-rwfCUw get https://invidious.snopyta.org/feed/channel/UCl1bDSVi34xE65YzreytVxQ [error] The read operation timed out for https://invidious.snopyta.org/feed/channel/UCl1bDSVi34xE65YzreytVxQ 2 errors, 4 ok, 6 total unchanged merged.xml
>>4543 I'joining this anon in asking for a rundown of the progress made in layman's terms, seems quite a good deal but idk.
Added download time deltas to get an idea about which hosts are under heavier or lighter load. Set INVIDIO_MAP to those instances from https://invidio.us/ that are currently updated enough to be able to show images on the front page. http://0x0.st/i6ZS.py A feed list file with 12 random feeds from >>>/leftypol/747700 for the current INVIDIO_MAP: $ cat feeds.txt https://www.youtube.com/feeds/videos.xml?channel_id=UCd1Ze_UknxhxpK9Lvi5rjYw https://www.youtube.com/channel/UCSm1_XO-zvR0ToSJYMljmPA https://invidious.snopyta.org/feed/channel/UCl_A_42M6kvjH8Gr-rwfCUw https://invidious.snopyta.org/channel/UCqilwNYrajK0djOx3sIh01A https://invidious.13ad.de/feed/channel/UCHgnQzZY7T9TxhI40BmKJwQ https://invidious.13ad.de/channel/UCSrad2ah3GKKDLK3_j0hogg https://invidious.fdn.fr/feed/channel/UCQPPvpSCGKNeoTQZUCnOgng https://invidious.fdn.fr/channel/UCEBbylt9Rax3nOP_hyPnMPA https://invidious.toot.koeln/feed/channel/UCs8mbJ-M142ZskR5VR0gBig https://invidious.toot.koeln/channel/UC__UuPAX7TvF3hGYb5ciVpQ https://yt.iswleuven.be/feed/channel/UC6DbLEHgTj6VK7LvtzoGSIw https://yt.iswleuven.be/channel/UC3-LD6DAgLsKSuowdrDOxRg A sample run: $ python3 rsstool.py getlistmerge feeds.txt 10:17:35 get https://www.youtube.com/feeds/videos.xml?channel_id=UCd1Ze_UknxhxpK9Lvi5rjYw 10:17:35 get https://invidious.snopyta.org/feed/channel/UCl_A_42M6kvjH8Gr-rwfCUw 10:17:35 get https://invidious.13ad.de/feed/channel/UCHgnQzZY7T9TxhI40BmKJwQ 10:17:35 get https://invidious.fdn.fr/feed/channel/UCQPPvpSCGKNeoTQZUCnOgng 10:17:35 +0.2s OK 5022 -> 22987 bytes for https://www.youtube.com/feeds/videos.xml?channel_id=UCd1Ze_UknxhxpK9Lvi5rjYw 10:17:35 +0.5s OK 2961 -> 21690 bytes for https://invidious.fdn.fr/feed/channel/UCQPPvpSCGKNeoTQZUCnOgng 10:17:35 +0.9s OK 5238 -> 36088 bytes for https://invidious.13ad.de/feed/channel/UCHgnQzZY7T9TxhI40BmKJwQ 10:17:37 +2.1s OK 5092 -> 23491 bytes for https://invidious.snopyta.org/feed/channel/UCl_A_42M6kvjH8Gr-rwfCUw 10:17:37 get https://www.youtube.com/feeds/videos.xml?channel_id=UCSm1_XO-zvR0ToSJYMljmPA 10:17:37 get https://invidious.fdn.fr/feed/channel/UCEBbylt9Rax3nOP_hyPnMPA 10:17:37 +0.4s OK 2603 -> 17048 bytes for https://www.youtube.com/feeds/videos.xml?channel_id=UCSm1_XO-zvR0ToSJYMljmPA 10:17:37 get https://invidious.toot.koeln/feed/channel/UCs8mbJ-M142ZskR5VR0gBig 10:17:37 get https://invidious.13ad.de/feed/channel/UCSrad2ah3GKKDLK3_j0hogg 10:17:38 +0.5s OK 4652 -> 38876 bytes for https://invidious.fdn.fr/feed/channel/UCEBbylt9Rax3nOP_hyPnMPA 10:17:38 get https://yt.iswleuven.be/feed/channel/UC6DbLEHgTj6VK7LvtzoGSIw 10:17:38 +0.8s OK 5198 -> 32010 bytes for https://invidious.toot.koeln/feed/channel/UCs8mbJ-M142ZskR5VR0gBig 10:17:38 +0.6s OK 2511 -> 24768 bytes for https://invidious.13ad.de/feed/channel/UCSrad2ah3GKKDLK3_j0hogg 10:17:38 +0.9s OK 18263 -> 90607 bytes for https://yt.iswleuven.be/feed/channel/UC6DbLEHgTj6VK7LvtzoGSIw 10:17:39 get https://invidious.snopyta.org/feed/channel/UCqilwNYrajK0djOx3sIh01A 10:17:40 get https://invidious.toot.koeln/feed/channel/UC__UuPAX7TvF3hGYb5ciVpQ 10:17:40 get https://yt.iswleuven.be/feed/channel/UC3-LD6DAgLsKSuowdrDOxRg 10:17:41 +0.7s OK 3133 -> 27662 bytes for https://invidious.toot.koeln/feed/channel/UC__UuPAX7TvF3hGYb5ciVpQ 10:17:41 +0.5s OK 1387 -> 6996 bytes for https://yt.iswleuven.be/feed/channel/UC3-LD6DAgLsKSuowdrDOxRg 10:17:41 +2.4s OK 4437 -> 27300 bytes for https://invidious.snopyta.org/feed/channel/UCqilwNYrajK0djOx3sIh01A 0 errors, 12 ok, 12 total new merged.xml Fill in C.USER_AGENT with your browser's user agent string. One way to get it is: Help -> Troubleshooting Information -> Application Basics -> User Agent. Better yet find an Apple device user agent string on the net and use that. https://html.duckduckgo.com/html?q=iphone%20user%20agent%20string
Added preliminary support for downloading feed IDs from a group of hosts, rather than specific feeds from specific hosts. Added autoflushing controlled by C.WORKER_FLUSH for use with tee. http://0x0.st/i6w4.py The allowed invidio hosts are declared in C.INVIDIO_DOMAINS as a plain list. Whether youtube is used as an additional host is controlled by C.YTFEED_ENABLED. The new input file format has one feed ID per line. A previous feeds.txt >>4615 can be converted with: $ sed -r -e 's/^.+([[:alnum:]_-]{24})$/\1/' feeds.txt > feedids.txt The available hosts get scores based on their response times and the host selection is dynamically adjusted based on the scores of the current run. Also remember to set C.USER_AGENT to some Apple device user agent string. https://html.duckduckgo.com/html?q=iphone%20user%20agent%20string A sample feed ID file >>4615: $ cat feedids.txt UCd1Ze_UknxhxpK9Lvi5rjYw UCSm1_XO-zvR0ToSJYMljmPA UCl_A_42M6kvjH8Gr-rwfCUw UCqilwNYrajK0djOx3sIh01A UCHgnQzZY7T9TxhI40BmKJwQ UCSrad2ah3GKKDLK3_j0hogg UCQPPvpSCGKNeoTQZUCnOgng UCEBbylt9Rax3nOP_hyPnMPA UCs8mbJ-M142ZskR5VR0gBig UC__UuPAX7TvF3hGYb5ciVpQ UC6DbLEHgTj6VK7LvtzoGSIw UC3-LD6DAgLsKSuowdrDOxRg A sample run: $ python3 rsstool.py getidlist feedids.txt | tee log.txt 00:17:49 worker 0 GET item 0 UCd1Ze_UknxhxpK9Lvi5rjYw try 1 host 0 www.youtube.com 00:17:49 worker 1 GET item 1 UCSm1_XO-zvR0ToSJYMljmPA try 1 host 1 invidious.snopyta.org 00:17:49 worker 2 GET item 2 UCl_A_42M6kvjH8Gr-rwfCUw try 1 host 2 invidious.13ad.de 00:17:49 worker 3 GET item 3 UCqilwNYrajK0djOx3sIh01A try 1 host 3 invidious.fdn.fr 00:17:50 +0.3s worker 0 OK 5026 -> 22987 bytes for https://www.youtube.com/feeds/videos.xml?channel_id=UCd1Ze_UknxhxpK9Lvi5rjYw 00:17:50 worker 0 GET item 4 UCHgnQzZY7T9TxhI40BmKJwQ try 1 host 4 invidious.toot.koeln 00:17:50 +0.4s worker 0 ERROR HTTP Error 500: Internal Server Error for https://invidious.toot.koeln/feed/channel/UCHgnQzZY7T9TxhI40BmKJwQ 00:17:50 worker 0 GET item 4 UCHgnQzZY7T9TxhI40BmKJwQ try 2 host 5 invidiou.site 00:17:50 +0.8s worker 2 OK 5090 -> 23376 bytes for https://invidious.13ad.de/feed/channel/UCl_A_42M6kvjH8Gr-rwfCUw 00:17:50 worker 2 GET item 5 UCSrad2ah3GKKDLK3_j0hogg try 1 host 6 yt.iswleuven.be 00:17:50 +0.8s worker 3 OK 4827 -> 28190 bytes for https://invidious.fdn.fr/feed/channel/UCqilwNYrajK0djOx3sIh01A 00:17:51 +0.4s worker 2 ERROR HTTP Error 500: Internal Server Error for https://yt.iswleuven.be/feed/channel/UCSrad2ah3GKKDLK3_j0hogg 00:17:52 +1.5s worker 0 OK 5557 -> 37550 bytes for https://invidiou.site/feed/channel/UCHgnQzZY7T9TxhI40BmKJwQ 00:17:52 worker 3 GET item 6 UCQPPvpSCGKNeoTQZUCnOgng try 1 host 0 www.youtube.com 00:17:52 worker 2 GET item 5 UCSrad2ah3GKKDLK3_j0hogg try 2 host 2 invidious.13ad.de 00:17:52 +0.5s worker 3 OK 2982 -> 18114 bytes for https://www.youtube.com/feeds/videos.xml?channel_id=UCQPPvpSCGKNeoTQZUCnOgng 00:17:52 worker 0 GET item 7 UCEBbylt9Rax3nOP_hyPnMPA try 1 host 3 invidious.fdn.fr 00:17:53 +0.5s worker 0 OK 4650 -> 38876 bytes for https://invidious.fdn.fr/feed/channel/UCEBbylt9Rax3nOP_hyPnMPA 00:17:53 +0.8s worker 2 OK 2514 -> 24768 bytes for https://invidious.13ad.de/feed/channel/UCSrad2ah3GKKDLK3_j0hogg 00:17:53 +4.0s worker 1 OK 2464 -> 20548 bytes for https://invidious.snopyta.org/feed/channel/UCSm1_XO-zvR0ToSJYMljmPA 00:17:54 worker 2 GET item 10 UC6DbLEHgTj6VK7LvtzoGSIw try 1 host 5 invidiou.site 00:17:54 worker 3 GET item 8 UCs8mbJ-M142ZskR5VR0gBig try 1 host 0 www.youtube.com 00:17:55 +0.4s worker 3 OK 5219 -> 23296 bytes for https://www.youtube.com/feeds/videos.xml?channel_id=UCs8mbJ-M142ZskR5VR0gBig 00:17:55 worker 0 GET item 9 UC__UuPAX7TvF3hGYb5ciVpQ try 1 host 3 invidious.fdn.fr 00:17:55 worker 1 GET item 11 UC3-LD6DAgLsKSuowdrDOxRg try 1 host 2 invidious.13ad.de 00:17:55 +0.6s worker 0 OK 3124 -> 27038 bytes for https://invidious.fdn.fr/feed/channel/UC__UuPAX7TvF3hGYb5ciVpQ 00:17:55 +1.7s worker 2 OK 18291 -> 92505 bytes for https://invidiou.site/feed/channel/UC6DbLEHgTj6VK7LvtzoGSIw 00:17:56 +0.8s worker 1 OK 1408 -> 7548 bytes for https://invidious.13ad.de/feed/channel/UC3-LD6DAgLsKSuowdrDOxRg 0 failed, 12 ok, 12 total, 2 extra In this run both koeln and iswleuven yielded quick 500 errors so two extra requests were used to retrieve those feeds from other hosts. The reason this is still preliminary is that merging feeds retrieved in this way results in artificially modified merged feeds due to the url substitutions and other feed changes performed by the invidio instances, so to restore meaningful modification support some sort of normalization step is needed.
There is a slight variation in the feeds from invidio instances. Most rewrite yt urls to their domain: <entry> <id>yt:video:cFpPWZRbW4Y</id> <yt:videoId>cFpPWZRbW4Y</yt:videoId> <yt:channelId>UC6DbLEHgTj6VK7LvtzoGSIw</yt:channelId> <title>Update: I'm still alive</title> <link rel="alternate" href="http://invidious.13ad.de/watch?v=cFpPWZRbW4Y"/> <author> <name>Ray Ramses</name> <uri>http://invidious.13ad.de/channel/UC6DbLEHgTj6VK7LvtzoGSIw</uri> </author> <content type="xhtml"> <div xmlns="http://www.w3.org/1999/xhtml"> <a href="http://invidious.13ad.de/watch?v=cFpPWZRbW4Y"> <img src="http://invidious.13ad.de/vi/cFpPWZRbW4Y/mqdefault.jpg"/> </a> <p style="word-break:break-word;white-space:pre-wrap">...</p> </div> </content> <published>2020-07-04T19:19:00+00:00</published> <media:group> <media:title>Update: I'm still alive</media:title> <media:thumbnail url="http://invidious.13ad.de/vi/cFpPWZRbW4Y/mqdefault.jpg" width="320" height="180"/> <media:description>...</media:description> </media:group> <media:community> <media:statistics views="923"/> </media:community> </entry> but some, like iswleuven, leave the urls relative: <entry> <id>yt:video:ywMZ1vPOswQ</id> <yt:videoId>ywMZ1vPOswQ</yt:videoId> <yt:channelId>UC3-LD6DAgLsKSuowdrDOxRg</yt:channelId> <title>What is Commodity Production?</title> <link rel="alternate" href="/watch?v=ywMZ1vPOswQ"/> <author> <name>Proletarian U</name> <uri>/channel/UC3-LD6DAgLsKSuowdrDOxRg</uri> </author> <content type="xhtml"> <div xmlns="http://www.w3.org/1999/xhtml"> <a href="/watch?v=ywMZ1vPOswQ"> <img src="/vi/ywMZ1vPOswQ/mqdefault.jpg"/> </a> <p style="word-break:break-word;white-space:pre-wrap">...</p> </div> </content> <published>2020-08-31T12:43:18+00:00</published> <media:group> <media:title>What is Commodity Production?</media:title> <media:thumbnail url="/vi/ywMZ1vPOswQ/mqdefault.jpg" width="320" height="180"/> <media:description>...</media:description> </media:group> <media:community> <media:statistics views="51"/> </media:community> </entry> Entries retrieved directly from youtube have 'updated' and media:starRating >>4500, use a one space indent and place media:community and media:content inside media:group. <entry> <id>yt:video:qtrR8IaOWvs</id> <yt:videoId>qtrR8IaOWvs</yt:videoId> <yt:channelId>UCs8mbJ-M142ZskR5VR0gBig</yt:channelId> <title>Capitalism makes sh!t products | Planned obsolescence and the inadequacy of market incentives.</title> <link rel="alternate" href="https://www.youtube.com/watch?v=qtrR8IaOWvs"/> <author> <name>YUGOPNIK</name> <uri>https://www.youtube.com/channel/UCs8mbJ-M142ZskR5VR0gBig</uri> </author> <published>2020-08-19T17:30:01+00:00</published> <updated>2020-08-21T05:14:54+00:00</updated> <media:group> <media:title>Capitalism makes sh!t products | Planned obsolescence and the inadequacy of market incentives.</media:title> <media:content url="https://www.youtube.com/v/qtrR8IaOWvs?version=3" type="application/x-shockwave-flash" width="640" height="390"/> <media:thumbnail url="https://i2.ytimg.com/vi/qtrR8IaOWvs/hqdefault.jpg" width="480" height="360"/> <media:description>...</media:description> <media:community> <media:starRating count="1609" average="4.85" min="1" max="5"/> <media:statistics views="12770"/> </media:community> </media:group> </entry>
As preparation for the normalization step >>4674 the modification detection has been switched from comparing bytes >>4555 to comparing feed entry IDs. The koeln instance is temporarily commented out for 503 Maintenance. http://0x0.st/i65O.py This means that things like indent changes >>4677 and increases in 'views' of media:statistics will be ignored. Now if the same feed ID list is retrieved a second time and the component feeds have no new entries, the merged output is not modified even though host selection for feed retrieval changes dynamically based on the response times >>4674 in the current run. $ python3 rsstool.py getidlist feedids.txt | tee log.txt [...] $ python3 rsstool.py merge $(ls feeds2 | grep -E -e '^.{24}[.]xml$ ' | sed -e 's/^/feeds2\//') new merged.xml $ python3 rsstool.py getidlist feedids.txt | tee log.txt [...] $ python3 rsstool.py merge $(ls feeds2 | grep -E -e '^.{24}[.]xml$ ' | sed -e 's/^/feeds2\//') unchanged merged.xml However this merged output is not yet suitable for consumption by an rss client until normalization is added.
Added http.client.HTTPException which urllib.request.urlopen can also raise. Switched the merge modification detection to (id, url) pairs, but this may change again. http://0x0.st/iIHb.py 11:27:59 worker 1 GET item 1 UCSm1_XO-zvR0ToSJYMljmPA try 1 host 1 invidious.13ad.de 11:28:00 +0.5s worker 1 ERROR Remote end closed connection without response for https://invidious.13ad.de/feed/channel/UCSm1_XO-zvR0ToSJYMljmPA
Is it possible to load twitch streams in newsboat?
Only newsboat is bunkerchan approved.
Anons,brainlet here: could you give me any update on the state of this project in layman terms?
Added the normalization step to convert both ways between yt and invidio feeds. Added a host score summary to rate hosts, in addition to the time deltas >>4615 already printed by the worker threads. http://0x0.st/iImn.py The score summary columns are the number of download attempts, the total host score and the per-download host score. The latter is the sort column and a lower score is better. The operation selectors are g=get n=normalize m=merge. The order is ignored and they only control which steps of the get-normalize-merge pipeline are enabled. To get the feeds, normalize to invidious.snopyta.org and merge: $ python3 rsstool.py ids gnm feedids.txt invidious.snopyta.org | tee log.txt 12:07:30 worker 3 GET item 3 UCqilwNYrajK0djOx3sIh01A try 1 host 3 invidious.fdn.fr [...] 12:07:38 +7.0s worker 0 OK 2513 -> 25158 bytes for https://invidious.snopyta.org/feed/channel/UCSrad2ah3GKKDLK3_j0hogg 0 failed, 12 ok, 12 total, 3 extra score 3 1.2 0.4 www.youtube.com score 3 2.4 0.8 invidious.fdn.fr score 2 1.7 0.9 yt.iswleuven.be score 2 3.3 1.7 invidiou.site score 1 1.8 1.8 invidious.site score 1 7.0 7.0 invidious.snopyta.org score 1 300.4 300.4 invidious.13ad.de score 1 300.5 300.5 invidious.ggc-project.de score 1 301.6 301.6 invidious.xyz normalize feeds2/UCd1Ze_UknxhxpK9Lvi5rjYw.xml [...] normalize feeds2/UC3-LD6DAgLsKSuowdrDOxRg.xml new merged2.xml To renormalize existing feeds to www.youtube.com and merge: $ python3 rsstool.py ids nm feedids.txt www.youtube.com normalize feeds2/UCd1Ze_UknxhxpK9Lvi5rjYw.xml [...] normalize feeds2/UC3-LD6DAgLsKSuowdrDOxRg.xml changed merged2.xml Normalization makes the feeds look like they came from the specified domain, while using all the domains for parallel retrieval >>4674. Renormalization can be run as many times as desired. Remember to set C.USER_AGENT to some Apple device user agent string. https://html.duckduckgo.com/html?q=iphone%20user%20agent%20string The merged output is in merged2.xml by default and can be passed to an rss client. This now replaces the old getlistmerge >>4538 mode.
Added xsl stylesheet support controlled by C.ADD_STYLESHEET and a simple demo stylesheet to view the merged feed in a browser without needing an rss client. http://0x0.st/iI4b.tgz To use another xsl stylesheet move rss.xml out of the way and replace it with the other one. To get, normalize and merge the sample feed ids run: $ python3 rsstool.py ids gnm demo-ids.txt invidious.snopyta.org | tee log.txt Drag merged2.xml onto a browser tab to see a summary of the merged feed. If you get 'merged feed comrade' and word soup instead your browser has not applied the stylesheet. Check dev tools -> console to see whether cors, noscript or similar blocked the xsl. If you see 'CORS request not HTTP' you can toggle privacy.file_unique_origin in about:config at your option, then reload the feed tab. https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS/Errors/CORSRequestNotHttp Remember to set C.USER_AGENT to some Apple device user agent string. https://html.duckduckgo.com/html?q=iphone%20user%20agent%20string
Added expandable media:description blocks to the xslt view of the merged feed. Added lazy loading of images to the normalization step to keep the browser from spamming the host with image requests. http://0x0.st/iIEB.tgz

Delete
Report

no cookies?