Our VOD-Servers are being crawled by the Googlebot (User Agent “Googlebot-Video/1.0”)
We have many terabytes of data there and we want to avoid that traffic.
I would like to put a robots.txt to prevent the googlebot from continuing.
Wowza runs on its own subdomain so the robots.txt would need to be hosted from Wowza itself.
Is there a good practice for that
Or does wowza support ways of setting up the X-Robots-Tag in the HTTP headers?
Hi @Elio Wahlen robots.txt is added to a web server; WSE is not really a web server though, but you can add custom headers to your HLS chunklist though.
https://www.wowza.com/docs/how-to-add-custom-playlist-headers-to-apple-hls-manifests
Thanks for this good and helpful answer.
Is there also a way for MPEG-DASH and HDS (i know HDS is dying, but we still need to support it for a while for our customers)?
I just saw in our logs that google was using the HDS manifest to access the vod-streams…
Thanks. I had the some issue. Help me a lot . @arquiteto
Dear @Rose Power-Wowza Community Manager
Unfortunately I now realize that you got me wrong.
The chunklist headers are not equivalent to the HTTP headers that I meant. Specifically I am talking about the X-Robots-Tag HTTP header that seem to be standard.
Please see here: https://developers.google.com/search/reference/robots_meta_tag
Is there any way for sending custom HTTP response headers across all HTTP based connections?
If not - what do you recommend? Is reverse-proxying e.g. with nginx a way to go? How would that look like?
I can imagine there are a lot of users that don’t want googlebot and possibly other crawlers to download all their video content because of different reasons (traffic, performance, legal circumstances, etc). It would be wise to have a plan here.
Best wishes, Elio
Apologies @Elio Wahlen. You can add custom http headers to hls/dash/hds by adding
httpUserHTTPHeaders
property to your application (this is similar to the Access-Control-Allow-Origin cors http headers).
Here’s an example of how this can be added:
https://www.wowza.com/docs/how-to-stream-from-an-android-device-to-the-google-chromecast-device
the value for the property in your case would be:
X-Robots-Tag: noarchive
etc. It’s a pipe-delimited list, so you can add multiple headers.
if you prefer to host a robots.txt file, then you would need to have a custom HTTPProvider that handles requests for robots.txt; this is similar to how WSE handles
http://:1935/crossdomain.xml
https://www.wowza.com/docs/how-to-create-an-http-provider
@Rose Power-Wowza Community Manager thanks very much.
httpUserHTTPHeaders works nicely!
I think it would help people to have a general tutorial or manual entry about adding these super useful custom http headers. For now it seems to be hidden in two more special tutorials.
All the best, Elio
Fabulous idea! Thank you so much for the feedback.