|
楼主 |
发表于 2012-5-22 00:18:48
|
显示全部楼层
Actually the script isn't that much manual work, and it wouldn't be much of a strain on the server. As the best way to do it, is it would scrape all files when they are uploaded/edited. This means no matter what they tried, you would disallow them at the source, and could even delete the file before it goes live. (A lot of free hosts already have this software in place for malicious files, such as blocking .exe files, .rar files, and many other keyword based protections against torrent scripts and such).
What I would do is have it scrape for movie names/TV shows and other warez related material... You can easily setup a database by scraping lists, which would take an hour or two at most to setup, of which you would just be watching over the lists as they pile up.
From here you will make it so that if it finds the keyword, it will then scrape the rest of the page to look for any filehost or video streaming places. Again, it is easy to get a database list by scraping or even manually inputting the sites.
If the file gives two red flags, then do not allow the file on your server.
There you go. A nice simple way to eliminate any copyrighted material at the source.
Any more coding issues you think it may have?
This is not a perfect solution as it could block people simply pointing to a trailer. But you could easily allow certain streaming places such as youtube, as they will delete almost all the copyrighted content within a day anyway. Again, there are ways to program an algorithm to allow trailers and such, but that's getting even more advanced. |
|