Re: [BULK] Re: [WebDNA] set HTTP-Status Code from webdna

This WebDNA talk-list message is from

2016

It keeps the original formatting. numero = 112675
interpreted = N
texte = 258What about using [referrer] to allow your customers navigate your =website but disallow bookmarking and outside links? you could also use =[session] to limit the navigation to X minutes or Y pages, even for =bots, then "kick" the visitor out.- chris> On Mar 24, 2016, at 20:30, Brian Burton wrote:>=20> Backstory: the site is question is a replacements part business and =has hundreds of thousands of pages of cross reference material, all =stored in databases and generated as needed. Competitors and dealers =that carry competitors brand parts seem to think that copying our cross =reference is easier then creating their own (it would be) so code was =written to block this.>=20> YES, I KNOW that if they are determined, they will find a way around =my blockades (I=E2=80=99ve seen quite a few variations on this: tor, =AWS, other VPNs=E2=80=A6)=20>=20> Solution: looking at the stats for the average use of the website, we =found that 95% of the site traffic visited 14 pages or less. So=E2=80=A6=20=> I have a visitors.db. The system logs all page requests tracked by IP =address, and after a set amount (more then 14 pages, but still a pretty =low number) starts showing visitors a nice Page Limit Exceeded page =instead of what they were crawling thru. After an unreasonable number of =pages I just 404 them out to save server time and bandwidth. The count =resets at midnight, because I=E2=80=99m far to lazy to track 24 hours =since the first or last page request (per IP.) In some cases, when I=E2=80==99m feeling particularly mischievous, once a bot is detected i start =feeding them fake info :D=20>=20> Here=E2=80=99s the Visitors.db header: (not sure if it will help, but =it is what it is)> VIDIPaddippermipnamevisitdatepagecount=starttimeendtimedomainfirstpagelastpagebrowtype=lastskupartnerlinkinpage9page8page7page6page5page4=page3page2page1>=20>=20> All the code that does the tracking and counting and map/reduction to =store stats and stuff is proprietary (sorry) but I=E2=80=99ll see what =(if anything) I can share a bit later, and try to write it up as a blog =post or something.>=20> -Brian B. Burton>=20>> On Mar 24, 2016, at 11:41 AM, Jym Duane wrote:>>=20>> curious how to determine...non google/bing/yahoo bots and other =attempting to crawl/copy the entire site?>>=20>>=20>>=20>> On 3/24/2016 9:28 AM, Brian Burton wrote:>>> Noah,=20>>>=20>>> Similar to you, and wanting to use pretty URLs I built something =similar, but did it a different way.>>> _All_ page requests are caught by a url-rewrite rule and get sent to =dispatch.tpl>>> Dispatch.tpl has hundreds of rules that decide what page to show, =and uses includes to do it.=20>>> (this keeps everything in-house to webdna so i don=E2=80=99t have to =go mucking about in webdna here, and apache there, and linux somewhere =else, and etc=E2=80=A6)=20>>>=20>>> Three special circumstances came up that needed special code to send =out proper HTTP status codes:>>>=20>>> >>> [function name=3D301public]>>> [text]eol=3D[unurl]%0D%0A[/unurl][/text]>>> [returnraw]HTTP/1.1 301 Moved Permanently[eol]Location: =http://www.example.com[link][eol][eol][/returnraw]>>> [/function]>>>=20>>> >>> [function name=3D404hard]>>> [text]eol=3D[unurl]%0D%0A[/unurl][/text]>>> [returnraw]HTTP/1.0 404 Not Found[eol]Status: 404 Not =Found[eol]Content-type: text/html[eol][eol][eol][eol]

404 =Not Found

[eol]The page that you have requested ([thisurl]) could =not be found.[eol][eol][/returnraw]>>> [/function]>>>=20>>> >>> [function name=3D404soft]>>> [text]eol=3D[unurl]%0D%0A[/unurl][/text]>>> [returnraw]HTTP/1.0 404 Not Found[eol]Status: 404 Not =Found[eol]Content-type: text/html[eol][eol][include =file=3D/404pretty.tpl][/returnraw]>>> [/function]>>>=20>>> Hope this helps>>> -Brian B. Burton>=20---------------------------------------------------------This message is sent to you because you are subscribed tothe mailing list .To unsubscribe, E-mail to: archives: http://mail.webdna.us/list/talk@webdna.usBug Reporting: support@webdna.us. Associated Messages, from the most recent to the oldest:

258What about using [referrer] to allow your customers navigate your =website but disallow bookmarking and outside links? you could also use =[session] to limit the navigation to X minutes or Y pages, even for =bots, then "kick" the visitor out.- chris> On Mar 24, 2016, at 20:30, Brian Burton wrote:>=20> Backstory: the site is question is a replacements part business and =has hundreds of thousands of pages of cross reference material, all =stored in databases and generated as needed. Competitors and dealers =that carry competitors brand parts seem to think that copying our cross =reference is easier then creating their own (it would be) so code was =written to block this.>=20> YES, I KNOW that if they are determined, they will find a way around =my blockades (I=E2=80=99ve seen quite a few variations on this: tor, =AWS, other VPNs=E2=80=A6)=20>=20> Solution: looking at the stats for the average use of the website, we =found that 95% of the site traffic visited 14 pages or less. So=E2=80=A6=20=> I have a visitors.db. The system logs all page requests tracked by IP =address, and after a set amount (more then 14 pages, but still a pretty =low number) starts showing visitors a nice Page Limit Exceeded page =instead of what they were crawling thru. After an unreasonable number of =pages I just 404 them out to save server time and bandwidth. The count =resets at midnight, because I=E2=80=99m far to lazy to track 24 hours =since the first or last page request (per IP.) In some cases, when I=E2=80==99m feeling particularly mischievous, once a bot is detected i start =feeding them fake info :D=20>=20> Here=E2=80=99s the Visitors.db header: (not sure if it will help, but =it is what it is)> VIDIPaddippermipnamevisitdatepagecount=starttimeendtimedomainfirstpagelastpagebrowtype=lastskupartnerlinkinpage9page8page7page6page5page4=page3page2page1>=20>=20> All the code that does the tracking and counting and map/reduction to =store stats and stuff is proprietary (sorry) but I=E2=80=99ll see what =(if anything) I can share a bit later, and try to write it up as a blog =post or something.>=20> -Brian B. Burton>=20>> On Mar 24, 2016, at 11:41 AM, Jym Duane wrote:>>=20>> curious how to determine...non google/bing/yahoo bots and other =attempting to crawl/copy the entire site?>>=20>>=20>>=20>> On 3/24/2016 9:28 AM, Brian Burton wrote:>>> Noah,=20>>>=20>>> Similar to you, and wanting to use pretty URLs I built something =similar, but did it a different way.>>> _All_ page requests are caught by a url-rewrite rule and get sent to =dispatch.tpl>>> Dispatch.tpl has hundreds of rules that decide what page to show, =and uses includes to do it.=20>>> (this keeps everything in-house to webdna so i don=E2=80=99t have to =go mucking about in webdna here, and apache there, and linux somewhere =else, and etc=E2=80=A6)=20>>>=20>>> Three special circumstances came up that needed special code to send =out proper HTTP status codes:>>>=20>>> >>> [function name=3D301public]>>> [text]eol=3D[unurl]%0D%0A[/unurl][/text]>>> [returnraw]HTTP/1.1 301 Moved Permanently[eol]Location: =http://www.example.com[link][eol][eol][/returnraw]>>> [/function]>>>=20>>> >>> [function name=3D404hard]>>> [text]eol=3D[unurl]%0D%0A[/unurl][/text]>>> [returnraw]HTTP/1.0 404 Not Found[eol]Status: 404 Not =Found[eol]Content-type: text/html[eol][eol][eol][eol]

404 =Not Found

DOWNLOAD WEBDNA NOW!

Re: [BULK] Re: [WebDNA] set HTTP-Status Code from webdna

2016

404 =Not Found

404 =Not Found

Top Articles:

Related Readings:

Re: [BULK] Re: [WebDNA] set HTTP-Status Code from webdna

2016

404 = Not Found

404 = Not Found

Top Articles:

Related Readings:

404 =Not Found

404 =Not Found