Re: [BULK] Re: [WebDNA] set HTTP-Status Code from webdna

This WebDNA talk-list message is from

2016


It keeps the original formatting.
numero = 112675
interpreted = N
texte = 258 What about using [referrer] to allow your customers navigate your = website but disallow bookmarking and outside links? you could also use = [session] to limit the navigation to X minutes or Y pages, even for = bots, then "kick" the visitor out. - chris > On Mar 24, 2016, at 20:30, Brian Burton wrote: >=20 > Backstory: the site is question is a replacements part business and = has hundreds of thousands of pages of cross reference material, all = stored in databases and generated as needed. Competitors and dealers = that carry competitors brand parts seem to think that copying our cross = reference is easier then creating their own (it would be) so code was = written to block this. >=20 > YES, I KNOW that if they are determined, they will find a way around = my blockades (I=E2=80=99ve seen quite a few variations on this: tor, = AWS, other VPNs=E2=80=A6)=20 >=20 > Solution: looking at the stats for the average use of the website, we = found that 95% of the site traffic visited 14 pages or less. So=E2=80=A6=20= > I have a visitors.db. The system logs all page requests tracked by IP = address, and after a set amount (more then 14 pages, but still a pretty = low number) starts showing visitors a nice Page Limit Exceeded page = instead of what they were crawling thru. After an unreasonable number of = pages I just 404 them out to save server time and bandwidth. The count = resets at midnight, because I=E2=80=99m far to lazy to track 24 hours = since the first or last page request (per IP.) In some cases, when I=E2=80= =99m feeling particularly mischievous, once a bot is detected i start = feeding them fake info :D=20 >=20 > Here=E2=80=99s the Visitors.db header: (not sure if it will help, but = it is what it is) > VIDIPaddippermipnamevisitdatepagecount= starttimeendtimedomainfirstpagelastpagebrowtype= lastskupartnerlinkinpage9page8page7page6page5page4= page3page2page1 >=20 >=20 > All the code that does the tracking and counting and map/reduction to = store stats and stuff is proprietary (sorry) but I=E2=80=99ll see what = (if anything) I can share a bit later, and try to write it up as a blog = post or something. >=20 > -Brian B. Burton >=20 >> On Mar 24, 2016, at 11:41 AM, Jym Duane wrote: >>=20 >> curious how to determine...non google/bing/yahoo bots and other = attempting to crawl/copy the entire site? >>=20 >>=20 >>=20 >> On 3/24/2016 9:28 AM, Brian Burton wrote: >>> Noah,=20 >>>=20 >>> Similar to you, and wanting to use pretty URLs I built something = similar, but did it a different way. >>> _All_ page requests are caught by a url-rewrite rule and get sent to = dispatch.tpl >>> Dispatch.tpl has hundreds of rules that decide what page to show, = and uses includes to do it.=20 >>> (this keeps everything in-house to webdna so i don=E2=80=99t have to = go mucking about in webdna here, and apache there, and linux somewhere = else, and etc=E2=80=A6)=20 >>>=20 >>> Three special circumstances came up that needed special code to send = out proper HTTP status codes: >>>=20 >>> >>> [function name=3D301public] >>> [text]eol=3D[unurl]%0D%0A[/unurl][/text] >>> [returnraw]HTTP/1.1 301 Moved Permanently[eol]Location: = http://www.example.com[link][eol][eol][/returnraw] >>> [/function] >>>=20 >>> >>> [function name=3D404hard] >>> [text]eol=3D[unurl]%0D%0A[/unurl][/text] >>> [returnraw]HTTP/1.0 404 Not Found[eol]Status: 404 Not = Found[eol]Content-type: text/html[eol][eol][eol][eol]

404 = Not Found

[eol]The page that you have requested ([thisurl]) could = not be found.[eol][eol][/returnraw] >>> [/function] >>>=20 >>> >>> [function name=3D404soft] >>> [text]eol=3D[unurl]%0D%0A[/unurl][/text] >>> [returnraw]HTTP/1.0 404 Not Found[eol]Status: 404 Not = Found[eol]Content-type: text/html[eol][eol][include = file=3D/404pretty.tpl][/returnraw] >>> [/function] >>>=20 >>> Hope this helps >>> -Brian B. Burton >=20 --------------------------------------------------------- This message is sent to you because you are subscribed to the mailing list . To unsubscribe, E-mail to: archives: http://mail.webdna.us/list/talk@webdna.us Bug Reporting: support@webdna.us . Associated Messages, from the most recent to the oldest:

    
258 What about using [referrer] to allow your customers navigate your = website but disallow bookmarking and outside links? you could also use = [session] to limit the navigation to X minutes or Y pages, even for = bots, then "kick" the visitor out. - chris > On Mar 24, 2016, at 20:30, Brian Burton wrote: >=20 > Backstory: the site is question is a replacements part business and = has hundreds of thousands of pages of cross reference material, all = stored in databases and generated as needed. Competitors and dealers = that carry competitors brand parts seem to think that copying our cross = reference is easier then creating their own (it would be) so code was = written to block this. >=20 > YES, I KNOW that if they are determined, they will find a way around = my blockades (I=E2=80=99ve seen quite a few variations on this: tor, = AWS, other VPNs=E2=80=A6)=20 >=20 > Solution: looking at the stats for the average use of the website, we = found that 95% of the site traffic visited 14 pages or less. So=E2=80=A6=20= > I have a visitors.db. The system logs all page requests tracked by IP = address, and after a set amount (more then 14 pages, but still a pretty = low number) starts showing visitors a nice Page Limit Exceeded page = instead of what they were crawling thru. After an unreasonable number of = pages I just 404 them out to save server time and bandwidth. The count = resets at midnight, because I=E2=80=99m far to lazy to track 24 hours = since the first or last page request (per IP.) In some cases, when I=E2=80= =99m feeling particularly mischievous, once a bot is detected i start = feeding them fake info :D=20 >=20 > Here=E2=80=99s the Visitors.db header: (not sure if it will help, but = it is what it is) > VIDIPaddippermipnamevisitdatepagecount= starttimeendtimedomainfirstpagelastpagebrowtype= lastskupartnerlinkinpage9page8page7page6page5page4= page3page2page1 >=20 >=20 > All the code that does the tracking and counting and map/reduction to = store stats and stuff is proprietary (sorry) but I=E2=80=99ll see what = (if anything) I can share a bit later, and try to write it up as a blog = post or something. >=20 > -Brian B. Burton >=20 >> On Mar 24, 2016, at 11:41 AM, Jym Duane wrote: >>=20 >> curious how to determine...non google/bing/yahoo bots and other = attempting to crawl/copy the entire site? >>=20 >>=20 >>=20 >> On 3/24/2016 9:28 AM, Brian Burton wrote: >>> Noah,=20 >>>=20 >>> Similar to you, and wanting to use pretty URLs I built something = similar, but did it a different way. >>> _All_ page requests are caught by a url-rewrite rule and get sent to = dispatch.tpl >>> Dispatch.tpl has hundreds of rules that decide what page to show, = and uses includes to do it.=20 >>> (this keeps everything in-house to webdna so i don=E2=80=99t have to = go mucking about in webdna here, and apache there, and linux somewhere = else, and etc=E2=80=A6)=20 >>>=20 >>> Three special circumstances came up that needed special code to send = out proper HTTP status codes: >>>=20 >>> >>> [function name=3D301public] >>> [text]eol=3D[unurl]%0D%0A[/unurl][/text] >>> [returnraw]HTTP/1.1 301 Moved Permanently[eol]Location: = http://www.example.com[link][eol][eol][/returnraw] >>> [/function] >>>=20 >>> >>> [function name=3D404hard] >>> [text]eol=3D[unurl]%0D%0A[/unurl][/text] >>> [returnraw]HTTP/1.0 404 Not Found[eol]Status: 404 Not = Found[eol]Content-type: text/html[eol][eol][eol][eol]

404 = Not Found

[eol]The page that you have requested ([thisurl]) could = not be found.[eol][eol][/returnraw] >>> [/function] >>>=20 >>> >>> [function name=3D404soft] >>> [text]eol=3D[unurl]%0D%0A[/unurl][/text] >>> [returnraw]HTTP/1.0 404 Not Found[eol]Status: 404 Not = Found[eol]Content-type: text/html[eol][eol][include = file=3D/404pretty.tpl][/returnraw] >>> [/function] >>>=20 >>> Hope this helps >>> -Brian B. Burton >=20 --------------------------------------------------------- This message is sent to you because you are subscribed to the mailing list . To unsubscribe, E-mail to: archives: http://mail.webdna.us/list/talk@webdna.us Bug Reporting: support@webdna.us . christophe.billiottet@webdna.us

DOWNLOAD WEBDNA NOW!

Top Articles:

Talk List

The WebDNA community talk-list is the best place to get some help: several hundred extremely proficient programmers with an excellent knowledge of WebDNA and an excellent spirit will deliver all the tips and tricks you can imagine...

Related Readings:

Secure Server (1997) REALLY need a hand (1999) ShowNext (1997) RE: Pop-Up Redirects (1998) [WebDNA] Another strange WebDNA problem (2013) Not carrying Zero (2000) RE: Can't get appendfile to work (1997) WebCat2b12--[searchstring] bug (1997) How Many SKU's is enough? (1997) [REFERRER] bug? (1998) Bad "from" address in sendmail (2004) Calculating multiple shipping... (1997) test (2001) using showpage and showcart commands (1996) searching multiple databases in single search (1997) unique ID (1997) Size limit for tmpl editor ? (1997) WebCatalog for Postcards ? (1997) Webcat causing crashes left and right! (1997) Date Bug (1998)