Re: [BULK] Re: [WebDNA] set HTTP-Status Code from webdna

This WebDNA talk-list message is from

2016


It keeps the original formatting.
numero = 112675
interpreted = N
texte = 258 What about using [referrer] to allow your customers navigate your = website but disallow bookmarking and outside links? you could also use = [session] to limit the navigation to X minutes or Y pages, even for = bots, then "kick" the visitor out. - chris > On Mar 24, 2016, at 20:30, Brian Burton wrote: >=20 > Backstory: the site is question is a replacements part business and = has hundreds of thousands of pages of cross reference material, all = stored in databases and generated as needed. Competitors and dealers = that carry competitors brand parts seem to think that copying our cross = reference is easier then creating their own (it would be) so code was = written to block this. >=20 > YES, I KNOW that if they are determined, they will find a way around = my blockades (I=E2=80=99ve seen quite a few variations on this: tor, = AWS, other VPNs=E2=80=A6)=20 >=20 > Solution: looking at the stats for the average use of the website, we = found that 95% of the site traffic visited 14 pages or less. So=E2=80=A6=20= > I have a visitors.db. The system logs all page requests tracked by IP = address, and after a set amount (more then 14 pages, but still a pretty = low number) starts showing visitors a nice Page Limit Exceeded page = instead of what they were crawling thru. After an unreasonable number of = pages I just 404 them out to save server time and bandwidth. The count = resets at midnight, because I=E2=80=99m far to lazy to track 24 hours = since the first or last page request (per IP.) In some cases, when I=E2=80= =99m feeling particularly mischievous, once a bot is detected i start = feeding them fake info :D=20 >=20 > Here=E2=80=99s the Visitors.db header: (not sure if it will help, but = it is what it is) > VIDIPaddippermipnamevisitdatepagecount= starttimeendtimedomainfirstpagelastpagebrowtype= lastskupartnerlinkinpage9page8page7page6page5page4= page3page2page1 >=20 >=20 > All the code that does the tracking and counting and map/reduction to = store stats and stuff is proprietary (sorry) but I=E2=80=99ll see what = (if anything) I can share a bit later, and try to write it up as a blog = post or something. >=20 > -Brian B. Burton >=20 >> On Mar 24, 2016, at 11:41 AM, Jym Duane wrote: >>=20 >> curious how to determine...non google/bing/yahoo bots and other = attempting to crawl/copy the entire site? >>=20 >>=20 >>=20 >> On 3/24/2016 9:28 AM, Brian Burton wrote: >>> Noah,=20 >>>=20 >>> Similar to you, and wanting to use pretty URLs I built something = similar, but did it a different way. >>> _All_ page requests are caught by a url-rewrite rule and get sent to = dispatch.tpl >>> Dispatch.tpl has hundreds of rules that decide what page to show, = and uses includes to do it.=20 >>> (this keeps everything in-house to webdna so i don=E2=80=99t have to = go mucking about in webdna here, and apache there, and linux somewhere = else, and etc=E2=80=A6)=20 >>>=20 >>> Three special circumstances came up that needed special code to send = out proper HTTP status codes: >>>=20 >>> >>> [function name=3D301public] >>> [text]eol=3D[unurl]%0D%0A[/unurl][/text] >>> [returnraw]HTTP/1.1 301 Moved Permanently[eol]Location: = http://www.example.com[link][eol][eol][/returnraw] >>> [/function] >>>=20 >>> >>> [function name=3D404hard] >>> [text]eol=3D[unurl]%0D%0A[/unurl][/text] >>> [returnraw]HTTP/1.0 404 Not Found[eol]Status: 404 Not = Found[eol]Content-type: text/html[eol][eol][eol][eol]

404 = Not Found

[eol]The page that you have requested ([thisurl]) could = not be found.[eol][eol][/returnraw] >>> [/function] >>>=20 >>> >>> [function name=3D404soft] >>> [text]eol=3D[unurl]%0D%0A[/unurl][/text] >>> [returnraw]HTTP/1.0 404 Not Found[eol]Status: 404 Not = Found[eol]Content-type: text/html[eol][eol][include = file=3D/404pretty.tpl][/returnraw] >>> [/function] >>>=20 >>> Hope this helps >>> -Brian B. Burton >=20 --------------------------------------------------------- This message is sent to you because you are subscribed to the mailing list . To unsubscribe, E-mail to: archives: http://mail.webdna.us/list/talk@webdna.us Bug Reporting: support@webdna.us . Associated Messages, from the most recent to the oldest:

    
258 What about using [referrer] to allow your customers navigate your = website but disallow bookmarking and outside links? you could also use = [session] to limit the navigation to X minutes or Y pages, even for = bots, then "kick" the visitor out. - chris > On Mar 24, 2016, at 20:30, Brian Burton wrote: >=20 > Backstory: the site is question is a replacements part business and = has hundreds of thousands of pages of cross reference material, all = stored in databases and generated as needed. Competitors and dealers = that carry competitors brand parts seem to think that copying our cross = reference is easier then creating their own (it would be) so code was = written to block this. >=20 > YES, I KNOW that if they are determined, they will find a way around = my blockades (I=E2=80=99ve seen quite a few variations on this: tor, = AWS, other VPNs=E2=80=A6)=20 >=20 > Solution: looking at the stats for the average use of the website, we = found that 95% of the site traffic visited 14 pages or less. So=E2=80=A6=20= > I have a visitors.db. The system logs all page requests tracked by IP = address, and after a set amount (more then 14 pages, but still a pretty = low number) starts showing visitors a nice Page Limit Exceeded page = instead of what they were crawling thru. After an unreasonable number of = pages I just 404 them out to save server time and bandwidth. The count = resets at midnight, because I=E2=80=99m far to lazy to track 24 hours = since the first or last page request (per IP.) In some cases, when I=E2=80= =99m feeling particularly mischievous, once a bot is detected i start = feeding them fake info :D=20 >=20 > Here=E2=80=99s the Visitors.db header: (not sure if it will help, but = it is what it is) > VIDIPaddippermipnamevisitdatepagecount= starttimeendtimedomainfirstpagelastpagebrowtype= lastskupartnerlinkinpage9page8page7page6page5page4= page3page2page1 >=20 >=20 > All the code that does the tracking and counting and map/reduction to = store stats and stuff is proprietary (sorry) but I=E2=80=99ll see what = (if anything) I can share a bit later, and try to write it up as a blog = post or something. >=20 > -Brian B. Burton >=20 >> On Mar 24, 2016, at 11:41 AM, Jym Duane wrote: >>=20 >> curious how to determine...non google/bing/yahoo bots and other = attempting to crawl/copy the entire site? >>=20 >>=20 >>=20 >> On 3/24/2016 9:28 AM, Brian Burton wrote: >>> Noah,=20 >>>=20 >>> Similar to you, and wanting to use pretty URLs I built something = similar, but did it a different way. >>> _All_ page requests are caught by a url-rewrite rule and get sent to = dispatch.tpl >>> Dispatch.tpl has hundreds of rules that decide what page to show, = and uses includes to do it.=20 >>> (this keeps everything in-house to webdna so i don=E2=80=99t have to = go mucking about in webdna here, and apache there, and linux somewhere = else, and etc=E2=80=A6)=20 >>>=20 >>> Three special circumstances came up that needed special code to send = out proper HTTP status codes: >>>=20 >>> >>> [function name=3D301public] >>> [text]eol=3D[unurl]%0D%0A[/unurl][/text] >>> [returnraw]HTTP/1.1 301 Moved Permanently[eol]Location: = http://www.example.com[link][eol][eol][/returnraw] >>> [/function] >>>=20 >>> >>> [function name=3D404hard] >>> [text]eol=3D[unurl]%0D%0A[/unurl][/text] >>> [returnraw]HTTP/1.0 404 Not Found[eol]Status: 404 Not = Found[eol]Content-type: text/html[eol][eol][eol][eol]

404 = Not Found

[eol]The page that you have requested ([thisurl]) could = not be found.[eol][eol][/returnraw] >>> [/function] >>>=20 >>> >>> [function name=3D404soft] >>> [text]eol=3D[unurl]%0D%0A[/unurl][/text] >>> [returnraw]HTTP/1.0 404 Not Found[eol]Status: 404 Not = Found[eol]Content-type: text/html[eol][eol][include = file=3D/404pretty.tpl][/returnraw] >>> [/function] >>>=20 >>> Hope this helps >>> -Brian B. Burton >=20 --------------------------------------------------------- This message is sent to you because you are subscribed to the mailing list . To unsubscribe, E-mail to: archives: http://mail.webdna.us/list/talk@webdna.us Bug Reporting: support@webdna.us . christophe.billiottet@webdna.us

DOWNLOAD WEBDNA NOW!

Top Articles:

Talk List

The WebDNA community talk-list is the best place to get some help: several hundred extremely proficient programmers with an excellent knowledge of WebDNA and an excellent spirit will deliver all the tips and tricks you can imagine...

Related Readings:

A little help on e-mail (HELP!!! :-) ) (1998) OFF TOPIC: help wanted (1997) [SearchString] (1998) Webcatalog/Butler (1998) Cart doesn't interpret tag! (1997) template cache problem (1998) [defined]ish (1997) [WebDNA] Emailer breaking format (2015) WebCat2b15MacPlugin - [protect] (1997) help with autenticate (1998) Problems with [Applescript] (1997) Gary Chaison (2003) OT: Textarea (2003) [WebDNA] WebDNA & VPS (2009) A new perspective ... (2005) Calculating multiple shipping... (1997) That Flakey 'Brawl' thing (1997) forms and variables (1998) Listfiles on network with NT (2000) My solution... (1996)