Re: [BULK] Re: [WebDNA] set HTTP-Status Code from webdna

This WebDNA talk-list message is from

2016


It keeps the original formatting.
numero = 112675
interpreted = N
texte = 258 What about using [referrer] to allow your customers navigate your = website but disallow bookmarking and outside links? you could also use = [session] to limit the navigation to X minutes or Y pages, even for = bots, then "kick" the visitor out. - chris > On Mar 24, 2016, at 20:30, Brian Burton wrote: >=20 > Backstory: the site is question is a replacements part business and = has hundreds of thousands of pages of cross reference material, all = stored in databases and generated as needed. Competitors and dealers = that carry competitors brand parts seem to think that copying our cross = reference is easier then creating their own (it would be) so code was = written to block this. >=20 > YES, I KNOW that if they are determined, they will find a way around = my blockades (I=E2=80=99ve seen quite a few variations on this: tor, = AWS, other VPNs=E2=80=A6)=20 >=20 > Solution: looking at the stats for the average use of the website, we = found that 95% of the site traffic visited 14 pages or less. So=E2=80=A6=20= > I have a visitors.db. The system logs all page requests tracked by IP = address, and after a set amount (more then 14 pages, but still a pretty = low number) starts showing visitors a nice Page Limit Exceeded page = instead of what they were crawling thru. After an unreasonable number of = pages I just 404 them out to save server time and bandwidth. The count = resets at midnight, because I=E2=80=99m far to lazy to track 24 hours = since the first or last page request (per IP.) In some cases, when I=E2=80= =99m feeling particularly mischievous, once a bot is detected i start = feeding them fake info :D=20 >=20 > Here=E2=80=99s the Visitors.db header: (not sure if it will help, but = it is what it is) > VIDIPaddippermipnamevisitdatepagecount= starttimeendtimedomainfirstpagelastpagebrowtype= lastskupartnerlinkinpage9page8page7page6page5page4= page3page2page1 >=20 >=20 > All the code that does the tracking and counting and map/reduction to = store stats and stuff is proprietary (sorry) but I=E2=80=99ll see what = (if anything) I can share a bit later, and try to write it up as a blog = post or something. >=20 > -Brian B. Burton >=20 >> On Mar 24, 2016, at 11:41 AM, Jym Duane wrote: >>=20 >> curious how to determine...non google/bing/yahoo bots and other = attempting to crawl/copy the entire site? >>=20 >>=20 >>=20 >> On 3/24/2016 9:28 AM, Brian Burton wrote: >>> Noah,=20 >>>=20 >>> Similar to you, and wanting to use pretty URLs I built something = similar, but did it a different way. >>> _All_ page requests are caught by a url-rewrite rule and get sent to = dispatch.tpl >>> Dispatch.tpl has hundreds of rules that decide what page to show, = and uses includes to do it.=20 >>> (this keeps everything in-house to webdna so i don=E2=80=99t have to = go mucking about in webdna here, and apache there, and linux somewhere = else, and etc=E2=80=A6)=20 >>>=20 >>> Three special circumstances came up that needed special code to send = out proper HTTP status codes: >>>=20 >>> >>> [function name=3D301public] >>> [text]eol=3D[unurl]%0D%0A[/unurl][/text] >>> [returnraw]HTTP/1.1 301 Moved Permanently[eol]Location: = http://www.example.com[link][eol][eol][/returnraw] >>> [/function] >>>=20 >>> >>> [function name=3D404hard] >>> [text]eol=3D[unurl]%0D%0A[/unurl][/text] >>> [returnraw]HTTP/1.0 404 Not Found[eol]Status: 404 Not = Found[eol]Content-type: text/html[eol][eol][eol][eol]

404 = Not Found

[eol]The page that you have requested ([thisurl]) could = not be found.[eol][eol][/returnraw] >>> [/function] >>>=20 >>> >>> [function name=3D404soft] >>> [text]eol=3D[unurl]%0D%0A[/unurl][/text] >>> [returnraw]HTTP/1.0 404 Not Found[eol]Status: 404 Not = Found[eol]Content-type: text/html[eol][eol][include = file=3D/404pretty.tpl][/returnraw] >>> [/function] >>>=20 >>> Hope this helps >>> -Brian B. Burton >=20 --------------------------------------------------------- This message is sent to you because you are subscribed to the mailing list . To unsubscribe, E-mail to: archives: http://mail.webdna.us/list/talk@webdna.us Bug Reporting: support@webdna.us . Associated Messages, from the most recent to the oldest:

    
258 What about using [referrer] to allow your customers navigate your = website but disallow bookmarking and outside links? you could also use = [session] to limit the navigation to X minutes or Y pages, even for = bots, then "kick" the visitor out. - chris > On Mar 24, 2016, at 20:30, Brian Burton wrote: >=20 > Backstory: the site is question is a replacements part business and = has hundreds of thousands of pages of cross reference material, all = stored in databases and generated as needed. Competitors and dealers = that carry competitors brand parts seem to think that copying our cross = reference is easier then creating their own (it would be) so code was = written to block this. >=20 > YES, I KNOW that if they are determined, they will find a way around = my blockades (I=E2=80=99ve seen quite a few variations on this: tor, = AWS, other VPNs=E2=80=A6)=20 >=20 > Solution: looking at the stats for the average use of the website, we = found that 95% of the site traffic visited 14 pages or less. So=E2=80=A6=20= > I have a visitors.db. The system logs all page requests tracked by IP = address, and after a set amount (more then 14 pages, but still a pretty = low number) starts showing visitors a nice Page Limit Exceeded page = instead of what they were crawling thru. After an unreasonable number of = pages I just 404 them out to save server time and bandwidth. The count = resets at midnight, because I=E2=80=99m far to lazy to track 24 hours = since the first or last page request (per IP.) In some cases, when I=E2=80= =99m feeling particularly mischievous, once a bot is detected i start = feeding them fake info :D=20 >=20 > Here=E2=80=99s the Visitors.db header: (not sure if it will help, but = it is what it is) > VIDIPaddippermipnamevisitdatepagecount= starttimeendtimedomainfirstpagelastpagebrowtype= lastskupartnerlinkinpage9page8page7page6page5page4= page3page2page1 >=20 >=20 > All the code that does the tracking and counting and map/reduction to = store stats and stuff is proprietary (sorry) but I=E2=80=99ll see what = (if anything) I can share a bit later, and try to write it up as a blog = post or something. >=20 > -Brian B. Burton >=20 >> On Mar 24, 2016, at 11:41 AM, Jym Duane wrote: >>=20 >> curious how to determine...non google/bing/yahoo bots and other = attempting to crawl/copy the entire site? >>=20 >>=20 >>=20 >> On 3/24/2016 9:28 AM, Brian Burton wrote: >>> Noah,=20 >>>=20 >>> Similar to you, and wanting to use pretty URLs I built something = similar, but did it a different way. >>> _All_ page requests are caught by a url-rewrite rule and get sent to = dispatch.tpl >>> Dispatch.tpl has hundreds of rules that decide what page to show, = and uses includes to do it.=20 >>> (this keeps everything in-house to webdna so i don=E2=80=99t have to = go mucking about in webdna here, and apache there, and linux somewhere = else, and etc=E2=80=A6)=20 >>>=20 >>> Three special circumstances came up that needed special code to send = out proper HTTP status codes: >>>=20 >>> >>> [function name=3D301public] >>> [text]eol=3D[unurl]%0D%0A[/unurl][/text] >>> [returnraw]HTTP/1.1 301 Moved Permanently[eol]Location: = http://www.example.com[link][eol][eol][/returnraw] >>> [/function] >>>=20 >>> >>> [function name=3D404hard] >>> [text]eol=3D[unurl]%0D%0A[/unurl][/text] >>> [returnraw]HTTP/1.0 404 Not Found[eol]Status: 404 Not = Found[eol]Content-type: text/html[eol][eol][eol][eol]

404 = Not Found

[eol]The page that you have requested ([thisurl]) could = not be found.[eol][eol][/returnraw] >>> [/function] >>>=20 >>> >>> [function name=3D404soft] >>> [text]eol=3D[unurl]%0D%0A[/unurl][/text] >>> [returnraw]HTTP/1.0 404 Not Found[eol]Status: 404 Not = Found[eol]Content-type: text/html[eol][eol][include = file=3D/404pretty.tpl][/returnraw] >>> [/function] >>>=20 >>> Hope this helps >>> -Brian B. Burton >=20 --------------------------------------------------------- This message is sent to you because you are subscribed to the mailing list . To unsubscribe, E-mail to: archives: http://mail.webdna.us/list/talk@webdna.us Bug Reporting: support@webdna.us . christophe.billiottet@webdna.us

DOWNLOAD WEBDNA NOW!

Top Articles:

Talk List

The WebDNA community talk-list is the best place to get some help: several hundred extremely proficient programmers with an excellent knowledge of WebDNA and an excellent spirit will deliver all the tips and tricks you can imagine...

Related Readings:

WebCatalog NT beta 18 now available (1997) encryption madness (2003) [/application] error? (1997) Pithy questions on webcommerce & siteedit (1997) [searchString] (1997) passing info to correct frame (2000) Email/Order to fax? (2003) WCS Newbie question (1997) fixed date problem (1997) Signal Raised (1997) WebCat2b13MacPlugIn - More limits on [include] (1997) Was cart limit-- Limits (2002) stopping/restarting service on NT? (1998) Multiple prices (1997) any limitations on # of users.db entries (1999) Announcing ---- WebDNA Developers Network -http://www.webdnadev.net (2002) [WebDNA] why is the webdna.us site slow? (sometimes?) (2011) WebCatalog for Postcards ? (1997) Bug? (1997) WebCat2b13MacPlugIn - [include] (1997)