On =Mar 24, 2016, at 11:41 AM, Jym Duane <jym@purposemedia.com> wrote:=20 =20curious how to determine...non google/bing/yahoo bots and other attempting to crawl/copy the entire site?
On 3/24/2016 9:28 AM, Brian Burton wrote:
Noah,
Similar to you, and wanting to use pretty URLs I built something similar, but did it a different way._All_ page requests are caught by a url-rewrite =rule and get sent to dispatch.tplDispatch.tpl has hundreds of rules that decide =what page to show, and uses includes to do it.(this keeps everything in-house to webdna so i =don=E2=80=99t have to go mucking about in webdna here, and apache there, and linux somewhere else, and etc=E2=80=A6)
Three special circumstances came up that needed special code to send out proper HTTP status codes:
<!=E2=80=94 for page URLS that have permanently =moved (webdna sends out a 302 temporarily moved code on a redirect) =E2=80=94>[function name=3D301public][text]eol=3D[unurl]%0D%0A[/unurl][/text][returnraw]HTTP/1.1 301 Moved Permanently[eol]Location: http://www.example.com[link][eol][eol][/returnraw][/function]
<!=E2=80=94 I send this to non =google/bing/yahoo bots and other attempting to crawl/copy the entire =site=E2=80=94>[function name=3D404hard][text]eol=3D[unurl]%0D%0A[/unurl][/text][returnraw]HTTP/1.0 404 Not Found[eol]Status: =404 Not Found[eol]Content-type: =text/html[eol][eol]<html>[eol]<body>[eol]<h1>404 Not Found</h1>[eol]The page that you have requested ([thisurl]) could not be found.[eol]</body>[eol]</html>[/returnraw][/function]
<!=E2=80=94 and finally a pretty 404 page for =humans =E2=80=94>[function name=3D404soft][text]eol=3D[unurl]%0D%0A[/unurl][/text][returnraw]HTTP/1.0 404 Not Found[eol]Status: =404 Not Found[eol]Content-type: text/html[eol][eol][include file=3D/404pretty.tpl][/returnraw][/function]
Hope this helps-Brian B. Burton
|
On =Mar 24, 2016, at 11:41 AM, Jym Duane <jym@purposemedia.com> wrote:=20 =20curious how to determine...non google/bing/yahoo bots and other attempting to crawl/copy the entire site?
On 3/24/2016 9:28 AM, Brian Burton wrote:
Noah,
Similar to you, and wanting to use pretty URLs I built something similar, but did it a different way._All_ page requests are caught by a url-rewrite =rule and get sent to dispatch.tplDispatch.tpl has hundreds of rules that decide =what page to show, and uses includes to do it.(this keeps everything in-house to webdna so i =don=E2=80=99t have to go mucking about in webdna here, and apache there, and linux somewhere else, and etc=E2=80=A6)
Three special circumstances came up that needed special code to send out proper HTTP status codes:
<!=E2=80=94 for page URLS that have permanently =moved (webdna sends out a 302 temporarily moved code on a redirect) =E2=80=94>[function name=3D301public][returnraw]HTTP/1.1 301 Moved Permanently[eol]Location: http://www.example.com[link][eol][eol][/returnraw][/function]
<!=E2=80=94 I send this to non =google/bing/yahoo bots and other attempting to crawl/copy the entire =site=E2=80=94>[function name=3D404hard][returnraw]HTTP/1.0 404 Not Found[eol]Status: =404 Not Found[eol]Content-type: =text/html[eol][eol]<html>[eol]<body>[eol]<h1>404 Not Found</h1>[eol]The page that you have requested ([thisurl]) could not be found.[eol]</body>[eol]</html>[/returnraw][/function]
<!=E2=80=94 and finally a pretty 404 page for =humans =E2=80=94>[function name=3D404soft][returnraw]HTTP/1.0 404 Not Found[eol]Status: =404 Not Found[eol]Content-type: text/html[eol][eol][include file=3D/404pretty.tpl][/returnraw][/function]
Hope this helps-Brian B. Burton
DOWNLOAD WEBDNA NOW!
The WebDNA community talk-list is the best place to get some help: several hundred extremely proficient programmers with an excellent knowledge of WebDNA and an excellent spirit will deliver all the tips and tricks you can imagine...