What is WebDNA

WebDNA is a scripting and database system designed to easily build web applications.

WebDNA and BioType

BioType service is a biometric keystroke dynamic system. It will be part of WebDNA 8.5

Download WebDNA

Download WebDNA freeware, try it and register later if you want.

WebDNA resources

The list of all WebDNA instructions.
WebDNA
Software Corporation
Search WebDNA Site
 Menu


HOME


DOWNLOADS


LEARN


EDUCATION


NEWS


COMMUNITY


STORE


SUPPORT


CONTACT

Re: [BULK] Re: [WebDNA] set HTTP-Status Code from webdna

This WebDNA talk-list message is from

2016


It keeps the original formatting.
numero = 112677
interpreted = N
texte = 260 --001a11c3c6d2451a77052ecf47d1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi all, Thought I would add my approach to 'pretty' urls using mod_rewrite rather than routing through an error document. Basically everything except images and folders/files that I specify are routed to 'parser.tmpl'. That template then parses the URL and you can the search databases, include files etc. Here's a sample htaccess file with all the mod_rewrite stuff and some other things that people might find useful. - Tom PS. This a great resource on what can be done using the htaccess file https://github.com/h5bp/html5-boilerplate/blob/master/dist/.htaccess # Better website experience for IE Header set X-UA-Compatible "IE=3Dedge" Header unset X-UA-Compatible DirectoryIndex index.html index.tmpl # Proper MIME types for all files AddType application/javascript js AddType application/json json AddType video/mp4 mp4 m4v f4v f4p AddType video/x-flv flv AddType application/font-woff woff AddType application/vnd.ms-fontobject eot AddType application/x-font-ttf ttc ttf AddType font/opentype otf AddType image/svg+xml svg svgz AddEncoding gzip svgz AddType application/x-shockwave-flash swf AddType application/xml atom rdf rss xml AddType image/x-icon ico AddType text/vtt vtt AddType text/x-component htc AddType text/x-vcard vcf AddType text/csv csv # UTF-8 encoding AddDefaultCharset utf-8 AddCharset utf-8 .atom .css .js .json .rss .vtt .webapp .xml # Security - Block access to directories without a default document Options -Indexes # Block access to backup and source files Order allow,deny Deny from all Satisfy All # Rewrite engine RewriteEngine On # Redirect to Main 'www' Domain RewriteCond %{HTTP_HOST} ^yourdomain\.com [NC] RewriteRule ^(.*)$ http://www.yourdomain.com/$1 [R=3D301,NC,L] # Exclude these directories and files from rewrite RewriteRule ^(admin|otherdirectories|parser\.tmpl|robots\.txt)($|/) - [L] # Exclude images from rewrite RewriteCond %{REQUEST_URI} !\.(gif|jp?g|png|css|ico) [NC] # Route everything else through parser.tmpl RewriteRule . /parser.tmpl?requestedurl=3D%{REQUEST_URI}&query=3D%{QUERY_STRING}&serverpo= rt=3D%{SERVER_PORT} [L] =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Digital Revolutionaries 1st Floor, Castleriver House 14-15 Parliament Street Temple Bar,Dublin 2 Ireland ---------------------------------------------- [t]: + 353 1 4403907 [e]: [w]: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D On 24 March 2016 at 17:50, wrote: > What about using [referrer] to allow your customers navigate your website > but disallow bookmarking and outside links? you could also use [session] = to > limit the navigation to X minutes or Y pages, even for bots, then "kick" > the visitor out. > > > - chris > > > > > > On Mar 24, 2016, at 20:30, Brian Burton wrote: > > > > Backstory: the site is question is a replacements part business and has > hundreds of thousands of pages of cross reference material, all stored in > databases and generated as needed. Competitors and dealers that carry > competitors brand parts seem to think that copying our cross reference is > easier then creating their own (it would be) so code was written to block > this. > > > > YES, I KNOW that if they are determined, they will find a way around my > blockades (I=E2=80=99ve seen quite a few variations on this: tor, AWS, ot= her VPNs=E2=80=A6) > > > > Solution: looking at the stats for the average use of the website, we > found that 95% of the site traffic visited 14 pages or less. So=E2=80=A6 > > I have a visitors.db. The system logs all page requests tracked by IP > address, and after a set amount (more then 14 pages, but still a pretty l= ow > number) starts showing visitors a nice Page Limit Exceeded page instead o= f > what they were crawling thru. After an unreasonable number of pages I jus= t > 404 them out to save server time and bandwidth. The count resets at > midnight, because I=E2=80=99m far to lazy to track 24 hours since the fir= st or last > page request (per IP.) In some cases, when I=E2=80=99m feeling particular= ly > mischievous, once a bot is detected i start feeding them fake info :D > > > > Here=E2=80=99s the Visitors.db header: (not sure if it will help, but = it is > what it is) > > VID IPadd ipperm ipname visitdate pagecount starttime > endtime domain firstpage lastpage browtype > lastsku partner linkin page9 page8 page7 page6 page5 page4 > page3 page2 page1 > > > > > > All the code that does the tracking and counting and map/reduction to > store stats and stuff is proprietary (sorry) but I=E2=80=99ll see what (i= f > anything) I can share a bit later, and try to write it up as a blog post = or > something. > > > > -Brian B. Burton > > > >> On Mar 24, 2016, at 11:41 AM, Jym Duane wrote: > >> > >> curious how to determine...non google/bing/yahoo bots and other > attempting to crawl/copy the entire site? > >> > >> > >> > >> On 3/24/2016 9:28 AM, Brian Burton wrote: > >>> Noah, > >>> > >>> Similar to you, and wanting to use pretty URLs I built something > similar, but did it a different way. > >>> _All_ page requests are caught by a url-rewrite rule and get sent to > dispatch.tpl > >>> Dispatch.tpl has hundreds of rules that decide what page to show, and > uses includes to do it. > >>> (this keeps everything in-house to webdna so i don=E2=80=99t have to = go > mucking about in webdna here, and apache there, and linux somewhere else, > and etc=E2=80=A6) > >>> > >>> Three special circumstances came up that needed special code to send > out proper HTTP status codes: > >>> > >>> temporarily moved code on a redirect) =E2=80=94> > >>> [function name=3D301public] > >>> [text]eol=3D[unurl]%0D%0A[/unurl][/text] > >>> [returnraw]HTTP/1.1 301 Moved Permanently[eol]Location: > http://www.example.com[link][eol][eol][/returnraw] > >>> [/function] > >>> > >>> crawl/copy the entire site=E2=80=94> > >>> [function name=3D404hard] > >>> [text]eol=3D[unurl]%0D%0A[/unurl][/text] > >>> [returnraw]HTTP/1.0 404 Not Found[eol]Status: 404 Not > Found[eol]Content-type: text/html[eol][eol][eol][eol]

404 > Not Found

[eol]The page that you have requested ([thisurl]) could not > be found.[eol][eol][/returnraw] > >>> [/function] > >>> > >>> > >>> [function name=3D404soft] > >>> [text]eol=3D[unurl]%0D%0A[/unurl][/text] > >>> [returnraw]HTTP/1.0 404 Not Found[eol]Status: 404 Not > Found[eol]Content-type: text/html[eol][eol][include > file=3D/404pretty.tpl][/returnraw] > >>> [/function] > >>> > >>> Hope this helps > >>> -Brian B. Burton > > > > --------------------------------------------------------- > This message is sent to you because you are subscribed to > the mailing list . > To unsubscribe, E-mail to: > archives: http://mail.webdna.us/list/talk@webdna.us > Bug Reporting: support@webdna.us > --------------------------------------------------------- This message is sent to you because you are subscribed to the mailing list . To unsubscribe, E-mail to: archives: http://mail.webdna.us/list/talk@webdna.us Bug Reporting: support@webdna.us --001a11c3c6d2451a77052ecf47d1 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi all,

Thought I would add my approach= to 'pretty' urls using mod_rewrite rather than routing through an = error document. =C2=A0

Basically everything except= images and folders/files that I specify are routed to 'parser.tmpl'= ;.=C2=A0 That template then parses the URL and you can the search databases= , include files etc.

Here's a sample htaccess = file with all the mod_rewrite stuff and some other things that people might= find useful. =C2=A0

- Tom



PS. This a great resource on what can be do= ne using the htaccess file


<= /div>


# Better website experience for IE
Header set X-UA-Compatible "IE=3Dedge"<= /div>
<FilesMatch "\.(appca= che|crx|css|eot|gif|htc|ico|jpe?g|js|m4a|m4v|manifest|mp4|oex|oga|ogg|ogv|o= tf|pdf|png|safariextz|svgz?|ttf|vcf|webapp|webm|webp|woff|xml|xpi)$"&g= t;
Header unset X-UA-Compatible
</FilesMatch>

DirectoryIndex index.html index.tmpl
<= font face=3D"monospace, monospace">
# Proper MIME types for all files
AddType application/javascript =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0js
AddType application/json =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0json

AddType video/mp4 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 mp4 m4v f4v f4p
Ad= dType video/x-flv =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 flv

AddType application/font-woff =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 woff
AddType application/vnd.ms-fontobject =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 eot
AddType= application/x-font-ttf =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0ttc ttf
A= ddType font/opentype =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 otf
AddType image/svg+xml =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 svg svgz=
AddEncoding gzip =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0svgz

AddType= application/x-shockwave-flash =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 swf
AddType application/xml =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 atom rdf rss xml
AddType image/x-icon =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0ico
AddType text/vtt =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0vtt
AddType text= /x-component =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0htc
AddType text/x-vcard =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0vcf

AddType text/csv =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0csv

# UTF-8 encoding
AddDefaultCharset utf-8
AddCharset utf-8 .atom .css .js .json .rss .vtt= .webapp .xml

# Security - Block access= to directories without a default document
Options -Indexes

#= Block access to backup and source files
<FilesMatch "(^#.*#|\.(bak|config|dist|fla|inc|i= ni|log|psd|sh|sql|sw[op])|~)$">
Order = allow,deny
Deny from all
= Satisfy All
&l= t;/FilesMatch>

=
# Rewrite engine
RewriteEngine On

# Redirect to Main 'www' Domain
RewriteCond %{HTTP_HOST} ^your= domain\.com [NC]
Rewri= teRule ^(.*)$ http://www.yourdomai= n.com/$1 [R=3D301,NC,L]

# Exclude t= hese directories and files from rewrite
RewriteRule ^(admin|otherdirectories|parser\.tmpl|robots\= ..txt)($|/) - [L]

<= /font>
# Exclude images from = rewrite
RewriteCond %{= REQUEST_URI} !\.(gif|jp?g|png|css|ico) [NC]

# Route everything else through parser.tmpl
RewriteRule . /parser.tmpl?requestedurl=3D%{REQU= EST_URI}&query=3D%{QUERY_STRING}&serverport=3D%{SERVER_PORT} [L]



=


=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D
Digital Revolutionaries
1st Floor, Castleriver Hou= se
14-15 Parliament Street
Temple Bar,Dublin 2
Ireland
--------= --------------------------------------
[t]: + 353 1 4403907
[e]: <= mailto:tom@revo= lutionaries.ie>
[w]: <http://www.revolutionaries.ie/>
=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

On 24 March 2016 at 17:50, <= christophe.billiottet@webdna.us> wrote:
What about using [referrer] to allow your customers navigate = your website but disallow bookmarking and outside links? you could also use= [session] to limit the navigation to X minutes or Y pages, even for bots, = then "kick" the visitor out.


- chris




> On Mar 24, 2016, at 20:30, Brian Burton <brian@burtons.com> wrote:
>
> Backstory: the site is question is a replacements part business and ha= s hundreds of thousands of pages of cross reference material, all stored in= databases and generated as needed. Competitors and dealers that carry comp= etitors brand parts seem to think that copying our cross reference is easie= r then creating their own (it would be) so code was written to block this.<= br> >
> YES, I KNOW that if they are determined, they will find a way around m= y blockades (I=E2=80=99ve seen quite a few variations on this: tor, AWS, ot= her VPNs=E2=80=A6)
>
> Solution: looking at the stats for the average use of the website, we = found that 95% of the site traffic visited 14 pages or less. So=E2=80=A6 > I have a visitors.db. The system logs all page requests tracked by IP = address, and after a set amount (more then 14 pages, but still a pretty low= number) starts showing visitors a nice Page Limit Exceeded page instead of= what they were crawling thru. After an unreasonable number of pages I just= 404 them out to save server time and bandwidth. The count resets at midnig= ht, because I=E2=80=99m far to lazy to track 24 hours since the first or la= st page request (per IP.) In some cases, when I=E2=80=99m feeling particula= rly mischievous, once a bot is detected i start feeding them fake info :D >
> Here=E2=80=99s the Visitors.db header:=C2=A0 (not sure if it will help= , but it is what it is)
> VID=C2=A0 =C2=A0IPadd=C2=A0 =C2=A0ipperm=C2=A0 ipname=C2=A0 visitdate= =C2=A0 =C2=A0 =C2=A0 =C2=A0pagecount=C2=A0 =C2=A0 =C2=A0 =C2=A0starttime=C2= =A0 =C2=A0 =C2=A0 =C2=A0endtime domain=C2=A0 firstpage=C2=A0 =C2=A0 =C2=A0 = =C2=A0lastpage=C2=A0 =C2=A0 =C2=A0 =C2=A0 browtype=C2=A0 =C2=A0 =C2=A0 =C2= =A0 lastsku partner linkin=C2=A0 page9=C2=A0 =C2=A0page8=C2=A0 =C2=A0page7= =C2=A0 =C2=A0page6=C2=A0 =C2=A0page5=C2=A0 =C2=A0page4=C2=A0 =C2=A0page3=C2= =A0 =C2=A0page2=C2=A0 =C2=A0page1
>
>
> All the code that does the tracking and counting and map/reduction to = store stats and stuff is proprietary (sorry) but I=E2=80=99ll see what (if = anything) I can share a bit later, and try to write it up as a blog post or= something.
>
> -Brian B. Burton
>
>> On Mar 24, 2016, at 11:41 AM, Jym Duane <jym@purposemedia.com> wrote:
>>
>> curious how to determine...non google/bing/yahoo bots and other at= tempting to crawl/copy the entire site?
>>
>>
>>
>> On 3/24/2016 9:28 AM, Brian Burton wrote:
>>> Noah,
>>>
>>> Similar to you, and wanting to use pretty URLs I built somethi= ng similar, but did it a different way.
>>> _All_ page requests are caught by a url-rewrite rule and get s= ent to dispatch.tpl
>>> Dispatch.tpl has hundreds of rules that decide what page to sh= ow, and uses includes to do it.
>>> (this keeps everything in-house to webdna so i don=E2=80=99t h= ave to go mucking about in webdna here, and apache there, and linux somewhe= re else, and etc=E2=80=A6)
>>>
>>> Three special circumstances came up that needed special code t= o send out proper HTTP status codes:
>>>
>>> <!=E2=80=94 for page URLS that have permanently moved (webd= na sends out a 302 temporarily moved code on a redirect) =E2=80=94>
>>> [function name=3D301public]
>>> [text]eol=3D[unurl]%0D%0A[/unurl][/text]
>>> [returnraw]HTTP/1.1 301 Moved Permanently[eol]Location: http://ww= w.example.com[link][eol][eol][/returnraw]
>>> [/function]
>>>
>>> <!=E2=80=94 I send this to non google/bing/yahoo bots and o= ther attempting to crawl/copy the entire site=E2=80=94>
>>> [function name=3D404hard]
>>> [text]eol=3D[unurl]%0D%0A[/unurl][/text]
>>> [returnraw]HTTP/1.0 404 Not Found[eol]Status: 404 Not Found[eo= l]Content-type: text/html[eol][eol]<html>[eol]<body>[eol]<h1= >404 Not Found</h1>[eol]The page that you have requested ([thisurl= ]) could not be found.[eol]</body>[eol]</html>[/returnraw]
>>> [/function]
>>>
>>> <!=E2=80=94 and finally a pretty 404 page for humans =E2=80= =94>
>>> [function name=3D404soft]
>>> [text]eol=3D[unurl]%0D%0A[/unurl][/text]
>>> [returnraw]HTTP/1.0 404 Not Found[eol]Status: 404 Not Found[eo= l]Content-type: text/html[eol][eol][include file=3D/404pretty.tpl][/returnr= aw]
>>> [/function]
>>>
>>> Hope this helps
>>> -Brian B. Burton
>

----------------------------------------------= -----------
This message is sent to you because you are subscribed to
the mailing list <talk@webdna.u= s>.
To unsubscribe, E-mail to: <talk= -leave@webdna.us>

--001a11c3c6d2451a77052ecf47d1-- . Associated Messages, from the most recent to the oldest:

    
260 --001a11c3c6d2451a77052ecf47d1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi all, Thought I would add my approach to 'pretty' urls using mod_rewrite rather than routing through an error document. Basically everything except images and folders/files that I specify are routed to 'parser.tmpl'. That template then parses the URL and you can the search databases, include files etc. Here's a sample htaccess file with all the mod_rewrite stuff and some other things that people might find useful. - Tom PS. This a great resource on what can be done using the htaccess file https://github.com/h5bp/html5-boilerplate/blob/master/dist/.htaccess # Better website experience for IE Header set X-UA-Compatible "IE=3Dedge" Header unset X-UA-Compatible DirectoryIndex index.html index.tmpl # Proper MIME types for all files AddType application/javascript js AddType application/json json AddType video/mp4 mp4 m4v f4v f4p AddType video/x-flv flv AddType application/font-woff woff AddType application/vnd.ms-fontobject eot AddType application/x-font-ttf ttc ttf AddType font/opentype otf AddType image/svg+xml svg svgz AddEncoding gzip svgz AddType application/x-shockwave-flash swf AddType application/xml atom rdf rss xml AddType image/x-icon ico AddType text/vtt vtt AddType text/x-component htc AddType text/x-vcard vcf AddType text/csv csv # UTF-8 encoding AddDefaultCharset utf-8 AddCharset utf-8 .atom .css .js .json .rss .vtt .webapp .xml # Security - Block access to directories without a default document Options -Indexes # Block access to backup and source files Order allow,deny Deny from all Satisfy All # Rewrite engine RewriteEngine On # Redirect to Main 'www' Domain RewriteCond %{HTTP_HOST} ^yourdomain\.com [NC] RewriteRule ^(.*)$ http://www.yourdomain.com/$1 [R=3D301,NC,L] # Exclude these directories and files from rewrite RewriteRule ^(admin|otherdirectories|parser\.tmpl|robots\.txt)($|/) - [L] # Exclude images from rewrite RewriteCond %{REQUEST_URI} !\.(gif|jp?g|png|css|ico) [NC] # Route everything else through parser.tmpl RewriteRule . /parser.tmpl?requestedurl=3D%{REQUEST_URI}&query=3D%{QUERY_STRING}&serverpo= rt=3D%{SERVER_PORT} [L] =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Digital Revolutionaries 1st Floor, Castleriver House 14-15 Parliament Street Temple Bar,Dublin 2 Ireland ---------------------------------------------- [t]: + 353 1 4403907 [e]: [w]: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D On 24 March 2016 at 17:50, wrote: > What about using [referrer] to allow your customers navigate your website > but disallow bookmarking and outside links? you could also use [session] = to > limit the navigation to X minutes or Y pages, even for bots, then "kick" > the visitor out. > > > - chris > > > > > > On Mar 24, 2016, at 20:30, Brian Burton wrote: > > > > Backstory: the site is question is a replacements part business and has > hundreds of thousands of pages of cross reference material, all stored in > databases and generated as needed. Competitors and dealers that carry > competitors brand parts seem to think that copying our cross reference is > easier then creating their own (it would be) so code was written to block > this. > > > > YES, I KNOW that if they are determined, they will find a way around my > blockades (I=E2=80=99ve seen quite a few variations on this: tor, AWS, ot= her VPNs=E2=80=A6) > > > > Solution: looking at the stats for the average use of the website, we > found that 95% of the site traffic visited 14 pages or less. So=E2=80=A6 > > I have a visitors.db. The system logs all page requests tracked by IP > address, and after a set amount (more then 14 pages, but still a pretty l= ow > number) starts showing visitors a nice Page Limit Exceeded page instead o= f > what they were crawling thru. After an unreasonable number of pages I jus= t > 404 them out to save server time and bandwidth. The count resets at > midnight, because I=E2=80=99m far to lazy to track 24 hours since the fir= st or last > page request (per IP.) In some cases, when I=E2=80=99m feeling particular= ly > mischievous, once a bot is detected i start feeding them fake info :D > > > > Here=E2=80=99s the Visitors.db header: (not sure if it will help, but = it is > what it is) > > VID IPadd ipperm ipname visitdate pagecount starttime > endtime domain firstpage lastpage browtype > lastsku partner linkin page9 page8 page7 page6 page5 page4 > page3 page2 page1 > > > > > > All the code that does the tracking and counting and map/reduction to > store stats and stuff is proprietary (sorry) but I=E2=80=99ll see what (i= f > anything) I can share a bit later, and try to write it up as a blog post = or > something. > > > > -Brian B. Burton > > > >> On Mar 24, 2016, at 11:41 AM, Jym Duane wrote: > >> > >> curious how to determine...non google/bing/yahoo bots and other > attempting to crawl/copy the entire site? > >> > >> > >> > >> On 3/24/2016 9:28 AM, Brian Burton wrote: > >>> Noah, > >>> > >>> Similar to you, and wanting to use pretty URLs I built something > similar, but did it a different way. > >>> _All_ page requests are caught by a url-rewrite rule and get sent to > dispatch.tpl > >>> Dispatch.tpl has hundreds of rules that decide what page to show, and > uses includes to do it. > >>> (this keeps everything in-house to webdna so i don=E2=80=99t have to = go > mucking about in webdna here, and apache there, and linux somewhere else, > and etc=E2=80=A6) > >>> > >>> Three special circumstances came up that needed special code to send > out proper HTTP status codes: > >>> > >>> temporarily moved code on a redirect) =E2=80=94> > >>> [function name=3D301public] > >>> [text]eol=3D[unurl]%0D%0A[/unurl][/text] > >>> [returnraw]HTTP/1.1 301 Moved Permanently[eol]Location: > http://www.example.com[link][eol][eol][/returnraw] > >>> [/function] > >>> > >>> crawl/copy the entire site=E2=80=94> > >>> [function name=3D404hard] > >>> [text]eol=3D[unurl]%0D%0A[/unurl][/text] > >>> [returnraw]HTTP/1.0 404 Not Found[eol]Status: 404 Not > Found[eol]Content-type: text/html[eol][eol][eol][eol]

404 > Not Found

[eol]The page that you have requested ([thisurl]) could not > be found.[eol][eol][/returnraw] > >>> [/function] > >>> > >>> > >>> [function name=3D404soft] > >>> [text]eol=3D[unurl]%0D%0A[/unurl][/text] > >>> [returnraw]HTTP/1.0 404 Not Found[eol]Status: 404 Not > Found[eol]Content-type: text/html[eol][eol][include > file=3D/404pretty.tpl][/returnraw] > >>> [/function] > >>> > >>> Hope this helps > >>> -Brian B. Burton > > > > --------------------------------------------------------- > This message is sent to you because you are subscribed to > the mailing list . > To unsubscribe, E-mail to: > archives: http://mail.webdna.us/list/talk@webdna.us > Bug Reporting: support@webdna.us > --------------------------------------------------------- This message is sent to you because you are subscribed to the mailing list . To unsubscribe, E-mail to: archives: http://mail.webdna.us/list/talk@webdna.us Bug Reporting: support@webdna.us --001a11c3c6d2451a77052ecf47d1 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi all,

Thought I would add my approach= to 'pretty' urls using mod_rewrite rather than routing through an = error document. =C2=A0

Basically everything except= images and folders/files that I specify are routed to 'parser.tmpl'= ;.=C2=A0 That template then parses the URL and you can the search databases= , include files etc.

Here's a sample htaccess = file with all the mod_rewrite stuff and some other things that people might= find useful. =C2=A0

- Tom



PS. This a great resource on what can be do= ne using the htaccess file


<= /div>


# Better website experience for IE
Header set X-UA-Compatible "IE=3Dedge"<= /div>
<FilesMatch "\.(appca= che|crx|css|eot|gif|htc|ico|jpe?g|js|m4a|m4v|manifest|mp4|oex|oga|ogg|ogv|o= tf|pdf|png|safariextz|svgz?|ttf|vcf|webapp|webm|webp|woff|xml|xpi)$"&g= t;
Header unset X-UA-Compatible
</FilesMatch>

DirectoryIndex index.html index.tmpl
<= font face=3D"monospace, monospace">
# Proper MIME types for all files
AddType application/javascript =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0js
AddType application/json =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0json

AddType video/mp4 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 mp4 m4v f4v f4p
Ad= dType video/x-flv =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 flv

AddType application/font-woff =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 woff
AddType application/vnd.ms-fontobject =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 eot
AddType= application/x-font-ttf =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0ttc ttf
A= ddType font/opentype =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 otf
AddType image/svg+xml =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 svg svgz=
AddEncoding gzip =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0svgz

AddType= application/x-shockwave-flash =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 swf
AddType application/xml =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 atom rdf rss xml
AddType image/x-icon =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0ico
AddType text/vtt =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0vtt
AddType text= /x-component =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0htc
AddType text/x-vcard =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0vcf

AddType text/csv =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0csv

# UTF-8 encoding
AddDefaultCharset utf-8
AddCharset utf-8 .atom .css .js .json .rss .vtt= .webapp .xml

# Security - Block access= to directories without a default document
Options -Indexes

#= Block access to backup and source files
<FilesMatch "(^#.*#|\.(bak|config|dist|fla|inc|i= ni|log|psd|sh|sql|sw[op])|~)$">
Order = allow,deny
Deny from all
= Satisfy All
&l= t;/FilesMatch>

=
# Rewrite engine
RewriteEngine On

# Redirect to Main 'www' Domain
RewriteCond %{HTTP_HOST} ^your= domain\.com [NC]
Rewri= teRule ^(.*)$ http://www.yourdomai= n.com/$1 [R=3D301,NC,L]

# Exclude t= hese directories and files from rewrite
RewriteRule ^(admin|otherdirectories|parser\.tmpl|robots\= ..txt)($|/) - [L]

<= /font>
# Exclude images from = rewrite
RewriteCond %{= REQUEST_URI} !\.(gif|jp?g|png|css|ico) [NC]

# Route everything else through parser.tmpl
RewriteRule . /parser.tmpl?requestedurl=3D%{REQU= EST_URI}&query=3D%{QUERY_STRING}&serverport=3D%{SERVER_PORT} [L]



=


=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D
Digital Revolutionaries
1st Floor, Castleriver Hou= se
14-15 Parliament Street
Temple Bar,Dublin 2
Ireland
--------= --------------------------------------
[t]: + 353 1 4403907
[e]: <= mailto:tom@revo= lutionaries.ie>
[w]: <http://www.revolutionaries.ie/>
=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

On 24 March 2016 at 17:50, <= christophe.billiottet@webdna.us> wrote:
What about using [referrer] to allow your customers navigate = your website but disallow bookmarking and outside links? you could also use= [session] to limit the navigation to X minutes or Y pages, even for bots, = then "kick" the visitor out.


- chris




> On Mar 24, 2016, at 20:30, Brian Burton <brian@burtons.com> wrote:
>
> Backstory: the site is question is a replacements part business and ha= s hundreds of thousands of pages of cross reference material, all stored in= databases and generated as needed. Competitors and dealers that carry comp= etitors brand parts seem to think that copying our cross reference is easie= r then creating their own (it would be) so code was written to block this.<= br> >
> YES, I KNOW that if they are determined, they will find a way around m= y blockades (I=E2=80=99ve seen quite a few variations on this: tor, AWS, ot= her VPNs=E2=80=A6)
>
> Solution: looking at the stats for the average use of the website, we = found that 95% of the site traffic visited 14 pages or less. So=E2=80=A6 > I have a visitors.db. The system logs all page requests tracked by IP = address, and after a set amount (more then 14 pages, but still a pretty low= number) starts showing visitors a nice Page Limit Exceeded page instead of= what they were crawling thru. After an unreasonable number of pages I just= 404 them out to save server time and bandwidth. The count resets at midnig= ht, because I=E2=80=99m far to lazy to track 24 hours since the first or la= st page request (per IP.) In some cases, when I=E2=80=99m feeling particula= rly mischievous, once a bot is detected i start feeding them fake info :D >
> Here=E2=80=99s the Visitors.db header:=C2=A0 (not sure if it will help= , but it is what it is)
> VID=C2=A0 =C2=A0IPadd=C2=A0 =C2=A0ipperm=C2=A0 ipname=C2=A0 visitdate= =C2=A0 =C2=A0 =C2=A0 =C2=A0pagecount=C2=A0 =C2=A0 =C2=A0 =C2=A0starttime=C2= =A0 =C2=A0 =C2=A0 =C2=A0endtime domain=C2=A0 firstpage=C2=A0 =C2=A0 =C2=A0 = =C2=A0lastpage=C2=A0 =C2=A0 =C2=A0 =C2=A0 browtype=C2=A0 =C2=A0 =C2=A0 =C2= =A0 lastsku partner linkin=C2=A0 page9=C2=A0 =C2=A0page8=C2=A0 =C2=A0page7= =C2=A0 =C2=A0page6=C2=A0 =C2=A0page5=C2=A0 =C2=A0page4=C2=A0 =C2=A0page3=C2= =A0 =C2=A0page2=C2=A0 =C2=A0page1
>
>
> All the code that does the tracking and counting and map/reduction to = store stats and stuff is proprietary (sorry) but I=E2=80=99ll see what (if = anything) I can share a bit later, and try to write it up as a blog post or= something.
>
> -Brian B. Burton
>
>> On Mar 24, 2016, at 11:41 AM, Jym Duane <jym@purposemedia.com> wrote:
>>
>> curious how to determine...non google/bing/yahoo bots and other at= tempting to crawl/copy the entire site?
>>
>>
>>
>> On 3/24/2016 9:28 AM, Brian Burton wrote:
>>> Noah,
>>>
>>> Similar to you, and wanting to use pretty URLs I built somethi= ng similar, but did it a different way.
>>> _All_ page requests are caught by a url-rewrite rule and get s= ent to dispatch.tpl
>>> Dispatch.tpl has hundreds of rules that decide what page to sh= ow, and uses includes to do it.
>>> (this keeps everything in-house to webdna so i don=E2=80=99t h= ave to go mucking about in webdna here, and apache there, and linux somewhe= re else, and etc=E2=80=A6)
>>>
>>> Three special circumstances came up that needed special code t= o send out proper HTTP status codes:
>>>
>>> <!=E2=80=94 for page URLS that have permanently moved (webd= na sends out a 302 temporarily moved code on a redirect) =E2=80=94>
>>> [function name=3D301public]
>>> [text]eol=3D[unurl]%0D%0A[/unurl][/text]
>>> [returnraw]HTTP/1.1 301 Moved Permanently[eol]Location: http://ww= w.example.com[link][eol][eol][/returnraw]
>>> [/function]
>>>
>>> <!=E2=80=94 I send this to non google/bing/yahoo bots and o= ther attempting to crawl/copy the entire site=E2=80=94>
>>> [function name=3D404hard]
>>> [text]eol=3D[unurl]%0D%0A[/unurl][/text]
>>> [returnraw]HTTP/1.0 404 Not Found[eol]Status: 404 Not Found[eo= l]Content-type: text/html[eol][eol]<html>[eol]<body>[eol]<h1= >404 Not Found</h1>[eol]The page that you have requested ([thisurl= ]) could not be found.[eol]</body>[eol]</html>[/returnraw]
>>> [/function]
>>>
>>> <!=E2=80=94 and finally a pretty 404 page for humans =E2=80= =94>
>>> [function name=3D404soft]
>>> [text]eol=3D[unurl]%0D%0A[/unurl][/text]
>>> [returnraw]HTTP/1.0 404 Not Found[eol]Status: 404 Not Found[eo= l]Content-type: text/html[eol][eol][include file=3D/404pretty.tpl][/returnr= aw]
>>> [/function]
>>>
>>> Hope this helps
>>> -Brian B. Burton
>

----------------------------------------------= -----------
This message is sent to you because you are subscribed to
the mailing list <talk@webdna.u= s>.
To unsubscribe, E-mail to: <talk= -leave@webdna.us>

--001a11c3c6d2451a77052ecf47d1-- . Tom Duke

DOWNLOAD WEBDNA NOW!

Top Articles:

Talk List

The WebDNA community talk-list is the best place to get some help: several hundred extremely proficient programmers with an excellent knowledge of WebDNA and an excellent spirit will deliver all the tips and tricks you can imagine...

Related Readings:

WebCommerce: Folder organization ? (1997) Help name our technology! I found it (1997) WebCat2b15MacPlugin - showing [math] (1997) Emailer (WebCat2) (1997) [WebDNA] current thinking on architecture of mass email scripts? (2011) WebCatalog can't find database (1997) Date/Time format problems (1997) ooops...WebCatalog [FoundItems] Problem - LONG - (1997) request for string functions (1998) Help formatting search results w/ table (1997) Download URL & access on the fly ? (1997) [WebDNA] Stupid question about CentOS v4 and WebDNA v6 (2008) Part Html part WebDNA (1997) Fun with dates (1997) Database not found in Include (2002) [WebDNA] WebDNA7 site randomly dropping tags (2011) WCS Newbie question (1997) Orderfile context problem (1998) [WebDNA] Emailer and Comcast.net (2008) Country & Ship-to address & other fields ? (1997)