[WebDNA] More thoughts about [middle]

This WebDNA talk-list message is from

2015


It keeps the original formatting.
numero = 112035
interpreted = N
texte = Chris, This is an improvement over my previous suggestion for improving the [middle] context ... or at least that's how I see it: One way to give middle the ability to extract similar individual tags from a HTML page might be something like this: startAfter continueUntil repeatUntil variableName My thought here is that middle would start after the first matching "startAfter" value, then it would continue from there until it finds the next "continueUntil" value ... Then it would keep REPEATING the same "startAfter" and "continueUntil" procedure -- from the last place it found a match -- thus finding more matches until it has repeated "repeatUntil" times (which could be a positive whole number value or [end]) ... And every time it finds a match it would set a text variable to the value found between the latest "startAfter" and "continueUntil" values. In other words, doing this: [middle startafter=[!] [/!]&repeatUntil=[end][!] [/!]&variableName=imagePath] here's a span text here

this is a paragraph

text here
this is a div
text here [/middle] ... would result in setting these text variables: imagePath1 = /images/abc.gif imagePath2 = def.png imagePath3 = ghi.gif imagePath4 = thumbnails/jkl.gif imagePath5 = mno.png imagePath6 = pqr.jpg imagePath7 = stu.gif imagePath8 = vwx.gif imagePath9 = /logos/yz.jpg In this case there would be no results displayed inside the middle context because the found values have been set as text variables. But if the "variableName" parameter were not used, those values would instead be displayed inside the middle context rather than set as text vars. Something like "matchCase=T" might be a nice option too in case we need to find exact lettercase matches. To me this is probably the best way to improve [middle] because it actually gives us the ability to find and extract similar HTML tags that are repeated in page. Last week I wanted to write a script that checks my client's website to see if the images referenced in each of his web pages actually exists, but I stopped when I realized that WebDNA does not have a simple way to parse the HTML and extract the img tags. However, having the above capability would make an image-checking script a no-brainer. :) Regards, Kenneth Grome WebDNA Solutions http://www.webdnasolutions.com Web Database Systems and Linux Server Management On 01/23/2015 10:29 AM, David Bastedo wrote: > Thanks Ken & Tom, > > as soon as I understood what Ken was saying, I knew what I want to > do is impossible > > I literally want to pluck open graph or other meta data off of a > page, no matter where it is by just using its tag and an end point. > > If I know what tags I am looking for explicitly - I could put them > in a table and loop through looking for whatever I wanted, then I > could define the end - working "forward" from the opening of the > tag "og: title" for example, and end at the close of the tag "/" > and be able to pull out dynamically any meta tag I could possibly > think of.... or want. > > That would be pretty straight forward and very powerful. > I can accomplish this task by creating a one off relationship > between a page and its tags - say for twitter - its an easy way to > grab an image - but its not dynamic I want to do this for any type > of page. > > d. > > On Fri, Jan 23, 2015 at 7:35 AM, Tom Duke > wrote: > > David, > > Hi - you won't be able to achieve what you are trying to do > with [middle]. You might be able to hack something together > using [grep] or [listwords]. Though Stackoverflow is full of > articles outlining why regex should not be used to parse HTML. > (http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) > > Your example shows why a proper HTML parser within WebDNA > would be really useful. For example if you paste your code > into this page: > > http://try.jsoup.org > > and type "meta" into the CSS Query box you'll see how a HTML > parser does the job. > > - Tom > > > > > > > ============================================== > Digital Revolutionaries > 1st Floor, Castleriver House > 14-15 Parliament Street > Temple Bar,Dublin 2 > Ireland > ---------------------------------------------- > [t]: + 353 1 4403907 > [e]: > > [w]: > ============================================== > > On 23 January 2015 at 00:11, David Bastedo > wrote: > > To your point, I never switched out your test variable > properly > To my point, I hate when you are right. > I get the same results. > > However, as opposed to blaming me for not understanding > how the friggin thing works, the docs aren't very clear > and after seeing your example I now understand "backwards" > for the reality that it is. > > There is no hope in hell of doing what I want with middle. > > Your first example is not as good as your second example > to illustrate the concept. Thank you for taking the time > with the second example, it illustrate backwards much more > effectively. > > d. > > > --------------------------------------------------------- > This message is sent to you because you are subscribed to > the mailing list __. To unsubscribe, E-mail to: __ > archives: http://mail.webdna.us/list/talk@webdna.us Bug > Reporting: support@webdna.us > > > --------------------------------------------------------- This > message is sent to you because you are subscribed to the > mailing list __. To unsubscribe, E-mail to: __ archives: > http://mail.webdna.us/list/talk@webdna.us Bug Reporting: > support@webdna.us > > > > > -- > David Bastedo > > Ten Plus One Communications Inc. > http://www.10plus1.com > 416.277.4499 > > --------------------------------------------------------- This > message is sent to you because you are subscribed to the mailing > list . To unsubscribe, E-mail to: archives: > http://mail.webdna.us/list/talk@webdna.us Bug Reporting: > support@webdna.us Associated Messages, from the most recent to the oldest:

    
  1. [WebDNA] More thoughts about [middle] (Kenneth Grome 2015)
Chris, This is an improvement over my previous suggestion for improving the [middle] context ... or at least that's how I see it: One way to give middle the ability to extract similar individual tags from a HTML page might be something like this: startAfter continueUntil repeatUntil variableName My thought here is that middle would start after the first matching "startAfter" value, then it would continue from there until it finds the next "continueUntil" value ... Then it would keep REPEATING the same "startAfter" and "continueUntil" procedure -- from the last place it found a match -- thus finding more matches until it has repeated "repeatUntil" times (which could be a positive whole number value or [end]) ... And every time it finds a match it would set a text variable to the value found between the latest "startAfter" and "continueUntil" values. In other words, doing this: [middle startafter=[!] [/!]&continueUntil=>[!] [/!]&repeatUntil=[end][!] [/!]&variableName=imagePath] here's a span text here

this is a paragraph

text here
this is a div
text here [/middle] ... would result in setting these text variables: imagePath1 = /images/abc.gif imagePath2 = def.png imagePath3 = ghi.gif imagePath4 = thumbnails/jkl.gif imagePath5 = mno.png imagePath6 = pqr.jpg imagePath7 = stu.gif imagePath8 = vwx.gif imagePath9 = /logos/yz.jpg In this case there would be no results displayed inside the middle context because the found values have been set as text variables. But if the "variableName" parameter were not used, those values would instead be displayed inside the middle context rather than set as text vars. Something like "matchCase=T" might be a nice option too in case we need to find exact lettercase matches. To me this is probably the best way to improve [middle] because it actually gives us the ability to find and extract similar HTML tags that are repeated in page. Last week I wanted to write a script that checks my client's website to see if the images referenced in each of his web pages actually exists, but I stopped when I realized that WebDNA does not have a simple way to parse the HTML and extract the img tags. However, having the above capability would make an image-checking script a no-brainer. :) Regards, Kenneth Grome WebDNA Solutions http://www.webdnasolutions.com Web Database Systems and Linux Server Management On 01/23/2015 10:29 AM, David Bastedo wrote: > Thanks Ken & Tom, > > as soon as I understood what Ken was saying, I knew what I want to > do is impossible > > I literally want to pluck open graph or other meta data off of a > page, no matter where it is by just using its tag and an end point. > > If I know what tags I am looking for explicitly - I could put them > in a table and loop through looking for whatever I wanted, then I > could define the end - working "forward" from the opening of the > tag "og: title" for example, and end at the close of the tag "/" > and be able to pull out dynamically any meta tag I could possibly > think of.... or want. > > That would be pretty straight forward and very powerful. > I can accomplish this task by creating a one off relationship > between a page and its tags - say for twitter - its an easy way to > grab an image - but its not dynamic I want to do this for any type > of page. > > d. > > On Fri, Jan 23, 2015 at 7:35 AM, Tom Duke > wrote: > > David, > > Hi - you won't be able to achieve what you are trying to do > with [middle]. You might be able to hack something together > using [grep] or [listwords]. Though Stackoverflow is full of > articles outlining why regex should not be used to parse HTML. > (http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) > > Your example shows why a proper HTML parser within WebDNA > would be really useful. For example if you paste your code > into this page: > > http://try.jsoup.org > > and type "meta" into the CSS Query box you'll see how a HTML > parser does the job. > > - Tom > > > > > > > ============================================== > Digital Revolutionaries > 1st Floor, Castleriver House > 14-15 Parliament Street > Temple Bar,Dublin 2 > Ireland > ---------------------------------------------- > [t]: + 353 1 4403907 > [e]: > > [w]: > ============================================== > > On 23 January 2015 at 00:11, David Bastedo > wrote: > > To your point, I never switched out your test variable > properly > To my point, I hate when you are right. > I get the same results. > > However, as opposed to blaming me for not understanding > how the friggin thing works, the docs aren't very clear > and after seeing your example I now understand "backwards" > for the reality that it is. > > There is no hope in hell of doing what I want with middle. > > Your first example is not as good as your second example > to illustrate the concept. Thank you for taking the time > with the second example, it illustrate backwards much more > effectively. > > d. > > > --------------------------------------------------------- > This message is sent to you because you are subscribed to > the mailing list __. To unsubscribe, E-mail to: __ > archives: http://mail.webdna.us/list/talk@webdna.us Bug > Reporting: support@webdna.us > > > --------------------------------------------------------- This > message is sent to you because you are subscribed to the > mailing list __. To unsubscribe, E-mail to: __ archives: > http://mail.webdna.us/list/talk@webdna.us Bug Reporting: > support@webdna.us > > > > > -- > David Bastedo > > Ten Plus One Communications Inc. > http://www.10plus1.com > 416.277.4499 > > --------------------------------------------------------- This > message is sent to you because you are subscribed to the mailing > list . To unsubscribe, E-mail to: archives: > http://mail.webdna.us/list/talk@webdna.us Bug Reporting: > support@webdna.us Kenneth Grome

DOWNLOAD WEBDNA NOW!

Top Articles:

Talk List

The WebDNA community talk-list is the best place to get some help: several hundred extremely proficient programmers with an excellent knowledge of WebDNA and an excellent spirit will deliver all the tips and tricks you can imagine...

Related Readings:

BR (1997) Re[3]: 2nd WebCatalog2 Feature Request (1996) Am I going senile? (Price recalc based on quantity) (1997) Sort Order on a page search (1997) Calculating days, hours, minutes ago (2004) Need help!! on searching in two databases. (1998) [shownext] support - MacOS (1997) Newbie needs advice to learn to use WebDNA (2003) Verifying both name and password (was: New Problem) (1997) WebDNA 4.5.1 Now Available (2003) Help name our technology! I found it (1997) won't serve .tpl (2000) popups, netscape vs explorer (1997) Secured Order Forms (1998) Multiple Ad databases? (1997) Sort Order on a page search (1997) embedding [showcart] ??? (1998) This might sound silly... (2000) redirect with frames (1997) Used to be good 4.5 to 6 code change (2004)