View Full Version : Follow Links
di3
29th June 2006, 08:10 PM
hmm, I have a question, how to follow links.
I have textfile links.txt
<link>http://domain/more.php?id=226_0_1_0_M</link>
<link>http://domain/more.php?id=225_0_1_0_M</link>
<link>http://domain/more.php?id=224_0_1_0_M</link>
Parser Code:
<Logger Browser>
Global
Level Error
</Logger>
<Section>
Name Process
Define $goto {$url}
<Action ContentURL>
URL {$goto}
RemoveNewLine
TagsToStrip br,nobr,b
</Action>
<Pattern>
RegExp <h3>{$title:text}</h3>
</Pattern>
<Action Print>
Text {$title}
</Action>
</Section>
<Section>
Name File
#load file with header
<Action ContentFile>
FileName links.txt
</Action>
<Section While>
#NoContext
<Pattern>
RegExp <link>{$url:text}</link>
</Pattern>
<Action Print>
Text {$url}
</Action>
<Section>
Section Process
</Section>
</Section>
</Section>
Main File
When I parse this, it gives me only 1result, not 3.
vzeman
5th July 2006, 10:31 AM
Hi.
The problem is, that you replaced content loaded from file with links by content loaded by first link.
if you like to iterate through all links, please separate section Process to another file (e.g. name it process.w) and call it by <Action Eval> tag.
This will gives you results you like.
If you will have still troubles, with writing of this script, please write to our support@qualityunit.com and we will help you.
di3
10th July 2006, 02:12 PM
Hi! Thanks your help.
But it is not working.
My 1st file:
<Section>
Name File
<Action ContentFile>
FileName links.txt
</Action>
<Section While>
<Pattern>
RegExp <link>{$url:text}</link>
</Pattern>
<Action Eval>
File process.w
</Action>
</Section>
</Section>
Main File
and the 2nd:
<Section>
#Printing URL Variable
<Action Print>
Text {$url}
</Action>
#Loading Content
<Action ContentUrl>
URL {$url}
</Action>
<Section>
<Pattern>
RegExp <h3>{$my:text}</h3>
</Pattern>
<Action Print>
Text {$my}
</Action>
</Section>
</Section>
It doesn't open URL, and Looger doesn't who any error, it just stops.
If I remove:
#Loading Content
<Action ContentUrl>
URL {$url}
</Action>
then it works (it is printing me all 3 urls).
jperdoch
15th July 2006, 06:08 PM
Hi,
I tried script you wrote, but it worked (with Main Process and Name Process in process.w of course.)
Try this:
<Section>
Name Process
#Printing URL Variable
<Action Print>
Text {$url}
</Action>
#Loading Content
<Action ContentUrl>
URL {$url}
</Action>
<Section>
<Pattern>
RegExp <h3>{$my:text}</h3>
</Pattern>
<Action Print>
Text {$my}
</Action>
</Section>
</Section>
Main Process
If you will have still troubles, with writing of this script, please write to our support@qualityunit.com directly and we will help you.
di3
15th July 2006, 06:35 PM
Ye, Thanks, it worked (when I tried it on other server:D ).
Thanks for help! This is best software/script I have ever tried! :)
mattcampbell
2nd November 2006, 12:54 PM
Can anyone provide an example of extracting links from a website (as opposed to text file like above) and then following all the extracted links and collecting data from each of those pages.
Eg like following links on a product page that displays the details of a product (ie each link displays different products)
Matt
jperdoch
5th November 2006, 12:09 AM
Script above extracts links from website, follows it and extracts data from target page. It's script you are looking for.
vBulletin® v3.7.2, Copyright ©2000-2008, Jelsoft Enterprises Ltd.