浅谈自动采集程序及入库

80酷酷网    80kuku.com

  采集|程序|采集<SPAN id=ArticleContent1_ArticleContent1_lblContent><SPAN 12px">最近网上流行着一些采集程序,更多人拿着这些东西在网上叫卖,很多不太懂的人看着那些程序眼羡,其实如果你懂一些ASP,了解自动采集程序的原理后,你会感觉实现自动化也是那么的简单.
原理及优点:通过XML中的XMLHTTP组件调用其它网站上的网页,然后批量截取或替换原有的信息使其转化成变量后再一一储存到数据库中。其主要的优点便是无需再手工添加大量的信息了,可以指定对某一个站信息的截取进行批量录入,达到省时省力的目的。与其单纯的ASP小偷程序不同的是:它已经不再依赖其目标网站。
简单事例:
<DIV windowtext 0.5pt solid; PADDING-RIGHT: 5.4pt; BORDER-TOP: windowtext 0.5pt solid; PADDING-LEFT: 5.4pt; BACKGROUND: #e6e6e6; PADDING-BOTTOM: 4px; BORDER-LEFT: windowtext 0.5pt solid; WIDTH: 98%; WORD-BREAK: break-all; PADDING-TOP: 4px; BORDER-BOTTOM: windowtext 0.5pt solid"><DIV><SPAN #008080"> 1</SPAN><SPAN #000000"><</SPAN><SPAN #000000">%
</SPAN><SPAN #008080"> 2</SPAN><SPAN #000000"></SPAN><SPAN #008000">''</SPAN><SPAN #008000">声明取得目标信息的函数,通过XML组件进行实现。</SPAN><SPAN #008000">
</SPAN><SPAN #008080"> 3</SPAN><SPAN #008000"></SPAN><SPAN #0000ff">Function</SPAN><SPAN #000000"> GetURL(url) 
</SPAN><SPAN #008080"> 4</SPAN><SPAN #000000"></SPAN><SPAN #0000ff">Set</SPAN><SPAN #000000"> Retrieval </SPAN><SPAN #000000">=</SPAN><SPAN #000000"> </SPAN><SPAN #0000ff">CreateObject</SPAN><SPAN #000000">(</SPAN><SPAN #000000">"</SPAN><SPAN #000000">Microsoft.XMLHTTP</SPAN><SPAN #000000">"</SPAN><SPAN #000000">) 
</SPAN><SPAN #008080"> 5</SPAN><SPAN #000000"></SPAN><SPAN #0000ff">With</SPAN><SPAN #000000"> Retrieval 
</SPAN><SPAN #008080"> 6</SPAN><SPAN #000000">.Open </SPAN><SPAN #000000">"</SPAN><SPAN #000000">GET</SPAN><SPAN #000000">"</SPAN><SPAN #000000">, url, </SPAN><SPAN #0000ff">False</SPAN><SPAN #000000">
</SPAN><SPAN #008080"> 7</SPAN><SPAN #000000">.Send 
</SPAN><SPAN #008080"> 8</SPAN><SPAN #000000">GetURL </SPAN><SPAN #000000">=</SPAN><SPAN #000000"> bytes2bstr(.responsebody)
</SPAN><SPAN #008080"> 9</SPAN><SPAN #000000"></SPAN><SPAN #008000">''</SPAN><SPAN #008000">对取得信息进行验证,如果信息长度小于100则说明截取失败</SPAN><SPAN #008000">
</SPAN><SPAN #008080">10</SPAN><SPAN #008000"></SPAN><SPAN #0000ff">if</SPAN><SPAN #000000"> </SPAN><SPAN #0000ff">len</SPAN><SPAN #000000">(.responsebody)</SPAN><SPAN #000000"><</SPAN><SPAN #000000">100</SPAN><SPAN #000000"> </SPAN><SPAN #0000ff">then</SPAN><SPAN #000000">
</SPAN><SPAN #008080">11</SPAN><SPAN #000000">response.write </SPAN><SPAN #000000">"</SPAN><SPAN #000000">获取远程文件 <a href=</SPAN><SPAN #000000">"</SPAN><SPAN #000000">&</SPAN><SPAN #000000">url</SPAN><SPAN #000000">&</SPAN><SPAN #000000">"</SPAN><SPAN #000000"> target=_blank></SPAN><SPAN #000000">"</SPAN><SPAN #000000">&</SPAN><SPAN #000000">url</SPAN><SPAN #000000">&</SPAN><SPAN #000000">"</SPAN><SPAN #000000"></a> 失败。"</SPAN><SPAN #000000">
</SPAN><SPAN #008080">12</SPAN><SPAN #000000"></SPAN><SPAN #000000">response.</SPAN><SPAN #0000ff">end</SPAN><SPAN #000000">
</SPAN><SPAN #008080">13</SPAN><SPAN #000000"></SPAN><SPAN #0000ff">end</SPAN><SPAN #000000"> </SPAN><SPAN #0000ff">if</SPAN><SPAN #000000">
</SPAN><SPAN #008080">14</SPAN><SPAN #000000">
</SPAN><SPAN #008080">15</SPAN><SPAN #000000"></SPAN><SPAN #0000ff">End</SPAN><SPAN #000000"> </SPAN><SPAN #0000ff">With</SPAN><SPAN #000000"> 
</SPAN><SPAN #008080">16</SPAN><SPAN #000000"></SPAN><SPAN #0000ff">Set</SPAN><SPAN #000000"> Retrieval </SPAN><SPAN #000000">=</SPAN><SPAN #000000"> </SPAN><SPAN #0000ff">Nothing</SPAN><SPAN #000000"> 
</SPAN><SPAN #008080">17</SPAN><SPAN #000000"></SPAN><SPAN #0000ff">End Function</SPAN><SPAN #000000">
</SPAN><SPAN #008080">18</SPAN><SPAN #000000"></SPAN><SPAN #008000">''</SPAN><SPAN #008000"> 二进制转字符串,否则会出现乱码的!</SPAN><SPAN #008000">
</SPAN><SPAN #008080">19</SPAN><SPAN #008000"></SPAN><SPAN #0000ff">function</SPAN><SPAN #000000"> bytes2bstr(vin) 
</SPAN><SPAN #008080">20</SPAN><SPAN #000000">strreturn </SPAN><SPAN #000000">=</SPAN><SPAN #000000"> </SPAN><SPAN #000000">""</SPAN><SPAN #000000"> 
</SPAN><SPAN #008080">21</SPAN><SPAN #000000"></SPAN><SPAN #0000ff">for</SPAN><SPAN #000000"> i </SPAN><SPAN #000000">=</SPAN><SPAN #000000"> </SPAN><SPAN #000000">1</SPAN><SPAN #000000"> </SPAN><SPAN #0000ff">to</SPAN><SPAN #000000"> lenb(vin) 
</SPAN><SPAN #008080">22</SPAN><SPAN #000000">thischarcode </SPAN><SPAN #000000">=</SPAN><SPAN #000000"> ascb(midb(vin,i,</SPAN><SPAN #000000">1</SPAN><SPAN #000000">)) 
</SPAN><SPAN #008080">23</SPAN><SPAN #000000"></SPAN><SPAN #0000ff">if</SPAN><SPAN #000000"> thischarcode </SPAN><SPAN #000000"><</SPAN><SPAN #000000"> </SPAN><SPAN #000000">&</SPAN><SPAN #000000">h80 </SPAN><SPAN #0000ff">then</SPAN><SPAN #000000"> 
</SPAN><SPAN #008080">24</SPAN><SPAN #000000">strreturn </SPAN><SPAN #000000">=</SPAN><SPAN #000000"> strreturn </SPAN><SPAN #000000">&</SPAN><SPAN #000000"> </SPAN><SPAN #0000ff">chr</SPAN><SPAN #000000">(thischarcode) 
</SPAN><SPAN #008080">25</SPAN><SPAN #000000"></SPAN><SPAN #0000ff">else</SPAN><SPAN #000000"> 
</SPAN><SPAN #008080">26</SPAN><SPAN #000000">nextcharcode </SPAN><SPAN #000000">=</SPAN><SPAN #000000"> ascb(midb(vin,i</SPAN><SPAN #000000">+</SPAN><SPAN #000000">1</SPAN><SPAN #000000">,</SPAN><SPAN #000000">1</SPAN><SPAN #000000">)) 
</SPAN><SPAN #008080">27</SPAN><SPAN #000000">strreturn </SPAN><SPAN #000000">=</SPAN><SPAN #000000"> strreturn </SPAN><SPAN #000000">&</SPAN><SPAN #000000"> </SPAN><SPAN #0000ff">chr</SPAN><SPAN #000000">(</SPAN><SPAN #0000ff">clng</SPAN><SPAN #000000">(thischarcode) </SPAN><SPAN #000000">*</SPAN><SPAN #000000"> </SPAN><SPAN #000000">&</SPAN><SPAN #000000">h100 </SPAN><SPAN #000000">+</SPAN><SPAN #000000"> </SPAN><SPAN #0000ff">cint</SPAN><SPAN #000000">(nextcharcode)) 
</SPAN><SPAN #008080">28</SPAN><SPAN #000000">i </SPAN><SPAN #000000">=</SPAN><SPAN #000000"> i </SPAN><SPAN #000000">+</SPAN><SPAN #000000"> </SPAN><SPAN #000000">1</SPAN><SPAN #000000"> 
</SPAN><SPAN #008080">29</SPAN><SPAN #000000"></SPAN><SPAN #0000ff">end</SPAN><SPAN #000000"> </SPAN><SPAN #0000ff">if</SPAN><SPAN #000000"> 
</SPAN><SPAN #008080">30</SPAN><SPAN #000000"></SPAN><SPAN #0000ff">next</SPAN><SPAN #000000"> 
</SPAN><SPAN #008080">31</SPAN><SPAN #000000">bytes2bstr </SPAN><SPAN #000000">=</SPAN><SPAN #000000"> strreturn 
</SPAN><SPAN #008080">32</SPAN><SPAN #000000"></SPAN><SPAN #0000ff">end function</SPAN><SPAN #000000"> 
</SPAN><SPAN #008080">33</SPAN><SPAN #000000"></SPAN><SPAN #008000">''</SPAN><SPAN #008000">声明截取的格式,从Start开始截取,到Last为结束</SPAN><SPAN #008000">
</SPAN><SPAN #008080">34</SPAN><SPAN #008000"></SPAN><SPAN #0000ff">Function</SPAN><SPAN #000000"> GetKey(HTML,Start,Last)
</SPAN><SPAN #008080">35</SPAN><SPAN #000000">filearray</SPAN><SPAN #000000">=</SPAN><SPAN #0000ff">split</SPAN><SPAN #000000">(HTML,Start)
</SPAN><SPAN #008080">36</SPAN><SPAN #000000">filearray2</SPAN><SPAN #000000">=</SPAN><SPAN #0000ff">split</SPAN><SPAN #000000">(filearray(</SPAN><SPAN #000000">1</SPAN><SPAN #000000">),Last)
</SPAN><SPAN #008080">37</SPAN><SPAN #000000">GetKey</SPAN><SPAN #000000">=</SPAN><SPAN #000000">filearray2(</SPAN><SPAN #000000">0</SPAN><SPAN #000000">)
</SPAN><SPAN #008080">38</SPAN><SPAN #000000"></SPAN><SPAN #0000ff">End Function</SPAN><SPAN #000000">
</SPAN><SPAN #008080">39</SPAN><SPAN #000000">
</SPAN><SPAN #008080">40</SPAN><SPAN #000000"></SPAN><SPAN #0000ff">Dim</SPAN><SPAN #000000"> Softid,Url,Html,Title 
</SPAN><SPAN #008080">41</SPAN><SPAN #000000">
</SPAN><SPAN #008080">42</SPAN><SPAN #000000"></SPAN><SPAN #008000">''</SPAN><SPAN #008000">获取要取页面的ID</SPAN><SPAN #008000">
</SPAN><SPAN #008080">43</SPAN><SPAN #008000"></SPAN><SPAN #000000">
</SPAN><SPAN #008080">44</SPAN><SPAN #000000">SoftId</SPAN><SPAN #000000">=</SPAN><SPAN #000000">Request(</SPAN><SPAN #000000">"</SPAN><SPAN #000000">Id</SPAN><SPAN #000000">"</SPAN><SPAN #000000">)
</SPAN><SPAN #008080">45</SPAN><SPAN #000000">
</SPAN><SPAN #008080">46</SPAN><SPAN #000000">  Url</SPAN><SPAN #000000">=</SPAN><SPAN #000000">"</SPAN><SPAN #000000">http://www3.skycn.com/soft/</SPAN><SPAN #000000">"</SPAN><SPAN #000000">&</SPAN><SPAN #000000">SoftId</SPAN><SPAN #000000">&</SPAN><SPAN #000000">"</SPAN><SPAN #000000">.html</SPAN><SPAN #000000">"</SPAN><SPAN #000000"> 
</SPAN><SPAN #008080">47</SPAN><SPAN #000000">
</SPAN><SPAN #008080">48</SPAN><SPAN #000000">  Html </SPAN><SPAN #000000">=</SPAN><SPAN #000000"> GetURL(Url) 
</SPAN><SPAN #008080">49</SPAN><SPAN #000000">
</SPAN><SPAN #008080">50</SPAN><SPAN #000000"></SPAN><SPAN #008000">''</SPAN><SPAN #008000">以截取天空软件的软件名为例子</SPAN><SPAN #008000">
</SPAN><SPAN #008080">51</SPAN><SPAN #008000"></SPAN><SPAN #000000">
</SPAN><SPAN #008080">52</SPAN><SPAN #000000">  Title </SPAN><SPAN #000000">=</SPAN><SPAN #000000"> GetKey(Html,</SPAN><SPAN #000000">"</SPAN><SPAN #000000"><font color=''#004FC6'' size=''3''></SPAN><SPAN #000000">"</SPAN><SPAN #000000">,</SPAN><SPAN #000000">"</SPAN><SPAN #000000"></font></b></td></tr></SPAN><SPAN #000000">"</SPAN><SPAN #000000">)
</SPAN><SPAN #008080">53</SPAN><SPAN #000000">
</SPAN><SPAN #008080">54</SPAN><SPAN #000000"></SPAN><SPAN #008000">''</SPAN><SPAN #008000">打开数据库,准备入库</SPAN><SPAN #008000">
</SPAN><SPAN #008080">55</SPAN><SPAN #008000"></SPAN><SPAN #000000">
</SPAN><SPAN #008080">56</SPAN><SPAN #000000"></SPAN><SPAN #0000ff">dim</SPAN><SPAN #000000"> connstr,conn,rs,sql
</SPAN><SPAN #008080">57</SPAN><SPAN #000000">
</SPAN><SPAN #008080">58</SPAN><SPAN #000000">connstr</SPAN><SPAN #000000">=</SPAN><SPAN #000000">"</SPAN><SPAN #000000">DBQ=</SPAN><SPAN #000000">"</SPAN><SPAN #000000">+</SPAN><SPAN #000000">server.mappath(</SPAN><SPAN #000000">"</SPAN><SPAN #000000">db1.mdb</SPAN><SPAN #000000">"</SPAN><SPAN #000000">)</SPAN><SPAN #000000">+</SPAN><SPAN #000000">"</SPAN><SPAN #000000">;DefaultDir=;DRIVER={Microsoft Access Driver (*.mdb)};"
</SPAN><SPAN #008080">59</SPAN><SPAN #000000"></SPAN><SPAN #000000">
</SPAN><SPAN #008080">60</SPAN><SPAN #000000"></SPAN><SPAN #0000ff">set</SPAN><SPAN #000000"> conn</SPAN><SPAN #000000">=</SPAN><SPAN #000000">server.</SPAN><SPAN #0000ff">createobject</SPAN><SPAN #000000">(</SPAN><SPAN #000000">"</SPAN><SPAN #000000">ADODB.CONNECTION</SPAN><SPAN #000000">"</SPAN><SPAN #000000">)
</SPAN><SPAN #008080">61</SPAN><SPAN #000000">
</SPAN><SPAN #008080">62</SPAN><SPAN #000000">conn.open connstr
</SPAN><SPAN #008080">63</SPAN><SPAN #000000">
</SPAN><SPAN #008080">64</SPAN><SPAN #000000"></SPAN><SPAN #0000ff">set</SPAN><SPAN #000000"> rs</SPAN><SPAN #000000">=</SPAN><SPAN #000000">server.</SPAN><SPAN #0000ff">createobject</SPAN><SPAN #000000">(</SPAN><SPAN #000000">"</SPAN><SPAN #000000">adodb.recordset</SPAN><SPAN #000000">"</SPAN><SPAN #000000">)
</SPAN><SPAN #008080">65</SPAN><SPAN #000000">
</SPAN><SPAN #008080">66</SPAN><SPAN #000000">sql</SPAN><SPAN #000000">=</SPAN><SPAN #000000">"</SPAN><SPAN #000000">select [列名] from [表名] where [列名]=''</SPAN><SPAN #000000">"</SPAN><SPAN #000000">&</SPAN><SPAN #000000">Title</SPAN><SPAN #000000">&</SPAN><SPAN #000000">"</SPAN><SPAN #000000">''"
</SPAN><SPAN #008080">67</SPAN><SPAN #000000"></SPAN><SPAN #000000">
</SPAN><SPAN #008080">68</SPAN><SPAN #000000"></SPAN><SPAN #000000">rs.open sql,conn,</SPAN><SPAN #000000">3</SPAN><SPAN #000000">,</SPAN><SPAN #000000">3</SPAN><SPAN #000000">
</SPAN><SPAN #008080">69</SPAN><SPAN #000000">
</SPAN><SPAN #008080">70</SPAN><SPAN #000000"></SPAN><SPAN #0000ff">if</SPAN><SPAN #000000"> rs.eof </SPAN><SPAN #0000ff">and</SPAN><SPAN #000000"> rs.bof </SPAN><SPAN #0000ff">then</SPAN><SPAN #000000"> 
</SPAN><SPAN #008080">71</SPAN><SPAN #000000">
</SPAN><SPAN #008080">72</SPAN><SPAN #000000">rs(</SPAN><SPAN #000000">"</SPAN><SPAN #000000">列名</SPAN><SPAN #000000">"</SPAN><SPAN #000000">)</SPAN><SPAN #000000">=</SPAN><SPAN #000000">Title
</SPAN><SPAN #008080">73</SPAN><SPAN #000000">
</SPAN><SPAN #008080">74</SPAN><SPAN #000000">rs.update 
</SPAN><SPAN #008080">75</SPAN><SPAN #000000">
</SPAN><SPAN #008080">76</SPAN><SPAN #000000"></SPAN><SPAN #0000ff">set</SPAN><SPAN #000000"> rs</SPAN><SPAN #000000">=</SPAN><SPAN #0000ff">nothing</SPAN><SPAN #000000">
</SPAN><SPAN #008080">77</SPAN><SPAN #000000">
</SPAN><SPAN #008080">78</SPAN><SPAN #000000"></SPAN><SPAN #0000ff">end</SPAN><SPAN #000000"> </SPAN><SPAN #0000ff">if</SPAN><SPAN #000000">
</SPAN><SPAN #008080">79</SPAN><SPAN #000000">
</SPAN><SPAN #008080">80</SPAN><SPAN #000000"></SPAN><SPAN #0000ff">set</SPAN><SPAN #000000"> rs</SPAN><SPAN #000000">=</SPAN><SPAN #0000ff">nothing</SPAN><SPAN #000000">
</SPAN><SPAN #008080">81</SPAN><SPAN #000000">
</SPAN><SPAN #008080">82</SPAN><SPAN #000000">Response.Write</SPAN><SPAN #000000">"</SPAN><SPAN #000000">采集完毕!"
</SPAN><SPAN #008080">83</SPAN><SPAN #000000"></SPAN><SPAN #000000">
</SPAN><SPAN #008080">84</SPAN><SPAN #000000"></SPAN><SPAN #000000">%</SPAN><SPAN #000000">></SPAN></DIV></DIV></SPAN></SPAN>

分享到
  • 微信分享
  • 新浪微博
  • QQ好友
  • QQ空间
点击: