PHP数据采集方法

PHP的数据采集已经不是一个陌生的事情,而数据采集可以分为两种,一种是小偷程序,另一种就是数据采集,小偷程序主要是把采集来的数据直接呈现给用户,而采集程序是把采集的数据放入数据库,然后再通过自己的程序把数据库中的数据呈现给用户。

PHP小偷程序主要步骤如下:

一、获取数据源(网址,比如说是http://www.chhua.com)

二、正则筛选

三、呈现给用户

PHP采集程序的步骤如下:

一、获取数据源(网址,比如说是http://www.chhua.com)

二、正则筛选

三、入库

四、用户请求

五、把数据呈现给用户

OK,下面是我大体的写了一个简单的类,主要是描述数据采集的步骤,仅供参考:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
<span style="color: #000000;font-weight: bold">class</span> caiji<span style="color: #009900">{</span>
	<span style="color: #000000;font-weight: bold">private</span> <span style="color: #000088">$reg</span><span style="color: #339933">;</span><span style="color: #666666;font-style: italic">//正则</span>
	<span style="color: #000000;font-weight: bold">private</span> <span style="color: #000088">$url</span><span style="color: #339933">;</span><span style="color: #666666;font-style: italic">//数据源</span>
	<span style="color: #000000;font-weight: bold">private</span> <span style="color: #000088">$dataArr</span><span style="color: #339933">;</span><span style="color: #666666;font-style: italic">//返回结果数组</span>
 
	<span style="color: #000000;font-weight: bold">public</span> <span style="color: #000000;font-weight: bold">function</span> __construct<span style="color: #009900">(</span><span style="color: #000088">$reg</span><span style="color: #339933">,</span><span style="color: #000088">$url</span><span style="color: #009900">)</span><span style="color: #009900">{</span>
		<span style="color: #000088">$this</span><span style="color: #339933">-></span><span style="color: #004000">reg</span><span style="color: #339933">=</span><span style="color: #000088">$reg</span><span style="color: #339933">;</span>
		<span style="color: #000088">$this</span><span style="color: #339933">-></span><span style="color: #004000">url</span><span style="color: #339933">=</span><span style="color: #000088">$url</span><span style="color: #339933">;</span>
		<span style="color: #000088">$this</span><span style="color: #339933">-></span><span style="color: #004000">caijStar</span><span style="color: #009900">(</span><span style="color: #009900">)</span><span style="color: #339933">;</span>
	<span style="color: #009900">}</span>
 
	<span style="color: #000000;font-weight: bold">private</span> <span style="color: #000000;font-weight: bold">function</span> caijStar<span style="color: #009900">(</span><span style="color: #009900">)</span><span style="color: #009900">{</span><span style="color: #666666;font-style: italic">//采集方法</span>
		<span style="color: #000088">$conn</span><span style="color: #339933">=</span><a rel="noopener noreferrer nofollow" href="http://www.php.net/file_get_contents" rel="noopener noreferrer nofollow" target="_blank"><span style="color: #990000">file_get_contents</span></a><span style="color: #009900">(</span><span style="color: #0000ff">"<span style="color: #006699;font-weight: bold">{$this->url}</span>"</span><span style="color: #009900">)</span><span style="color: #339933">;</span>
        <span style="color: #000088">$reg</span><span style="color: #339933">=</span><span style="color: #0000ff">"<span style="color: #006699;font-weight: bold">{$this->reg}</span>"</span><span style="color: #339933">;</span>
        <a rel="noopener noreferrer nofollow" href="http://www.php.net/preg_match_all" rel="noopener noreferrer nofollow" target="_blank"><span style="color: #990000">preg_match_all</span></a><span style="color: #009900">(</span><span style="color: #000088">$reg</span><span style="color: #339933">,</span><span style="color: #000088">$conn</span><span style="color: #339933">,</span><span style="color: #000088">$this</span><span style="color: #339933">-></span><span style="color: #004000">dataArr</span><span style="color: #009900">)</span><span style="color: #339933">;</span>
	<span style="color: #009900">}</span>
 
	<span style="color: #000000;font-weight: bold">public</span> <span style="color: #000000;font-weight: bold">function</span> getArr<span style="color: #009900">(</span><span style="color: #009900">)</span><span style="color: #009900">{</span><span style="color: #666666;font-style: italic">//数据获取方法</span>
		<span style="color: #b1b100">return</span> <span style="color: #000088">$this</span><span style="color: #339933">-></span><span style="color: #004000">dataArr</span><span style="color: #339933">;</span>
	<span style="color: #009900">}</span>
<span style="color: #009900">}</span>