手摸手教你用go写爬虫之一（准备知识：网页抓取）

栏目: Go · 发布时间: 7年前

内容简介：本文介绍网页抓取相关的知识我们使用可以看到，该方法返回了一个

本文介绍网页抓取相关的知识

1. 获取网页内容

我们使用 http.Get() 方法来获取网页的内容，它相当于 PHP 中的 file_get_contents

url := "https://hz.zu.anjuke.com/"
response,err := http.Get(url)

可以看到，该方法返回了一个 response 相应信息的指针以及错误信息该响应信息中我们要获取的是请求体的内容,可以使用：

bytes, err := ioutil.ReadAll(response.Body)
defer response.Body.Close()

注意： response.Body 必须手动调用 Close 方法，否则该网络响应会一直占用内存原官方文档如下：

// The http Client and Transport guarantee that Body is always
 // non-nil, even on responses without a body or responses with
 // a zero-length body. It is the caller's responsibility to
 // close Body.

这里我们就拿到了完整的字节流请求的结果。

2. 完整实例

package main

import (
	"fmt"
	"io/ioutil"
	"net/http"
)

/**
  根据提供的url 获取返回信息内容
 */
func GetContents(url string) (string ,error) {

	resp,err := http.Get(url)
	if err != nil {
		return "",err
	}
	defer resp.Body.Close()
	if resp.StatusCode != http.StatusOK {
		return "", fmt.Errorf("get content failed status code is %d ",resp.StatusCode)
	}

	bytes,err := ioutil.ReadAll(resp.Body)
	if err != nil {
		return "" , nil
	}
	return string(bytes),nil
}

func main() {
	url := "https://hz.zu.anjuke.com/"
	contents,err := GetContents(url)
	if err != nil {
		fmt.Println(err)
		return
	}
	fmt.Printf(contents)
}

源代码地址： github

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

Don't Make Me Think

Steve Krug / New Riders Press / 18 August, 2005 / $35.00

Five years and more than 100,000 copies after it was first published, it's hard to imagine anyone working in Web design who hasn't read Steve Krug's "instant classic" on Web usability, but people are ......一起来看看《Don't Make Me Think》这本书的介绍吧!

码农工具

JS 压缩/解压工具

在线压缩/解压 JS 代码

HTML 编码/解码

HEX HSV 转换工具

HEX HSV 互换工具