信息收集第二步

信息收集第二步,获取urls。

对于一个不懂开发的脚本小子来说,这会很大的影响挖洞效率。那就站在巨人的肩膀上,利用一些大佬开发好的工具,整合一下来达到你想要的效果。

waybackurls

go语言编写的脚本,主要是调用Wayback Machine接口,爬取互联网上的历史页面。

echo testphp.vulnweb.com | waybackurls

echo www.abc.com | waybackurls | grep "\.js" | uniq | sort

Gau

简介

getallurls (gau) fetches known URLs from AlienVault’s Open Threat Exchange, the Wayback Machine, and Common Crawl for any given domain. Inspired by Tomnomnom’s waybackurls.

还是和waybackurls类似,从互联网上获取链接地址。

echo testphp.vulnweb.com | gau

hakrawler

echo testphp.vulnweb.com | hakrawler -all -depth 5 -plain

crawlergo

crawlergo是一个使用chrome headless模式进行URL入口收集的动态爬虫。 使用Golang语言开发,基于chromedp 进行一些定制化开发后操纵CDP协议,对整个页面关键点进行HOOK,灵活表单填充提交,完整的事件触发,尽可能的收集网站暴露出的入口。同时,依靠智能URL去重模块,在过滤掉了大多数伪静态URL之后,仍然确保不遗漏关键入口链接,大幅减少重复任务。

可以将爬取的结果推送给被动扫描服务器。

./crawlergo -c /root/tools/craw/chrome-linux/chrome -t 20 --push-to-proxy "http://127.0.0.1:7777/

awvs

主要是使用awvs的爬虫功能,爬取的链接设置被动扫描代理地址。调用awvs的几个api,进行批量的添加任务,对任务设置代理,然后扫描,最后删除任务。

参考这个 AWVS 批量添加扫描/删除任务脚本,添加一个代理函数。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
import json
import queue
import requests
import time

requests.packages.urllib3.disable_warnings()
# awvs 12 api

class AwvsScan(object):
def __init__(self):
self.scanner = 'https://127.0.0.1:3443'
self.api = '1986ad8c0a5b3df4d7028d5f3c06e936ced2583c73c594961'
self.ScanMode = '11111111-1111-1111-1111-111111111117'
self.headers = {'X-Auth': self.api, 'content-type': 'application/json'}
self.targets_id = queue.Queue() # 目标ID
self.scan_id = queue.Queue()
self.site = queue.Queue() # website

# 主函数进行扫描还是删除
def main(self):
print('=' * 80)
print("""1、使用awvs.txt添加任务\n2、设置代理\n3、开始扫描\n4、删除所有任务""")
print('=' * 80)
choice = input(">")
if choice == '1':
self.targets()
if choice == '4':
self.del_targets()
if choice == '2':
self.proxy()
if choice == '3':
self.scans()
self.main()

# 将website写入队列之中
def openfile(self):
with open('awvs.txt') as cent:
for web_site in cent:

web_site = web_site.strip('\n\r')

self.site.put(web_site)

# 添加扫描目标
def targets(self):
self.openfile()
while not self.site.empty():
website = self.site.get()
try:
data = {'address': website,
'description': 'awvs-auto4',
'criticality': '10'}
response = requests.post(self.scanner + '/api/v1/targets', data=json.dumps(data), headers=self.headers,
verify=False)
cent = json.loads(response.content)

target_id = cent['target_id']
self.targets_id.put(target_id) # 目标ID
except Exception as e:
print('Target is not website! {}'.format(website))
#对每一个任务设置代理地址
def proxy(self):
self.get_targets_id()
while not self.targets_id.empty():
data = {"proxy":
{"enabled": True,
"address": "127.0.0.1",
"protocol": "http",
"port": 7777
}
}
path = '/api/v1/targets/{}/configuration'.format(self.targets_id.get()[1])
url = self.scanner + path
#print(url)
response = requests.patch(url, data=json.dumps(data), headers=self.headers, allow_redirects=False, verify=False)


def scans(self):
#self.targets()
self.get_targets_id()
while not self.targets_id.empty():

data = {'target_id': self.targets_id.get()[1],
'profile_id': self.ScanMode,
'schedule': {'disable': False, 'start_date': None, 'time_sensitive': False}}

response = requests.post(self.scanner + '/api/v1/scans', data=json.dumps(data), headers=self.headers,
allow_redirects=False, verify=False)
time.sleep(180)
if response.status_code == 201:
cent = response.headers['Location'].replace('/api/v1/scans/', '')
print(cent)

def get_targets_id(self):
response = requests.get(self.scanner + "/api/v1/targets", headers=self.headers, verify=False)
content = json.loads(response.content)
for cent in content['targets']:
self.targets_id.put([cent['address'], cent['target_id']])

def del_targets(self):
while True:
self.get_targets_id()
if self.targets_id.qsize() == 0:
break
else:
while not self.targets_id.empty():
targets_info = self.targets_id.get()
response = requests.delete(self.scanner + "/api/v1/targets/" + targets_info[1],
headers=self.headers, verify=False)
if response.status_code == 204:
print('delete targets {}'.format(targets_info[0]))


if __name__ == '__main__':
Scan = AwvsScan()
Scan.main()
url的去重

爬取到的url会有很多重复的,需要考虑去重。

1
2
www.abc.com/index.php?id=1111
www.abc.com/index.php?id=2222
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[root@vultr ~]# echo testphp.vulnweb.com | waybackurls |wc -l
446
# grep 获取查询的链接
[root@vultr ~]# echo testphp.vulnweb.com | waybackurls | grep '?'| wc -l
116

# awk -F '=' '{{print $1}}' 以'=' 分割,不要参数,只获取路径
[root@vultr ~]# echo testphp.vulnweb.com | waybackurls | grep '?'| awk -F '=' '{{print $1}}' |wc -l
118

# sort -u 对路径的去重
[root@vultr ~]# echo testphp.vulnweb.com | waybackurls | grep '?'| awk -F '=' '{{print $1}}' | sort -u | wc -l
38

# sed 's/$/&=2/g' 为查询的参数添加值为2
[root@vultr ~]# echo testphp.vulnweb.com | waybackurls | grep '?'| awk -F '=' '{{print $1}}' | sort -u | sed 's/$/&=2/g' | wc -l
38

#最后获取到的链接(部分)
http://testphp.vulnweb.com:80/productlist.php?=2
http://testphp.vulnweb.com:80/product.php?=2
http://testphp.vulnweb.com:80/product.php?pic=2
http://testphp.vulnweb.com:80/redir.php?r=2
http://testphp.vulnweb.com:80/search.php?=2
http://testphp.vulnweb.com:80/search.php?test=2
xray

使用xray对爬取到的url进行批量扫描。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import time
import sys
import os
# 单独的爬取域名的url,进行去重扫描
#urls txt url->name
if len(sys.argv) == 1:
msg = """
Find XSS with subdomain of craw xray
Usage: xss.py subdomain.txt
"""
print(msg)
sys.exit(0)
file = sys.argv[1]
#url = sys.argv[2]
with open(file,'r') as f:
for i in f.readlines():
url = i.strip('\n')

print(url)
d = 'echo {0} | waybackurls > {1}_url1.txt'.format(url,url)

os.system(d)

a = 'cat {0}_url1.txt | sort -u > {1}_qurl.txt'.format(url,url)
os.system(a)

g = "cat {0}_qurl.txt | grep '?' | awk -F '=' '{{print $1}}' | sort -u | sed 's/$/&=2/g' > {1}_qc.txt".format(url,url)
print(g)
os.system(g)

f = '/root/tools/xary/xray_linux_amd64 webscan --url-file {0}_qc.txt --html-output {1}.html'.format(url,url)
try:
os.system(f)
time.sleep(60)
except:
pass
总结

以上主要是非登陆状态下对域名进行爬取链接,然后结合被动扫描进行测试。不足点批量对不同的域名扫描时不能添加cookie,很多时候扫描的url去重后还是有很多的无效url而影响效率。还需要提高自己的开发能力,研究如何爬取精准的url。

未来学习参考

漏扫动态爬虫实践

Web漏洞扫描器

SuperSpider——打造功能强大的爬虫利器