信息收集第二步,获取urls。
对于一个不懂开发的脚本小子来说,这会很大的影响挖洞效率。那就站在巨人的肩膀上,利用一些大佬开发好的工具,整合一下来达到你想要的效果。
go语言编写的脚本,主要是调用Wayback Machine 接口,爬取互联网上的历史页面。
echo testphp.vulnweb.com | waybackurls
echo www.abc.com | waybackurls | grep "\.js" | uniq | sort
简介
getallurls (gau) fetches known URLs from AlienVault’s Open Threat Exchange , the Wayback Machine, and Common Crawl for any given domain. Inspired by Tomnomnom’s waybackurls .
还是和waybackurls 类似,从互联网上获取链接地址。
echo testphp.vulnweb.com | gau
echo testphp.vulnweb.com | hakrawler -all -depth 5 -plain
crawlergo是一个使用chrome headless
模式进行URL入口收集的 动态爬虫。 使用Golang语言开发,基于chromedp 进行一些定制化开发后操纵CDP协议,对整个页面关键点进行HOOK,灵活表单填充提交,完整的事件触发,尽可能的收集网站暴露出的入口。同时,依靠智能URL去重模块,在过滤掉了大多数伪静态URL之后,仍然确保不遗漏关键入口链接,大幅减少重复任务。
可以将爬取的结果推送给被动扫描服务器。
./crawlergo -c /root/tools/craw/chrome-linux/chrome -t 20 --push-to-proxy "http://127.0.0.1:7777/
主要是使用awvs的爬虫功能,爬取的链接设置被动扫描代理地址。调用awvs的几个api,进行批量的添加任务,对任务设置代理,然后扫描,最后删除任务。
参考这个 AWVS 批量添加扫描/删除任务脚本 ,添加一个代理函数。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 import jsonimport queueimport requestsimport timerequests.packages.urllib3.disable_warnings() class AwvsScan (object) : def __init__ (self) : self.scanner = 'https://127.0.0.1:3443' self.api = '1986ad8c0a5b3df4d7028d5f3c06e936ced2583c73c594961' self.ScanMode = '11111111-1111-1111-1111-111111111117' self.headers = {'X-Auth' : self.api, 'content-type' : 'application/json' } self.targets_id = queue.Queue() self.scan_id = queue.Queue() self.site = queue.Queue() def main (self) : print('=' * 80 ) print("""1、使用awvs.txt添加任务\n2、设置代理\n3、开始扫描\n4、删除所有任务""" ) print('=' * 80 ) choice = input(">" ) if choice == '1' : self.targets() if choice == '4' : self.del_targets() if choice == '2' : self.proxy() if choice == '3' : self.scans() self.main() def openfile (self) : with open('awvs.txt' ) as cent: for web_site in cent: web_site = web_site.strip('\n\r' ) self.site.put(web_site) def targets (self) : self.openfile() while not self.site.empty(): website = self.site.get() try : data = {'address' : website, 'description' : 'awvs-auto4' , 'criticality' : '10' } response = requests.post(self.scanner + '/api/v1/targets' , data=json.dumps(data), headers=self.headers, verify=False ) cent = json.loads(response.content) target_id = cent['target_id' ] self.targets_id.put(target_id) except Exception as e: print('Target is not website! {}' .format(website)) def proxy (self) : self.get_targets_id() while not self.targets_id.empty(): data = {"proxy" : {"enabled" : True , "address" : "127.0.0.1" , "protocol" : "http" , "port" : 7777 } } path = '/api/v1/targets/{}/configuration' .format(self.targets_id.get()[1 ]) url = self.scanner + path response = requests.patch(url, data=json.dumps(data), headers=self.headers, allow_redirects=False , verify=False ) def scans (self) : self.get_targets_id() while not self.targets_id.empty(): data = {'target_id' : self.targets_id.get()[1 ], 'profile_id' : self.ScanMode, 'schedule' : {'disable' : False , 'start_date' : None , 'time_sensitive' : False }} response = requests.post(self.scanner + '/api/v1/scans' , data=json.dumps(data), headers=self.headers, allow_redirects=False , verify=False ) time.sleep(180 ) if response.status_code == 201 : cent = response.headers['Location' ].replace('/api/v1/scans/' , '' ) print(cent) def get_targets_id (self) : response = requests.get(self.scanner + "/api/v1/targets" , headers=self.headers, verify=False ) content = json.loads(response.content) for cent in content['targets' ]: self.targets_id.put([cent['address' ], cent['target_id' ]]) def del_targets (self) : while True : self.get_targets_id() if self.targets_id.qsize() == 0 : break else : while not self.targets_id.empty(): targets_info = self.targets_id.get() response = requests.delete(self.scanner + "/api/v1/targets/" + targets_info[1 ], headers=self.headers, verify=False ) if response.status_code == 204 : print('delete targets {}' .format(targets_info[0 ])) if __name__ == '__main__' : Scan = AwvsScan() Scan.main()
url的去重 爬取到的url会有很多重复的,需要考虑去重。
1 2 www.abc.com/index.php?id=1111 www.abc.com/index.php?id=2222
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 [root@vultr ~]# echo testphp.vulnweb.com | waybackurls |wc -l 446 # grep 获取查询的链接 [root@vultr ~]# echo testphp.vulnweb.com | waybackurls | grep '?'| wc -l 116 # awk -F '=' '{{print $1}}' 以'=' 分割,不要参数,只获取路径 [root@vultr ~]# echo testphp.vulnweb.com | waybackurls | grep '?'| awk -F '=' '{{print $1}}' |wc -l 118 # sort -u 对路径的去重 [root@vultr ~]# echo testphp.vulnweb.com | waybackurls | grep '?'| awk -F '=' '{{print $1}}' | sort -u | wc -l 38 # sed 's/$/&=2/g' 为查询的参数添加值为2 [root@vultr ~]# echo testphp.vulnweb.com | waybackurls | grep '?'| awk -F '=' '{{print $1}}' | sort -u | sed 's/$/&=2/g' | wc -l 38 #最后获取到的链接(部分) http://testphp.vulnweb.com:80/productlist.php?=2 http://testphp.vulnweb.com:80/product.php?=2 http://testphp.vulnweb.com:80/product.php?pic=2 http://testphp.vulnweb.com:80/redir.php?r=2 http://testphp.vulnweb.com:80/search.php?=2 http://testphp.vulnweb.com:80/search.php?test=2
使用xray对爬取到的url进行批量扫描。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 import timeimport sysimport osif len(sys.argv) == 1 : msg = """ Find XSS with subdomain of craw xray Usage: xss.py subdomain.txt """ print(msg) sys.exit(0 ) file = sys.argv[1 ] with open(file,'r' ) as f: for i in f.readlines(): url = i.strip('\n' ) print(url) d = 'echo {0} | waybackurls > {1}_url1.txt' .format(url,url) os.system(d) a = 'cat {0}_url1.txt | sort -u > {1}_qurl.txt' .format(url,url) os.system(a) g = "cat {0}_qurl.txt | grep '?' | awk -F '=' '{{print $1}}' | sort -u | sed 's/$/&=2/g' > {1}_qc.txt" .format(url,url) print(g) os.system(g) f = '/root/tools/xary/xray_linux_amd64 webscan --url-file {0}_qc.txt --html-output {1}.html' .format(url,url) try : os.system(f) time.sleep(60 ) except : pass
总结 以上主要是非登陆状态下对域名进行爬取链接,然后结合被动扫描进行测试。不足点批量对不同的域名扫描时不能添加cookie,很多时候扫描的url去重后还是有很多的无效url而影响效率。还需要提高自己的开发能力,研究如何爬取精准的url。
未来学习参考 漏扫动态爬虫实践
Web漏洞扫描器
SuperSpider——打造功能强大的爬虫利器