HttpClient + Jsoup 模拟登陆解析HTML.docx
- 文档编号:12456171
- 上传时间:2023-04-19
- 格式:DOCX
- 页数:42
- 大小:25.34KB
HttpClient + Jsoup 模拟登陆解析HTML.docx
《HttpClient + Jsoup 模拟登陆解析HTML.docx》由会员分享,可在线阅读,更多相关《HttpClient + Jsoup 模拟登陆解析HTML.docx(42页珍藏版)》请在冰豆网上搜索。
HttpClient+Jsoup模拟登陆解析HTML
HttpClient+Jsoup模拟登陆,解析HTML,信息筛选(广工图书馆)
HttpClient+Jsoup模拟登陆,解析HTML获取信息
微博:
QQ:
375061590
最近在做一个校园综合Android客户端,主要是想把学校各类网站信息进行整合,放在一个平台上,供学校学生阅览。
思路如下:
拿广东工业大学图书馆网站作为一个例子
实现目标:
用个人账号登陆图书馆并获取到个人借阅情况。
登陆地址http:
//222.200.98.171:
81/login.aspx
这里会用到Chrome的开发者工具(浏览器按F12可以开启)
打开登陆界面的源码,下面是源码中的form标签
Html代码
<formname="aspnetForm"method="post"action="login.aspx?
ReturnUrl=%2fuser%2fuserinfo.aspx"onsubmit="javascript:
returnWebForm_OnSubmit();"id="aspnetForm">
<div>
<inputtype="hidden"name="__EVENTTARGET"id="__EVENTTARGET"value=""/>
<inputtype="hidden"name="__EVENTARGUMENT"id="__EVENTARGUMENT"value=""/>
<inputtype="hidden"name="__VIEWSTATE"id="__VIEWSTATE"value="/wEPDwULLTE0MjY3MDAxNzcPZBYCZg9kFgoCAQ8PFgIeCEltYWdlVXJsBRt+XGltYWdlc1xoZWFkZXJvcGFjNGdpZi5naWZkZAICDw8WAh4EVGV4dAUt5bm/5Lic5bel5Lia5aSn5a2m5Zu+5Lmm6aaG5Lmm55uu5qOA57Si57O757ufZGQCAw8PFgIfAQUcMjAxM+W5tDAz5pyIMDXml6UgIOaYn+acn+S6jGRkAgQPZBYEZg9kFgQCAQ8WAh4LXyFJdGVtQ291bnQCCBYSAgEPZBYCZg8VAwtzZWFyY2guYXNweAAM55uu5b2V5qOA57SiZAICD2QWAmYPFQMTcGVyaV9uYXZfY2xhc3MuYXNweAAM5YiG57G75a+86IiqZAIDD2QWAmYPFQMOYm9va19yYW5rLmFzcHgADOivu+S5puaMh+W8lWQCBA9kFgJmDxUDCXhzdGIuYXNweAAM5paw5Lmm6YCa5oqlZAIFD2QWAmYPFQMUcmVhZGVycmVjb21tZW5kLmFzcHgADOivu+iAheiNkOi0rWQCBg9kFgJmDxUDE292ZXJkdWVib29rc19mLmFzcHgADOaPkOmGkuacjeWKoWQCBw9kFgJmDxUDEnVzZXIvdXNlcmluZm8uYXNweAAP5oiR55qE5Zu+5Lmm6aaGZAIID2QWAmYPFQMbaHR0cDovL2xpYnJhcnkuZ2R1dC5lZHUuY24vAA/lm77kuabppobpppbpobVkAgkPZBYCAgEPFgIeB1Zpc2libGVoZAIDDxYCHwJmZAIBD2QWBAIDD2QWBAIBDw9kFgIeDGF1dG9jb21wbGV0ZQUDb2ZmZAIHDw8WAh8BZWRkAgUPZBYGAgEPEGRkFgFmZAIDDxBkZBYBZmQCBQ8PZBYCHwQFA29mZmQCBQ8PFgIfAQWlAUNvcHlyaWdodCAmY29weTsyMDA4LTIwMDkuIFNVTENNSVMgT1BBQyA0LjAxIG9mIFNoZW56aGVuIFVuaXZlcnNpdHkgTGlicmFyeS4gIEFsbCByaWdodHMgcmVzZXJ2ZWQuPGJyIC8+54mI5p2D5omA5pyJ77ya5rex5Zyz5aSn5a2m5Zu+5Lmm6aaGIEUtbWFpbDpzenVsaWJAc3p1LmVkdS5jbmRkZL5QuJMrEZz+0UxuTVpXZ/EaY5A4"/>
</div>
<scripttype="text/javascript">
//<!
[CDATA[
vartheForm=document.forms[‘aspnetForm’];
if(!
theForm){
theForm=document.aspnetForm;
}
function__doPostBack(eventTarget,eventArgument){
if(!
theForm.onsubmit||(theForm.onsubmit()!
=false)){
theForm.__EVENTTARGET.value=eventTarget;
theForm.__EVENTARGUMENT.value=eventArgument;
theForm.submit();
}
}
//]]>
</script>
<scriptsrc="/WebResource.axd?
d=kbLQnwjf5uNQN4GcWRC5kD1rIySOzkR3uLyKE5xUO0j4Fa2lQPZwQlk_qYaspRXtlojncSBfRJNkA00qXOMQqsKd8WY1&t=634751988274393221"type="text/javascript"></script>
<scriptsrc="/WebResource.axd?
d=nsbO6ZJty6_6fuRufFNYnRiJ-xEoD0xQr70NX6g0v64gngATPLSnyyt7jyZkELLW6THXmh92_m0Y5TyvhES_-JroQeU1&t=634751988274393221"type="text/javascript"></script>
<scripttype="text/javascript">
//<!
[CDATA[
functionWebForm_OnSubmit(){
if(typeof(ValidatorOnSubmit)=="function"&&ValidatorOnSubmit()==false)returnfalse;
returntrue;
}
//]]>
</script>
<div>
<inputtype="hidden"name="__EVENTVALIDATION"id="__EVENTVALIDATION"value="/wEWBQKa7ezdCwKOmK5RApX9wcYGAsP9wL8JAqW86pcIaBhXmFYzd5pGDTk/afln2TfArPw="/>
</div>
<inputname="ctl00$ContentPlaceHolder1$txtlogintype"type="hidden"id="ctl00_ContentPlaceHolder1_txtlogintype"value="0"/>
<divid="Login"class="clearFix">
<divclass="LoginTitle">
登录我的图书馆
</div>
<divclass="LeftLogin">
<divclass="LoginDiv">
<divclass="loginContent">
<divclass="loginInfo">
<spanclass="leftInfo">图书证号:
</span>
<spanclass="rightInfo">
<inputname="ctl00$ContentPlaceHolder1$txtUsername_Lib"type="text"id="ctl00_ContentPlaceHolder1_txtUsername_Lib"class="txtInput"autocomplete="off"/><spanid="ctl00_ContentPlaceHolder1_rfv_UserName_Lib"style="color:
Red;display:
none;">请输入证号</span>
</span>
</div>
<divclass="loginInfo">
<spanclass="leftInfo">密 码:
</span>
<spanclass="rightInfo">
<inputname="ctl00$ContentPlaceHolder1$txtPas_Lib"type="password"id="ctl00_ContentPlaceHolder1_txtPas_Lib"class="txtInput"/><spanid="ctl00_ContentPlaceHolder1_rfv_Password_Lib"style="color:
Red;display:
none;">请输入密码</span>
</span>
</div>
<div>
<spanid="ctl00_ContentPlaceHolder1_lblErr_Lib"></span>
</div>
<divclass="loginInfo">
<inputtype="submit"name="ctl00$ContentPlaceHolder1$btnLogin_Lib"value="登录"onclick="javascript:
WebForm_DoPostBackWithOptions(newWebForm_PostBackOptions("ctl00$ContentPlaceHolder1$btnLogin_Lib","",true,"","",false,false))"id="ctl00_ContentPlaceHolder1_btnLogin_Lib"class="btn"/>
<inputtype="button"value="清空"onclick="rset()"class="btn"/>
</div>
</div>
</div>
</div>
<divclass="RightDescription">
<imgsrc="images/pin.gif"/><br/>
1.如果您使用的是公共电脑,请在使用完毕后,务必退出登录,以保安全。
<br/>
2.首次登录,请先<ahref="changepas.aspx">修改初始密码</a>。
</div>
</div>
<scripttype="text/javascript">
//<!
[CDATA[
varPage_Validators=newArray(document.getElementById("ctl00_ContentPlaceHolder1_rfv_UserName_Lib"),document.getElementById("ctl00_ContentPlaceHolder1_rfv_Password_Lib"));
//]]>
</script>
<scripttype="text/javascript">
//<!
[CDATA[
varctl00_ContentPlaceHolder1_rfv_UserName_Lib=document.all?
document.all["ctl00_ContentPlaceHolder1_rfv_UserName_Lib"]:
document.getElementById("ctl00_ContentPlaceHolder1_rfv_UserName_Lib");
ctl00_ContentPlaceHolder1_rfv_UserName_Lib.controltovalidate="ctl00_ContentPlaceHolder1_txtUsername_Lib";
ctl00_ContentPlaceHolder1_rfv_UserName_Lib.focusOnError="t";
ctl00_ContentPlaceHolder1_rfv_UserName_Lib.errormessage="请输入证号";
ctl00_ContentPlaceHolder1_rfv_UserName_Lib.display="Dynamic";
ctl00_ContentPlaceHolder1_rfv_UserName_Lib.evaluationfunction="RequiredFieldValidatorEvaluateIsValid";
ctl00_ContentPlaceHolder1_rfv_UserName_Lib.initialvalue="";
varctl00_ContentPlaceHolder1_rfv_Password_Lib=document.all?
document.all["ctl00_ContentPlaceHolder1_rfv_Password_Lib"]:
document.getElementById("ctl00_ContentPlaceHolder1_rfv_Password_Lib");
ctl00_ContentPlaceHolder1_rfv_Password_Lib.controltovalidate="ctl00_ContentPlaceHolder1_txtPas_Lib";
ctl00_ContentPlaceHolder1_rfv_Password_Lib.focusOnError="t";
ctl00_ContentPlaceHolder1_rfv_Password_Lib.errormessage="请输入密码";
ctl00_ContentPlaceHolder1_rfv_Password_Lib.display="Dynamic";
ctl00_ContentPlaceHolder1_rfv_Password_Lib.evaluationfunction="RequiredFieldValidatorEvaluateIsValid";
ctl00_ContentPlaceHolder1_rfv_Password_Lib.initialvalue="";
//]]>
</script>
<scripttype="text/javascript">
//<!
[CDATA[
varPage_ValidationActive=false;
if(typeof(ValidatorOnLoad)=="function"){
ValidatorOnLoad();
}
functionValidatorOnSubmit(){
if(Page_ValidationActive){
returnValidatorCommonOnSubmit();
}
else{
returntrue;
}
}
//]]>
</script>
</form>
里面很多代码,我们要从中提取出我们登陆所需要的表单信息,input和select这些标签都是作为登陆表单内容,这里只有input标签我们就提取它就好了,代码如下:
initLoginParmas(StringuserName,StringpassWord)和getLoginFormData(Stringurl)两个方法
Java代码
/**
*初始化参数
*
*@paramuserName
*@parampassWord
*@return
*@throwsParseException
*@throwsIOException
*/
publicstaticList<NameValuePair>initLoginParmas(StringuserName,
StringpassWord)throwsParseException,IOException{
List<NameValuePair>parmasList=newArrayList<NameValuePair>();
HashMap<String,String>parmasMap=getLoginFormData(LoginUrl);
Set<String>keySet=parmasMap.keySet();
for(Stringtemp:
keySet){
if(temp.contains("Username")){
parmasMap.put(temp,userName);
}elseif(temp.contains("txtPas")){
parmasMap.put(temp,passWord);
}
}
Set<String>keySet2=parmasMap.keySet();
System.out.println("表单内容:
");
for(Stringtemp:
keySet2){
System.out.println(temp+"="+parmasMap.get(temp));
}
for(Stringtemp:
keySet2){
parmasList.add(newBasicNameValuePair(temp,parmasMap.get(temp)));
}
//System.out.println("initParams\n"+parmasMap);
returnparmasList;
}
Java代码
/**
*获取登录表单input内容
*
*@paramurl
*@return
*@throwsIOException
*@throwsParseException
*/
publicstaticHashMap<String,String>getLoginFormData(Stringurl)
throwsParseException,IOException{
Documentdocument=Jsoup.parse(getHtml(url));
Elementselement1=document.getElemen
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- HttpClient Jsoup 模拟登陆解析HTML 模拟 登陆 解析 HTML
![提示](https://static.bdocx.com/images/bang_tan.gif)