72 reRegular expression operationsPython v275 documentation.docx
- 文档编号:4944009
- 上传时间:2022-12-12
- 格式:DOCX
- 页数:17
- 大小:29.20KB
72 reRegular expression operationsPython v275 documentation.docx
《72 reRegular expression operationsPython v275 documentation.docx》由会员分享,可在线阅读,更多相关《72 reRegular expression operationsPython v275 documentation.docx(17页珍藏版)》请在冰豆网上搜索。
72reRegularexpressionoperationsPythonv275documentation
7.2.re—Regularexpressionoperations—Pythonv2.7.5documentation
7.2.—Regularexpressionoperations
ThismoduleprovidesregularexpressionmatchingoperationssimilartothosefoundinPerl.BothpatternsandstringstobesearchedcanbeUnicodestringsaswellas8-bitstrings.
Regularexpressionsusethebackslashcharacter('\')toindicatespecialformsortoallowspecialcharacterstobeusedwithoutinvokingtheirspecialmeaning.ThiscollideswithPython’susageofthesamecharacterforthesamepurposeinstringliterals;forexample,tomatchaliteralbackslash,onemighthavetowrite'\\\\'asthepatternstring,becausetheregularexpressionmustbe\\,andeachbackslashmustbeexpressedas\\insidearegularPythonstringliteral.
ThesolutionistousePython’srawstringnotationforregularexpressionpatterns;backslashesarenothandledinanyspecialwayinastringliteralprefixedwith'r'.Sor"\n"isatwo-characterstringcontaining'\'and'n',while"\n"isaone-characterstringcontaininganewline.UsuallypatternswillbeexpressedinPythoncodeusingthisrawstringnotation.
Itisimportanttonotethatmostregularexpressionoperationsareavailableasmodule-levelfunctionsandmethods.Thefunctionsareshortcutsthatdon’trequireyoutocompilearegexobjectfirst,butmisssomefine-tuningparameters.
Seealso
∙MasteringRegularExpressions
∙BookonregularexpressionsbyJeffreyFriedl,publishedbyO’Reilly.ThesecondeditionofthebooknolongercoversPythonatall,butthefirsteditioncoveredwritinggoodregularexpressionpatternsingreatdetail.
7.2.1.RegularExpressionSyntax
Aregularexpression(orRE)specifiesasetofstringsthatmatchesit;thefunctionsinthismoduleletyoucheckifaparticularstringmatchesagivenregularexpression(orifagivenregularexpressionmatchesaparticularstring,whichcomesdowntothesamething).
Regularexpressionscanbeconcatenatedtoformnewregularexpressions;ifAandBarebothregularexpressions,thenABisalsoaregularexpression.Ingeneral,ifastringpmatchesAandanotherstringqmatchesB,thestringpqwillmatchAB.ThisholdsunlessAorBcontainlowprecedenceoperations;boundaryconditionsbetweenAandB;orhavenumberedgroupreferences.Thus,complexexpressionscaneasilybeconstructedfromsimplerprimitiveexpressionsliketheonesdescribedhere.Fordetailsofthetheoryandimplementationofregularexpressions,consulttheFriedlbookreferencedabove,oralmostanytextbookaboutcompilerconstruction.
Abriefexplanationoftheformatofregularexpressionsfollows.Forfurtherinformationandagentlerpresentation,consulttheRegularExpressionHOWTO.
Regularexpressionscancontainbothspecialandordinarycharacters.Mostordinarycharacters,like'A','a',or'0',arethesimplestregularexpressions;theysimplymatchthemselves.Youcanconcatenateordinarycharacters,solastmatchesthestring'last'.(Intherestofthissection,we’llwriteRE’sinthisspecialstyle,usuallywithoutquotes,andstringstobematched'insinglequotes'.)
Somecharacters,like'|'or'(',arespecial.Specialcharacterseitherstandforclassesofordinarycharacters,oraffecthowtheregularexpressionsaroundthemareinterpreted.Regularexpressionpatternstringsmaynotcontainnullbytes,butcanspecifythenullbyteusingthe\numbernotation,e.g.,'\x00'.
Thespecialcharactersare:
∙'.'
∙(Dot.)Inthedefaultmode,thismatchesanycharacterexceptanewline.Iftheflaghasbeenspecified,thismatchesanycharacterincludinganewline.
∙'^'
∙(Caret.)Matchesthestartofthestring,andinmodealsomatchesimmediatelyaftereachnewline.
∙'$'
∙Matchestheendofthestringorjustbeforethenewlineattheendofthestring,andinmodealsomatchesbeforeanewline.foomatchesboth‘foo’and‘foobar’,whiletheregularexpressionfoo$matchesonly‘foo’.Moreinterestingly,searchingforfoo.$in'foo1\nfoo2\n'matches‘foo2’normally,but‘foo1’inmode;searchingforasingle$in'foo\n'willfindtwo(empty)matches:
onejustbeforethenewline,andoneattheendofthestring.
∙'*'
∙CausestheresultingREtomatch0ormorerepetitionsoftheprecedingRE,asmanyrepetitionsasarepossible.ab*willmatch‘a’,‘ab’,or‘a’followedbyanynumberof‘b’s.
∙'+'
∙CausestheresultingREtomatch1ormorerepetitionsoftheprecedingRE.ab+willmatch‘a’followedbyanynon-zeronumberof‘b’s;itwillnotmatchjust‘a’.
∙'?
'
∙CausestheresultingREtomatch0or1repetitionsoftheprecedingRE.ab?
willmatcheither‘a’or‘ab’.
∙*?
+?
?
?
∙The'*','+',and'?
'qualifiersareallgreedy;theymatchasmuchtextaspossible.Sometimesthisbehaviourisn’tdesired;iftheRE<.*>ismatchedagainst'
title
',itwillmatchtheentirestring,andnotjust''.Adding'?
'afterthequalifiermakesitperformthematchinnon-greedyorminimalfashion;asfewcharactersaspossiblewillbematched.Using.*?
inthepreviousexpressionwillmatchonly'
'.
∙{m}
∙SpecifiesthatexactlymcopiesofthepreviousREshouldbematched;fewermatchescausetheentireREnottomatch.Forexample,a{6}willmatchexactlysix'a'characters,butnotfive.
∙{m,n}
∙CausestheresultingREtomatchfrommtonrepetitionsoftheprecedingRE,attemptingtomatchasmanyrepetitionsaspossible.Forexample,a{3,5}willmatchfrom3to5'a'characters.Omittingmspecifiesalowerboundofzero,andomittingnspecifiesaninfiniteupperbound.Asanexample,a{4,}bwillmatchaaaaborathousand'a'charactersfollowedbyab,butnotaaab.Thecommamaynotbeomittedorthemodifierwouldbeconfusedwiththepreviouslydescribedform.
∙{m,n}?
∙CausestheresultingREtomatchfrommtonrepetitionsoftheprecedingRE,attemptingtomatchasfewrepetitionsaspossible.Thisisthenon-greedyversionofthepreviousqualifier.Forexample,onthe6-characterstring'aaaaaa',a{3,5}willmatch5'a'characters,whilea{3,5}?
willonlymatch3characters.
∙'\'
∙Eitherescapesspecialcharacters(permittingyoutomatchcharacterslike'*','?
',andsoforth),orsignalsaspecialsequence;specialsequencesarediscussedbelow.Ifyou’renotusingarawstringtoexpressthepattern,rememberthatPythonalsousesthebackslashasanescapesequenceinstringliterals;iftheescapesequenceisn’trecognizedbyPython’sparser,thebackslashandsubsequentcharacterareincludedintheresultingstring.However,ifPythonwouldrecognizetheresultingsequence,thebackslashshouldberepeatedtwice.Thisiscomplicatedandhardtounderstand,soit’shighlyrecommendedthatyouuserawstringsforallbutthesimplestexpressions.
∙[]
∙Usedtoindicateasetofcharacters.Inaset:
∙Characterscanbelistedindividually,e.g.[amk]willmatch'a','m',or'k'.
∙Rangesofcharacterscanbeindicatedbygivingtwocharactersandseparatingthembya'-',forexample[a-z]willmatchanylowercaseASCIIletter,[0-5][0-9]willmatchallthetwo-digitsnumbersfrom00to59,and[0-9A-Fa-f]willmatchanyhexadecimaldigit.If-isescaped(e.g.[a\-z])orifit’splacedasthefirstorlastcharacter(e.g.[a-]),itwillmatchaliteral'-'.
∙Specialcharacterslosetheirspecialmeaninginsidesets.Forexample,[(+*)]willmatchanyoftheliteralcharacters'(','+','*',or')'.
∙Characterclassessuchas\wor\S(definedbelow)arealsoacceptedinsideaset,althoughthecharacterstheymatchdependsonwhetherormodeisinforce.
∙Charactersthatarenotwithinarangecanbematchedbycomplementingtheset.Ifthefirstcharacterofthesetis'^',allthecharactersthatarenotinthesetwillbematched.Forexample,[^5]willmatchanycharacterexcept'5',and[^^]willmatchanycharacterexcept'^'.^hasnospecialmeaningifit’snotthefirstcharacterintheset.
∙Tomatchaliteral']'insideaset,precedeitwithabackslash,orplaceitatthebeginningoftheset.Forexample,both[()[\]{}]and[]()[{}]willbothmatchaparenthesis.
∙'|'
∙A|B,whereAandBcanbearbitraryREs,createsaregularexpressionthatwillmatcheitherAorB.AnarbitrarynumberofREscanbeseparatedbythe'|'inthisway.Thiscanbeusedinsidegroups(seebelow)aswell.Asthetargetstringisscanned,REsseparatedby'|'aretriedfromlefttoright.Whenonepatterncompletelymatches,thatbranchisaccepted.ThismeansthatonceAmatches,Bwillnotbetestedfurther,evenifitwouldproducealongeroverallmatch.Inotherwords,the'|'operatorisnevergreedy.Tomatchaliteral'|',use\|,orencloseitinsideacharacterclass,asin[|].
∙(...)
∙Matcheswhateverregularexpressionisinsidetheparentheses,andindicatesthestartandendofagroup;thecontentsofagroupcanberetrievedafteramatchhasbeenperformed,andcanbematchedlaterinthestringwiththe\numberspecialsequence,describedbelow.Tomatchtheliterals'('or')',use\(or\),orenclosetheminsideacharacterclass:
[(][)].
∙(?
...)
∙Thisisanextensionnotation(a'?
'followinga'('isnotmeaningfulotherwise).Thefirstcharacterafterthe'?
'determineswhatthemeaningandfurthersyntaxoftheconstructis.Extensionsusuallydonotcreateanewgroup;(?
P
∙(?
iLmsux)
∙(Oneormorelettersfromtheset'i','L','m','s','u','x'.)Thegroupmatchestheemptystring;theletterssetthecorrespondingflags:
(ignorecase),(localedependent),(multi-line),(dotmatchesall),(Unicodedependent),and(verbose),fortheentireregularexpression.(Theflagsaredescribedin.)Thisisusefulifyouwishtoinclu
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 72 re Regular expression operations Python v275 documentation
链接地址:https://www.bdocx.com/doc/4944009.html