书签分享收藏举报版权申诉 / 18

立即下载加入VIP,免费下载

当前位置：首页 > 小学教育 > 学科竞赛 > Oracle Text使用小结.docx

Oracle Text使用小结.docx

文档编号：3377262
上传时间：2022-11-22
格式：DOCX
页数：18
大小：24.89KB

《Oracle Text使用小结.docx》由会员分享，可在线阅读，更多相关《Oracle Text使用小结.docx（18页珍藏版）》请在冰豆网上搜索。

Oracle Text使用小结.docx

OracleText使用小结

一、OracleText介绍

Oracle从7.3开始支持全文检索，即用户可以使用Oracle服务器的上下文（ConText）完成基于文本的查询（具体可采用通配符查找、模糊匹配、相关分类、近似查找、条件加权和词意扩充等方法）；在Oracle8.0.x中称为ConText；在Oracle8i中称为interMediaText；Oracle9i中称为OracleText。

OracleText是9i标准版和企业版的一部分，Oracle9i将全文检索功能做为内置功能提供给用户，使得用户在创建数据库实例时自动安装全文检索。

OracleText使Oracle9i具备了强大的文本检索能力和智能化的文本管理能力。

使用OracleText，可以方便而有效地利用标准的SQL工具来构建基于文本的新的开发工具或对现有应用程序进行扩展。

应用程序开发人员可以在任何使用文本的Oracle数据库应用程序中充分利用OracleText搜索，应用范围可以是现有应用程序中可搜索的注释字段，也可是实现涉及多种文档格式（包括doc，excel，txt，pdf等）和复杂搜索标准的大型文档管理系统，还可是来自Internet和文件系统的文本数据搜索XML应用程序。

OracleText支持Oracle数据库所支持的大多数语言的基本全文搜索功能。

要使用OracleText，必须具有CTXAPP角色或者是CTXSYS用户。

OracleText为系统管理员提供CTXSYS用户，为应用程序开发人员提供CTXAPP角色。

CTXSYS用户可执行以下任务：

启动OracleText服务器，执行CTXAPP角色的所有任务。

具有CTXAPP角色的用户可执行以下任务：

创建索引，管理OracleText数据字典，包括创建和删除首选项，进行OracleText查询，使用OracleTextPL/SQL程序包。

二、OracleText索引

a、Index简介：

索引就是将文本打碎分成很多标记（token），这些标记通常是用空格分开的一个个单词。

OracleText应用的实现可以理解为就是一个“装载数据—>配置索引—>索引数据—>执行检索—>维护索引”的过程。

Index的索引类型有：

CONTEXT，CTXCAT，CTXRULE。

下面就对索引做简单的描述：

IndexType

ApplicationType

QueryOperator

CONTEXT

Usethisindextobuildatextretrievalapplicationwhenyourtextconsistsoflargecoherentdocuments.YoucanindexdocumentsofdifferentformatssuchasMSWord,HTML,XML,orplaintext.Withacontextindex,youcancustomizeyourindexinavarietyofways.

CONTAINS

CTXCAT

Usethisindextypetoindexsmalltextfragmentssuchasitemnames,pricesanddescriptionsthatarestoredacrosscolumns.Withthisindex,queryperformanceisimprovedformixedqueries.

CATSEARCH

CTXRULE

UseaCTXRULEindextobuildadocumentclassificationapplication.TheCTXRULEindexisanindexcreatedonatableofqueries,whereeachqueryhasaclassification.Singledocuments（plaintext,HTML,orXML）canbeclassifiedusingtheMATCHESoperator.

MATCHES

最常用的就是CONTEXT索引，使用最通用的CONTAINS操作符进行查询。

b、CONTEXT索引

OracleTextCONTEXT索引是反向索引（invertedindex），每个标记（token）都映射着包含它自己的文本位置。

在索引建好后，可以查到Oracle自动产生的表（假设索引名为myindex）：

DR$myindex$I、DR$myindex$K、DR$myindex$R、DR$myindex$N，其中以I表最重要，该表保存的是Oracle分析文档后生成的token记录，包括token出现的位置、次数、hash值等。

包括一下参数：

DatastoreTypes，FilterTypes，LexerTypes，WordlistType，StorageTypes，SectionGroupTypes，Stoplists，System-DefinedPreferences，SystemParameters，每个参数设置的目的是：

PreferenceClass

AnswerstheQuestion

Datastore

Howareyourdocumentsstored?

Filter

Howcanthedocumentsbeconvertedtoplaintext?

Lexer

Whatlanguageisbeingindexed?

Wordlist

Howshouldstemandfuzzyqueriesbeexpanded?

Storage

Howshouldtheindextablesbestored?

StopList

Whatwordsorthemesarenottobeindexed?

SectionGroup

Isqueryingwithinsectionsenabled,andhowarethedocumentsectionsdefined?

下面就对每个参数包含的值、值的意义及目的做简单描述：

1、DatastoreTypes

Datastore

TypeUseWhen

DIRECT_DATASTORE

Dataisstoredinternallyinthetextcolumn.Eachrowisindexedasasingledocument.

MULTI_COLUMN_DATASTORE

Dataisstoredinatexttableinmorethanonecolumn.Columnsareconcatenatedtocreateavirtualdocument,oneperrow.

DETAIL_DATASTORE

Dataisstoredinternallyinthetextcolumn.Documentconsistsofoneormorerowsstoredinatextcolumninadetailtable,withheaderinformationstoredinamastertable.

FILE_DATASTORE

Dataisstoredexternallyinoperatingsystemfiles.Filenamesarestoredinthetextcolumn,oneperrow.

NESTED_DATASTORE

Dataisstoredinanestedtable.

URL_DATASTORE

DataisstoredexternallyinfileslocatedonanintranetortheInternet.UniformResourceLocators（URLs）arestoredinthetextcolumn.

USER_DATASTORE

Documentsaresynthesizedatindextimebya

user-definedstoredprocedure.

2、FilterTypes

FilterPreferencetype

Description

CHARSET_FILTER

Charactersetconvertingfilter

INSO_FILTER

Insofilterforfilteringformatteddocuments

NULL_FILTER

Nofilteringrequired.Useforindexingplaintext,HTML,orXMLdocuments

USER_FILTER

User-definedexternalfiltertobeusedforcustomfiltering

PROCEDURE_FILTER

User-definedstoredprocedurefiltertobeusedforcustomfiltering.

3、LexerTypes

type

Description

BASIC_LEXER

Lexerforextractingtokensfromtextinlanguages,suchasEnglishandmostwesternEuropeanlanguagesthatusewhitespacedelimitedwords.

MULTI_LEXER

Lexerforindexingtablescontainingdocumentsofdifferentlanguages

CHINESE_VGRAM_LEXER

LexerforextractingtokensfromChinesetext

JAPANESE_VGRAM_LEXER

LexerforextractingtokensfromJapanesetext.

JAPANESE_LEXER

LexerforextractingtokensfromJapanesetext.

KOREAN_LEXER

LexerforextractingtokensfromKoreantext.

KOREAN_MORPH_LEXER

LexerforextractingtokensfromKoreantext（recommended）.

basic_lexer，针对英语；chinese_vgram_lexer，专门的汉语分析器，支持所有汉字字符集；chinese_lexer，这是一个新的汉语分析器，只支持utf8字符集（也支持zhs16gbk字符集）。

4、WordlistType

Usethewordlistpreferencetoenablethequeryoptionssuchasstemming,fuzzymatchingforyourlanguage.YoucanalsousethewordlistpreferencetoenablesubstringandprefixindexingwhichimprovesperformanceforwildcardquerieswithCONTAINSandCATSEARCH.

Tocreateawordlistpreference,youmustuseBASIC_WORDLIST,whichistheonlytypeavailable.

Attribute

AttributeValues

stemmer

Specifywhichlanguagestemmertouse.Youcanspecifyoneofthefollowing:

NULL（nostemming）,

ENGLISH（Englishinflectional）,DERIVATIONAL（Englishderivational）,DUTCH,FRENCH,GERMAN,ITALIAN,SPANISH,AUTO（automaticlanguage-detectionforstemming）

fuzzy_match

Specifywhichfuzzymatchingclustertouse.Youcanspecifyoneofthefollowing:

GENERIC,JAPANESE_VGRAM,KOREAN,CHINESE_VGRAM,ENGLISH

DUTCH,FRENCH,GERMAN,ITALIAN,SPANISH,OCR

AUTO（automaticlanguagedetectionforstemming）

fuzzy_score

Specifyadefaultlowerlimitoffuzzyscore.Specifyanumberbetween0and80.Textwithscoresbelowthisnumberisnotreturned.Defaultis60.

fuzzy_numresults

Specifythemaximumnumberoffuzzyexpansions.Useanumberbetween0and5,000.Defaultis100.

substring_index

SpecifyTRUEforOracletocreateasubstringindex.Asubstringindeximprovesleft-truncatedanddouble-truncatedwildcardqueriessuchas%ingor%benz%.DefaultisFALSE.

prefix_index

SpecifyYEStoenableprefixindexing.PrefixindexingimprovesperformanceforrighttruncatedwildcardsearchessuchasTO%.DefaultstoNO.

prefix_length_min

Specifytheminimumlengthofindexedprefixes.Defaultsto1.

prefix_length_max

Specifythemaximumlengthofindexedprefixes.Defaultsto64.

wlidcard_maxterms

Specifythemaximumnumberoftermsinawildcardexpansion.Useanumberbetween1and15,000.Defaultis5,000.

5、StorageTypes

UsethestoragepreferencetospecifytablespaceandcreationparametersfortablesassociatedwithaTextindex.ThesystemprovidesasinglestoragetypecalledBASIC_STORAGE:

type

Description

BASIC_STORAGE

IndexingtypeusedtospecifythetablespaceandcreationparametersforthedatabasetablesandindexesthatconstituteaTextindex.

BASIC_STORAGEhasthefollowingattributes:

Attribute

AttributeValue

i_table_clause

Parameterclausefordr$indexname$Itablecreation.SpecifystorageandtablespaceclausestoaddtotheendoftheinternalCREATETABLEstatement.

TheItableistheindexdatatable.

k_table_clause

Parameterclausefordr$indexname$Ktablecreation.SpecifystorageandtablespaceclausestoaddtotheendoftheinternalCREATETABLEstatement.

TheKtableisthekeymaptable.

r_table_clause

Parameterclausefordr$indexname$Rtablecreation.SpecifystorageandtablespaceclausestoaddtotheendoftheinternalCREATETABLEstatement.

TheRtableistherowidtable.

Thedefaultclauseis:

’LOB（DATA）STOREAS（CACHE）’

n_table_clause

Parameterclausefordr$indexname$Ntablecreation.SpecifystorageandtablespaceclausestoaddtotheendoftheinternalCREATETABLEstatement.

TheNtableisthenegativelisttable.

i_index_clause

Parameterclausefordr$indexname$Xindexcreation.SpecifystorageandtablespaceclausestoaddtotheendoftheinternalCREATEINDEXstatement.Thedefaultclauseis:

’COMPRESSe’

p_table_clause

ParameterclauseforthesubstringindexifyouhaveenabledSUBSTRING_INDEXintheBASIC_WORDLIST.

SpecifystorageandtablespaceclausestoaddtotheendoftheinternalCREATEINDEXstatement.ThePtableisanindex-organizedtablesothestorageclauseyouspecifymustbeappropriatetothistypeoftable.

6、SectionGroupTypes

SectionGroupPreference

Description

NULL_SECTION_GROUP

UsethisgrouptypewhenyoudefinenosectionsorwhenyoudefineonlySENTENCEorPARAGRAPHsections.Thisisthedefault.

BASIC_SECTION_GROUP

Usethisgrouptypefordefiningsectionswherethestartandendtagsareoftheformand.

HTML_SECTION_GROUP

UsethisgrouptypeforindexingHTMLdocumentsandfordefiningsectionsinHTMLdocuments.

XML_SECTION_GROUP

UsethisgrouptypeforindexingXMLdocumentsandfordefiningsectionsinXMLdocuments.

AUTO_SECTION_GROUP

Usethisgrouptypetoautomaticallycreateazonesectionforeachstart-tag/end-tagpairinanXMLdocument.ThesectionnamesderivedfromXMLtagsarecasesensitiveasinXML.

AttributesectionsarecreatedautomaticallyforXMLtagsthathaveattributes.Attributesectionsarenamedintheformattribute@tag.

Stopsections,emptytags,processinginstructions,andcommentsarenotindexed.

Thefollowinglimitationsapplytoautomaticsectiongroups:

●Youcannotaddzone,field,orspecialsectionstoanautomaticsectiongroup.

●AutomaticsectioningdoesnotindexXMLdocumenttypes（rootelements.）However,youcandefinestopsectionswithdocumenttype.

●Thelengthoftheindexedtags,includingprefixandnamespace,cannotexceed64characters.Tags

longerthanthisarenotindexed.

PATH_SECTION_GROUP

Usethisgrouptypet