CN105868177A - Universal formula search method - Google Patents
Universal formula search method Download PDFInfo
- Publication number
- CN105868177A CN105868177A CN201610171766.5A CN201610171766A CN105868177A CN 105868177 A CN105868177 A CN 105868177A CN 201610171766 A CN201610171766 A CN 201610171766A CN 105868177 A CN105868177 A CN 105868177A
- Authority
- CN
- China
- Prior art keywords
- mathematical
- document
- mathematical formulae
- index
- checked
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a universal formula search method. The method comprises the following steps of establishing a universal formula search engine; running a plurality of network crawler processes by a searcher; extracting a mathematic formula in a document from an original webpage database by an indexer; performing mathematic formula query by a querier through a mathematic formula index and mathematic symbol dictionary database; and returning the document containing the queried mathematic formula in the original webpage database to a user by the querier and displaying the document in a search completion interface. The method has the beneficial effects that a quick and accurate mathematic formula search scheme is provided for scientific research personnel and teaching staffs, a theoretical support is provided for scientific research, and extensive and accurate material search schemes are provided for teaching and science popularization; and dedicated teaching formula search interfaces are provided for various document databases, the application and business ranges of the document databases are expanded, and payment download interfaces are provided for increasing the benefits of the document databases.
Description
Technical field
The invention belongs to search engine technique field, relate to a kind of general formula searching method.
Background technology
For carrying out mathematical formulae search, two methods are typically had to select.
First method is a method gradually evolved, by extension on existing text search system and mathematics phase
The function of search adapted to.The method, on the basis of the string representation of mathematical formulae, utilizes traditional text retrieval side
Method search mathematical formulae.This method is due to text search system based on existing maturation, and workload is little.
Second method is to create a brand-new mathematical formulae search system, fully collects and index mathematical material, profit
Scan for by the structure in the content representation form of mathematical formulae, start from scratch completely, when this method is to need more
Between and energy.The method is wanted careful use and integrate various computer algebra and symbol-manipulation technique, simultaneously by not being required nothing more than
Also require exploitation novelty index and search technique, and the research of this respect does not the most also start to.Certainly, it is adaptable to computer generation
The mathematic(al) representation analytic technique of number system and compiler has been developed and has been achieved.These technology can and also should be used.
Use first method be DLMF (Digital Library of Mathematical Functions) and
ActiveMath system.It is the most indexed that mathematical formulae is converted into textual form.Search string is similar to LaTex order,
Search is performed after being converted into character string.This searches for mathematical material while allowing search plain text, but it can not carry
For strong mathematical formulae function of search.
One similar method is to use XQuery search engine based on XML.Having of both approaches is identical excellent
Gesture is to rely on existing technology, but they are not the most provided that a searching method being perfectly facing mathematical formulae.
Take second method is MBase system, and it uses the pattern match of programming language to find in knowledge base
The mathematics terms of OMDoc coding [24].The search engine of HELM system is to from the mathematical formulae that Content MathML represents
The structural metadata extracted is indexed, to provide effective retrieval.According to being that metadata is similar to formula structure, can conduct
A kind of filter of large-scale terminological data bank.But, due to complete formula structure information dropout, semantic equivalence cannot be protected
Card.
Summary of the invention
The technical problem to be solved be to provide a kind of can be by mathematical formulae precise search web document
General formula searching method.
Be the technical scheme is that a kind of general formula searching method by solving above-mentioned technical problem, it includes as follows
Step:
(1) general formula search engine is set up;Described general formula search engine includes:
Searcher, for roaming, finding and collect mathematical formulae in the Internet;
Index, is used for setting up mathematical formulae index;
Requestor, for query statement is converted to query task, gives index, completes inquiry, and returns result to use
Family;
(2) set up mathematical symbol dictionary database, and distribute one No. ID, as mathematical symbol dictionary to every kind of mathematical symbol
ID;
(3) described searcher runs multiple web crawlers processes;Web crawlers collects webpage from network, it is judged that document in webpage
Whether comprise mathematical formulae;If comprising mathematical formulae, then downloading described document, storing after being compressed described document processing
In raw page data storehouse;
(4) described index extracts the mathematical formulae in described document in described raw page data storehouse;Described index root
According to described mathematical symbol dictionary database, described mathematical formulae is set up mathematical formulae to index;
(5), after described requestor receives inquiry request, described mathematical formulae index and mathematical symbol dictionary database is utilized to enter
Row mathematical formulae is inquired about, and obtains the mathematical formulae inquired;
(6) document comprising, in raw page data storehouse, the mathematical formulae inquired is returned to user and shows by described requestor
On the interface searched for.
Described set up mathematical formulae index concrete grammar as follows:
A. the mathematical formulae in described document is converted to text-string form mathematical formulae;
B. described text-string form mathematical formulae is carried out participle, described text-string form mathematical formulae is decomposed into
Mathematical symbol, records described mathematical symbol positional information in described text-string form mathematical formulae simultaneously;
C. utilize mathematical symbol dictionary database, described mathematical symbol is converted to and described literary composition corresponding for mathematical symbol dictionary ID
Shelves mathematical symbol ID;
D. according to document mathematical symbol ID, mathematical symbol set up document mathematics notation index;
F. document mathematics notation index table is set up according to setting up document mathematics notation index.
Described mathematical formulae query steps is as follows:
The most described requestor receives mathematical formulae to be checked;
B. judge whether described mathematical formulae to be checked is text formatting;If described mathematical formulae to be checked is not text lattice
Formula, carries out text formatting conversion by described mathematical formulae to be checked, is converted to text formatting mathematical formulae to be checked;
C. mathematical formulae to be checked to described text formatting carries out participle, and is decomposed into be checked by mathematical formulae to be checked for this form
Ask mathematical symbol, record the described mathematical symbol to be checked positional information in mathematical formulae to be checked simultaneously;
D. mathematical symbol to be checked is converted to corresponding with described mathematical symbol dictionary ID according to mathematical symbol dictionary database
Mathematical symbol ID to be checked;
E. in concordance list, inquire about document mathematical symbol ID, obtain and described Query Result consistent for mathematical symbol ID to be checked;
F., Query Result utilizes KNN algorithm carry out computing, obtains the mathematical formulae inquired;
G. the document comprising the mathematical formulae content inquired in raw page data storehouse is returned to user.
Described mathematical formulae index includes towards the Presentation index represented and the Content rope of Semantic-Oriented
Draw.
Described requestor receives the typing mode of mathematical formulae to be checked and includes that structured document typing mode and image are adopted
Collection typing mode;Described structured document typing mode is latex structured document typing mode, word formula editors structure
Change document typing mode and pdf structured document typing mode;Described image acquisition typing mode includes insert pictures mode, screen
Curtain sectional drawing mode, camera collection mode, scanner acquisition mode and high photographing instrument acquisition mode.
The document of the described user of returning to includes the documents and materials that the derivation of formula is relevant to formula;Described return
It is PPT, Word, Latex or PDF to the structured document form of the document of user.
The document of the described user of returning to according to similarity, drawn the frequency and public's scoring carries out the division of priority, root
It is ranked up according to the division of priority and shows on the interface searched for.
Its contain respectively with middle National IP Network, ten-thousand-ton train, dragon citation journals interface, it is provided that under the paying of the document retrieved
Carrying function, in supporting respectively, the reimbursement that has of National IP Network, ten-thousand-ton train and dragon citation journals is downloaded.
The invention has the beneficial effects as follows: the present invention provides quick and accurate mathematical formulae for scientific research personnel and faculty
Search plan, provides theories integration for scientific research, also provides material collection side extensively and accurately for teaching and scientific popularization
Case;The present invention provides special mathematical formulae searching interface for each big bibliographic data base, expand bibliographic data base range and
The scope of business, it is provided that paying download interface increases bibliographic data base income.To sum up, the present invention has higher economic benefit and society
Can benefit.
Accompanying drawing explanation
Fig. 1 is for extracting mathematical formulae and carrying out conversion process flow chart from webpage.
Fig. 2 is for setting up mathematical symbol index flow chart.
Fig. 3 is for asking mathematical formulae querying flow figure.
Detailed description of the invention
Below in conjunction with Fig. 1-3 and specific embodiment, the present invention is described further.
Embodiment one:
Set up index database, common-used formula and corresponding ID according to original mathematical dictionary to be indexed;Prepare initial data, to original
Data carry out pretreatment, and setup parameter K;Input formula in a browser, and utilize participle technique to be converted on backstage
Latex form is ax^{2}+bx+c=0;Run search device, searcher runs multiple crawlers and counts parallel in webpage
According to search;The page download containing content ax^{2}+bx+c=0 in webpage being got off, compression storage is to raw page data storehouse;
Utilize participle technique to be broken down into ax^{2}, bx, c according to content to be indexed;Concordance list and former is combined according to web page contents
The mathematics dictionary storehouse begun, generates formula to be output;Use KNN algorithm, setup parameter K, safeguard size be K by Europe several in
Must be used for storing training tuple (formula the most to be output) apart from descending priority query.By webpage prime formula Latex
Form is as test tuple;From training tuple, choose at random K tuple as initial arest neighbors tuple, calculate test respectively
Training unit's deck label and distance, to the distance of this K tuple, are stored in priority query by tuple;Travel through complete, calculate priority
Most classes of K tuple in queue, and as the classification of test tuple;Test tuple calculates error rate after being completed,
Continue to set different K value re-training, finally take the K value that error rate is minimum, be converted to similarity;Finally according to similarity,
Take front 30% according to similarity degree, return to user.
Embodiment two:
Set up index database, common-used formula and corresponding ID according to original mathematical dictionary to be indexed;Prepare initial data, to original
Data carry out pretreatment, and setup parameter K;Input formula in a browser, and utilize participle technique to be converted on backstage
Latex form be sin left (3x+ frac{ pi}{6} right);Run search device, searcher runs multiple climbing
Worm program carries out data search in webpage parallel;By in webpage containing content sin left (3x+ frac{ pi}{6}
Right) page download is got off, and compression storage is to raw page data storehouse;Participle technique is utilized to be decomposed according to content
For sin, left (, 3x, frac{ pi}{6}, right) be indexed;Concordance list and former is combined according to web page contents
The mathematics dictionary storehouse begun, generates formula to be output;Use KNN algorithm, setup parameter K, safeguard size be K by Europe several in
Must be used for storing training tuple (formula the most to be output) apart from descending priority query.By webpage prime formula Latex
Form is as test tuple;From training tuple, choose at random K tuple as initial arest neighbors tuple, calculate test respectively
Training unit's deck label and distance, to the distance of this K tuple, are stored in priority query by tuple;Travel through complete, calculate priority
Most classes of K tuple in queue, and as the classification of test tuple;Test tuple calculates error rate after being completed,
Continue to set different K value re-training, finally take the K value that error rate is minimum, be converted to similarity;Finally according to similarity,
Take front 30% according to similarity degree, return to user.
Embodiment three:
Set up index database, common-used formula and corresponding ID according to original mathematical dictionary to be indexed;Prepare initial data, to original
Data carry out pretreatment, and setup parameter K;Input formula in a browser, and utilize participle technique to be converted on backstage
Latex form be lim_{n rightarrow infty left (1+ frac{1}{n} right) ^{n} run
Searcher, searcher runs multiple crawlers and carries out data search in webpage parallel;By in webpage containing content lim_
{ n rightarrow infty } the page download of left (1+ frac{1}{n} right) ^{n} get off, compression is deposited
Storage is to raw page data storehouse;According to content utilize participle technique be broken down into lim_{n rightarrow infty
, left (1+ frac{1}{n} right) ^{n}) be indexed;Concordance list and original is combined according to web page contents
Mathematics dictionary storehouse, generates formula to be output;Use KNN algorithm, setup parameter K, safeguard size be K by Euclid away from
From descending priority query, it is used for storing training tuple (formula the most to be output).By webpage prime formula Latex form
As test tuple;From training tuple, choose at random K tuple as initial arest neighbors tuple, calculate respectively and test tuple
To the distance of this K tuple, training unit's deck label and distance are stored in priority query;Travel through complete, calculate priority query
Most classes of middle K tuple, and as the classification of test tuple;Test tuple calculates error rate after being completed, and continues
Set different K value re-training, finally take the K value that error rate is minimum, be converted to similarity;Finally according to similarity, according to
Similarity degree takes front 30%, returns to user.
The present invention establishes general formula search engine;Described general formula search engine includes:
Searcher, for roaming, finding and collect mathematical formulae in the Internet;
Index, is used for setting up mathematical formulae index;
Requestor, for query statement is converted to query task, gives index, completes inquiry, and returns result to use
Family;
The present invention establishes mathematical symbol dictionary database, and distributes one No. ID, as mathematical symbol to every kind of mathematical symbol
Dictionary ID;
Described searcher runs multiple web crawlers processes;Web crawlers collects webpage from network, it is judged that in webpage, document is
No comprise mathematical formulae;If comprising mathematical formulae, then downloading described document, storing after being compressed described document processing and arriving
In raw page data storehouse;
Described index extracts the mathematical formulae in described document in described raw page data storehouse;Described index is according to institute
State mathematical symbol dictionary database and described mathematical formulae is set up mathematical formulae index;
After described requestor receives inquiry request, utilize described mathematical formulae index and mathematical symbol dictionary database number
Formula is inquired about, and obtains the mathematical formulae inquired;
The document comprising, in raw page data storehouse, the mathematical formulae inquired is returned to user and shows by described requestor
On the interface searched for.
Described set up mathematical formulae index concrete grammar as follows:
A. the mathematical formulae in described document is converted to text-string form mathematical formulae;
B. described text-string form mathematical formulae is carried out participle, described text-string form mathematical formulae is decomposed into
Mathematical symbol, records described mathematical symbol positional information in described text-string form mathematical formulae simultaneously;
C. utilize mathematical symbol dictionary database, described mathematical symbol is converted to and described literary composition corresponding for mathematical symbol dictionary ID
Shelves mathematical symbol ID;
D. according to document mathematical symbol ID, mathematical symbol set up document mathematics notation index;
F. document mathematics notation index table is set up according to setting up document mathematics notation index.
Described mathematical formulae query steps is as follows:
The most described requestor receives mathematical formulae to be checked;
B. judge whether described mathematical formulae to be checked is text formatting;If described mathematical formulae to be checked is not text lattice
Formula, carries out text formatting conversion by described mathematical formulae to be checked, is converted to text formatting mathematical formulae to be checked;
C. mathematical formulae to be checked to described text formatting carries out participle, and is decomposed into be checked by mathematical formulae to be checked for this form
Ask mathematical symbol, record the described mathematical symbol to be checked positional information in mathematical formulae to be checked simultaneously;
D. mathematical symbol to be checked is converted to corresponding with described mathematical symbol dictionary ID according to mathematical symbol dictionary database
Mathematical symbol ID to be checked;
E. in concordance list, inquire about document mathematical symbol ID, obtain and described Query Result consistent for mathematical symbol ID to be checked;
F., Query Result utilizes KNN algorithm carry out computing, obtains the mathematical formulae inquired;
G. the document comprising the mathematical formulae content inquired in raw page data storehouse is returned to user.
Described mathematical formulae index includes towards the Presentation index represented and the Content rope of Semantic-Oriented
Draw.
Described requestor receives the typing mode of mathematical formulae to be checked and includes that structured document typing mode and image are adopted
Collection typing mode;Described structured document typing mode is latex structured document typing mode, word formula editors structure
Change document typing mode and pdf structured document typing mode;Described image acquisition typing mode includes insert pictures mode, screen
Curtain sectional drawing mode, camera collection mode, scanner acquisition mode and high photographing instrument acquisition mode.
The document of the described user of returning to includes the documents and materials that the derivation of formula is relevant to formula;Described return
It is PPT, Word, Latex or PDF to the structured document form of the document of user.
The document of the described user of returning to according to similarity, drawn the frequency and public's scoring carries out the division of priority, root
It is ranked up according to the division of priority and shows on the interface searched for.
Its contain respectively with middle National IP Network, ten-thousand-ton train, dragon citation journals interface, it is provided that under the paying of the document retrieved
Carrying function, in supporting respectively, the reimbursement that has of National IP Network, ten-thousand-ton train and dragon citation journals is downloaded.
The present invention solves the frame retrieval Input of mathematical formulae.When user needs to retrieve formula, inventive algorithm needs
Want to provide multiple typing mode, including structured document typing mode and image acquisition typing mode.Structured document is recorded
Enter mode can with provide current conventional latex structured document mode and word formula editors structured document mode and
Pdf structured document mode, provides the interface of other institutional document modes simultaneously;Image acquisition typing mode supports insertion figure
Sheet mode and screenshot capture mode, can also provide the input modes such as photographic head, scanner or high photographing instrument simultaneously.
The present invention solves online (and off-line) precise search problem of mathematical formulae.User uses certain typing mode to record
Enter need retrieval formula after, click on " search ", inventive algorithm by search the derivation of this formula, this formula be correlated with
Documents and materials, including various structured document forms, the documents and materials of common formats as various in PPT, Word, Latex, PDF etc.,
Also have the website data relevant with this formula also will be retrieved.
The present invention can by the formula result that obtain of retrieval will according to similarity, drawn the frequency, public's scoring etc. and carried out preferentially
The division of level, is ranked up according to the priority of retrieval result and shows on the interface searched for.
Search engine algorithms of the present invention contains and the interface of current each big bibliographic data base, to provide the literary composition retrieved
The paying download function offered, supports that the reimbursement that has of each overall search mechanism is downloaded.
The present invention uses linear discriminant system algorithm and principal component analysis system algorithm that formula picture is carried out Similarity matching,
Use, carry out data output according to similarity degree.The present invention uses in JavaScript language exploitation browser and fills out in list
The formula of confiscating automatically generates Latex code.The present invention uses cosine-algorithm based on space vector for Latex code similarity
Join output data.Independent research distributed full-text search system of the present invention, for structured document full-text searches such as PDF.
Searcher of the present invention runs multiple web crawlers processes, is responsible for crawling the webpage containing mathematically related content in network
Document.Index is responsible for setting up mathematics index.Requestor is responsible for query statement is converted to query task, gives index, complete
Become inquiry, and return result to user.Computer Algebra System sets up for index and query processing is helpful, can be complete
Become necessary evaluation work.
The present invention first web crawlers collects webpage from network, it is judged that whether comprise mathematical formulae content in webpage, as
Fruit have, then download the document, be compressed wait process after store in raw page data storehouse.Then, carry in original web page
Take the mathematical information such as mathematical formulae therein, and it is carried out form conversion.The mathematical formulae of multiple format is converted to Latex
Form.Secondly, index mathematical formulae is set up index.In order to both support based on semantic mathematical formulae inquiry, also support
Based on the mathematical formulae inquiry represented, index establishes respectively towards the Presentation index represented and Semantic-Oriented
Content indexes.There is provided two kinds of indexes, to support two kinds of inquiry modes.
After requestor receives inquiry request, query statement is resolved, find eligible the most again on index
Mathematical material, return Query Result.User is returned to after Query Result is evaluated sequence.
The problem that mathematical symbol index first has to solve is exactly the design problem of mathematical symbol dictionary, needs various numbers
Learn symbol to classify, join in dictionary.Mathematical symbol can be largely classified into: variable, numeral, operative symbol, mathematical function,
Keyword etc..One No. ID is respectively allocated for each mathematical symbol.
The process of setting up of index mainly includes following three steps: a. mathematical formulae textual, by the mathematical formulae of different-format
Be converted to text-string form.B. mathematical formulae is carried out participle, be decomposed into the combined sequence of each mathematical symbol.C. basis
Mathematical symbol dictionary, is converted to the ID of correspondence by mathematical symbol.D. each mathematical symbol is indexed.Participle is by mathematics
Formula Solution parser completes, and mathematical formulae carries out morphology and syntactic analysis, and mathematical formulae is decomposed into number one by one
Learn symbol, simultaneously record mathematical symbol positional information in mathematical formulae.
Mathematical formulae query script mainly includes that the following steps: a. accepts the mathematical formulae of inquiry.B. textual, if
The mathematical formulae of inquiry is not that text formatting is converted to text formatting.C. mathematical formulae is carried out participle, be decomposed into mathematics symbol
Number, and the syntagmatic between record symbol.D. according to mathematical symbol dictionary, mathematical symbol is converted to No. ID.E. in concordance list
Inquire about each mathematical symbol respectively, can be with executed in parallel.F. Query Result is combined computing, obtains final result.
The index of mathematical symbol index and querying flow are essentially identical with text search, simply adds additional logarithm
Learning the special handling of formula, the particularly word segmentation processing of mathematical formulae, by being extended Lucene.
The above embodiment is only the preferred embodiments of the present invention, and and non-invention possible embodiments exhaustive.
For persons skilled in the art, done any aobvious to it on the premise of without departing substantially from the principle of the invention and spirit
And the change being clear to, within all should being contemplated as falling with the claims of the present invention.
Claims (8)
1. a general formula searching method, it is characterised in that comprise the steps:
Set up general formula search engine;Described general formula search engine includes:
Searcher, for roaming, finding and collect mathematical formulae in the Internet;
Index, is used for setting up mathematical formulae index;
Requestor, for query statement is converted to query task, gives index, completes inquiry, and returns result to use
Family;
Set up mathematical symbol dictionary database, and distribute one No. ID, as mathematical symbol dictionary ID to every kind of mathematical symbol;
(3) described searcher runs multiple web crawlers processes;Web crawlers collects webpage from network, it is judged that document in webpage
Whether comprise mathematical formulae;If comprising mathematical formulae, then downloading described document, storing after being compressed described document processing
In raw page data storehouse;
(4) described index extracts the mathematical formulae in described document in described raw page data storehouse;Described index root
According to described mathematical symbol dictionary database, described mathematical formulae is set up mathematical formulae to index;
(5), after described requestor receives inquiry request, described mathematical formulae index and mathematical symbol dictionary database is utilized to enter
Row mathematical formulae is inquired about, and obtains the mathematical formulae inquired;
(6) document comprising, in raw page data storehouse, the mathematical formulae inquired is returned to user and shows by described requestor
On the interface searched for.
A kind of general formula searching method the most according to claim 1, it is characterised in that described mathematical formulae of setting up indexes
Concrete grammar as follows:
A. the mathematical formulae in described document is converted to text-string form mathematical formulae;
B. described text-string form mathematical formulae is carried out participle, described text-string form mathematical formulae is decomposed into
Mathematical symbol, records described mathematical symbol positional information in described text-string form mathematical formulae simultaneously;
C. utilize mathematical symbol dictionary database, described mathematical symbol is converted to and described literary composition corresponding for mathematical symbol dictionary ID
Shelves mathematical symbol ID;
D. according to document mathematical symbol ID, mathematical symbol set up document mathematics notation index;
F. document mathematics notation index table is set up according to setting up document mathematics notation index.
A kind of general formula searching method the most according to claim 2, it is characterised in that: described mathematical formulae query steps
As follows:
The most described requestor receives mathematical formulae to be checked;
B. judge whether described mathematical formulae to be checked is text formatting;If described mathematical formulae to be checked is not text lattice
Formula, carries out text formatting conversion by described mathematical formulae to be checked, is converted to text formatting mathematical formulae to be checked;
C. mathematical formulae to be checked to described text formatting carries out participle, and is decomposed into be checked by mathematical formulae to be checked for this form
Ask mathematical symbol, record the described mathematical symbol to be checked positional information in mathematical formulae to be checked simultaneously;
D. mathematical symbol to be checked is converted to corresponding with described mathematical symbol dictionary ID according to mathematical symbol dictionary database
Mathematical symbol ID to be checked;
E. in concordance list, inquire about document mathematical symbol ID, obtain and described Query Result consistent for mathematical symbol ID to be checked;
F., Query Result utilizes KNN algorithm carry out computing, obtains the mathematical formulae inquired;
G. the document comprising the mathematical formulae content inquired in raw page data storehouse is returned to user.
A kind of general formula searching method the most according to claim 1, it is characterised in that: described mathematical formulae index includes
Towards the Presentation index represented and the Content index of Semantic-Oriented.
A kind of general formula searching method the most according to claim 1, it is characterised in that: described requestor receives to be checked
The typing mode of mathematical formulae includes structured document typing mode and image acquisition typing mode;Described structured document typing
Mode is latex structured document typing mode, word formula editors structured document typing mode and pdf structured document
Typing mode;Described image acquisition typing mode includes insert pictures mode, screenshot capture mode, camera collection mode, sweeps
Retouch instrument acquisition mode and high photographing instrument acquisition mode.
A kind of general formula searching method the most according to claim 1, it is characterised in that return to the document of user described in:
Including the documents and materials that the derivation of formula is relevant to formula;The structured document form of the described document returning to user
For PPT, Word, Latex or PDF.
A kind of general formula searching method the most according to claim 1, it is characterised in that return to the document of user described in:
According to similarity, drawn the frequency and public's scoring carries out the division of priority, be ranked up according to the division of priority and show
On the interface searched for.
A kind of general formula searching method the most according to claim 1, it is characterised in that: it contains knows with China respectively
Net, ten-thousand-ton train, dragon citation journals interface, it is provided that the paying download function of the document retrieved, respectively support in National IP Network, ten thousand
The reimbursement that has of number formulary evidence and dragon citation journals is downloaded.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610171766.5A CN105868177A (en) | 2016-03-24 | 2016-03-24 | Universal formula search method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610171766.5A CN105868177A (en) | 2016-03-24 | 2016-03-24 | Universal formula search method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105868177A true CN105868177A (en) | 2016-08-17 |
Family
ID=56625332
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610171766.5A Pending CN105868177A (en) | 2016-03-24 | 2016-03-24 | Universal formula search method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105868177A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107153640A (en) * | 2017-05-08 | 2017-09-12 | 成都准星云学科技有限公司 | A kind of segmenting method towards elementary mathematics field |
CN107463553A (en) * | 2017-09-12 | 2017-12-12 | 复旦大学 | For the text semantic extraction, expression and modeling method and system of elementary mathematics topic |
CN107885870A (en) * | 2017-11-24 | 2018-04-06 | 北京神州泰岳软件股份有限公司 | A kind of service profile formulas Extraction method and device |
CN108133168A (en) * | 2016-12-01 | 2018-06-08 | 北京新唐思创教育科技有限公司 | Formula searching method and device in text recognition |
CN108304383A (en) * | 2018-01-29 | 2018-07-20 | 北京神州泰岳软件股份有限公司 | The formula info extracting method and device of service profile |
CN108319724A (en) * | 2018-02-28 | 2018-07-24 | 北京仁和汇智信息技术有限公司 | A kind of Homepage Publishing method and device with formula file |
CN108399156A (en) * | 2018-02-28 | 2018-08-14 | 北京仁和汇智信息技术有限公司 | The composition method and device of formula in a kind of pdf document |
CN110888993A (en) * | 2018-08-20 | 2020-03-17 | 珠海金山办公软件有限公司 | Composite document retrieval method and device and electronic equipment |
CN111078724A (en) * | 2019-12-11 | 2020-04-28 | 中国建设银行股份有限公司 | Method, device and equipment for searching test questions in learning system and storage medium |
CN111597393A (en) * | 2020-04-14 | 2020-08-28 | 北京金山云网络技术有限公司 | Theorem search method, device, equipment and storage medium |
CN112613279A (en) * | 2020-12-24 | 2021-04-06 | 北京乐学帮网络技术有限公司 | File conversion method and device, computer device and readable storage medium |
CN116108326A (en) * | 2023-04-12 | 2023-05-12 | 山东工程职业技术大学 | Mathematic tool software control method, device, equipment and storage medium |
CN116483943A (en) * | 2023-06-21 | 2023-07-25 | 山东网安安全技术有限公司 | Full text retrieval method and full text retrieval system |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101110077A (en) * | 2007-08-24 | 2008-01-23 | 新诺亚舟科技(深圳)有限公司 | Method for implementing associated searching on handhold learning terminal |
-
2016
- 2016-03-24 CN CN201610171766.5A patent/CN105868177A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101110077A (en) * | 2007-08-24 | 2008-01-23 | 新诺亚舟科技(深圳)有限公司 | Method for implementing associated searching on handhold learning terminal |
Non-Patent Citations (4)
Title |
---|
MICHAEL KOHLHASE 等: "A search engine for mathematical formulate", 《ARTIFICIAL INTELLIGENCE AND SYMBOL COMPUTATION-8TH INTERNATIONAL CONFERENCE》 * |
刘志伟: "数学索引引擎研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
崔林卫 等: "基于Nutch的Web数学公式提取", 《广西师范大学学报 自然科学版》 * |
闫慧丽: "基于Lucene框架的Latex数学公式", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108133168A (en) * | 2016-12-01 | 2018-06-08 | 北京新唐思创教育科技有限公司 | Formula searching method and device in text recognition |
CN108133168B (en) * | 2016-12-01 | 2021-04-30 | 北京新唐思创教育科技有限公司 | Formula searching method and device in text recognition |
CN107153640A (en) * | 2017-05-08 | 2017-09-12 | 成都准星云学科技有限公司 | A kind of segmenting method towards elementary mathematics field |
CN107463553B (en) * | 2017-09-12 | 2021-03-30 | 复旦大学 | Text semantic extraction, representation and modeling method and system for elementary mathematic problems |
CN107463553A (en) * | 2017-09-12 | 2017-12-12 | 复旦大学 | For the text semantic extraction, expression and modeling method and system of elementary mathematics topic |
CN107885870A (en) * | 2017-11-24 | 2018-04-06 | 北京神州泰岳软件股份有限公司 | A kind of service profile formulas Extraction method and device |
CN108304383A (en) * | 2018-01-29 | 2018-07-20 | 北京神州泰岳软件股份有限公司 | The formula info extracting method and device of service profile |
CN108304383B (en) * | 2018-01-29 | 2019-06-25 | 北京神州泰岳软件股份有限公司 | The formula info extracting method and device of service profile |
CN108319724A (en) * | 2018-02-28 | 2018-07-24 | 北京仁和汇智信息技术有限公司 | A kind of Homepage Publishing method and device with formula file |
CN108399156A (en) * | 2018-02-28 | 2018-08-14 | 北京仁和汇智信息技术有限公司 | The composition method and device of formula in a kind of pdf document |
CN110888993A (en) * | 2018-08-20 | 2020-03-17 | 珠海金山办公软件有限公司 | Composite document retrieval method and device and electronic equipment |
CN111078724A (en) * | 2019-12-11 | 2020-04-28 | 中国建设银行股份有限公司 | Method, device and equipment for searching test questions in learning system and storage medium |
CN111597393A (en) * | 2020-04-14 | 2020-08-28 | 北京金山云网络技术有限公司 | Theorem search method, device, equipment and storage medium |
CN112613279A (en) * | 2020-12-24 | 2021-04-06 | 北京乐学帮网络技术有限公司 | File conversion method and device, computer device and readable storage medium |
CN116108326A (en) * | 2023-04-12 | 2023-05-12 | 山东工程职业技术大学 | Mathematic tool software control method, device, equipment and storage medium |
CN116483943A (en) * | 2023-06-21 | 2023-07-25 | 山东网安安全技术有限公司 | Full text retrieval method and full text retrieval system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105868177A (en) | Universal formula search method | |
CN110399457B (en) | Intelligent question answering method and system | |
Tuarob et al. | Automatic tag recommendation for metadata annotation using probabilistic topic modeling | |
CN108280114B (en) | Deep learning-based user literature reading interest analysis method | |
Hienert et al. | Digital library research in action–supporting information retrieval in sowiport | |
US9323834B2 (en) | Semantic and contextual searching of knowledge repositories | |
CN106777043A (en) | A kind of academic resources acquisition methods based on LDA | |
CN113190687B (en) | Knowledge graph determining method and device, computer equipment and storage medium | |
CN115757689A (en) | Information query system, method and equipment | |
CN111061828B (en) | Digital library knowledge retrieval method and device | |
CN111190920B (en) | Data interaction query method and system based on natural language | |
US10810181B2 (en) | Refining structured data indexes | |
Saini et al. | Review on web content mining techniques | |
CN112015907A (en) | Method and device for quickly constructing discipline knowledge graph and storage medium | |
Spitz et al. | EVELIN: Exploration of event and entity links in implicit networks | |
Wu et al. | Searching online book documents and analyzing book citations | |
RU2473119C1 (en) | Method and system for semantic search of electronic documents | |
CN114117242A (en) | Data query method and device, computer equipment and storage medium | |
Aparna et al. | ANNOTATING SEARCH RESULTS FROM WEB DATABASE USING IN-TEXT PREFIX/SUFFIX ANNOTATOR | |
KR101476225B1 (en) | Method for Indexing Natural Language And Mathematical Formula, Apparatus And Computer-Readable Recording Medium with Program Therefor | |
Priyadarshini et al. | Semantic retrieval of relevant sources for large scale virtual documents | |
Nghiem et al. | Which one is better: presentation-based or content-based math search? | |
Chala et al. | A Framework for Enriching Job Vacancies and Job Descriptions Through Bidirectional Matching. | |
CN116523041A (en) | Knowledge graph construction method, retrieval method and system for equipment field and electronic equipment | |
Blaz̆ek et al. | Video hunter at VBS 2017 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20160817 |