Latest Automatic document indexing in open office Project Report, Requirement Analysis 2011
Existing system: The present algorithms in indexing technique in open office makes the writer to index particular words manually. This makes the user to access the particular data which he wants. Manual indexing becomes a tedious job when the document goes very large.
Proposed system: Automatic document indexing in open office reduces the manual effort of the user for indexing the document. First step in this approach is to keep aside the words which are not going to be indexed, the words like articles and prepositions. Next is comparing each word with all the excluded words, if the word matches one of the excluded words next is taken. If the word does not match with the excluded words it is considered as the indexed word.
Here we are going to use two types of
methods:
- Indexing the words in sequence.
- Indexing the words for the whole document once.
n the first method a word is taken compared with the excluded words if it matches the next word is compared with the excluded words. If it does not match it is indexed and the page number is noted. The next word is taken and compared with the excluded words the same process goes on the whole document.In the second method a word is taken compared with the excluded words if it matches the next word is compared with the excluded words. If it does not match it is indexed and the page number is noted. Then the same word compared with other words if it matches the page number of the next occurance is noted. The same procedure is followed for the whole document.
Environment going to be used:
It is open office writer and thus it supports only the
odt format. We are going to develop the project in
odf format and further it can be converted to
docx also.
Users:
Technical users(programmers).
Non technical users(author of a book, author of a document)