The following are entry points into the Tibetan eText Repository.
- Browse eTexts
- Global Search eTexts
- Advanced Search eTexts
- Intra-Work search
For an overview, you can browse the latest list of eTexts. There are three main browsing entry points.
- Faceted Browse – this list shows you all the eTexts in the repository, sorted by a faceted list.
- Namsel OCR eTexts – this list shows you the all the OCR eTexts in the repository.
- Input eTexts – this list shows you all the Input eTexts in the repository.
The default global search includes the eText Repository. You can enter the global search via the search box in the banner or on the home page. The banner search box is a global search on all collections. On the home page you can select the collections for more targeted querying.
Advanced Search (eTexts)
Advance Search (eTexts) is an entry point specifically built to search the Tibetan eText Repository,
In this entry point, you can type in a search term into the text box and initiate a search. There several notable features.
Under Collections you can see there are four columns. Each column represents a virtual collection to search on.
- Organizations – This is list of all the organizations we currently have eTexts for. Under the organization list, is a list of works by that organization.
- Works – This is a list of all the works in the repository sorted alphabetically by title.
- Topics – This is a list of all topics which are used to classify eTexts.
- Authors – This is a list of all authors which had some role in creating the eTexts.
Selecting under organization, allows you to search the entire collection of OCR texts.
You can also select the work under the organization, if you want to limit your search to that work.
You can also search across virtual collections such as genres of literature by selecting the topics list.
When initiate a search you get back a list of results. Each result item contains a of highlighted text match containing the string you searched for.
There is a list of items beneath the match highlight that references the bibliographic context of the match.
The following types of information are available:
- Link to exact image where the match occurs
- Link to the work record
- An expand context button that allows you to see the preceding and following pages where the match occurs.
- The Work:Volume:Page:Pubinfo line.
In the case of Input eTexts, you will see one of two phrases
- "Scans exist for this input eText but the scan-to-page correspondence is unknown." – this means that there are scans for the input eText but the exact page correspondence to the scan is not know.
- "No scans associated with this input eText" – this means that the eText does not have corresponding scans.
You can export the results of every query in the Advanced Search (eTexts) interface using the Export Query button. This is an XML file of the results that you can save offline and use to keep track of your searches. The XML file outlines the work, volume, page, text and whether or not there is an exact page match or not (see Text-Scan Page Correspondence below).
You can always generate the link again to get the latest results:
You can always search on a single work by going to the work record and look for the Search in eTexts tab.
Types of eTexts
The are two types of eTexts.
- OCR eTexts
- Input eTexts
OCR eTexts are generated from the Namsel OCR program with the University of California, Berkeley. The list is available here: http://tbrc.org/#!etexts/ocr
Input eTexts are input from Tibetan authors, publishers and monasteries in a variety of formats, including TibetDoc and Sambhota. These files are converted into Unicode Tibetan TEI-XML for use in the Tibetan eText Repository.
Text-Scan Page Correspondence
The text-scan page correspondence is a key facet in the eText Repository and refers to whether or not the eText has a link to the exact scanned page. More specifically, page correspondence denotes whether or not there is semantic information (expressed as XML markup) in the eText that designates the page of the corresponding scanned source. In the case of OCR eTexts there is always an exact text-scan page correspondence. In the case of input eTexts, however, there is no text-scan page correspondence. Why? Because the page information is lost in the conversion of those documents into Tibetan Unicode TEI-XML