Text File
README.txt

Catalogs

Catalogs are simple tools used to suppprt searching. A catalog manages a collection of indexes, and aranges for objects to indexed with it's contained indexes.

TODO: Filters
Catalogs should provide the option to filter the objects the catalog. This would facilitate the use of separate catalogs for separate purposes. It should be possible to specify a a collection of types (interfaces) to be cataloged and a filtering expression. Perhaps another option would be to be the ability to spcify a names filter adapter.

Catalogs use a unique-id tool to assign short (integer) ids to objects. Before creating a catalog, you must create a intid tool:

>>> print http(r"""
... POST /++etc++site/default/@@+/action.html HTTP/1.1
... Authorization: Basic bWdyOm1ncnB3
... Content-Length: 78
... Content-Type: application/x-www-form-urlencoded
... Referer: http://localhost:8081/++etc++site/default/@@+
...
... type_name=BrowserAdd__zope.app.intid.IntIds&id=&add=+Add+""",
... handle_errors=False)
HTTP/1.1 303 ...

And register it:

>>> print http(r"""
... POST /++etc++site/default/IntIds/addRegistration.html HTTP/1.1
... Authorization: Basic mgr:mgrpw
... Referer: http://localhost:8081/++etc++site/default/IntIds/
... Content-Type: multipart/form-data; boundary=----------CedQTrEQIEPbgfYhvcITAhQi2aJdgu3tYfJ0WYQmkpLQTt6OTOpd5GJ
...
... ------------CedQTrEQIEPbgfYhvcITAhQi2aJdgu3tYfJ0WYQmkpLQTt6OTOpd5GJ
... Content-Disposition: form-data; name="field.name"
...
...
... ------------CedQTrEQIEPbgfYhvcITAhQi2aJdgu3tYfJ0WYQmkpLQTt6OTOpd5GJ
... Content-Disposition: form-data; name="field.provided"
...
... zope.app.intid.interfaces.IIntIds
... ------------CedQTrEQIEPbgfYhvcITAhQi2aJdgu3tYfJ0WYQmkpLQTt6OTOpd5GJ
... Content-Disposition: form-data; name="field.provided-empty-marker"
...
... 1
... ------------CedQTrEQIEPbgfYhvcITAhQi2aJdgu3tYfJ0WYQmkpLQTt6OTOpd5GJ
... Content-Disposition: form-data; name="field.comment"
...
...
... ------------CedQTrEQIEPbgfYhvcITAhQi2aJdgu3tYfJ0WYQmkpLQTt6OTOpd5GJ
... Content-Disposition: form-data; name="field.actions.register"
...
... Register
... ------------CedQTrEQIEPbgfYhvcITAhQi2aJdgu3tYfJ0WYQmkpLQTt6OTOpd5GJ--
... """, handle_errors=False)
HTTP/1.1 303 ...
...

Moving short-id management outside of catalogs make it possible to join searches accross multiple catalogs and indexing tools (e.g. relationship indexes).

TODO: Filters?
Maybe unique-id tools should be filtered as well, however, this would limit the value of unique id tools for providing cross-catalog/cross-index merging. At least the domain for a unique id tool would be broader than the domain of a catalog. The value of filtering in the unique id tool is that it limits the amount of work that needs to be done by catalogs. One obvious aplication is to provide separate domains for ordinary and meta content. If we did this, then we'd need to be able to select, and, perhaps, alter, the unique-id tool used by a catalog.

Once we have a unique-id tool, you can add a catalog:

>>> print http(r"""
... POST /++etc++site/default/@@+/action.html HTTP/1.1
... Authorization: Basic bWdyOm1ncnB3
... Content-Length: 77
... Content-Type: application/x-www-form-urlencoded
... Referer: http://localhost:8081/++etc++site/default/@@+
...
... type_name=BrowserAdd__zope.app.catalog.catalog.Catalog&id=&add=+Add+""")
HTTP/1.1 303 ...

and register it:

>>> print http(r"""
... POST /++etc++site/default/Catalog/addRegistration.html HTTP/1.1
... Authorization: Basic bWdyOm1ncnB3
... Referer: http://localhost:8081/++etc++site/default/Catalog/
... Content-Type: multipart/form-data; boundary=----------61t9UJyoacebBevQVdNrlvXP6T9Ik3Xo4RyXkwJJWvuhao65RTuAPRa
...
... ------------61t9UJyoacebBevQVdNrlvXP6T9Ik3Xo4RyXkwJJWvuhao65RTuAPRa
... Content-Disposition: form-data; name="field.name"
...
...
... ------------61t9UJyoacebBevQVdNrlvXP6T9Ik3Xo4RyXkwJJWvuhao65RTuAPRa
... Content-Disposition: form-data; name="field.provided"
...
... zope.app.catalog.interfaces.ICatalog
... ------------61t9UJyoacebBevQVdNrlvXP6T9Ik3Xo4RyXkwJJWvuhao65RTuAPRa
... Content-Disposition: form-data; name="field.provided-empty-marker"
...
... 1
... ------------61t9UJyoacebBevQVdNrlvXP6T9Ik3Xo4RyXkwJJWvuhao65RTuAPRa
... Content-Disposition: form-data; name="field.comment"
...
...
... ------------61t9UJyoacebBevQVdNrlvXP6T9Ik3Xo4RyXkwJJWvuhao65RTuAPRa
... Content-Disposition: form-data; name="field.actions.register"
...
... Register
... ------------61t9UJyoacebBevQVdNrlvXP6T9Ik3Xo4RyXkwJJWvuhao65RTuAPRa--
... """)
HTTP/1.1 303 ...

Once we have a catalog, we can add indexes to it. Before we add an index, let's add a templated page. When adding indexes, existing objects are indexed, so the document we add now will appear in the index:

>>> print http(r"""
... POST /+/zope.app.zptpage.ZPTPage%3D HTTP/1.1
... Authorization: Basic bWdyOm1ncnB3
... Content-Length: 780
... Content-Type: multipart/form-data; boundary=---------------------------1425445234777458421417366789
... Referer: http://localhost:8081/+/zope.app.zptpage.ZPTPage=
...
... -----------------------------1425445234777458421417366789
... Content-Disposition: form-data; name="field.source"
...
... <html>
... <body>
... Now is the time, for all good dudes to come to the aid of their country.
... </body>
... </html>
... -----------------------------1425445234777458421417366789
... Content-Disposition: form-data; name="field.expand.used"
...
...
... -----------------------------1425445234777458421417366789
... Content-Disposition: form-data; name="field.evaluateInlineCode.used"
...
...
... -----------------------------1425445234777458421417366789
... Content-Disposition: form-data; name="UPDATE_SUBMIT"
...
... Add
... -----------------------------1425445234777458421417366789
... Content-Disposition: form-data; name="add_input_name"
...
... dudes
... -----------------------------1425445234777458421417366789--
... """)
HTTP/1.1 303 ...

Perhaps the most common type of index to be added is a text index. Most indexes require the specification of an interface, an attribute, and an indication of whether the attribute must be called.

TODO: Simplify the UI for selecting interfaces and attributes

There are a number of ways the UI for this could be made more user friendly:

  • If the user selects an interface, we could then provide a select list of possible attributes and we could determine the callability. Perhaps selection of an interface should be required.
  • An index should have a way to specify default values. In particular, text indexes usially use ISearchableText and searchableText.

For text indexes, one generally uses zope.index.text.interfaces.ISearchableText, getSearchableText and True.

>>> print http(r"""
... POST /++etc++site/default/Catalog/+/AddTextIndex%3D HTTP/1.1
... Authorization: Basic bWdyOm1ncnB3
... Content-Length: 1008
... Content-Type: multipart/form-data; boundary=---------------------------12609588153518590761493918424
... Referer: http://localhost:8081/++etc++site/default/Catalog/+/AddTextIndex=
...
... -----------------------------12609588153518590761493918424
... Content-Disposition: form-data; name="field.interface"
...
... zope.index.text.interfaces.ISearchableText
... -----------------------------12609588153518590761493918424
... Content-Disposition: form-data; name="field.interface-empty-marker"
...
... 1
... -----------------------------12609588153518590761493918424
... Content-Disposition: form-data; name="field.field_name"
...
... getSearchableText
... -----------------------------12609588153518590761493918424
... Content-Disposition: form-data; name="field.field_callable.used"
...
...
... -----------------------------12609588153518590761493918424
... Content-Disposition: form-data; name="field.field_callable"
...
... on
... -----------------------------12609588153518590761493918424
... Content-Disposition: form-data; name="UPDATE_SUBMIT"
...
... Add
... -----------------------------12609588153518590761493918424
... Content-Disposition: form-data; name="add_input_name"
...
...
... -----------------------------12609588153518590761493918424--
... """, handle_errors=False)
HTTP/1.1 303 ...

We can visit the advanced tab of the catalog to get some index statistics. Doing so, we see that we have a single document and that the total word count is 8. The word count is only 8 because ssome stop words have been eliminated.

>>> print http(r"""
... GET /++etc++site/default/Catalog/@@advanced.html HTTP/1.1
... Authorization: Basic bWdyOm1ncnB3
... Referer: http://localhost:8081/++etc++site/default/Catalog/@@contents.html
... """)
HTTP/1.1 200 ...
...
<table class="listing" summary="Indexes">
   <tr><th>Index</th>
       <th>Document Count</th>
       <th>Word Count</th>
   </tr>
   <tr>
       <td>TextIndex</td>
       <td>1</td>
       <td>8</td>
    </tr>
</table>
...

Now lets add some more pages:

>>> print http(r"""
... POST /+/zope.app.zptpage.ZPTPage%3D HTTP/1.1
... Authorization: Basic bWdyOm1ncnB3
... Content-Length: 754
... Content-Type: multipart/form-data; boundary=---------------------------1213614620286666602740364725
... Referer: http://localhost:8081/+/zope.app.zptpage.ZPTPage=
...
... -----------------------------1213614620286666602740364725
... Content-Disposition: form-data; name="field.source"
...
... <html>
... <body>
... Dudes, we really need to switch to Zope 3 now.
... </body>
... </html>
... -----------------------------1213614620286666602740364725
... Content-Disposition: form-data; name="field.expand.used"
...
...
... -----------------------------1213614620286666602740364725
... Content-Disposition: form-data; name="field.evaluateInlineCode.used"
...
...
... -----------------------------1213614620286666602740364725
... Content-Disposition: form-data; name="UPDATE_SUBMIT"
...
... Add
... -----------------------------1213614620286666602740364725
... Content-Disposition: form-data; name="add_input_name"
...
... zope3
... -----------------------------1213614620286666602740364725--
... """)
HTTP/1.1 303 ...
>>> print http(r"""
... POST /+/zope.app.zptpage.ZPTPage%3D HTTP/1.1
... Authorization: Basic bWdyOm1ncnB3
... Content-Length: 838
... Content-Type: multipart/form-data; boundary=---------------------------491825988706308579952614349
... Referer: http://localhost:8081/+/zope.app.zptpage.ZPTPage=
...
... -----------------------------491825988706308579952614349
... Content-Disposition: form-data; name="field.source"
...
... <html>
... <body>
... <p>Writing tests as doctests makes them much more understandable.</p>
... <p>Python 2.4 has major enhancements to the doctest module.</p>
... </body>
... </html>
... -----------------------------491825988706308579952614349
... Content-Disposition: form-data; name="field.expand.used"
...
...
... -----------------------------491825988706308579952614349
... Content-Disposition: form-data; name="field.evaluateInlineCode.used"
...
...
... -----------------------------491825988706308579952614349
... Content-Disposition: form-data; name="UPDATE_SUBMIT"
...
... Add
... -----------------------------491825988706308579952614349
... Content-Disposition: form-data; name="add_input_name"
...
... doctest
... -----------------------------491825988706308579952614349--
... """)
HTTP/1.1 303 ...

Now, if we visit the catalog advanced tab, we can see that the 3 documents have been indexed and that the word count has increased to 30:

>>> print http(r"""
... GET /++etc++site/default/Catalog/@@advanced.html HTTP/1.1
... Authorization: Basic bWdyOm1ncnB3
... Referer: http://localhost:8081/++etc++site/default/Catalog/@@contents.html
... """)
HTTP/1.1 200 ...
...
<table class="listing" summary="Indexes">
   <tr><th>Index</th>
       <th>Document Count</th>
       <th>Word Count</th>
   </tr>
   <tr>
       <td>TextIndex</td>
       <td>3</td>
       <td>30</td>
    </tr>
</table>
...

Now that we have a catalog with some documents indexed, we can search it. The catalog is really meant to be used from Python:

>>> root = getRootFolder()

We'll make our root folder the site (this would normally be done by the publisher):

>>> from zope.app.component.hooks import setSite
>>> setSite(root)

Now, we'll get the catalog:

>>> from zope.app import zapi
>>> from zope.app.catalog.interfaces import ICatalog
>>> catalog = zapi.getUtility(ICatalog)

And search it to find the names of all of the documents that contain the word 'now':

>>> results = catalog.searchResults(TextIndex='now')
>>> [result.__name__ for result in results]
[u'dudes', u'zope3']
TODO
This stuff needs a lot of work. The indexing interfaces, despite being rather elaborate are still a bit too simple. There really should be more provision for combining result. In particular, catalog should have a search interface that returns ranked docids, rather than documents.

You don't have to use the search algorithm build into the catalog. You can implement your own search algorithms and use them with a catalog's indexes.