Tuesday, March 20, 2012

cidump documentation?

I'm trying to use the new cidump tool of SQL Server 2005 Beta 2 to get a
words list from a full text catalog and I can't find good documentation.
I've looked everywhere. Have you see it? Can the SQL documentation team be
so kind and provide us with something here about this new feature?
Here is how I dump the catalog I created in C:\Program Files\Microsoft SQL
Server\MSSQL.1\MSSQL\FTData\XIN
C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn>cidump -dump
"C:\Program Files\Microsoft SQL
Server"\MSSQL.1\MSSQL\FTData\XIN\MssearchCatalogDi r -dir k
Note how I use the subdirectory called MSSearchCatalogDir.
For more documentation did you check this doc?
Use cidump.exe for:
- Dumping the content of the catalog or a specified index
cidump -dump <catalog_path> [options]
- Checking the integrity of the catalog or a specified index
cidump -check <catalog_path> [options]
- Computing statistics on the content of the catalog or a specified index
cidump -stats <catalog_path> [options]
=== Common options:
================================================== =======
-i <IndexId> - Apply the operation on this index.
In this option is not used, all the indexes are
processed.
-e <IndexId> - Exclude this index.
Use it multiple times for excluding more than one index.
-u <filepath> - Dump to a UNICODE file.
-not_read_only - Open CiStorage not read only
For when the index table streams are not in sync
=== Specific options to be used with -dump:
=================================
Select what to dump:
-x [<format>] - Dump the index.
<format> can be: k - keys only (default)
kw - keys and wids (document IDs)
kwo - dump keys, wids, and occurences
kwc - keys and WidCounts per key
sbr - the SortByRank index for each key
-dir - Dump the index directory.
-widset [<format>] - Dump the widset files.
<format> can be:
wid - iterate the wids in the widset (default)
wc - widcounts only
hdr - widset header only
To determine in which index a given wid is fresh use:
cidump -dump <path> -widset -w <wid>
Select the keys to dump:
-k <startKey> [<endKey>] - Dump only the the keys in an given range.
-kn <keyCount> - Stop after displaying a number of keys.
-w <wid> - Dump only the keys that contain the given wid.
-p <PropID> - Dump only keys with the given PropID.
Select the wids to dump:
-wr <startWid> [<endWid>] - Dump only the wids in a given range.
-wn <widCount> - Stop after displaying a number of wids per each key.
Select the format of the data:
-d - Display all numbers in decimal.
(default is hex for IDs and offsets decimal for counts and sizes).
-h - Display all numbers in hexadecimal.
-b - Display the internal representaion of the keys (bytes in hex).
=== Notes:
* IndexIds, wids and PropIds can be entered as hex numbers preceeded by
"0x".
* The keys can be entered as strings or as a sequence of bytes in hex,
between
quotes and parantheses: abc is the same as "(00 00 61 00 62 00 63)"
Run cidump -?key to get help on the input format for keys.
=== Samples:
Check the integrity of all the indexes:
cidump -check c:\catalog
Display global statistics for the index 0001002A:
cidump -stats c:\catalog -g -i 0x1002A
Dump from all the indexes the keys in the range aaa - bbb.
Use the format that also shows the widcounts (number of docs with that key):
cidump -dump c:\catalog -x kwc -k aaa bbb
Dump the keys that contain the the wid 9001:
cidump -dump c:\catalog -x -w 9001
Dump the first 10 wids from the sorted by rank index for the key "tokenone":
cidump -dump c:\catalog -x sbr -k tokenone -kn 1 -wn 10
Dump the directory
cidump -dump s:\encpath\encarta -dir kbo
===
Run cidump -? to display advanced options.
Advanced options:
=== Specific options for dump:
==============================================
-x [<format>] - Dump the index
<format> can be also:
ks - statistics per key
kw+ - keys and wids and wid metadata (all but
occurrences)
kp - list of keys with position in index
kwp - list of keys and wids with position in index
kph - dump phrases that contain a given key.
Requires -kph and -pch options.
ph - dump all phrases in more than a given nr of docs
Requires -phc option.
-dir [<format>] - Dump the index directory
<format> can be:
kbo - keys and BitOffset in index (default)
kp - list of keys with position in the directory file
-phk <key> - Used only with "-x kph" dump format.
Dump phrases that contain the given <key>. Use also -phc.
-phc <minWidCount> - Used only with "-x ph" and "-x kph" dump formats.
Dump all the 2-3 word phrases that occur in at least <minOccCount>
documents in the indexed corpus.
-kwc <minWidCount> <maxWidCount> - Display only keys in a widcount range.
-alr - Display the allocated ranges for the master index.
-fbs - Force binary search when dumping widsets.
-rec <index> <type> <maxWid> <R/W> - Dump a standalone index.
Don't use the Index Table.
<index> - the index id (e.g.) 0x1001C
<type> - 0 - master index; 1 - shadow index.
<maxWid> - the maximum workid in the index.
<R/W> - 0 - complete index; 1 - incomplete index (write mode)
=== Specific options for check: ============================================
-k <start_key> [<end_key>] - Process only the area of the index
for the given key range.
=== Specific options for statistics: =======================================
Computing default statistics require a full scan of the index file.
Additional options:
-o - display default statistics and also occurence distribution.
-wc - display widcount distribution (iterates through keys only).
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"cybergoo" <cybergoo@.newsgroup.nospam> wrote in message
news:%23448WlToFHA.1948@.TK2MSFTNGP12.phx.gbl...
> I'm trying to use the new cidump tool of SQL Server 2005 Beta 2 to get a
> words list from a full text catalog and I can't find good documentation.
> I've looked everywhere. Have you see it? Can the SQL documentation team be
> so kind and provide us with something here about this new feature?
>
|||Cybergoo,
The SQL Server 2005 Beta2 as well as Bet3 Books Online (BOL) have yet to be
updated on this most useful SQL FTS utiltity. In the meantime, you should
use the cidump /? to get the syntax as well as example of use:
-- on my Win2003 server...
f:
cd:\MSSQL90\MSSQL.1\MSSQL\Binn
cidump /?
-- edited output:
Use cidump.exe for:
- Dumping the content of the catalog or a specified index
cidump -dump <catalog_path> [options]
- Checking the integrity of the catalog or a specified index
cidump -check <catalog_path> [options]
- Computing statistics on the content of the catalog or a specified index
cidump -stats <catalog_path> [options]
...
Display global statistics for the index 0001002A:
cidump -stats c:\catalog -g -i 0x1002A
cidump /?
Advanced options:
-x [<format>] - Dump the index
<format> can be also:
ks - statistics per key
kw+ - keys and wids and wid metadata (all but
occurrences)
kp - list of keys with position in index
kwp - list of keys and wids with position in index
kph - dump phrases that contain a given key.
Requires -kph and -pch options.
ph - dump all phrases in more than a given nr of docs
Requires -phc option.
-dir [<format>] - Dump the index directory
<format> can be:
kbo - keys and BitOffset in index (default)
kp - list of keys with position in the directory file
-phk <key> - Used only with "-x kph" dump format.
Dump phrases that contain the given <key>. Use also -phc.
-phc <minWidCount> - Used only with "-x ph" and "-x kph" dump formats.
Dump all the 2-3 word phrases that occur in at least <minOccCount>
documents in the indexed corpus.
-kwc <minWidCount> <maxWidCount> - Display only keys in a widcount range.
-alr - Display the allocated ranges for the master index.
-fbs - Force binary search when dumping widsets.
-rec <index> <type> <maxWid> <R/W> - Dump a standalone index.
Don't use the Index Table.
<index> - the index id (e.g.) 0x1001C
<type> - 0 - master index; 1 - shadow index.
<maxWid> - the maximum workid in the index.
<R/W> - 0 - complete index; 1 - incomplete index (write mode)
Enjoy!
John
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/
"cybergoo" <cybergoo@.newsgroup.nospam> wrote in message
news:%23448WlToFHA.1948@.TK2MSFTNGP12.phx.gbl...
> I'm trying to use the new cidump tool of SQL Server 2005 Beta 2 to get a
> words list from a full text catalog and I can't find good documentation.
> I've looked everywhere. Have you see it? Can the SQL documentation team be
> so kind and provide us with something here about this new feature?
>
|||Thanks, I'm aware of the help text of the /? switch. A better
documentation of the dump text is needed to parse it. For example, in
some cases there's a dot (.) at the beginning of an index entry line.
The help text is mute regarding what it means and it focuses on the
different switches, not the syntax of the dump.
|||You're welcome, Cybergoo,
Yep, it is... Actually the SQL Server 2005 (June CTP / IDW15 / Beta3)
version is *mute* on a lot of FTS-related topics, IMHO.
Hopefully, the next CTP version will be more *verbose*, as the FTS-related
topics leave a lot to be desired... Look for entries at my blog on this
topic in the near future!
Thanks,
John
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/
<cybergoo@.gmail.com> wrote in message
news:1125101091.602442.196550@.g49g2000cwa.googlegr oups.com...
> Thanks, I'm aware of the help text of the /? switch. A better
> documentation of the dump text is needed to parse it. For example, in
> some cases there's a dot (.) at the beginning of an index entry line.
> The help text is mute regarding what it means and it focuses on the
> different switches, not the syntax of the dump.
>

No comments:

Post a Comment