Showing posts with label catalog. Show all posts
Showing posts with label catalog. Show all posts

Tuesday, March 20, 2012

cidump tool documentation?

I'm trying to use the cidump tool of SQL Server 2005 Beta 2 to dump a words
list from a full text catalog and I can't find good documentation - there ar
e
few people online that claim that this is possible. I've look everywhere but
there's little to no info about this feature. Can the SQL documentation team
be so kind and provide us with something to work with here?Here is how I dump the catalog I created in C:\Program Files\Microsoft SQL
Server\MSSQL.1\MSSQL\FTData\XIN
C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn>cidump -dump
"C:\Program Files\Microsoft SQL
Server"\MSSQL.1\MSSQL\FTData\XIN\MssearchCatalogDir -dir k
Note how I use the subdirectory called MSSearchCatalogDir.
For more documentation did you check this doc?
Use cidump.exe for:
- Dumping the content of the catalog or a specified index
cidump -dump <catalog_path> [options]
- Checking the integrity of the catalog or a specified index
cidump -check <catalog_path> [options]
- Computing statistics on the content of the catalog or a specified index
cidump -stats <catalog_path> [options]
=== Common options:
========================================
=================
-i <IndexId> - Apply the operation on this index.
In this option is not used, all the indexes are
processed.
-e <IndexId> - Exclude this index.
Use it multiple times for excluding more than one index.
-u <filepath> - Dump to a UNICODE file.
-not_read_only - Open CiStorage not read only
For when the index table streams are not in sync
=== Specific options to be used with -dump:
=================================
Select what to dump:
-x [<format>] - Dump the index.
<format> can be: k - keys only (default)
kw - keys and wids (document IDs)
kwo - dump keys, wids, and occurences
kwc - keys and WidCounts per key
sbr - the SortByRank index for each key
-dir - Dump the index directory.
-widset [<format>] - Dump the widset files.
<format> can be:
wid - iterate the wids in the widset (default)
wc - widcounts only
hdr - widset header only
To determine in which index a given wid is fresh use:
cidump -dump <path> -widset -w <wid>
Select the keys to dump:
-k <startKey> [<endKey>] - Dump only the the keys in an given range.
-kn <keyCount> - Stop after displaying a number of keys.
-w <wid> - Dump only the keys that contain the given wid.
-p <PropID> - Dump only keys with the given PropID.
Select the wids to dump:
-wr <startWid> [<endWid>] - Dump only the wids in a given range.
-wn <widCount> - Stop after displaying a number of wids per each key.
Select the format of the data:
-d - Display all numbers in decimal.
(default is hex for IDs and offsets decimal for counts and sizes).
-h - Display all numbers in hexadecimal.
-b - Display the internal representaion of the keys (bytes in hex).
=== Notes:
* IndexIds, wids and PropIds can be entered as hex numbers preceeded by
"0x".
* The keys can be entered as strings or as a sequence of bytes in hex,
between
quotes and parantheses: abc is the same as "(00 00 61 00 62 00 63)"
Run cidump -?key to get help on the input format for keys.
=== Samples:
Check the integrity of all the indexes:
cidump -check c:\catalog
Display global statistics for the index 0001002A:
cidump -stats c:\catalog -g -i 0x1002A
Dump from all the indexes the keys in the range aaa - bbb.
Use the format that also shows the widcounts (number of docs with that key):
cidump -dump c:\catalog -x kwc -k aaa bbb
Dump the keys that contain the the wid 9001:
cidump -dump c:\catalog -x -w 9001
Dump the first 10 wids from the sorted by rank index for the key "tokenone":
cidump -dump c:\catalog -x sbr -k tokenone -kn 1 -wn 10
Dump the directory
cidump -dump s:\encpath\encarta -dir kbo
===
Run cidump -' to display advanced options.
Advanced options:
=== Specific options for dump:
========================================
======
-x [<format>] - Dump the index
<format> can be also:
ks - statistics per key
kw+ - keys and wids and wid metadata (all but
occurrences)
kp - list of keys with position in index
kwp - list of keys and wids with position in index
kph - dump phrases that contain a given key.
Requires -kph and -pch options.
ph - dump all phrases in more than a given nr of docs
Requires -phc option.
-dir [<format>] - Dump the index directory
<format> can be:
kbo - keys and BitOffset in index (default)
kp - list of keys with position in the directory file
-phk <key> - Used only with "-x kph" dump format.
Dump phrases that contain the given <key>. Use also -phc.
-phc <minWidCount> - Used only with "-x ph" and "-x kph" dump formats.
Dump all the 2-3 word phrases that occur in at least <minOccCount>
documents in the indexed corpus.
-kwc <minWidCount> <maxWidCount> - Display only keys in a widcount range.
-alr - Display the allocated ranges for the master index.
-fbs - Force binary search when dumping widsets.
-rec <index> <type> <maxWid> <R/W> - Dump a standalone index.
Don't use the Index Table.
<index> - the index id (e.g.) 0x1001C
<type> - 0 - master index; 1 - shadow index.
<maxWid> - the maximum workid in the index.
<R/W> - 0 - complete index; 1 - incomplete index (write mode)
=== Specific options for check: ========================================
====
-k <start_key> [<end_key>] - Process only the area of the index
for the given key range.
=== Specific options for statistics: =======================================
Computing default statistics require a full scan of the index file.
Additional options:
-o - display default statistics and also occurence distribution.
-wc - display widcount distribution (iterates through keys only).
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"cybergoo" <cybergoo@.discussions.microsoft.com> wrote in message
news:433D93A2-6162-446E-841C-DCEC48618ADF@.microsoft.com...
> I'm trying to use the cidump tool of SQL Server 2005 Beta 2 to dump a
words
> list from a full text catalog and I can't find good documentation - there
are
> few people online that claim that this is possible. I've look everywhere
but
> there's little to no info about this feature. Can the SQL documentation
team
> be so kind and provide us with something to work with here?sqlsql

cidump documentation?

I'm trying to use the new cidump tool of SQL Server 2005 Beta 2 to get a
words list from a full text catalog and I can't find good documentation.
I've looked everywhere. Have you see it? Can the SQL documentation team be
so kind and provide us with something here about this new feature?
Here is how I dump the catalog I created in C:\Program Files\Microsoft SQL
Server\MSSQL.1\MSSQL\FTData\XIN
C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn>cidump -dump
"C:\Program Files\Microsoft SQL
Server"\MSSQL.1\MSSQL\FTData\XIN\MssearchCatalogDi r -dir k
Note how I use the subdirectory called MSSearchCatalogDir.
For more documentation did you check this doc?
Use cidump.exe for:
- Dumping the content of the catalog or a specified index
cidump -dump <catalog_path> [options]
- Checking the integrity of the catalog or a specified index
cidump -check <catalog_path> [options]
- Computing statistics on the content of the catalog or a specified index
cidump -stats <catalog_path> [options]
=== Common options:
================================================== =======
-i <IndexId> - Apply the operation on this index.
In this option is not used, all the indexes are
processed.
-e <IndexId> - Exclude this index.
Use it multiple times for excluding more than one index.
-u <filepath> - Dump to a UNICODE file.
-not_read_only - Open CiStorage not read only
For when the index table streams are not in sync
=== Specific options to be used with -dump:
=================================
Select what to dump:
-x [<format>] - Dump the index.
<format> can be: k - keys only (default)
kw - keys and wids (document IDs)
kwo - dump keys, wids, and occurences
kwc - keys and WidCounts per key
sbr - the SortByRank index for each key
-dir - Dump the index directory.
-widset [<format>] - Dump the widset files.
<format> can be:
wid - iterate the wids in the widset (default)
wc - widcounts only
hdr - widset header only
To determine in which index a given wid is fresh use:
cidump -dump <path> -widset -w <wid>
Select the keys to dump:
-k <startKey> [<endKey>] - Dump only the the keys in an given range.
-kn <keyCount> - Stop after displaying a number of keys.
-w <wid> - Dump only the keys that contain the given wid.
-p <PropID> - Dump only keys with the given PropID.
Select the wids to dump:
-wr <startWid> [<endWid>] - Dump only the wids in a given range.
-wn <widCount> - Stop after displaying a number of wids per each key.
Select the format of the data:
-d - Display all numbers in decimal.
(default is hex for IDs and offsets decimal for counts and sizes).
-h - Display all numbers in hexadecimal.
-b - Display the internal representaion of the keys (bytes in hex).
=== Notes:
* IndexIds, wids and PropIds can be entered as hex numbers preceeded by
"0x".
* The keys can be entered as strings or as a sequence of bytes in hex,
between
quotes and parantheses: abc is the same as "(00 00 61 00 62 00 63)"
Run cidump -?key to get help on the input format for keys.
=== Samples:
Check the integrity of all the indexes:
cidump -check c:\catalog
Display global statistics for the index 0001002A:
cidump -stats c:\catalog -g -i 0x1002A
Dump from all the indexes the keys in the range aaa - bbb.
Use the format that also shows the widcounts (number of docs with that key):
cidump -dump c:\catalog -x kwc -k aaa bbb
Dump the keys that contain the the wid 9001:
cidump -dump c:\catalog -x -w 9001
Dump the first 10 wids from the sorted by rank index for the key "tokenone":
cidump -dump c:\catalog -x sbr -k tokenone -kn 1 -wn 10
Dump the directory
cidump -dump s:\encpath\encarta -dir kbo
===
Run cidump -? to display advanced options.
Advanced options:
=== Specific options for dump:
==============================================
-x [<format>] - Dump the index
<format> can be also:
ks - statistics per key
kw+ - keys and wids and wid metadata (all but
occurrences)
kp - list of keys with position in index
kwp - list of keys and wids with position in index
kph - dump phrases that contain a given key.
Requires -kph and -pch options.
ph - dump all phrases in more than a given nr of docs
Requires -phc option.
-dir [<format>] - Dump the index directory
<format> can be:
kbo - keys and BitOffset in index (default)
kp - list of keys with position in the directory file
-phk <key> - Used only with "-x kph" dump format.
Dump phrases that contain the given <key>. Use also -phc.
-phc <minWidCount> - Used only with "-x ph" and "-x kph" dump formats.
Dump all the 2-3 word phrases that occur in at least <minOccCount>
documents in the indexed corpus.
-kwc <minWidCount> <maxWidCount> - Display only keys in a widcount range.
-alr - Display the allocated ranges for the master index.
-fbs - Force binary search when dumping widsets.
-rec <index> <type> <maxWid> <R/W> - Dump a standalone index.
Don't use the Index Table.
<index> - the index id (e.g.) 0x1001C
<type> - 0 - master index; 1 - shadow index.
<maxWid> - the maximum workid in the index.
<R/W> - 0 - complete index; 1 - incomplete index (write mode)
=== Specific options for check: ============================================
-k <start_key> [<end_key>] - Process only the area of the index
for the given key range.
=== Specific options for statistics: =======================================
Computing default statistics require a full scan of the index file.
Additional options:
-o - display default statistics and also occurence distribution.
-wc - display widcount distribution (iterates through keys only).
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"cybergoo" <cybergoo@.newsgroup.nospam> wrote in message
news:%23448WlToFHA.1948@.TK2MSFTNGP12.phx.gbl...
> I'm trying to use the new cidump tool of SQL Server 2005 Beta 2 to get a
> words list from a full text catalog and I can't find good documentation.
> I've looked everywhere. Have you see it? Can the SQL documentation team be
> so kind and provide us with something here about this new feature?
>
|||Cybergoo,
The SQL Server 2005 Beta2 as well as Bet3 Books Online (BOL) have yet to be
updated on this most useful SQL FTS utiltity. In the meantime, you should
use the cidump /? to get the syntax as well as example of use:
-- on my Win2003 server...
f:
cd:\MSSQL90\MSSQL.1\MSSQL\Binn
cidump /?
-- edited output:
Use cidump.exe for:
- Dumping the content of the catalog or a specified index
cidump -dump <catalog_path> [options]
- Checking the integrity of the catalog or a specified index
cidump -check <catalog_path> [options]
- Computing statistics on the content of the catalog or a specified index
cidump -stats <catalog_path> [options]
...
Display global statistics for the index 0001002A:
cidump -stats c:\catalog -g -i 0x1002A
cidump /?
Advanced options:
-x [<format>] - Dump the index
<format> can be also:
ks - statistics per key
kw+ - keys and wids and wid metadata (all but
occurrences)
kp - list of keys with position in index
kwp - list of keys and wids with position in index
kph - dump phrases that contain a given key.
Requires -kph and -pch options.
ph - dump all phrases in more than a given nr of docs
Requires -phc option.
-dir [<format>] - Dump the index directory
<format> can be:
kbo - keys and BitOffset in index (default)
kp - list of keys with position in the directory file
-phk <key> - Used only with "-x kph" dump format.
Dump phrases that contain the given <key>. Use also -phc.
-phc <minWidCount> - Used only with "-x ph" and "-x kph" dump formats.
Dump all the 2-3 word phrases that occur in at least <minOccCount>
documents in the indexed corpus.
-kwc <minWidCount> <maxWidCount> - Display only keys in a widcount range.
-alr - Display the allocated ranges for the master index.
-fbs - Force binary search when dumping widsets.
-rec <index> <type> <maxWid> <R/W> - Dump a standalone index.
Don't use the Index Table.
<index> - the index id (e.g.) 0x1001C
<type> - 0 - master index; 1 - shadow index.
<maxWid> - the maximum workid in the index.
<R/W> - 0 - complete index; 1 - incomplete index (write mode)
Enjoy!
John
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/
"cybergoo" <cybergoo@.newsgroup.nospam> wrote in message
news:%23448WlToFHA.1948@.TK2MSFTNGP12.phx.gbl...
> I'm trying to use the new cidump tool of SQL Server 2005 Beta 2 to get a
> words list from a full text catalog and I can't find good documentation.
> I've looked everywhere. Have you see it? Can the SQL documentation team be
> so kind and provide us with something here about this new feature?
>
|||Thanks, I'm aware of the help text of the /? switch. A better
documentation of the dump text is needed to parse it. For example, in
some cases there's a dot (.) at the beginning of an index entry line.
The help text is mute regarding what it means and it focuses on the
different switches, not the syntax of the dump.
|||You're welcome, Cybergoo,
Yep, it is... Actually the SQL Server 2005 (June CTP / IDW15 / Beta3)
version is *mute* on a lot of FTS-related topics, IMHO.
Hopefully, the next CTP version will be more *verbose*, as the FTS-related
topics leave a lot to be desired... Look for entries at my blog on this
topic in the near future!
Thanks,
John
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/
<cybergoo@.gmail.com> wrote in message
news:1125101091.602442.196550@.g49g2000cwa.googlegr oups.com...
> Thanks, I'm aware of the help text of the /? switch. A better
> documentation of the dump text is needed to parse it. For example, in
> some cases there's a dot (.) at the beginning of an index entry line.
> The help text is mute regarding what it means and it focuses on the
> different switches, not the syntax of the dump.
>

Friday, February 24, 2012

Checking the integrity of the FT catalog/index

What would be an efficient way to check for the integrity of a FT
catalog or index in SQL 2000? Beside monitoring the event logs, I'm
planning to write a script to parse out a string of text from a text
column of a random row from a table has FT indexes, then go back and do
a FT search on that string/words to make sure the catalog is ok.
SQL 2005 has cidump, does SQL 2000 has any equivalent utility?
Thanks,
Hai
Hai,
The MSSearch service does its own internal integrity checking by design, so
little to no addition checking is normally required. However, I would
recommend that you monitor the free space, memory and cpu usage via the
"Microsoft Search" Performance counters. The following blog entry has links
to the more common SQL FTS issues.
SQL Server 2000 Full-Text Search Resources and Links
http://spaces.msn.com/members/jtkane/Blog/cns!1pWDBCiDX1uvH5ATJmNCVLPQ!305.entry
323739 "INF: SQL Server 2000 Full-Text Search Deployment White Paper" will
have more information about the MSSearch perfmon counters.
Regards,
John
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/
<tran.hai@.gmail.com> wrote in message
news:1130170118.779530.104390@.z14g2000cwz.googlegr oups.com...
> What would be an efficient way to check for the integrity of a FT
> catalog or index in SQL 2000? Beside monitoring the event logs, I'm
> planning to write a script to parse out a string of text from a text
> column of a random row from a table has FT indexes, then go back and do
> a FT search on that string/words to make sure the catalog is ok.
> SQL 2005 has cidump, does SQL 2000 has any equivalent utility?
> Thanks,
> Hai
>
|||Thanks John!