4 Tips for index cleaning for SharePoint Search

How to clean only one Content Source and clean if you have FAST Search for SharePoint 2010

Agnes Molnar

by Agnes Molnar on 3/31/2014

Share this:
Print

Article Details

Date Revised:
4/16/2014


If you’ve ever had fun with SharePoint Search, most likely you’ve seen (or even used) Index Reset there. This is very useful if you want to clear everything from your SharePoint index – but sometimes it’s not good enough, if you

  1. don’t want to clean the full index but only one Content Source
  2. have FAST Search for SharePoint 2010
  3. both :)

Tip 1: Cleaning up only one Content Source

Sometimes you have too much content crawled but need to clear only one Content Source. In this case, clearing everything might be very painful – imagine clearing millions of documents, then crawling everything that should not have been cleaned.

Instead, why not clean one Content Source only? It’s much easier than you might think. Here are the steps.

  1. Open your existing Content Source
  2. Make sure no crawl is running on this Content Source. The status of the Content Source has to be Idle. If not, Stop the current crawl and wait until it gets done
  3. Remove all Start Addresses from your Content Source. Don’t forget to note them before clearing!
  4. Wait until the index gets cleaned up (see below for details on this)
  5. Add back the Start Addresses (URLs) to your Content Source, and Save your settings.
  6. Enjoy!

With this method, you’ll be able to clear only one Content Source.

Of course, you can use either the UI of Search Service Application (SSA) in Central Administration or PowerShell. The logic is the same. Here is a simple PowerShell script for removing the Start Addresses:

$contentSSA = "FAST Content SSA"$sourceName = "MyContentSource" 

$source = Get-SPEnterpriseSearchCrawlContentSource -Identity $sourceName -SearchApplication $contentSSA $URLs = $source.StartAddresses | ForEach-Object { $_.OriginalString } 

$source.StartAddresses.Clear() 

Then, as soon as you’re sure the Index has been cleaned up,(*) you can add back the Start Addresses, by using this command:

ForEach ($address in $URLs){ $source.StartAddresses.Add($address) }

Tip 2: Index Reset in FAST Search for SharePoint

You most likely know Index Reset on the SSA UI shown in Figure 1:

1

Figure 1: Search Service Application UI

Well, in case you’re using FAST Search for SharePoint 2010 (FS4SP), it’s not enough. You need the following steps if you want to do a real Index Reset :

  1. Make an Index Reset on the SSA, see the screenshot above.
  2. Open FS4SP PowerShell Management on the FAST Server, as a FAST Admin.
  3. Run the following command:

Clear-FASTSearchContentCollection –Name <yourContentCollection>

The full list of available parameters can be found here.

This deletes all items from the content collection, without removing the collection itself.

Tip 3: Cleaning up only one Content Source in FAST Search for SharePoint

Steps are the same as in case of SharePoint Search, see above.

Tip 4: Checking the status of your Index

In Step 4 (under Cleaning up only one Content Source), I mentioned that you should wait until the index gets cleaned up, and it always takes time. The first place where you can go to find out how long it will take is the SSA. You'll find a number that is a very good indicator. Figure 2 shows this number circled in red.

1

Figure 2: SSA number to determine how long clean-up will take

In case of FS4SP, you should use PowerShell again, after running the Clear-FASTSearchContentCollection command:

  1. Open FS4SP PowerShell Management on the FAST Server, as a FAST Admin.
  2. Run the following command:

Get-FASTSearchContentCollection –Name <yourContentCollection>. 

The result contains several pieces of information, including DocumentCount, shown in Figure 3.

2

Figure 3: Document Count

How does this help you check the clean-up process? You have two options:

First option: If you know how many items need to be cleaned, just check the DocumentCount before you clean the Content Source, and keep checking regularly afterwards. If the value of DocumentCount is around the value you’re expecting AND not decreasing anymore, you’re done.

Second option: If you don’t know how many items will be cleared, just check the value of Document Count regularly--say, every five minutes. If this value has stopped decreasing AND doesn’t decrease for a while (e.g., for fifteen minutes or so), you’re done.

As soon as you’re done, you can add back the Start Addresses to your Content Source, as mentioned above.


Topic: Tutorial

Sign in with

Or register