Event-Driven Crawl Scheduling for SharePoint Search and FAST Search for SharePoint

PowerShell Scripts to customize scheduling

Agnes Molnar

by Agnes Molnar on 3/31/2014

Share this:
Print

Article Details

Date Revised:
4/16/2014

[IF condition="s1.Length > 0" strings="FAST SearchMicrosoft SharePointPowerShell SearchSharePoint 2010spxWindows PowerShell"]

Applies to:
FAST Search, Microsoft SharePoint, PowerShell , Search, SharePoint 2010, spx, Windows PowerShell

[ENDIF]

http://itunity.com/c/bJ


Recently, I've been working for a customer that has some interesting requirements. They have several content sources and want to crawl them one after another. Scheduling the incrementals for a fixed time is not a good solution as their content incrementals are very hectic. For example, incremental crawl for the same content source took t minutes at one time, and then 1.5 hours the next. And of course, they didn't want any idle time.

The problem is that you can't define rules for this type of requirement form the SharePoint UI. So the solution was PowerShell.

First, you need to be able to start the crawl. Let's concentrate only on the incremental crawl for now. Here's the PowerShell script:

$SSA = Get-SPEnterpriseSearchServiceApplication -Identity "Search Service Application"

$ContentSourceName = My Content Source

$ContentSource = $SSA | Get-SPEnterpriseSearchCrawlContentSource -Identity $ContentSourceName

$ContentSource.StartIncrementalCrawl()

 It's an easy one, isn't it?

The next step is checking the status of this content source. You need this for several reasons. For example, you might want to start the crawl only if it's in Idle status, or you might want to display the current status of the crawl every minute, etc. Here's the PowerShell command you need:

$ContentSource.CrawlStatus

What values can it have? Here's a list of crawl statuses:

  • Idle
  • CrawlStarting
  • CrawlingIncremental / CrawlingFull
  • CrawlPausing
  • Paused
  • CrawlResuming
  • CrawlCompleting
  • CrawlStopping

OK. You can decide on the status now and start a crawl. How do you make it event driven? Here's the logical sequence you have to follow:

  1. Start the crawl of a content source
  2. Wait till it's finished
  3. Take the next content source and repeat steps 1 and 2 until you're finished with each content Pause
  4. Repeat this sequence

The first step is creating a function if you want nice code. Here's my first one:

function Crawl { #Start crawling $ContentSourceName = $args[0] $ContentSource = $SSA | Get-SPEnterpriseSearchCrawlContentSource –Identity $ContentSourceName $CrawlStarted = Get-Date 

#Check crawl status if (($ContentSource.CrawlStatus -eq "Idle") -and ($CrawlNumber -eq 0)) 

{ $ContentSource.StartIncrementalCrawl() Start-sleep 1 Write-Host $ContentSourceName " - Crawl Starting..." 

do { Start-Sleep 60 # Display the crawl status in every 60 seconds $Now = Get-Date $Duration = $Now.Subtract($CrawlStarted) # Duration of the current crawl $Speed = $ContentSource.SuccessCount / $Duration.TotalSeconds # Speed of the current crawl, docs/sec Write-Host $ContentSourceName " - " $ContentSource.CrawlState (Get-Date).ToString() "-" $ContentSource.SuccessCount"/" ContentSource.WarningCount"/" $ContentSource.ErrorCount "(" ("{0:N2}" -f $Speed) " doc/sec)" } while (($ContentSource.CrawlStatus -eq "CrawlStarting" ) -or ($ContentSource.CrawlStatus -eq "CrawlCompleting") -or ($ContentSource.CrawlStatus -eq "CrawlingIncremental") -or ($ContentSource.CrawlStatus -eq "CrawlingFull" )) 

Write-Host $ContentSourceName " - Crawling Finished" Write-Host "" } }

This is how you can call this function:

Crawl("My Content Source") 

Some additional steps you might need:

  • If you want to run this script once a day (need daily incrementals only but would like to be done as quick as possible), just schedule this script as a Windows task.
  • If you want to run this script during the day only (perhaps you need to release the resources for some other jobs for nights), you can do the “start in the morning” and “start in the evening” logic. I’ve made a simple example in my blog post a few months ago.
  • If you want to run this sequence all day long, you might insert this logic into an infinite loop. (But be careful, sometimes you’ll need to run full crawl and then you have to stop running this script.)
  • You can insert some other steps into this script too. If you want to do something (logging, sending some alerts, etc.) when the crawl starts / stops, just do that here. It’ll be your custom event handler on the crawl events.
  • You can even write the output of this script to a file, so that you’ll have your own crawl log.

The above scripts work fine with both SharePoint Search and FAST Search for SharePoint.

Enjoy!


Topic: Article

Sign in with

Or register