Export the SharePoint 2010 Search Crawl Log

Needed to do this today, as I’d set up a FAST Search crawl of a  public-facing website, and there being a lot of errors I need to do some serious analysis of the issues.  Of course, there is no “Export to Excel” tab, which would have been great. Still after hunting around and deciding that the Codeplex project which was written for SharePoint 2007 probably wouldn’t work (though I never actually confirmed this), as I’m assuming things like DLL versions will be a bit off, though being codeplex, you could of course recompile with the latest DLLs.

Better still, “how about a bit of Powershell?” I thought, seeing as I’m getting over my fear of that beastie and sure enough there is an excellent blog post from the SharePoint escalation team which gives a great example of how to export the Crawl Log.  I took the liberty of pinching that code and altering it slightly to output to a CSV text file, so it goes stright into Excel nicely. Here’s the code:

#============================================================================ 
#Powershell script to pull all the crawl logs and display based on errorId 
#============================================================================ [IO.Directory]::SetCurrentDirectory((Convert-Path (Get-Location -PSProvider FileSystem)))
#Replace "Search Service Application" in the script with the exact name of the SSA that you browse to for viewing the crawl log. 
#With FAST you have multiple Search SSA’s and hence specify the name of the SSA that you use to view the crawl log data. 
$ssa = Get-SPEnterpriseSearchServiceApplication | Where-Object {$_.Name -eq "UoN FAST Search Connector Service Application"} 
#This should list only one SSA object. 
$ssa 
#Create a LogViewer object associated with that SSA 
$logViewer = New-Object Microsoft.Office.Server.Search.Administration.Logviewer $ssa 
#Get a List of all errors/warnings in the Crawl Log
$ErrorList = $logViewer.GetAllStatusMessages() | Select ErrorId 
#Loop through each type of error and pull that data $currentUser = [Environment]::UserDomainName + "\" + [Environment]::UserName;
$machine = [Environment]::MachineName;
$date = ( get-date ).ToString('yyyyMMdd');
$logFile = "$date-installationlog.txt";
if ($logfile -eq $null)
{
  $logFile = New-Item -type file "$date-installationlog.txt"; 
}
Foreach ($errorId in $ErrorList) 
{ 
    $crawlLogFilters = New-Object Microsoft.Office.Server.Search.Administration.CrawlLogFilters 
    #Filter based on the Error Id 
    $crawlLogFilters.AddFilter("MessageId", $errorId.errorId)        
    "Pulling data for Message ID : " + $errorId.errorId 
    $nextStart = 0 
    $urls = $logViewer.GetCurrentCrawlLogData($crawlLogFilters, ([ref] $nextStart)) 
    #Data from the crawl log will be available in the DataTable $urls. If this number is larger than the number of records requested (50 by default), then use only 50 records, ignore the rest. 
    WHILE($nextStart -ne -1)
 {
  $crawlLogFilters.AddFilter("StartAt", $nextStart);
  $nextStart = 0;
  $urls = $logViewer.GetCurrentCrawlLogData($crawlLogFilters, ([ref] $nextStart));

  for ($i=0; $i -le $urls.Rows.Count -1 ; $i++)
  {
   # Just output the URL, the Error (if any) and the Error Code.
   [System.String]::Format('{0},{1}, {2}',$urls.Rows[$i].DisplayUrl, $urls.Rows[$i].ErrorMsg, $urls.Rows[$i].ErrorLevel) | Out-File $logFile -append;
  } 
 } 
}

Note that the LogViewer class is deprecated, but I’m assuming something will replace it.  Hope this helps somebody

Cheers

Dave Mc

Advertisements

About davemcmahon81
Software Developer & Architect, User Group Leader, Speaker, Writer, Blogger, Occasional Guitarist, Man-made Global Warming Sceptic, Climate Change Believer, General Optimist but most of all proud Husband and Dad ...

7 Responses to Export the SharePoint 2010 Search Crawl Log

  1. Pingback: Blog del CIIN

  2. Pingback: SharePoint 2010: Recopilatorio de enlaces interesantes (XXIII)! « Pasión por la tecnología…

  3. brad says:

    Hi Dave, I found your script helpful! So thanks for the post a couple of questions.

    1) Is there anyway to export crawl logs between a particular date range? We are crawling some legacy systems (non microsoft) that dont always bring back what we expect in search i.e. some things are not included some things are included we would expect ignored by crawl rules, so we basically we are not just interested in errors but everything that was crawled. We tend to do small tests with rules and compare with what we expect to what we crawled. So exporting on a date range would be ideal

    2) How does Sharepoint determine how far back to keep crawl logs? i.e. you can apply filters, but say i filter on a content source (if this is possible) and also an error id, how does sharepoint know how far back to interrogate the logs, i.e. just the last crawl, or all of them etc. what does current mean in getcurrentcrawllogdata()

    any help appreciated.
    Brad

    • Hi Brad, all good questions. I don’t off-hand know of a way to filter by date, and I’m unaware of how to specify the length of time a crawl log is kept. I know that the logs are reset when the index itself is reset, but you probably know that yourself :-). If I’m looking for specific content amongst the crawl logs, not just errors, I export to a text file and then use Notepad++ http://notepad-plus-plus.org/ as my search tool which has a fantastic search capability, I highly recommend it.

  4. Ben says:

    Hi Dave,
    Very helpful for me. Just a question, I am using SharePoint server 2010 and have 3 target data sources to crawl. The script you did seems that it only exported crawl log of “Local SharePoint sites”. How to specify the data source in the script? Another information is that it seems that in the export file, it considered “warning” log items as “crawled” with status 0.
    Could you please help me?
    Ben

  5. Pingback: Track the Output of SharePoint Fast Search Crawl Logs : Beyond Search

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: