Storage Locations for files gathered by the Crawl Component sharepoint 2013

When gathering files from a content source, the SharePoint 2013 Crawl Component can be very I/O intensive process – locally writing all of the files it gathers from content repositories to its to temporary file paths and having them read by the Content Processing Component during document parsing. This post can help you understand where the Crawl Components write temporary files, which can help in planning and performance troubleshooting (e.g. Why does disk performance of my C:\ drive get so bad – or worse, fill up – when I start a large crawl?)

By default, all Search data files will be written within the Installation Path

  • The Data Directory (by default, a sub-directory of the Installation Path) specifies the path for all Search data files including those used by I/O intensive components (Crawl, Analytics, and Index Components)
    • The Data Directory can only be configured at the time of Installation (e.g. it can only be changed if uninstalling/re-installing SharePoint on the given server)
      • From the Installation Wizard, choose the “File Location” tab as seen below
      • IMPORTANT: Before uninstalling SharePoint, first modify your Search topology by removing any Search components from the applicable server. Once SharePoint is re-installed, you can once again deploy the components back to this server.
    • The defined path can be viewed in the registry:

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\15.0\Search\Setup\DataDirectory

    • Advanced Note: The Index files (by default, written to the Data Directory) path can be configured separately when provisioning an Index Component via PowerShell using the “RootDirectory” parameter

3175.installAndDataPath
(As a side note: the graphic is only intended to display the default locations specified at install time. It is recommended to change these to a file path other than C:\ drive)

For the Crawl Component:

  • When crawling [gathering] an item, the filter daemon (mssdmn.exe – a child process of the Crawl Component that actually interfaces with an end content repository using a Search Connector/Protocol Handler) will download any applicable file blobs to the SSA’s “TempPath” (e.g. an HTML file, a Word document, a PowerPoint presentation, etc)
    • In the graphic below, this is step 2a
    • The defined path can be viewed either:
      • In the registry (of a Crawl server)

        HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\15.0\Search\Global\Gathering Manager\TempPath

      • Or as a property of the SSA:

        $SSA = Get-SPEnterpriseSearchServiceApplication

        $SSA.TempPath

  • When the filter daemon completes the gathering of an item, it is returned to the Gathering Manager (mssearch.exe – responsible for orchestrating a crawl of a given item) and the applicable blob is moved to the “GathererDataPath“, which is a path relative to the DataDirectory mentioned above.
    • In the graphic below, this occurs in step 2b
    • The defined path can be viewed in the registry (of a Crawl server):

      HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\15.0\Search\Components\-GUID-of-theSSA-crawl-0\GathererDataPath

  • The GathererDataPath is mapped as a network share (used by the Content Processing Components)
    • The shared path can be viewed in the registry (of a Crawl server):

      HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\15.0\Search\Components\-GUID-of-theSSA-crawl-0\GathererDataShare

8233.crawlFlow
Usage by the Content Processing Components:

  • When the item is fed from the Crawler to the Content Processing Component (step 3 above), the item is only logically submitted to the CPC in a serialized payload of properties that represent that particular item – any related blob would remain on the Crawler and retrieved by a later stage in the processing flow
    • For SharePoint list items, there would typically not be a blob (unless the list item had an attachment)
    • For a document in a SharePoint library, the blob would represent the item’s associated file (such as a Word document)
  • During the Document Parsing stage in the processing flow (e.g. during step 4 above), the item’s blob will be retrieved from the Crawl Component via the GathererDataShare
  • When the Crawl Component receives a callback (success or failure) from the CPC (e.g. in step 6b above after an item has been processed), the temporary blob is then deleted from the GathererDataPath

1373.gathererDataShare
An example path to an item with DocID 933112 would look like the following:

file://crawlSrv/gthrsvc_7ecdbb10-3c86-4298-ab09-04f61aaeb636-crawl-0//f8/0xe3cf8_1.aspx   

#0xe3cf8 hex = 933112 decimal

Where:

  • crawlerSrv is a server running a crawl component
  • gthrsvc_-GUID-of-theSearchAdminWebServiceApp--crawl-0 is the name of the crawl component
    • This GUID can be identified using the following PowerShell:

      $SSA = Get-SPEnterpriseSearchServiceApplication

      $searchAdminWeb = Get-SPServiceApplication –Name $SSA.id

      $searchAdminWeb.id

      7ecdbb10-3c86-4298-ab09-04f61aaeb636

  • And the file name is actually re-named to the hex value of the docID
    • For example: 0xe3cf8 hex = 933112 decimal
    • Which we can see in ULS, such as:
      • From the Crawl Component (in this case, running on server “faceman”):

        mssearch.exe     SharePoint Server Search Crawler:Content Plugin      af7zf VerboseEx

        CTSDocument: FeedingDocument: properties : strDocID = ssic://933112 key = path values =\\FACEMAN\gthrsvc_7ecdbb10-3c86-4298-ab09-04f61aaeb636-crawl-0\\f8\0xe3cf8.aspx 

      • From the Content Processing Component:

        NodeRunnerContent2-834ebb1f-009    Search    Document Parsing      ai3ef VerboseEx

        AttachDocParser – Parsing: ‘file://faceman/gthrsvc_7ecdbb10-3c86-4298-ab09-04f61aaeb636-crawl-0//f8/0xe3cf8.aspx’

Advertisements

SharePoint Health Analyzer rules reference SharePoint 2013

Crawl error Processing this item failed because of an unknown error when trying to parse its contents sharepoint

During various search troubleshooting i came across the following crawling error in the Crawl log of a SharePoint 2013 environment.

Processing this item failed because of an unknown error when trying to parse its contents. (Error parsing document ‘http://********.*****.com/Project/abcd/Q_M/ABX/SitePages/Homepage.aspx’. Sandbox worker pool is
closed.; ; SearchID = *******************)

In order to fix this you can try to perform the following action plan:
Open “Local Policies
Click on “User rights assignment

user-rights-assignment

Make sure that the search service account has the following rights:
Replace a process level token

adjust-memory-quotas-for-process

Adjust memory quotas for a process

adjust-memory-quotas-for-process-properties

Impersonate a client after authentication

impersonate

Please make sure that the policies don’t get changed afterwards.

After implementing the above changes please run a clear configuration cache
After clearing the cache, start a full crawl and the errors should be gone.

Retrive account password powershell

Start SharePoint Service Application Proxy using Powershell

If your Usage and Health Data Collection Proxy is in a stopped state here is a quick bit of PowerShell to to get it started:

$sap = Get-SPServiceApplicationProxy | where-object {$_.TypeName -eq “Usage and Health Data Collection Proxy”}
$sap.Provision()

The above can easily be adapted to allow you to start any Service Application Proxy

Slow SharePoint improve performance without upgrading hardware

what you can do if your SharePoint is sometimes very slow.

E.g.: on the first start of a Site
Sometimes during the day a search query will take about a minute until you get results…..

Just look on that article: http://support.microsoft.com/kb/2625048

it will improve “feeled” performance (site response times) massive, if you’re going to implement both solutions.

Disabling CRL Check is just necessary if the SP Server does not have internet connectivity, that means proxy settings must be configured for the server itself

http://technet.microsoft.com/de-de/library/bb430772(v=exchg.141).aspx, and your proxy must allow traffic from the server of course.

SharePoint shortcut URL and hidden list

Users and Permissions:
People and Groups: _layouts/people.aspx
Site Collection Admins: _layouts/mngsiteadmin.aspx
Advanced Permissions: _layouts/user.aspx
Master Pages: _Layouts/ChangeSiteMasterPage.aspx
Look and Feel:
Quick Launch settings page: /_layouts/quiklnch.aspx
Title, Desc, and Icon: _layouts/prjsetng.aspx
Navigation: _layouts/AreaNavigationSettings.aspx
Page Layout and Ste Templates: _Layouts/AreaTemplateSettings.aspx
Welcome Page: _Layouts/AreaWelcomePage.aspx
Tree View: _layouts/navoptions.aspx
Top Nav Bar: _layouts/topnav.aspx
Site Theme: _layouts/themeweb.aspx
Reset to Site Definition: _layouts/reghost.aspx
Searchable Columns: _Layouts/NoCrawlSettings.aspx
Site Content Types: _layouts/mngctype.aspx
Galleries
Site Columns: _layouts/mngfield.aspx
Site Templates: _catalogs/wt/Forms/Common.aspx
List Templates: _catalogs/lt/Forms/AllItems.aspx
Filter toolbar for Lists and libraries: ?Filter=1
Web Parts: _catalogs/wp/Forms/AllItems.aspx
Workflows: _layouts/wrkmng.aspx
Workflow history hidden list: /lists/Workflow History
Master Pages and Page Layouts: _catalogs/masterpage/Forms/AllItems.aspx
Regoinal Settings: _layouts/regionalsetng.aspx
Site Administration
Recreate default site sp groups: _layouts/15/permsetup.aspx
recycle bin: _layouts/RecycleBin.aspx
Site Libraries and Lists: _layouts/mcontent.aspx
Site Usage Report: _layouts/usageDetails.aspx
User Alerts: _layouts/sitesubs.aspx
RSS: _layouts/siterss.aspx
Search Visibility: _layouts/srchvis.aspx
Sites and Workspaces: _layouts/mngsubwebs.aspx
Site Features: _layouts/ManageFeatures.aspx
Delete This Site: _layouts/deleteweb.aspx
Site Output Cache: _Layouts/areacachesettings.aspx
Content and Structure: _Layouts/sitemanager.aspx
Content and Structure Logs: _Layouts/SiteManager.aspx?lro=all
Search Settings: _layouts/enhancedSearch.aspx
Site Collection Administration
Search Scopes: _layouts/viewscopes.aspx?mode=site
Search Keywords: _layouts/listkeywords.aspx
Recycle Bin: _layouts/AdminRecycleBin.aspx
Site Collection Features: _layouts/ManageFeatures.aspx?Scope=Site
Site Hierachy: _layouts/vsubwebs.aspx
Site hierarchy page (lists of sub sites): /_layouts/1033/vsubwebs.aspx
Portal Site Connection: _layouts/portal.aspx
Site Collection Audit Settings: _layouts/AuditSettings.aspx
Site Collection Policies: _layouts/Policylist.aspx
Site Collection Cache Profiles: Cache%20Profiles/AllItems.aspx
Site Collection Output Cache: _Layouts/sitecachesettings.aspx
Site Collection Object Cache: _Layouts/objectcachesettings.aspx
Variations: _Layouts/VariationSettings.aspx
Variation Labels: _Layouts/VariationLabels.aspx
Translatable Columns: _Layouts/TranslatableSettings.aspx
Variation Logs: _Layouts/VariationLogs.aspx
Site Settings: _layouts/settings.aspx
Delete user from Site collection (on-premises): /_layouts/15/people.aspx?MembershipGroupId=0

Load document tab initial
?InitialTabId=Ribbon.Document

Delete user from Site collection (on-premises):
/_layouts/15/people.aspx?MembershipGroupId=0

Display list in grid view. ‘True’ is case sensitive:
?ShowInGrid=True

Sandboxed Solution Gallery:
/_catalogs/solutions/Forms/AllItems.aspx

Filter toolbar for Lists and libraries:
?Filter=1

Site usage page:
/_layouts/usage.aspx

View all site content page (Site content):
/_layouts/viewlsts.aspx

Get the version of the SharePoint server (Patch level):
/_vti_pvt/Service.cnf

Web Part Maintenance Page:
?Contents=1

Show Page in Dialog View:
?isdlg=1

Application page for registering SharePoint apps
/_layouts/15/appregnew.aspx

Save Site as a template
/_layouts/savetmpl.aspx

Sign in as a different user
/_layouts/closeConnection.aspx?loginasanotheruser=true

Enable SharePoint designer
/_layouts/SharePointDesignerSettings.aspx

Quick Deploy List
Quick%20Deploy%20Items/AllItems.aspx

Open Page in Edit Mode
?ToolPaneView=2

Taxonomy Hidden List (MMS)
Lists/TaxonomyHiddenList/AllItems.aspx

User Information List:
_catalogs/users
_catalogs/users/simple.aspx

Force displaying the user profile in the site collection:
/_layouts/userdisp.aspx?id={UserID}&Force=True

Site hierarchy page (lists of sub sites)
/_layouts/vsubwebs.aspx
/_layouts/1033/vsubwebs.aspx

Add Web Parts Pane: ?ToolPaneView=2 : Add to the end of the page URL; will only work if the page is already checked out
Create: [area]/_layouts/spscreate.aspx
Create: /_layouts/create.aspx

Create list in a different portal area :

/_layouts/new.aspx?NewPageFilename=YourTemplateName.stp&ListTemplate=100&
ListBaseType=0

When you save a template in a portal area and try to create a new list in a different portal area, the template will not show on the Create page. Use this URL to force it to show.

Documents and Lists: /_layouts/viewlsts.aspx

List Template Gallery: /_catalogs/lt

Manage Audiences: /_layouts/Audience_Main.aspx

Manage Cross Site Groups: /_layouts/mygrps.aspx

Manage List Template Gallery: /_catalogs/lt/Forms/AllItems.aspx

Manage My Alerts: /_layouts/MySubs.aspx

Manage People: /_layouts/people.aspx

Manage Site Collection Administrators: /_layouts/mngsiteadmin.aspx

Manage Site Collection Users:
/_layouts/siteusrs.aspx : To access you must be an admin on the server or a site collection admin for the site.

Manage Site Groups: /_layouts/role.aspx

Manage Site Template Gallery: /_catalogs/wt/Forms/AllItems.aspx

Manage Site Template Gallery: /_catalogs/wt/Forms/Common.aspx

Manage Sites and Workspaces: /_layouts/mngsubwebs.aspx

Manage User Alerts: /_layouts/AlertsAdmin.aspx

Manage User Alerts: /_layouts/SiteSubs.aspx

Manage User Permissions: /_layouts/user.aspx

Manage Web Part Gallery: /_catalogs/wp/Forms/AllItems.aspx

Master Page Gallery: /_catalogs/masterpage : Also includes Page Layouts

Modify Navigation: /_layouts/AreaNavigationSettings.aspx

Recycle Bin: /_layouts/AdminRecycleBin.aspx

Save as site template: /_layouts/savetmpl.aspx

Site Column Gallery: /_layouts/mngfield.aspx

Site Content and Structure Manager: /_layouts/sitemanager.aspx

Site Content Types: /_layouts/mngctype.aspx

Site Settings: /_layouts/settings.aspx

Site Settings: /_layouts/default.aspx

Site Template Gallery: /_catalogs/wt

Site Theme: /_layouts/themeweb.aspx

Site usage details: /_layouts/UsageDetails.aspx

Site Usage Summary: /_layouts/SpUsageWeb.aspx

Site Usage Summary: /_layouts/Usage.aspx

Sites Registry: /SiteDirectory/Lists/Sites/Summary.aspx

Top-level Site Administration: /_layouts/webadmin.aspx

User Information: /_layouts/userinfo.aspx

Web Part Gallery: /_catalogs/wp

Web Part Page Maintenance: ?contents=1 : Add to the end of the page URL