Sunday, November 22, 2009

Content type features revisited

In one of my previous postings I discussed a custom solution that updates feature-created content types pushing down the changes to the destination lists. Since we already have the public beta of SharePoint 2010 I was curios how this problem may be handled in the new SharePoint. What I noticed first was the new Overwrite attribute that is introduced in the content type element definition and I expected some really good news about it, but after some further investigation the conclusions are a bit mixed. So, before going back to the new stuff about content types in SharePoint 2010 (which is not that much actually) let me make a step back and make a deep dive into the content type definitions intricacies so that we can see all possible issues about them, the problem with the propagation (pushing down) of changes to the destination lists and the changes introduced in SharePoint 2010. As I mentioned already the changes in 2010 are not that big and the main focus of this article will not be about them. So, let me start with some terminology – it is not officially sanctioned but it will help in my further explanations if we have strict definitions:

Some terminology

First thing – the type of content types depending on their scope:

  1. site content type – this is merely a schema definition with no direct relation to the data – you see these in the content type gallery of the site
  2. list content type – this is a content type in a list – its schema has direct representation in the list fields

Second – the type of site content types depending on the method of creating it:

  1. feature content type – this a site content type created with a feature that has a ContentType element
  2. object model (OM) content type – this is a site content type created using the SharePoint object model – namely the SPContentType and SPContentTypeCollection classes

Third – two groups of feature content types depending on the source of their schema:

  1. ghosted content type – its schema is read directly from the elements XML file in the feature that created the content type
  2. unghosted content type – the schema is read from the content database – a ghosted content type can get unghosted if something in its schema gets changed using the SharePoint OM.

phantom field links – these are elements in the FieldLinks collections of a list content type that do not have corresponding list fields (i.e. they are not provisioned) – I will explain later how you can end up with such.

Content type inheritance

And several words about the content type inheritance – this is something essential in respect to pushing down site content type updates to list content types. I won’t get into details here about how content type inheritance can be defined in the feature elements file using the ID attribute of the ContentType element – you can check the corresponding SharePoint SDK documentation on that.

First off – inheritance may be not even the correct word for this type of relation, at least if you compare it to class inheritance in programming languages. So, for content types – depending on the scope of the content type there’re some restrictions: only site content types can be inherited (this means that you can’t have a list content type inheriting another list content type). And list content types although bearing the same names as the site content types used to create them are actually inheritors of these site content types – so basically all rules of “inheritance” apply for them too. The inheritance relation for content types is actually rather loose. The most obvious case to see the content type “inheritance” in action is when you create a new child content type (using a feature or the object model alike) – the result is basically that the whole schema definition of the parent content type is copied to the child content type (for the feature creation case – the schema specified for the child type is then merged with the schema inherited from the parent). When it comes to updates it becomes a little bit stranger – you have two modes here – propagating updates – meaning that the changes in the parent content type will be applied to its descendants and non-propagating changes – which won’t be applied to the descendants. What’s more – every element in the schema of the child content type is modifiable including the inherited parts which means that they are nothing more than a separate and independent from the parent’s definition copy in the child content type schema. You can imagine that using non-propagating updates and modifying both the parent and child content types these can get totally different. So much for inheritance here.

And about the propagating updates: these are actually more like the publisher-subscriber relation in a database replication model rather than class-like inheritance. The main rule here is that only immediate changes are propagated to the descendant content types – what do I mean here with immediate changes – these are changes that you make with the object model to a SPContentType object after you get it from the parent web and before you call its Update method with the pushing down flag set. This means that no previous changes made to the parent content type with non-propagating updates will ever get propagated afterwards. A small example here: let’s consider two content types – A with field links a, b and c which is a parent of content type B with the same field links. If we remove field a from content type A and add field d to it and update it without the propagation option the fields of content type B will remain again a, b and c. If we then remove field b and add field e to A and update with pushing the changes, the fields of B will be a, c and e (in A we’ll have – c, d and e). So you see that the propagation doesn’t work en masse – you can’t force the exact copying of the parent content type to the child one – and this is not only when you’ve made modifications to the child content type, but also when you’ve made modifications to the parent with non-propagating updates. So in the case of the above example if you want to force the removing of field a and the adding of d in B you should use this rather awkward work-around – add again a and remove d from A with non-propagating update and then remove a and add again d with a propagating update. Actually with this kind of “trick” you can remove any field in the descendant content types even such that were never present in A (meaning such that the descendants never “inherited” from A when they were created).

Another example of the “looseness” of the content type “inheritance” is that there’re cases in which you can get the parent content type deleted and this won’t affect the child content type the least – when you check the Parent property of the child type you will see the closest existing ancestor of the type after the deleted former parent (if none more derived exists you will see the standard Item content type).

So what’s the verdict – the pushing down updates of site content types (they are applicable only for site content types since list content types cannot be inherited) though a rather good feature can’t ensure in all cases correct (in the sense of getting the exact fields or subset of fields in the descendants) propagation of the changes in the parent content type to its descendants. Even if you use the brute force approach of first deleting all fields in the parent content type and then add them again there will be cases when the child content type will have different fields in its schema. And actually this holds for scenarios when you have the creating content type feature unmodified and using another feature or tool to modify the content types and propagate the changes. When it comes for updating (reapplying) the creating feature it may get even worse – I will now explain why:

Activating the content type feature

This is actually about getting the content type creating feature activated repeatedly with the force flag set. In SharePoint 2010 there is the new Overwrite attribute for the ContentType feature element which somewhat mitigates some of these issues and I will describe it later. Here I will explain the common issues of reactivating a content type feature with modified content type definitions on both SharePoint 2007 and SharePoint 2010. So, let me quickly mention a big warning about this in MSDN – the recommendation is (was?) not to update the original content type definition files and use new ones if necessary (I guess this was the reason to introduce the Overwrite attribute in the first place).

Ok, so what happens to the ghosted site content types when you modify the creating feature and activate it again using the force parameter – well, they simply get updated (the same actually happens if you just modify the elements file and check the definitions after recycling the application pool – the difference with the feature reactivation is that with the latter the new content type definitions get also provisioned on the site). As you may expect it – the unghosted site content types (i.e. feature content types that were modified consequently with the object model) do not get modified in this case. And what happens to the list content types inheriting the feature site content types – at first glance – nothing – they seem exactly the same as before the update. But when you check the SPContentType object and specifically its FieldLinks collection you see some strange things – all new fields that were added to the content type definitions also appear here – and this is no matter whether the parent site content type is ghosted or unghosted. I called these field links in the terminology section – phantom field links. The problem with them is that they exist in the list content type definition but there are no matching list fields – on the surface nothing seems wrong – at least in the SharePoint UI. But when you try to add this field to the content type (and consequently to the list) – then nothing happens and it can be quite frustrating if you don’t know the reason for that – the system just detects that the field link is already in the content type definition (though you don’t see it in the UI since it’s not provisioned) and refuses to add it again. To resolve the problem you will need to add the field to the list directly or perform some complex equilibristic with adding/removing fields in the parent site content type and propagate the changes to the descendant.

The case with the phantom field links is quite worrying – you have your list content types (no matter whether created from ghosted or unghosted site content types or whether they themselves were further modified) and a simple change to the original creating feature manifest file adds changes to their schema – as to why the SharePoint designers introduced this “merging” schema logic is beyond my knowledge. Now the MSDN recommendation against modifying the feature element manifest seems quite justified (the Overwrite attribute quite nicely fixes the phantom field links issue and the problem with updating at least site content types – see below).

Deactivating the content type feature

I think there is some interesting information about the deactivation of the content type features: first you have two options here – to deactivate with and without the force parameter – yes, there is a difference here. When you don’t specify the force attribute then only the content types without inheritors (including inheriting list content types) get deleted. And when you specify the force parameter – all feature site content types get deleted. And what about the inheriting list content types – they remain intact, just their Parent property returns a higher ancestor or ultimately the standard Item content type. So, as I mentioned in the “inheritance” paragraph, the inheritance concept for content types is pretty loose. Actually when you activate the feature again and the site content types are created once again the parent relation is “fixed” to point to the original parent. And what happens with the phantom field links – well, it seems that they are quite persistent and won’t disappear unless you uninstall the creating feature from the farm – quite nasty stuff. As you see deactivating with the force parameter and activating the content type again you can get your site content types updated (actually overwritten since you will lose all you modifications to the unghosted site content types) – so at least this is some alternative to the Overwrite attribute for SharePoint 2007.

The Overwrite ContentType element attribute in SharePoint 2010

With all the aforementioned problems with feature site content types you probably may want to have a facility to create a content type using the object model so that it never relies on element manifest files in the file system. Well, this is what the Overwrite attribute actually does. The content type created with the Overwrite attribute can still be regarded as a feature content type since it will get removed when the feature is deactivated but in all other respects behaves as an OM content type (this is also the case with content types created with a sandboxed solution). Just to mention here that the SPContentType class in SharePoint 2010 now has a public constructor in which you can specify an exact SPContentTypeId – in SharePoint 2007 the content type id of new content types created with the object model is auto-generated (this in case you are interested in creating your own content types programmatically).

So, the new attribute allows you to update feature site content types without deactivating the creating feature and no matter whether the target content types were unghosted or not. Actually when you create a content type with the Overwrite attribute the content type will never be ghosted in the first place but this may be the case if initially the content type is created without the Overwrite attribute and you add it at some later moment when you want to update the feature. And another piece of good news here – the issue with the phantom field links in the inheriting list content types is also fixed – they are gone not only for content types initially created with the Overwrite attribute but for ones updated consequently with that attribute.

The conclusion about the new Overwrite attribute in SharePoint 2010 is that it does a great job for what it was intended – updating (overwriting) the site content types. It would have been a good idea if there were an option to have these updates propagated to the inheriting list content types but I think Microsoft had a good reason not to include that.

The problem with the update propagation actually remains and the updating of the site content types alone makes it even a little bit harder. This is because you end up in a situation with the the site content type and the inheriting list content type being different (with the site content type independently modified with the updated feature) so the inheritance propagation with its immediate changes mechanism will be hardly useful to get the job done. One possible but ugly solution that I already briefly mentioned is with deleting all field links of the site content type and then creating them again (possibly in a feature receiver). Since this will fail to handle removed fields in the modified content type definition (they will remain in the inheriting types) a design rule may be applied to all possible updates – to not delete fields from the content type definition but to mark them with the Hidden attribute.

And several words about my content type updates propagation solution – it will still be applicable for SharePoint 2010 but will still have all complexities of a custom solution for a group of issues for which we still won’t have a complete out-of-the-box solution.

Friday, November 20, 2009

SharePoint Search and HTML Meta tags

Using the standard HTML meta tags in your pages (including SharePoint ones) – e.g. Title, Keywords, Description, Robots, etc is quite common. Apart from these you can use custom meta tags (specifying custom values in the name attribute) to define (or extend) your custom meta data for the page. These can be crawled by search engines including the SharePoint search engine and can be used for various filtering and sorting purposes when using the search functionality. Normally in SharePoint the list item (publishing pages are list items) meta data is stored in the list items’ fields but there are cases in which storing meta data in HTML meta tags may have some advantages.

First off – HTML meta tags can be created relatively easy – you can place a custom control in your master page (pages) which outputs the meta tags in the HTML of your pages, an even better solution here is to use a delegate control. So, basically with a single control you ensure that all your pages contain your set of custom meta tags. A particularly good scenario for the meta tags created in this way is when you need them for meta properties whose values can be easily generated or calculated based on other meta data (like list fields) or certain criteria. For example these can be the URLs of the pages or the type of the parent web, etc. In the case of such calculable meta properties you can imagine the overhead of creating auxiliary list fields and maintaining them in all your page libraries compared to the neat approach of the custom web control that generates all meta data in a single piece of code.

A recent example of using meta tags with SharePoint search that I had was for a solution where I needed to display the rollup image of every page in the search results. The problem with the publishing rollup image (actually with the publishing image field type) is that it stores its value as HTML markup (a single HTML img element with several attributes) and the SharePoint crawler simply ignores all HTML markup. The result is that you have your image field containing the source of the image file but the search service just can’t get it for you. A possible solution that I have seen for this problem is to create an additional field that will be populated with just the path of the image file using a list item event receiver for updating it when the rollup image field gets changed. The thing is that implementing and maintaining this is not so easy and not that economically looking.

Here is a small code snippet to demonstrate a simple way to generate the meta tag HTML for the standard rollup image source attribute:

        protected override void OnLoad(EventArgs e)

        {

            base.OnLoad(e);

 

            string image = GetItemRollupImage();

            this.litMeta.Text = string.Format("<meta name=\"myrollupimage\" content=\"{0}\" />", Server.HtmlEncode(image));

        }

 

        protected string GetItemRollupImage()

        {

            if (SPContext.Current.ListItem != null)

            {

                ImageFieldValue imgValue = SPContext.Current.ListItem[FieldId.RollupImage] as ImageFieldValue;

                if (imgValue != null) return imgValue.ImageUrl;

            }

            return string.Empty;

        }

The crawl properties that the SharePoint search crawler creates for the custom HTML meta tags are in the Web crawled properties category – they are with the same name as the name of the custom meta tags – just uppercase. These can be created with code too – the ID of the propset for the meta tag crawled properties is d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1 (this is actually the ID of the meta tag Web crawled properties subcategory). The last step before you can use the meta tag crawled properties is to create managed properties mapped to them.