The search session was presented by two guys. One who was a self-professed search expert and one who loved the technology. Both were consultants.
They outlined some common challenges with search including:
- missing or poor metadata
- duplicate info
- siloed information
- repositories with unknown content
- incorrect security and retention
One of the key messages was that: Good stewards focus on deduplication of content. Duplicate identification by search requires a title match and the body content hash to be equivalent in the docs. If two docs are found that are duplicates that the search returns one result and notes that a duplicate was found.
The presenters gave a live demo of entity extraction, which reads all the content sources and all the word in those documents. You can build a hierarchial navigation structure by telling the search which terms to look for. They also mentioned the Hierarchial classifier, but didn't discuss it in detail.
The most important takeaway for me came out of this session: To address the fileshares that exist for years, you can define the entities structure in a text file and then tell it it crawl the fileshare and then it extract content into the structured buckets. And then you can navigate the content and organize it, apply metadata and retention classifications by buckets.
Another takeaway for me is to understand more about taxonomies. Another is that by typing # in the searmch box you get everything.
Also, you should Assume that the taxonomy is always wrong ehen the site is launched and to remember that Content creators and content consumers can have different taxonomies. Need to identify when to use a presentation taxonomy
Top tips for search
- Crawl as much high value content as you can - Prevents users from confusing poor ranking with failure to index.
- encourage natural hierarchies: nest less important sites, discourage flat structures.
- use natural language for metadata
- supply metadata
- get rid of junk
When to use a folder vs a document set: use folders when you don't need hierarchies, use doc sets when you can. Doc sets cut down the work to get good metadata, also facilitates workflow and retention.
Also, the guys in this session were excited about the content organizer and said one of the benefits was that users don't have to know where to store things anymore. Unfortuneately for micosoft i had spken to some consultants from Finland who had attended a session where the product manager for the content organizer said not to use it in an enterprise environment.
0 comments:
Post a Comment