Improving the Search User Experience: Migrating from GSA
Improving the Search User Experience: Migrating from GSA to Fusion at the EPA Lucidworks Fusion 3.x and 4.x U.S. Environmental Protection Agency Notice: This presentation reflects the view of the author. U.S. Environmental Protection Agency
Google Search Appliance Used for 8 years Abrupt end of product and no replacement Inexpensive and easy to maintain Not configurable Not much customization U.S. Environmental Protection Agency Decided to Move Forward
Decided to move to a modern search engine More future proof Use AI capabilities Government mandate to use open source products (LucidWorks Fusion is value added SOLR) Lot of connectors including Sharepoint and Drupal. U.S. Environmental Protection Agency Open Source Options Set up two demos with limited content
Apache SOLR Elastic (Lucene-based, AWS & .Net implementation) EPA content is mostly unstructured, a strength of Apache Solr Clear that Apache Solr was a better fit U.S. Environmental Protection Agency We Had Problems Big giant learning curve EPA preference to use internal servers
Had a choice of 4.0 or 3.1 Picked older version because of stability. Still had a known bug that wasnt fixed. U.S. Environmental Protection Agency Why Lucidworks Lucidworks provided a framework Didnt start from very beginning Had tools built in Required less technical expertise to manage index than other choices
U.S. Environmental Protection Agency Current Search Engine History Intranet February July 2018 Fusion 3.8 Public Search July, 2018 Jan 5, 2019 Fusion 4.1 More features
Spell check Autocomplete FAQs Faceted search U.S. Environmental Protection Agency Testing Fusion for Release
Relevance Precision search engine returned more relevant than irrelevant documents, quality of results. Recall search engine returned all relevant documents, completeness of results. U.S. Environmental Protection Agency Testing Fusion for Release Query test set Top queries
Internal query reports and Google Analytics reports Some best bets (key match) from GSA Also used some queries from the long tail of queries. Conducted two types of tests Relevance testing depth of best document Precision testing evaluation of top results U.S. Environmental Protection Agency Relevance Testing
Automated test with quantitative results we could compare to GSA. Already had identified best documents/URLs for top queries = best bets from the GSA. U.S. Environmental Protection Agency Relevance Testing Different Boost Configurations Ran iterations of test with different boost configurations
Title Description Keywords Mimetype boosts - html pages, web area home pages Signals
U.S. Environmental Protection Agency Relevance Testing Results U.S. Environmental Protection Agency Precision Testing After identifying best boost configurations based on depth of best document test results, we did a couple iterations of precision testing
Non-automated U.S. Environmental Protection Agency Precision Testing Used scaled-down version of depth of best document test set Librarians analyzed top 5 results Rated results according to relevance: relevant, near relevant, irrelevant, misplaced. Assigned number according to how many fell into each category
U.S. Environmental Protection Agency Precision Testing Results U.S. Environmental Protection Agency Limited Stakeholder Testing Stakeholder testing Collected and evaluated general comments/impressions of search results using queries of testers choice.
Limited testing due to release date U.S. Environmental Protection Agency Entity Extraction Why? Multiple platforms Different representations for same semantic value Entity extraction combines Explicit metadata coded in document metatags
Derived metadata from context/location Used for Faceted search Results filtering U.S. Environmental Protection Agency Sample Explicit Metadata
U.S. Environmental Protection Agency Sample Derived Metadata Rules
Rules to exclude kids content from professional and regulatory content exclusive DC.title all these wordskids specialcollection_s Professional exclusive specialcollection_s Professional U.S. Environmental Protection Agency URL all these wordskids Entity Extraction with GSA Regular expression (pattern) based
Two entity dictionary flavors, metadata/URL Joined by metadictionaries Glaring deficiencies No NOT operator No AND/OR or group syntax Required reindex when dictionary changed U.S. Environmental Protection Agency Fusion Entity Extraction How? Wrote our own
Regular expression based (PCRE) Support for exclusions (NOT), groups, AND/OR, multiple comparison operators Examples exclusive URL phrase www3.epa.gov/recyclecity specialcollection_s Professional Inclusive WebArea Impaired Waters
specialcollection_s Water U.S. Environmental Protection Agency Entity Extraction - Next Look for Fusion entity extraction tool Completely rethink how we look at organizing search results U.S. Environmental Protection Agency Filtering Search Results Using Metadata
EPA has hundreds of websites Content owners can code HTML search forms to filter their results Example search form code
Creates this query: (WebArea=Imparied Water OR WebArea=Waters of the US) AND DocType=pdf U.S. Environmental Protection Agency Best Bets - Why Google Search Appliance Terminology Librarian curated best URLs for queries Fusion Landing pages didnt satisfy need Signals tend to train Fusion to boost these documents over time
Periodic review The Best Bet is removed if in the top 5 without it U.S. Environmental Protection Agency Public Search Released Jan 5, 2019
Still adding our features Future goals Improve search with the new tools and less custom code Use AI to improve search Use connectors to add more EPA content into the index U.S. Environmental Protection Agency Improving the user experience We review our combined google analytics/Foresee survey
data quarterly We review our top 50 search terms monthly U.S. Environmental Protection Agency Summary It was a hard transition We are happy with the product and the advantages that it will give us in the future U.S. Environmental Protection Agency
Questions? Contact Judy Dew Information Management Specialist USEPA, Office of Mission Support [email protected] U.S. Environmental Protection Agency
Implications. The identification and consistent delivery of such attributes are central to the ability of any educational service to create and sustain a differential value proposition in the minds of its students.. HE managers should be more concerned with the...
Fraud the Uninsurable Risk. The Workers' Compensation Fraud Program was established in 1991. The legislature made workers' compensation fraud a felony, required insurers to report suspected fraud, and established a mechanism for funding enforcement and prosecution activities.
Arial Comic Sans MS Franklin Gothic Book Papyrus Coronet Default Design B.C. Roy Memorial Library Indian Institute of Management Calcutta Orientation Program for 47h PGDM, 17th PGDCM and 2010 Fellow Program Students Welcome Our Location Working Hours Important Contact Your...
Fibrous tissue is produced, which then contracts, pulling the esophageal wall outwards. This diverticulum consists of all 4 layers of the esophagus. Small esophageal diverticula may be subclinical. Large esophageal diverticula allow food to become trapped in the pouch leading...
Miss Honey gets to move back into her old house and is happy. One day, Matilda's parents said they were moving to Spain. They needed to run away because Mr. Wormwood was a crook and in trouble with the law.
Ready to download the document? Go ahead and hit continue!