![specialized search engine for rare media specialized search engine for rare media](https://www.oberlo.com/media/1603954985-image28.png)
The ranking algorithm used in FindZebra is the one from Indri, which, roughly speaking, ranks a document according to how frequent the query terms occur in that document. By restricting Google Search to the same domains used for FindZebra, relevant documents are retrieved in the top 20 search results in around only one-third of the cases. 3 This yields a total of roughly 33,400 documents (see Table 1 for details). For FindZebra, we selected ten sites with highly curated information on rare diseases that represent more than 90% of Orphanet’s list of about 7,000 rare diseases. Second, if one can focus the search on websites with high quality content on the specialized topic, then one can eliminate noise coming from the overwhelming amount of irrelevant documents indexed by Google.Ĭan we tell which of these two reasons is the most important? Yes, because Google’s advanced search option allows the user to specify which domains to search. For example, ranking according to page popularity may risk retrieving popular documents with only minor matches to the query.
![specialized search engine for rare media specialized search engine for rare media](https://www.maketecheasier.com/assets/uploads/2021/08/deep-web-search-engines-waybackmachine.jpg)
If a person conducting a search works within a field that generates a relatively small volume of queries, such as rare diseases, then Google’s ranking optimization might result in worse retrieval performance for this topic. Google Search optimizes the average retrieval performance (that is, how close to the top the chosen link appears). That a specialized search engine-tailored to a specific application domain-may still be superior can be explained by the following two points. In everyday life, we all experience Google's effectiveness at finding what we are looking for to such a degree that we take it for granted. However, it is known that it uses personalized information beyond the query, adjusts for page popularity (using PageRank) and has around 200 adjustable parameters that are optimized based on large-scale experimentation with users' queries (that is, monitoring whether users, on average, click on a link ranked closer to or further from the top after parameter adjustment).
![specialized search engine for rare media specialized search engine for rare media](https://www.wired.com/images_blogs/business/2009/06/collecta_swine.png)
The details of the Google Search ranking algorithm are not public, as this algorithm is central to Google’s business. The match between query terms and documents is computed using a query likelihood model that estimates the probability of the query being randomly sampled from a document model. The ranking algorithm used by FindZebra matches indexed medical resources, such as web pages and documents, to the query terms and retrieves a ranked list of the documents that best match the query. In the following, we discuss the shortcomings of Google Search for the task of searching for rare disease diagnostic hypotheses and the ingredients in FindZebra that make it more useful (in a statistical sense) for this task. Meanwhile, FindZebra, our specialized search engine, was able to retrieve relevant documents in around two-thirds of the cases. Documents associated with the correct diagnosis turned up among the first 20 Google Search results in roughly only one-third of the cases. In our recent study, 2 we queried these tools with a list of symptoms and patient information for cases in which the final diagnosis was known. Unfortunately, it turns out to be only partly true when used for medical diagnosis on rare diseases. So if the medical information that the physician is looking for is available online, then one would imagine that at least one of these would have indexed it and would be able to retrieve it. Google indexes (collects, parses and stores) web data more thoroughly than any other search engine, and PubMed provides the search interface to the largest database of medical abstracts in the world. 1 Google Search and PubMed are arguably the most popular web interfaces for physicians, although specialized resources are also widely used. When collaborating with physicians, one soon realizes that the web is an important resource for medical information. Web Search for Diagnoses: Making the Case for Specialized Search Engines