DIAアカウントをお持ちの場合、サインインしてください。

サインイン

ユーザーIDをお忘れですか? or パスワードをお忘れですか?

メニュー 戻る Poster-Presentations-Details

P211: Automated Similarity Search for Enhanced Content Retrieval in Scientific Communications





Poster Presenter

      Joseph Laudano

      • Medical Director
      • IQVIA
        United States

Objectives

To reduce subjectivity and enhance information retrieval by automating the creation of search strategies for the development of Scientific Communications or publication manuscripts.

Method

We have automated the development of search strategies with increased retrieval of relevant documents compared to those obtained by Boolean search strategy (BSS), increasing information yield for the preparation of scientific communications by Medical Affairs Professionals.

Results

We tested the hypothesis that our method, Similarity Search Retrieval (SSR), would yield additional relevant documents that were not retrieved by the BSS method. For this test case, we selected the topic “gut microbiome in Parkinson’s disease”. The test corpus used was the IQVIA Insightmeme database (2012 to 2022) which consists of over 26MM conference and PubMed abstracts. A BSS with high precision and low recall was created by inserting the “and” operator between the microbiome terms and the wildcard form of Parkinson: ("microbiome" or "dysbiosis" or "microbiota" or "alterations in the gut" or "alterations in the flora") and ("Parkinson*"). To create the SSR, the abstract texts of the first and last authors most frequently occurring in the BSS results were parsed into individual terms. 73 terms with the highest TF-IDF scores were classified as microbiome or neurologically related and were assembled to form the SSR. 35 terms found not to be specific to either topic were not included in the SSR to prevent the retrieval of irrelevant documents. The SSR used TF-IDF scores as term-weights in the calculation of document rank according to the Jaccard index method. The terms in the SSR were connected only by the “or” operator, increasing retrieval relative to the BSS. Using SSR, a total of 3,430 abstracts were retrieved, including all 1,619 obtained via BSS. Of the additional 1811 abstracts found by SSR, 172 were focused on the role of microbiome in Parkinson’s disease thanks to additional microbiome terms identified via the SSR. The remaining 1639 were focused on the role of the microbiome in neurologic diseases other than Parkinson’s, validating our hypothesis that additional and relevant documents were retrieved by SSR compared to the BSS method.

Conclusion

The literature research and review process behind various types of scientific communications developed by Medical Affairs Professionals can be a labor intensive, subjective and incomplete process, lacking precision and document ranking for analysis. In an effort to improve this task, we developed and tested SSR, a statistical, linguistic approach for the automated generation of search expressions that retrieve relevant documents ranked by similarity to a model set of documents. In this study, the model set of documents was obtained using a simple Boolean expression with content retrieved as the source of terminology used to build the SSR search strategy. A key finding of our study was that the content retrieved by BSS can be an additional source of search terms employed in SSR to provide additional breadth to the retrieval (scope and quantity). The reduced time and effort to write very simple Boolean queries is afforded by the automated discovery and selection of search terms. This would be expected to improve the efficiency of the content development process for scientific communications, publication manuscripts, white papers etc., illustrating the central virtue of this methodology.

最新情報や機会を逃さないで

DIAのメールを購読すれば、常に最新の業界情報やイベント情報を得ることができます。