Syntax COVID-19 Analysis

The syntax analysis included 14057 abstracts from 26980 published articles. Last update in 2020-06-01 by LitCovid. Most of them were Journal Article (60.8%), Letter (17.09%), Editorial (6.84%), Review (6.51%) and Comment (2.03%).

Graphic 1. Daylly (red) and cumulative (green) publication about COVID-19


United States (43.59%), United Kingdom (16.63%), China (11.25%), Italy (5.71%) and Spain (5.35%) were the main source of scientific literature. About 82.53% of articles analyzed came from these five countries.

Map 1. COVID-19 literature source.


PlatCOVID performed 4 descriptive syntax analysis in these abstracts:
(1) Word atomization all abstracts
(2) Categorization based on word atomization
(3) Word atomization of each category
(4) Sentece atomization and Human Literature Curation



Word Atomization Overview

Using the atomization process, 75368 words/terms were found. 7899 commom words were execluded, remaining 67469 words. The table bellow shows the top 10 terms. All words are availible at supplemantary informations.


Box 1. Top 10 Words cited in Abstracts in COVID-19 literature.

Word Frequency
disease 11738
pandemic 10117
health 9418
infection 7760
clinical 7598
respiratory 6821
severe 6767
care 6453
during 6242
risk 5311


Our analysis suggests that the scientific focus, until now, has been to summarize the main clinical symptoms of COVID-19 (terms: respiratory, clinical, severe, acute, pneumonia, syndrome and symptoms, fever, chest and lung). It is also possible to infer that many articles were driven to describe the virus spreading (terms: novel, severe, virus, outbreak epidemic and spread). The other scientific efforts discussed were about the transmission, prevention, treatment, health care management and diagnosis of SARS-CoV-2 and COVID-19.



Categorization Process: The 5 classes of Science Interest

Based on global words tokenization/atomization from abstracts, we categorized the COVID-19 studies in five categories: (1) clinical & signs & symptoms, (2) epidemiology, (3) transmission, (4) treatment and (5) diagnosis (Fluxogram 1). The categorization process used the Mesh and DeCS terms list.

Fluxogram 1. Workflow of categorization. Click on the square to follow the information.

28 articles fit all categories. The articles acess on PMIDs: 32278065, 32112886, 32447742, 32397688, 32362969, 32347772, 32317810, 32499983, 32271601, 32228809, 32220177, 32185921, 32183901, 32145185, 32086938, 32442720, 32442265, 32357503, 32300673, 32591667, 32584236, 32565599, 32534188, 32532933, 32506768, 32498762, 32475877, 32297723.

Venn 1. Categorizations of abstracts.



Word Atomization of Categories

Then, we peformed the words atomization from abstracts of each categories. Acess to view all words atomization report in each category.

Box 2. Top 10 Words/terms atomization of each category.

Diagnose (n) Treatment (n) Epidemiology (n) Transmission (n) Signs (n)
disease (776) treatment (1935) disease (377) transmission (991) disease (3192)
diagnosis (700) disease (1660) clinical (308) disease (616) clinical (2864)
clinical (649) clinical (1366) health (264) infection (520) pandemic (2599)
infection (566) pandemic (1256) infection (241) health (519) health (2288)
pandemic (548) severe (1116) epidemiological (236) pandemic (455) infection (2118)
respiratory (460) infection (1069) pandemic (235) respiratory (400) severe (1954)
treatment (458) care (958) respiratory (203) during (390) respiratory (1838)
severe (444) respiratory (920) severe (197) virus (382) during (1713)
study (415) health (879) study (188) risk (380) care (1668)
during (406) during (812) risk (145) study (282) study (1663)

Sentece Tokenization

Finally, we peformed the tokenization sentece process from abstracts of each categories.
Frist, we colect the last 4 sentence of each abstract, assumed as the conclusion of the work, using pubmed.mineR. Around 5.622810^{4} conclusion sentences were achivied.
Second, we extract the sentece context of each category term, previously used, by tokenizer. 1923, 3839, 410, 1806 and 5406 senteces were retrivied, about diagnosis, treatment, epidemiology, transmission and clinical, sings and symptoms, respectivelly. Articles with no context sentence were excluded.
Third, we began the human curation process (Fluxogram 2):

Fluxogram 2. Human curation process from PlatCOVID based on 5 categories. Click on the square to follow the information.