Title: | Extracts Covid-19 and other demographic metrics regarding U.S.A and Italy |
---|---|
Description: | Package with functions to scrape data regarding COVID-19 epidemic in U.S.A and Italy, as well as datasets with related indexes. |
Authors: | Claudio Zanettini |
Maintainer: | claudio_zanettini <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0 |
Built: | 2024-10-27 04:04:43 UTC |
Source: | https://github.com/marchionniLab/covid19census |
extracts and translates time series form the git repository of the protezione civile and combines them with other statistics related to italian population.
getit_all()
getit_all()
Data regarding COVID-19 comes form the repository of the protezione civile and it is updated daily. Age and sex of the population (2019), first aid and medical guard visits (2018), smoking status (2018), prevalence of chronic conditions (2018), annual-household income (2017) household crowding index (2018) and body-mass index were dataset collect by ISTAT. Prevalence of types of cancer patients (2016), influenza-vaccination coverage (2019) and the number of hospital beds per 1000 people (2017) were obtained from Ministero della Salute. Note that cancer patients prevalence was calculated using region population esitmates of 2019. Data of particulate 2.5 (2017) comes from the Istituto Superiore Per La protezione Ambientale.
a dataframe with following 64 variables:
date of data
state
region abbreviation
full name of region
lat
long
influenza vaccination coverage in the general population
influenza vaccination coverage in people age 65 or older
case-mortality rate for that region and that date (deaths/total_cases * 100)
number of COVID-19 positive cases detected
number of deaths
number of tests performed
number of people hospitalized with symptoms, that day
number of people in intensive care units, that day
hospitalized_with_symptoms + intensive_care_unit
number of people COVID-19 positive in home quarantine, that day
total currently positives: hospitalized_with_symptoms + intensive_care_unit + home_quarantine
change in the number of positive cases: total_positives that day - total_positives preceding day
number of new positive cases: total_cases that day - total_cases preceding day
recovered - released from hospital
number of people tested
number of people per squared meter living in the same house
total population
household crowding index (number of components of household per square meter)
density of population per squared kilometer
percent of females age 65 years old or more
percent of males age 65 years old or more
percent of population with that chronic condistion
percent of population with that type of cancer. Info regarding Trento and Bolzano were not present.
percent of people underweight, normalweight, overweight or obese. This is percent calculated over the total population even if the mesure has been taken only people 18 of age or more. This is the reason why their total is not 100
number of peple using first aid in 3 months preceding the survey
number of people using medical guard in 3 months preceding the survey
inpatient hospital beds per 1000 people in acure care
inpatient hospital beds per 1000 people in long care
inpatient hospital beds per 1000 people in rehabilitation
inpatient hospital beds per 1000 people, total
median net annual households income, in euros
emission of pm2.5 in tons per region, mean values 2000 to 2016
for details regarding the methodology of specific datasets check it_bweight
, it_cancer
,
it_chronic
, it_dem
, it_firstaid
, it_fl
, it_fl65
,
it_hospbed
, it_house
, it_pm2.5
extracts and translates time series form the git repository of the protezione civile
getit_covid()
getit_covid()
caveats and problems related the calculation by the Protezione Civile of some variables were rised by GIMBE Foundation. Unfortunately the page is in Italian... buona lettura!
a dataframe with following 19 variables:
in ISO 8601
format
state
region abbreviation
full name of region
lat
long
case-mortality rate for that region and that date (deaths/total_cases * 100)
number of COVID-19 positive cases detected
number of deaths
number of tests performed
number of people hospitalized with symptoms, that day
number of people in intensive care units, that day
hospitalized_with_symptoms + intensive_care_unit
number of people COVID-19 positive in home quarantine, that day
total currently positives: hospitalized_with_symptoms + intensive_care_unit + home_quarantine
change in the number of positive cases: total_positives that day - total_positives preceding day
number of new positive cases: total_cases that day - total_cases preceding day
recovered - released from hospital
number of people tested
extracts/joins COVID-19 info with other demographic metrics at the county level and tests and hospitalizations from the COVID Tracking Project
getus_all(repo = "jhu")
getus_all(repo = "jhu")
repo |
repository of COVID-19 data, one of |
For details regarding some specific datasets refer to: Subject Definitions of the American Community Survey, Medicare and Medicaid Medical Services Technical Documentation, COVIDExposureIndices
A dataframe. Data regarding the household composition, population sex, age, race, ancestry and poverty levels,
were scraped from the 2018 American Community Survey (ACS). Poverty was defined at the family level and not the household level in
the ACS. Medical conditions, tobacco use, cancer and, data relative to the number of medical and emergency visits
was obtained from the 2017 Mapping Medicare Disparities. From relative documentation listed in the source: "Prevalence rates are calculated
by searching for certain diagnosis codes in Medicare beneficiaries’ claims. The admission rate by admission type is the frequency of
a specific type of inpatient admission per 1,000 inpatient admissions in a year."
The number of hospital beds per county was calculated from data of the2020 Homeland Infrastructure Foundation.
Emissions of particulate 2.5 in micro g/m3 (2000-2016) and seasonal temperature (2000-2016) were reported by Atmoshpheric Composition Analysis Group and
aggregate by Ista Zahn and Ben Sabath.
The following list of variables is divided in sections COVID-19 VARS, HOUSEHOLDS MARITAL STATUS AND COMPOSITION, HOUSEHOLDS EDUCATION DEGREES,
ANCESTRY, COMPUTER OR INTERNET, POPULATION AND SEX, POPULATION AND RACE, MEDICAL AND VACCINES, POVERTY, ACTIVITY, POLLUTIONS AND TEMPERATURE, STATE LEVEL TESTS AND HOSPITALIZATIONS.
Note that data on test and hospitalizations are at the state level!
formatted ISO 8601
county
state
federal information processing standard, a unique numeric identifier of a county. Unknown fips are coded as 00000. Note that in the nyt repository a lot of deaths and confirmed cases are no categorized at the county level
urban or rural (see cenus)
—————
confirmed COVID-19 cases (cumulates with date)
number of deaths attributed to COVID-19
case-mortality rate (deaths / confirmed cases * 100)
—————
total number of households (occupy a housing unit) in that county. People not living in households are classified as living in group quarters
percent of households that are defined as family. A family consists of a householder and one or more other people living in the same household who are related to the householder by birth, marriage, or adoption
percent families with at least a child <= 18 years old
percent families consisting of married couples
percent families consisting of married couples at least a child 18 years old or less
percent of family with a male householder and no spouse of householder present
percent families with male householder and no spouse of householder present and with at least a child under 18 years old
percent families with female householder
percent families with female householder with at least a child under 18 years old
percent of non-family households. A family consists of a householder and one or more other people living in the same household who are related to the householder by birth, marriage, or adoption
percent of non-family households with householder living alone
percent of non-family households with householder living alone, age 65 years and older
percent of non-family households with one or more people under 18 years
percent of non-family households with with one or more people 65 years and older
total number of people that responded to the question regarding relationship
households including person married to and living with the householder
households including a son or daughter by birth, a stepchild, or adopted child of the householder
percent households including other relatives
percent households including foster children, not related to the householder by birth, marriage, or adoption
percent households containing members other than a “married-couple household” that includes a householder and an “unmarried partner.”
total males that responded to the marital status question
percent males never married
percent males married
percent of males separate
percent of males widowed
percent of males divorced
perent of female never married
perent of female married
perent of female separated
perent of female widowed
perent of female divorced
—————
total people enrolled in school
percent in preschool
percent in kindergarden
percent in elementary
percent in highschool
percent college
total number of people 25 years old or more that responded to the question regarding education (?)
percent that went up to 9th grade
percent that went up to 9th grade
percent with highschool
percent with some college
percent that obtaibed an associate degree
percent with bachelor
percent that graduated or with a professional degree
percent with bachelor or higher
—————
total population
percent estimated specific ancestry (27)
—————
total that own or use a computer
percent that owns or use computer
percet that has acces to internet
—————
total population
total male
total female
total population by age bin and sex
percent population by age bin and sex
median age in years
median age in years of males
median age in years of females
males per 100 females
the age dependency ratio is derived by dividing the combined under 18 and 65-more year populations by the 18-to-64 population and multiplying the result by 100
the old-age dependency ratio is derived by dividing the population 65 years and over by the 18-to-64 population and multiplying by 100
the child dependency ratio is calculated dividing the population under 18 years by the 18-to-64 population, and multiplying the result by 100
—————
total white
total black or afroamerican
total native
total asian
total hawaian and pacific islander
other races
two or more races
total hispanic or latino
—————
percentage of fee-for-service (FFS) Medicare enrollees that had an annual flu vaccination.
total number of hospital beds
percent medicare with at least a chronic condition
percent medicare with acute myocardial infarction
percent medicare with Alzheimer’s Disease, Related Disorders, or Senile Dementia
percent medicare with asthma
percent medicare with Atrial Fibrillation
percent medicare with Breast Cancer
percent medicare with Colorectal Cancer
percent medicare withLung Cancer
percent medicare with Cancer (breast, colorectal, lung, and/or prostate)
percent medicare with Chronic Obstructive Pulmonary Disease (COPD)
percent medicare with Chronic Kidney Disease
percent medicare with Depression
percent medicare beneficiaries with Diabetes
percent medicare beneficiaries with Hypertension
percent medicare beneficiaries with Ischemic Heart Disease
percent medicare beneficiaries with Obesity
percent medicare beneficiaries with Osteoporosis
percent medicare beneficiaries with Rheumatoid Arthritis
percent medicare beneficiaries with Schizophrenia/Other Psychotic Disorders
percent medicare beneficiaries with Stroke Transient Ischemic Attack
urgent care admission rate
number of annual wellness visits
elective admission rate
ER admission rate
other admission rates
percent pneumococcal vaccine
—————
number of people evaluated for poverty
total people that met the definition of below poverty level
percent people that met the definition of below poverty level
total people evaluated in that age bin
total people that met the definition of below poverty level in that age bin
percent people that met the definition of below poverty level in that age bin
total people evaluated for poverty in that sex
total people that met the definition of below poverty level in that sex
perc people that met the definition of below poverty level in that sex
total people evaluated for poverty in that race
total people that met the definition of below poverty level in that race
perc people that met the definition of below poverty level in that race
median household income
—————
activity index
—————
pm2.5 in micro g per m3
mean temperature in summer, %
mean humity in summer, mixing ratio
mean temperature in winter, K
mean humity in winter, %
—————
total cumulative positive test results
total cumulative negative test results
tests that have been submitted to a lab but no results have been reported yet
current people hospitalized
cumulative people hospitalized
current people in ICU
cumulative people in ICU
current people using ventilator
cumulativepeople using ventilator
total people recoverd
increase in deaths from day before
increase in hospitalization from day before
increase in negative results from day before
increase in positive results from day before
increase from the day before
Center for Medicare and Medicaid Services, Homeland Infrastructure Foundation-Level Data, American Community Survey tables, Mapping Medicare Disparities, COVIDExposureIndices, Atmoshpheric Composition Analysis Group
getus_covid
,getus_tests
, getus_dex
,
extracts time series from the git repository of the NYT or of the JHU
getus_covid(repo = "jhu")
getus_covid(repo = "jhu")
repo |
repository of COVID-19 data, one of |
cases
represents the number of confirmed cases, while cmr
the case-mortality rate (deaths / confirmed_case * 100).
A good description of pitfalls and caveats associated with the use of case-mortality rate metric has been made on
Our World in Data.
a dataframe
dat <- getus_covid(repo = "jhu")
dat <- getus_covid(repo = "jhu")
extracts DEX from the git repository of the COVID-19 exposure indeces
getus_dex()
getus_dex()
main metric is dex_a
. In the repository, they
explains: In the context of the ongoing pandemic, the DEX measure may be biased if devices sheltering-in-place
are not in the sample due to lack of movement. We report adjusted DEX values to help address this selection bias.
DEX-adjusted is computed assuming that the number of devices has not declined since the early-2020 peak
and that unobserved devices did not visit any commercial venues. Datataset is updated by the mantainers every weekend.
a dataframe
extracts information on tests, hospitalizations and other metrics at the State level maintained by the the COVID Tracking Project
getus_tests()
getus_tests()
a description of the variable can be found in the the COVID Tracking Project and when possible was used verbatim for the description below
in ISO 8601
format
state name
abbreviation
total cumulative positive test results
total cumulative negative test results
tests that have been submitted to a lab but no results have been reported yet
current people hospitalized
cumulative people hospitalized
current people in ICU
cumulative people in ICU
current people using ventilator
cumulative people using ventilator
total people recoverd
unique ID changed every time the data updates
date of the time we last visited their website
number of deaths
increase in deaths from day before
increase in hospitalization from day before
increase in negative results from day before
increase in positive results from day before
increase from the day before
Other details regarding the score system used are reported in the maintainers webpage.
Note for the use of some of some this variables by covidtracking authors:
States are currently reporting two fundamentally unlike statistics: current hospital/ICU admissions and cumulative hospitalizations/ICU admissions.
Across the country, this reporting is also sparse.
In short: it is impossible to assemble anything resembling the real statistics for hospitalizations,
ICU admissions, or ventilator usage across the United States. As a result, we will no longer provide
national-level summary hospitalizations, ICU admissions, or ventilator usage statistics on our site.
a dataframe with 15 variables
Body mass index in regions of Italy, in the general population. Data were collected in 2018 and indicate absolute number of people underweight, normalweight, overweight or obese.
data(it_bweight)
data(it_bweight)
An object of class tbl_df
(inherits from tbl
, data.frame
) with 21 rows and 5 columns.
Number of cancer patients in each region by type. Data were collected in 2016 and indicate absolute number of people diagnosed with cancer. Data for P.A. Trento and P.A. Bolzano are missing (but we have Trentino Alto Adige)
data(it_cancer)
data(it_cancer)
An object of class data.frame
with 21 rows and 10 columns.
a tibble
Number of people suffering of chronic conditions by region and type. Data were collected in 2018 and indicate absolute number of people.
data(it_chronic)
data(it_chronic)
An object of class tbl_df
(inherits from tbl
, data.frame
) with 21 rows and 14 columns.
a tibble
Percent of population by region, sex and age. Data were collected in 2019 and indicate absolute number of people. Long format,
data(it_dem)
data(it_dem)
An object of class tbl_df
(inherits from tbl
, data.frame
) with 4242 rows and 9 columns.
methodology The Istituto Superiore Sanita' provides biweekly info regarding the mortality in different age groups fro patients positive for COVID-19 in this link
a tibble
Number of people using first aid or medical guard in 3 months preceding the survey. Collected in 2018
data(it_firstaid)
data(it_firstaid)
An object of class tbl_df
(inherits from tbl
, data.frame
) with 21 rows and 3 columns.
a tibble
Influenza vaccination coverage in Italy in the general population from 1999 to 2019. Data are percent of region population
data(it_fl)
data(it_fl)
An object of class data.frame
with 21 rows and 21 columns.
Influenza vaccination coverage in Italy for 2018-2019 season for population age 65 or more from 1999 to 2019. Data are percent of region population
data(it_fl65)
data(it_fl65)
An object of class data.frame
with 22 rows and 21 columns.
a tibble with following columns:
region
percent of population age 65 or more that received influenza vaccination
percent of general population that received influenza vaccination
Inpatient hospital beds per 1000 people. Collected in 2017
data(it_hospbed)
data(it_hospbed)
An object of class tbl_df
(inherits from tbl
, data.frame
) with 21 rows and 5 columns.
a tibble in wide format in which bed_acute
, bed_long
, bed_rehab
, bed_tot
refers to acute care, long term care,
rehabilitation and total beds, respectivelly
Household crowding index from 2014 to 2018 in each region
data(it_house)
data(it_house)
An object of class tbl_df
(inherits from tbl
, data.frame
) with 105 rows and 3 columns.
a tibble in which phouse
is number of components of household per square meter
Median net annual households income (including imputed rents, in euros). Collected in 2017
data(it_netinc)
data(it_netinc)
An object of class tbl_df
(inherits from tbl
, data.frame
) with 21 rows and 2 columns.
a tibble
Emission of pm2.5 in tons per region from 1990 to 2017
data("it_pm2.5")
data("it_pm2.5")
An object of class tbl_df
(inherits from tbl
, data.frame
) with 21 rows and 2 columns.
a tibble
Istituto Superiore Per La protezione Ambientale
Area in square meters of each region. Used to calculate density per region. Scraped from old good wikipedia.
data(it_regions)
data(it_regions)
An object of class data.frame
with 21 rows and 2 columns.
a tibble
Number of people age 14 years and over that self-refer as smoker, non smoker, or past smoker by region and type. Data were collected in 2018 and are absolute number of people.
data(it_smoking)
data(it_smoking)
An object of class tbl_df
(inherits from tbl
, data.frame
) with 21 rows and 4 columns.
a tibble
Several metrics regarding household composition from the American Community Survey of 2018
data(us_acm_househ)
data(us_acm_househ)
An object of class tbl_df
(inherits from tbl
, data.frame
) with 3142 rows and 82 columns.
a tibble
American Community Survey tables
Sex and age composition of the county population from the American Community Survey of 2018
data(us_dem)
data(us_dem)
An object of class tbl_df
(inherits from tbl
, data.frame
) with 3220 rows and 120 columns.
a tibble
American Community Survey tables
Percentage of fee-for-service (FFS) Medicare enrollees that had an annual flu vaccination. Collected in 2019.
data(us_fl65)
data(us_fl65)
An object of class tbl_df
(inherits from tbl
, data.frame
) with 3220 rows and 4 columns.
Center for Medicare and Medicaid Services and NORC at the University of Chicago.
tibble wotj fl_65
indicating the percentage of fee-for-service (FFS) Medicare enrollees that had an annual flu vaccination
beds of each hospital by county (2019).
data(us_hospbeds)
data(us_hospbeds)
An object of class grouped_df
(inherits from tbl_df
, tbl
, data.frame
) with 2545 rows and 3 columns.
a tibble
Homeland Infrastructure Foundation-Level Data
Prevalence of many medical and chronic conditions, 2019. From relative documentation listed below: "Prevalence rates are calculated by searching for certain diagnosis codes in Medicare beneficiaries’ claims. The prevalence rate of a condition for a specific sub-population (e.g., all beneficiaries in a county) is the proportion of beneficiaries who are found to have the condition. The admission rate by admission type is the frequency of a specific type of inpatient admission per 1,000 inpatient admissions in a year."
data(us_mmd)
data(us_mmd)
An object of class data.frame
with 3235 rows and 33 columns.
Details regarding the use of the webtool can be found in the relative documentation. It includes prevalence of
Alzheimer
chronic kidney
obesity,
depression
obstructive pulmonary
disease
arthritis
diabetes
osteoporosis
asthma
atrial
fibrillation
ischemic hearth,
myocardial infarction
hypertension
several type of cancer
emergency, medical admissions, annual visits
pneumoccocal vaccine
tabacco use
a tibble
getus_all
for more details regarding the variables
Median Household income, 2018
data(us_netinc)
data(us_netinc)
An object of class tbl_df
(inherits from tbl
, data.frame
) with 3220 rows and 4 columns.
Subject Definitions of the American Community Survey
a tibble
American Community Survey tables
Emission of pm2.5 in micro g/m3 per county from 2000 to 2016
data(us_pm2.5)
data(us_pm2.5)
An object of class tbl_df
(inherits from tbl
, data.frame
) with 3176 rows and 2 columns.
a tibble
Atmoshpheric Composition Analysis Group, wxwk1993 processed data
Household living below the poverty level, divided by age and race and calculate as absolute value or percentage. American Community Survey of 2018
data(us_poverty)
data(us_poverty)
An object of class tbl_df
(inherits from tbl
, data.frame
) with 3220 rows and 63 columns.
Subject Definitions of the American Community Survey
a tibble
American Community Survey tables
Estimate population of each county by race. American Community Survey of 2018
data(us_race)
data(us_race)
An object of class tbl_df
(inherits from tbl
, data.frame
) with 3220 rows and 11 columns.
Subject Definitions of the American Community Survey
a tibble
American Community Survey tables
Seasonal temperature and humidity
data(us_season)
data(us_season)
An object of class tbl_df
(inherits from tbl
, data.frame
) with 3233 rows and 5 columns.
a tibble
Atmoshpheric Composition Analysis Group, wxwk1993 processed data