COVID-19 SAS Studio Project No.2: Visualising global trends in Johns Hopkins University data

This is a project to read the daily Johns Hopkins COVID-19 data and visualise the national infection and fatality trends using Base SAS and SAS/STAT:

Download the GitHub Desktop software from https://desktop.github.com/ and install it on your computer where you will be running SAS Studio or SAS University Edition. For instructions on how to install SAS University Edition on your own computer please read my blog post “Are you learning about SAS?”.
Clone the Johns Hopkins COVID-19 data at https://github.com/CSSEGISandData/COVID-19, and then Pull the latest data, using the GitHub Desktop. This will reduce the time need to download all of the latest data each time you run the SAS Studio project, as a simple and quick Pull request in GitHub Desktop is all that is required each time.
Download my SAS Studio CPF project file (John-Hopkins-GitHub-data.cpf), which is a zipped CPF file and will be updated occasionally with accepted submitted updates. Please check for comments here when updates are added.
Open the CPF project file in SAS Studio (requires Base SAS and SAS/STAT) or SAS University Edition (making certain you have created a Shared Folder(s) first that are pointing to where your GitHub files and CPF project file are stored).
Update the “run first” program to include your GitHub file folder in the &_dir macro variable assignment. The CSV files we will be using can be found in the /csse_covid_19_data/csse_covid_19_daily_reports folder.
Submit each program in order given below (or submit all of the programs in the project’s flow together):

- (1) “run first” assigns the location of the data to the &_dir macro variable.
- (2) “Read CSV files” creates the SAS data sets in WORK by reading all of the CSV files in the csse_covid_19_daily_reports folder. Summarise the records by Country_Region to remove finer detail in the csse_covid_19_daily_reports.
- (3) “Calculate regression lines” generates the regression lines for confirmed cases between 100 and 10,000, and deaths between 10 and 1,000, to include on the graphs. The regression lines appear to be straight in the semi-log plots, but are actually exponential to match the initial growth of confirmed cases, so that “flattening” of the curves can be identified more easily.
- (4) “Semi-log plots of confirmed vs deaths” generates the graphs for countries where COVID-19 has had more than 1,000 confirmed cases or more than 100 deaths.

Some questions for you to answer:

- (a) Where could my “Read CSV files” program be improved?
- (b) Why is the US graph split at around 20Mar2020? Is this a problem with the data or my program?
- (c) Are all of cases being included?

This project is open to SAS programmers and to researchers. Follow the above instructions yourself, and then see if you can improve my SAS code by answering the questions.

Please send your saved SAS Studio flow containing your improved versions of the SAS programs to phil@hollandnumerics.org.uk. Anyone providing improvements that can be incorporated will be added to the credits for this project.

My first COVID-19 SAS project for SAS Studio/SAS University Edition can be found at “Can you help? Supporting Coronavirus Research by searching research papers with SAS“.

If you are still looking for SAS training, then please go to my blog post “SAS training for home-workers: Keeping your mind active and your skills current” for some more training options.

COVID-19 can be defeated, and, working together, we can make a difference!

3 thoughts on “COVID-19 SAS Studio Project No.2: Visualising global trends in Johns Hopkins University data”

Philip Holland says:

24Apr2020 at 1000
Recent data updates have highlighted some more updates required to the programs:
- Some country_region values are different for the same country, for example Mainland China and China, so a format (and a standardized variable?) may be required.
- Death_rate is currently calculated as confirmed/deaths (confirmed per death), but a more useful calculation would be 1000*deaths/confirmed (deaths per 1000 confirmed), as it would indicate a measurement of the hospital treatment.
Philip Holland says:

28Apr2020 at 1356
The SAS Studio project file has been updated, and will contain the following changes when next downloaded:
- The name has been corrected to Johns-Hopkins-GitHub-data.cpf.
- The country_region values have been standardized using the $_country format in “run first”, but the original data values can still be found in _country_region.
- The calculation of death_rate is now 1000*deaths/confirmed, i.e. Deaths per 1,000 confirmed cases.
Philip Holland says:

18May2020 at 1155

The SAS Studio project file now contains a new branch, which calculates the doubling time of confirmed cases of COVID-19 by country. This is based on the SAS code in the SAS Blogs post by Rick Wicklin: “Estimates of doubling time for exponential growth”.

You must be logged in to post a comment.

COVID-19 SAS Studio Project No.2: Visualising global trends in Johns Hopkins University data

Like this:

Related

Published by

Philip Holland

3 thoughts on “COVID-19 SAS Studio Project No.2: Visualising global trends in Johns Hopkins University data”

Share this:

Like this:

Related

Published by

Philip Holland

3 thoughts on “COVID-19 SAS Studio Project No.2: Visualising global trends in Johns Hopkins University data”