COVID-19 SAS Studio Project No.2: Visualising global trends in Johns Hopkins University data

Total views 1,356 

This is a project to read the daily Johns Hopkins COVID-19 data and visualise the national infection and fatality trends using Base SAS and SAS/STAT:

  1. Download the GitHub Desktop software from https://desktop.github.com/ and install it on your computer where you will be running SAS Studio or SAS University Edition. For instructions on how to install SAS University Edition on your own computer please read my blog post “Are you learning about SAS?”.
  2. Clone the Johns Hopkins COVID-19 data at https://github.com/CSSEGISandData/COVID-19, and then Pull the latest data, using the GitHub Desktop. This will reduce the time need to download all of the latest data each time you run the SAS Studio project, as a simple and quick Pull request in GitHub Desktop is all that is required each time.
  3. Download my SAS Studio CPF project file (John-Hopkins-GitHub-data.cpf), which is a zipped CPF file and will be updated occasionally with accepted submitted updates. Please check for comments here when updates are added.
  4. Open the CPF project file in SAS Studio (requires Base SAS and SAS/STAT) or SAS University Edition (making certain you have created a Shared Folder(s) first that are pointing to where your GitHub files and CPF project file are stored).
  5. Update the “run first” program to include your GitHub file folder in the &_dir macro variable assignment. The CSV files we will be using can be found in the /csse_covid_19_data/csse_covid_19_daily_reports folder.
  6. Submit each program in order given below (or submit all of the programs in the project’s flow together):
    • (1) “run first” assigns the location of the data to the &_dir macro variable.
    • (2) “Read CSV files” creates the SAS data sets in WORK by reading all of the CSV files in the csse_covid_19_daily_reports folder. Summarise the records by Country_Region to remove finer detail in the csse_covid_19_daily_reports.
    • (3) “Calculate regression lines” generates the regression lines for confirmed cases between 100 and 10,000, and deaths between 10 and 1,000, to include on the graphs. The regression lines appear to be straight in the semi-log plots, but are actually exponential to match the initial growth of confirmed cases, so that “flattening” of the curves can be identified more easily.
    • (4) “Semi-log plots of confirmed vs deaths” generates the graphs for countries where COVID-19 has had more than 1,000 confirmed cases or more than 100 deaths.

Some questions for you to answer:

    • (a) Where could my “Read CSV files” program be improved?
    • (b) Why is the US graph split at around 20Mar2020? Is this a problem with the data or my program?
    • (c) Are all of cases being included?

This project is open to SAS programmers and to researchers. Follow the above instructions yourself, and then see if you can improve my SAS code by answering the questions.

Please send your saved SAS Studio flow containing your improved versions of the SAS programs to phil@hollandnumerics.org.uk. Anyone providing improvements that can be incorporated will be added to the credits for this project.

My first COVID-19 SAS project for SAS Studio/SAS University Edition can be found at “Can you help? Supporting Coronavirus Research by searching research papers with SAS“.

If you are still looking for SAS training, then please go to my blog post “SAS training for home-workers: Keeping your mind active and your skills current” for some more training options.

COVID-19 can be defeated, and, working together, we can make a difference!