I regularly visit LinkedIn and read the discussions, where they exist. What amazes me are the responses to job posts which just have “Interested”, a name, or an email address. Those who post this type of response are only boosting LinkedIn’s count of “engagements”, not their chances of landing that job.
No-one will give you a job. You have to earn it!
It is true that doing well in a job interview it usually critical. However, in order to do well in the interview, you have to be offered one. A good CV/resume will help, but, even before you send that to an employer, there are ways to improve your chances of being offered an interview:
Are you already known to the employer? Do you know anyone that already works there?
Are you actively participating in LinkedIn or other groups? By participating I don’t mean posting “Interested”, but asking insightful questions or offering helpful answers, which demonstrate your subject knowledge.
Have you presented in conferences or webinars? Presenting papers can be difficult at first, but by researching your topics thoroughly in advance, and incorporating your research into your presentations, you will gain confidence and show that you can explain your subject knowledge clearly to others. You will also find the questions asked afterwards to be easier to answer, or could lead you to research a new topic for a future presentation.
Getting a job should never be easy. It requires you to put in that extra little bit of effort, so that you can stand out amongst the candidates, and the employers see the potential in you and also see the benefits you can bring to their company.
Kaggle are running a competition to develop a Python or R application to filter the vast collection of medical research papers that are being published every day.
The CORD-19 dataset represents the most extensive machine-readable coronavirus literature collection available for data mining to date. This allows the worldwide AI research community the opportunity to apply text and data mining approaches to find answers to questions within, and connect insights across, this content in support of the ongoing COVID-19 response efforts worldwide. There is a growing urgency for these approaches because of the rapid increase in coronavirus literature, making it difficult for the medical community to keep up.
Many of these questions are suitable for text mining, and they are encouraging researchers to develop text mining tools to provide insights on these questions.
This dataset was created by the Allen Institute for AI in partnership with the Chan Zuckerberg Initiative, Georgetown University’s Center for Security and Emerging Technology, Microsoft Research, and the National Library of Medicine – National Institutes of Health, in coordination with The White House Office of Science and Technology Policy.
I am not a Python or R programmer, but a SAS programmer, so I decided to make use of the freely available dataset and try to develop a simple data mining application in SAS instead, which I would like to publish to the benefit of the fight against COVID-19. I have now created a basic framework, which I am opening up to the SAS-programming community to test, improve and enhance as a saved SAS Studio flow (*.cpf), which can be imported into a Single-User SAS Studio installation, or into a SAS University Edition installed on a PC (see my blog post “Are you learning about SAS?” for details about how to install this version of SAS), as these installations can directly access files on your own computer.
The SAS programs in the SAS Studio flow are as follows:
run first:
Assigns the location of the downloaded Kaggle dataset into &_dir.
This macro variable will need to be edited to match the location of the folder where you have downloaded and extracted the CORD-19 dataset. SAS University Edition users will also need to assign a shared folder pointing to this location.
Includes %create_json_extract_script used in the Read json xxx programs below to read the JSON files containing the selected research papers into SAS, and then print out the contents of the paper.
If more non-printable characters are present that are not catered for in this macro, then additional TRANWRD() statements will need to be added here.
If anyone can devise a more elegant solution than using multiple TRANWRD() statements to convert unicode (\u9999) strings to printable 8-bit ASCII values, then I will welcome tested suggestions.
Read metadata:
Reads the CSV file containing the metadata about the research papers, including the abstract, which can be searched, and the location(s) of the related paper(s). SAS data set created = work.metadata.
Filter metadata 31Dec19:
Filters the extracted metadata to only include papers published in or after December 2019. SAS data set created = work.Since_31Dec19.
Filter metadata xxx:
Filters the SAS data set (work.Since_31Dec19) created by Filter metadata 31Dec19 to select paper abstracts containing specific particular keywords. SAS data set created = work.Since_31Dec19_xxx:
xxx=infect: WHERE INDEX(lowcase(abstract), ‘infect‘) AND INDEX(lowcase(abstract), ‘rate’) AND INDEX(lowcase(abstract), ‘age’) AND (INDEX(lowcase(abstract), ‘hcov’) OR INDEX(lowcase(abstract), ‘-cov’) OR INDEX(lowcase(abstract), ‘covid’));
xxx=cured: WHERE INDEX(lowcase(abstract), ‘cured‘) AND INDEX(lowcase(abstract), ‘rate’) AND INDEX(lowcase(abstract), ‘age’) AND (INDEX(lowcase(abstract), ‘hcov’) OR INDEX(lowcase(abstract), ‘-cov’) OR INDEX(lowcase(abstract), ‘covid’));
xxx=fatal: WHERE INDEX(lowcase(abstract), ‘fatal‘) AND INDEX(lowcase(abstract), ‘rate’) AND INDEX(lowcase(abstract), ‘age’) AND (INDEX(lowcase(abstract), ‘hcov’) OR INDEX(lowcase(abstract), ‘-cov’) OR INDEX(lowcase(abstract), ‘covid’));
xxx=recover: WHERE INDEX(lowcase(abstract), ‘recover‘) AND INDEX(lowcase(abstract), ‘rate’) AND INDEX(lowcase(abstract), ‘age’) AND (INDEX(lowcase(abstract), ‘hcov’) OR INDEX(lowcase(abstract), ‘-cov’) OR INDEX(lowcase(abstract), ‘covid’));
Read json xxx:
Prints all of the papers in the filtered abstracts to HTML using the metadata in the SAS data set (work.Since_31Dec19_xxx) created by Filter metadata xxx.
This project is open both to SAS programmers and to researchers. Please download the CORD-19 dataset and my SAS Studio flow. Try it out yourself, and then see if you can improve the performance, usability, flexibility or maintenance of my SAS code.
Please send your saved SAS Studio flow containing your improved versions of the SAS programs to phil@hollandnumerics.org.uk. Anyone providing improvements that can be incorporated will be added to the credits for this project.