Making sense of the world with data

6 minute read

Data science is a very powerful tool that can guide our understanding of real world events. In times of crisis, we may get confused and emotional. Data science gives us a lense to be objective. It introduces concrete solutions to concrete material conditions. It is a science after all which follows sound logic, is repeatable, and insightful.

This article is to give you an idea of the data science process in conducting exploratory data analysis (EDA). The context is using the nCov data from DOH (found here COVID-19 Philippines) and population data from PSA (found here Total population by age group, sex, region).

This is all done in Python using a Jupyter Notebook with the basic tools of data wrangling, Pandas, Numpy, and Matplotlib.

Data from DOH is in a spreadsheet format so when you import it please make sure you have the proper Google APIs enabled.

We will not try to predict anything. We should leave it to the epidemiologists and experts whose been doing it for their entire careers. I will link some of their works below. I will also list down the efforts in the Philippines using data science to help with the Corona Crisis.

Importing data

First I used the libraries gspread and oauth2client to access the spreadsheet. You can also download the spreadsheet as a CSV and import using the standard pd.read_csv. The advantage of the method that I used is that everytime I run the code, it will fetch the latest numbers from the spreadsheet as compared to downloading the CSV every single time for an updated dataset.

1

Here are the columns indicated in the spreadsheet.

2

Asking questions of the data

Suppose that we are interested with the running total of cases in the Philippines as of March 31, 2020, we can run:

3

Now, the goal of our EDA is to breakdown this number. Let us look first at the binned age distribution.

4

The average age of nCov positive patients is around 54. The distribution is near symmetric at the 60 age mark. This is interesting to see because the population of the Philippines is young. In fact, 89.35% of Filipinos are aged 55 and below. This tells us that there is indeed a significant number of older people who gets infected by the corona virus.

Now let’s look at gender,

gender

We see that around 62% are male. We can only infer certain reasons to this. Maybe males are more exposed due to the nature of their jobs or maybe it can also be attributed to sanitation and personal hygene. We cannot claim this entirely and we need more factors and indicators to prove this.

It’s easier to interpret if gender and age group and age group are visualized together.

mix

Then again, people in statistics have a better way of doing this through a pyramid plot. In the case of the Philippines,

pyramid

The first three cases of the virus are from Chinese nationals.

china

It might be interesting to see if other nationalities have been infected with the corona virus inside the PH as well.

nation

Indeed, we see that that is the case. Although majority of the cases are now Filipinos there are foreign nationals who have been tested positive inside the Philippines.

Given these insights, that corona crisis started from foreign sources, it is logical for us to ask,

How did our domestic cases acquired Corona?

travel

We see that around 72% of those who tested positive didn’t have recen relevant travel history. This confirms the statement of the experts that community transmission is already happening.

Interestingly, in the study of disease spread, epidemiologists always factor in links. You will have a high likelihood of contracting the disease if you interacted someone who has been infected and is a carrier of the virus. Plotting the patient links in our data set,

link

An alarming 97% has no link to previous patients. This tells us that the current contact tracing is not efficient as we are not able to monitor the connections of contractions. (Note: This number includes those that are tagged as for validation so definitely this is an overestimate. But given the weight of the matter (and the clock is ticking), an overestimate is better than an underestimate as we are not already able to confirm information in real time.)

Most of the locations of the patients are concentrated in Metro Manila.

loc

But it is starting to trickle down to the provinces.

Evolutions through time

On a more optimistic note, we see the count of new cases decreasing as of the week of this posting,

new_cases

For us to see the pattern better, we can look at the cumulative count of cases,

cummul1

This tells us that although the numbers are increasing, the rate at which they increase are losing momentum which is a good indication.

We can also look at the number of deaths and recoveries per day,

rec died

Again cummulatively,

rec died cummul

Since, the Philippines is still at the onset of its experience with the corona crisis relative to China and South Korea, the number of deaths per day are more significant compared to the number of recoveries per day. This is expected as this pattern was also evident to the case of China and South Korea. We should expect however the effect of the enhanced community quarantine to reflect on the numbers after one month.

Given this scenario, the focus of the officials should be already on containing those who tested positive and empowering frontline workers by providing support and resources. We see some evidence as well that the enhance community quarantine has already reduced the number of cases in the PH. Also we need to do more tests!

What’s next?

We just imported and took a look at what’s in the PH COVID-19 data set using python. That was a quick tour of the basic EDA process that we do in Data Science. I strongly recommend you peruse the documentation of the libraries we used, learn about all of the other handy functions that these tool provides, and explore further questions that you have for the data set.

You can also apply this process to other future projects that you plan to do.

Call to Action

If you want to help out or want to access some of the efforts using tech and data science to help in the corona crisis in the Philippines, check this out:

A think tank for open source solutions and hacks to help each other survive the current pandemic Repository of resources and survival tips for COVID-19 lockdown Like-minded friends only. Absolutely no trolling & wise-ass remarks. Propose and share your projects and helpful resources. Support all your posts with photos, videos and/ links if possible.
Let's connect the FRONTLINERS to VOLUNTEERS and DONORS. Please SHARE this group so we can send help to those in need.
Let’s help each other know which businesses are open* within our local communities during the quarantine. An establishment is open (bukas) if it is operational during the quarantine. This does not pertain to opening hours.
With its corresponding dashboard from DOH.
Produced by Grab Philippines to help citizens locate stores for food, essentials, and services.
Locating what establishements have delivery services.
List of hospitals and their calls for donations.
Look for nearby stores, pharmacies, banks, and remittance centers remain operational during the Covid-19 Enhanced Community Quarantine period.
Help out daily wage workers affected by the COVID-19 quarantine.
  • [Project Moses PH]](http://www.projectmoses.ph/apps)
Portal for latest information regarding COVID-19 in the Philippines
In response to the ongoing public health, transportation and food crises, the Computer Professionals' Union is opening to the public the COVID19 PH REPORT, a web-based geographical information system application. Anyone may report on community initiatives and the National Government's and LGUs' implementations of policies to address the aforementioned crises, as well as any incident related to the enhanced community quarantine. We hope to gather as much information through this tool to sufficiently be able to provide a situationer which the general public can use to make informed decisions and actions in these trying times.

Questions? Contact me via LinkedIn. I’m also on GitHub with the username albertyumol.