Toby Geeson | Data Analyst
Introduction
This SQL project explores real-world Covid-19 data, in preparation for a seperate project to develop a dashboard into various aspects of the pandemic. The code here showcases a range of data analysis techniques, including selecting columns, performing calculations, ordering results, grouping by columns, joining tables, and creating views.
To begin the project, I obtained the dataset from https://ourworldindata.org/covid-deaths in CSV format. I then saved the information on deaths and vaccinations as two separate tables, saving each as distinct Excel Workbooks.
Next, I imported the data into Microsoft SQL Server Management Studio and created a new database named 'PortfolioProject', importing the workbooks. Finally, I began exploring the data.
What I discovered
It soon became apparent the importance of data cleaning and formatting in preparing the data for SQL queries. Throughout the project, I encountered challenges in dealing with various data types, null values, and inconsistent formatting, which led me to cast data types and filter out null values to ensure the accuracy of the results. This experience reaffirmed the importance of data preparation and organisation to obtain reliable insights from data analysis.
Here's a summary of the SQL code:
​​
-
The initial query selects data from the CovidDeaths table, sorting it by location and date.
-
The second query calculates the death percentage from Covid cases for each location, excluding rows with null continent values, and sorts the results by location and date.
-
The third query calculates the percentage of population infected with Covid for each location, sorting the results by location and date.
-
The fourth query shows the highest infection rate countries (max total cases) in relation to population, sorted by percentage of population infected.
-
The fifth query shows the highest death count for each location, excluding rows with null continent values, sorted by total death count.
-
The sixth query shows the continents with the highest death count per population, excluding rows with null continent values, sorted by total death count.
-
The seventh query calculates the global Covid cases and deaths, excluding rows with null continent values, and sorts the results by total cases and deaths.
-
The eighth query retrieves Covid deaths and vaccination data from two tables, calculates the cumulative sum of new vaccinations for each location, filters the data to only include rows with non-null continent values, and sorts the results by location and date.
-
The final query creates a view named PercentPopulationVaccinated, storing data for later visualisations, by joining data from the CovidDeaths and CovidVaccinations tables, and filtering the results to exclude rows with null continent values.
1) set up the DB



2) import tables
3) Start exploring

