top of page
Fondo lineal abstracto

ETLWorld Bank Data ETL Process 

Project Overview:
This project focuses on extracting, transforming, and loading (ETL) data from the World Bank's Health, Nutrition, and Population Statistics. The objective is to display and store only the required information for selected countries and series in a SQL database, making it accessible for future use.

What Was Done:
1. Extracted data from an Excel file downloaded from the World Bank website.
2. Transformed the data to focus on specific countries and series, making it easier to analyze.
3. Loaded the cleaned and structured data into a new Excel file, ready for future database storage.


How It Was Done:


1. Data Extraction:
- Downloaded the data from the [World Bank Health, Nutrition, and Population Statistics](https://databank.worldbank.org/source/health-nutrition-and-population-statistics/Type/TABLE/preview/on#) into an Excel file.
- Read the Excel file into a pandas DataFrame for processing.


2. Data Transformation:
- Filtered the data to include only the relevant series (e.g., unemployment rates) and countries (e.g., Argentina, Armenia).
- Excluded unnecessary columns, such as 'Series Code' and 'Country Code'.
- Pivoted the data to organize it by year, making it more accessible for analysis.


3. Data Loading:
- Created a new Excel file and inserted the cleaned and pivoted data.
- Saved the Excel file with a descriptive name reflecting the selected series and countries.


Achievements:
- Successfully streamlined the ETL process for World Bank data, ensuring that only relevant information is stored in an easily accessible format.
- Facilitated future data analysis and reporting by organizing the data into a SQL database-ready format.
- Demonstrated a repeatable process for extracting, transforming, and loading large datasets into a structured and usable format.

  • GitHub
bottom of page