Specifically I transform and load the following into the database, I only take a subset of the data that is returned from OpenWeatherMap. The code below creates a table called weather_table in a local PostgreSQL database named WeatherDB. I went over the basics of how to use PostgresSQL in a previous blog post, so I'll just present the code I used to make one here. Now, let's go over how to set up a PostgreSQL database. A great introduction into using API's with Python can be found here. Note that this is the exact Bash command that I'll use to have Airflow collect daily weather data. The above code is stored in a file title getWeather.py and be run from the command line by typing from the appropriate directory: python getWeather.py If the request is succesfull, then weather data is returned and is then dumped into a JSON file with a name that is the current date using the JSON package. Proper exception handling here is definitely something I will add in the future. After the request has been made, I check to see if it was successful by checking the status_code, result.status_code = 200 In order to use this code yourself you would have to obtain your own API key and either substitute it into the code directly or have a variable API_KEY = your-api-key in a config.py file. Notice that I keep my API key in a seperate file called config.py. To get a better feel for how the request below works, check out the OpenWeatherMap API documentation page here. Below I wrote a module, getWeather.py, that uses a GET request to obtain the weather for Brooklyn, NY. In Python, this is done using the requests module. To use a Web API to get data, you make a request to a remote web server, and retrieve the data you need. Let's first get started with how to query an API. Airflow is a platform to schedule and monitor workflows and in this post I will show you how to use it to extract the daily weather in New York from the OpenWeatherMap API, convert the temperature to Celsius and load the data in a simple PostgreSQL database. Since the data is a dictionary of strings this means we must transform it before storing or loading into a database. JSON can pretty much be thought of a semi-structured data or as a dictionary where the dictionary keys and values are strings. Often times the data we get back is in the form of JSON. If the query is sucessful, then we will receive data back from the API's server. You can see the source code for this project here.Įxtracting data can be done in a multitude of ways, but one of the most common ways is to query a WEB API. In this blog post I want to go over the operations of data engineering called Extract, Transform, Load (ETL) and show how they can be automated and scheduled using Apache Airflow.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |