paperliner.blogg.se

Astronomer apache airflow 213m insight
Astronomer apache airflow 213m insight





astronomer apache airflow 213m insight
  1. ASTRONOMER APACHE AIRFLOW 213M INSIGHT INSTALL
  2. ASTRONOMER APACHE AIRFLOW 213M INSIGHT DOWNLOAD
astronomer apache airflow 213m insight

ASTRONOMER APACHE AIRFLOW 213M INSIGHT INSTALL

This is not a complete number by any means - users can set up private package repos or image registries, install from a Github clone, or distribute Airflow directly within their organizations - but it is a big, growing number we can all celebrate: So when we say “Airflow downloads” in our content and materials, we mean the sum of Docker Hub image pulls for apache/airflow and PyPI package downloads for apache-airflow. And we want a really comprehensive metric to summarize demand, one that befits a community with the vibrancy and ubiquity of Airflow. They boost morale and establish a “high water mark” for the community to celebrate. They are fun! They are in the “hamburgers served” realm. We snapshot this every day and load it into our warehouse with - you guessed it - another Airflow DAG: Fortunately, Docker Hub’s public API provides the number of total pulls for an image. Since the community maintains an official Airflow Docker image, with accompanying Helm chart and docker-compose.yaml, many users also install Airflow using that instead of Python packages. Studying the distribution of package downloads gives us a sense of which provider packages people are using most and, therefore, the systems people are using alongside Airflow most frequently: We also gather data on other Airflow packages. It shows us vibrant and growing demand for Airflow’s Python package: We get this from the PyPI public data source in Google BigQuery, and we ingest it into our warehouse nightly using an Airflow DAG. Many users of Airflow install it using pip install, so the number of downloads on the main Airflow Python package is interesting to us. Downloads themselves, however, are easier to measure.

ASTRONOMER APACHE AIRFLOW 213M INSIGHT DOWNLOAD

Once users download Airflow, they can use and redistribute it in many ways that cannot be studied through publicly available data sources. The “dream metrics” - the number of deployments running during a given day, the number of people who use Airflow - are unattainable.

astronomer apache airflow 213m insight

Using these mailbox archives, we can observe the total number of votes that have been called over time:Īs you might expect, we have a keen interest in how much demand exists for Airflow. We analyze the messages looking for threads that contain “VOTE” in the subject line, as those are the ones where a community member calls for a vote. Archives are publicly available in mbox format, and we ingest messages daily via another Airflow DAG into our data warehouse. These are especially interesting because the dev list is where much of the official Apache project governance occurs. In addition to Github, we study the conversations that occur on the Airflow mailing lists. That lets us see the number of created pull requests by week over the last six months: We gather metrics on the number of opened, closed, and merged pull requests nightly via the same Airflow DAG, and load them into our warehouse. Fortunately, we can measure a more atomic unit of work: the pull request. The information we pull from it - on commits, pull requests, issues, stars, and forks - is ingested daily via an Airflow DAG into our data warehouse.Īs a starting point, we look at the number of commits in the Airflow repository over time:Ĭommits are interesting, but they can represent highly variable amounts of work. We study the activity of developers in the project for many reasons, but our primary goals are to quantify how much work is happening in the project over time, compare Astronomer’s contributions with the community norm, and gauge whether activity is speeding up or slowing down.Īs development of Airflow occurs mainly on Github, our primary source of information on this subject is the Github API. The first area we focus on is development. In this post, I’ll explore the different data sources and metrics we use to understand the current state - and history - of Airflow in particular. So, naturally, we in the community team use data as a way to understand what’s happening in the open source communities and projects - Apache Airflow and OpenLineage - that Astronomer works with closely. Every part of our company, from engineering to customer success, operates at least one data pipeline.

astronomer apache airflow 213m insight

We work with open source data projects, and deliver data solutions to data-focused organizations.







Astronomer apache airflow 213m insight