Reports To: Engineering Resource Manager
In the role of Apache Airflow Specialist, you will be responsible for designing, expanding, automating and optimizing our data pipeline architecture, as well as optimizing data flow and collection.
You will be responsible for the Airflow architecture to design, build and manage automated data pipelines built upon environments in AWS and pioneer ways of working that other team members can adopt
Using Apache Airflow for respondent survey pipeline building, you will work to automate components of a manual load and then add these to the pipeline. The aim is to reduce manual intervention, and increase processing speed, so that we can repeatedly process more data, at higher frequencies in the future. Where needed you may choose to add manual human tasks to the automation workflow.
You will work with Data Engineers, Python Engineers and Product Managers within Agile to deliver secure, performant and maintainable data automation. You will use best-practice continuous integration and continuous deployment methodologies to ensure that the build and deployment pipelines are fast, robust and secure. You will champion good code quality and architectural practices.
Main Responsibilities and Tasks:
- Advise on the extension and optimisation of our automated data ingestion platform using Apache Airflow for the automation of data loading routines
- Automation/Industrialisation of pipelines for regular execution in business-critical scenarios
- Programmatically author, schedule and monitor workflow within Apache Airflow.
- Determine custom ETL processes in the design and implementation of data pipelines
- From ingestion of data from source systems, identify and resolve data quality and other related issues.
- Elicit requirements, describe scope, define data flow in flowcharts and be able transform them into stories as per our internal standards.
- Ability to read existing source code/scripting repositories to determine functional state of current implementations
- Be able to document current state and functional state of current workflow for the wider Technical team e.g. about the automation pipeline, and utilities used within it.
- Place security as a foremost primary concern in the architecture, secure coding, build and deployment of solutions
- Collaborate with the wider team in particular Infrastructure Engineers to deploy automation
- Convert, parse and manipulate data files using various database programs and utilities
- Become an ambassador for an area of specialist subject matter expertise and promote Community of Practice
- Mentor less experienced Engineers and provide the learning framework to support learning.
- Keep up to date with advances in Technology and best practices from the industry.
Education, Experience, Knowledge, and Skills
- Significant experience of large scale implementation using Apache Airflow
- Be an expert in the concept of DAGs (Directed Acyclic Graph) and Operators to schedule jobs
- Working knowledge of message queuing, stream processing, and highly scalable “big data” data stores.
- Experience of manipulating, processing and extracting value from large disconnected datasets.
- Prior experience with customer data platforms.
- Experience in performing root cause analysis on internal/external data and processes.
- Prior experience with data analysis and data warehousing
- Technical expertise with data models, data mining, and segmentation techniques
- Proficiency in scripting languages (especially Python)
- Be able to investigate current data loading procedures, planning pipelines and required steps in order to automate data extraction, transformation, and loading (ETL) processes
- Proficiency in understanding of GitHub for source code repositories to maintain daily operation, integrity and security of source code.
- Experience of conducting code reviews against acceptance criteria
- Knowledge of Amazon Web Services (AWS) infrastructure & services e.g. Redshift, EC2, RDS, S3, Lambda, EMR, Batch or Athena
- Excellent Linux scripting skills
- Experience with data modelling, data processing and ETL
- Passionate about the power of data to drive better business outcomes for our customers.
- A working knowledge of SQL, query authoring and a working familiarity with a variety of relational databases.
- Experience with Agile methodologies and change management i.e. JIRA and be able to define technical acceptance criteria for stories
- Experience working with external partners to drive product delivery.
- Excellent problem-solving skills
- Proven ability to work effectively in a distributed working environment
- Outstanding written and verbal communication skills
- Ability to estimate effort of own tasks and those of others in expertise domain
- Organized, detail-oriented, and deadline-driven
- Strong interpersonal skills and the ability to work proactively, independently and as a team player
- Ability to work efficiently and productively in a fast-paced environment
- Willingness to learn new skills
- Be confident with numbers