Principal Data Architect, Manager
Vitoria-Gasteiz, Spain; Barcelona, Spain - Semi-Presence
PepsiCo operates in an environment undergoing immense and rapid change. Big-data and digital technologies are driving business transformation that is unlocking new capabilities and business innovations in areas like eCommerce, mobile experiences and IoT. The key to winning in these areas is being able to leverage enterprise data foundations built on PepsiCo’s global business scale to enable business insights, advanced analytics and new product development. PepsiCo’s Data Management and Operations team is tasked with the responsibility of developing quality data collection processes, maintaining the integrity of our data foundations and enabling business leaders and data scientists across the company to have rapid access to the data they need for decision-making and innovation.
What PepsiCo Data Management and Operations does:
- Maintain a predictable, transparent, global operating rhythm that ensures always-on access to high-quality data for stakeholders across the company
- Responsible for day-to-day data collection, transportation, maintenance/curation and access to the PepsiCo corporate data asset
- Work cross-functionally across the enterprise to centralize data and standardize it for use by business, data science or other stakeholders
- Increase awareness about available data and democratize access to it across the company
As a Principal Data Architect, you will be the key technical expert overseeing PepsiCo's data product build & operations and drive a strong vision for how data engineering can proactively create a positive impact on the business. As a member of the data engineering team, you will help lead the development of very large and complex data applications into public cloud environments directly impacting the design, architecture, and implementation of PepsiCo's flagship data products around topics like revenue management, supply chain, manufacturing, and logistics. The primary responsibilities of this role are to work with business users, data product owners, platform owners, enterprise architects, data management owners, and data engineering teams to ensure the data supply chain and the enterprise data products are built with high performance, high availability, and maintainability standards using current and emerging big data technologies. You'll be working in a hybrid environment with in-house, on-premise data sources as well as cloud and remote systems. You will establish data design patterns that will drive flexible, scalable, and efficient data models to maximize value and reuse. You’ll make tactical architecture decisions to support immediate projects but will be a key expert informing long term data architecture strategy.
- Develop a deep understanding of the business domain and enterprise technology inventory to craft a solution roadmap that achieves business objectives, maximizes reuse.
- Design scalable patterns and architecture to support both batch and real-time data products & platform using big data technologies such as Hadoop, SQL Data Warehouse, EMR, Spark, DataBricks, Snowflake, Azure Synapse or other Cloud data warehousing technologies.
- Ensure physical and logical data models are designed with an extensible philosophy to support future, unknown use cases with minimal rework.
- Partner with IT, data engineering and other teams on the administration and monitoring of all data platforms to ensure the enterprise data model incorporates key dimensions needed for the proper management: business and financial policies, security, local-market regulatory rules, consumer privacy by design principles (PII management) and all linked across fundamental identity foundations.
- Drive collaborative reviews of design, code, data, security features implementation performed by data engineers to drive data product development.
- Assist with data planning, sourcing, collection, profiling, and transformation.
- Write requirements for ETL and BI developers.
- Test the effectiveness of the database before release for business use.
- Show expertise for data at all levels: low-latency, relational, and unstructured data stores; analytical and data lakes; data streaming (consumption/production), data in-transit.
- Develop repeatable data patterns based on cloud-centric, code-first approaches to data management and cleansing.
- Work with product managers and data stewards within the enterprise data governance process to define and conceptualize data models across enterprise master data, transaction data, and informational data and implement those models into the enterprise data model.
- Partner with the data science team to standardize their classification of unstructured data into standard structures for data discovery and action by business customers and stakeholders.
- Design data lineage and mapping of source system data to canonical data stores for research, analysis and productization.
- Lead the way in creating next-generation talent for Tech, mentoring internal talent and help leadership in recruiting external talent.
- Help with Intake prioritization, decision making of what to pursue across a wide base of users/stakeholders and across products, databases and services.
- 8+ years of overall technology experience that includes at least 6+ years of hands-on software development, data engineering, and systems architecture.
- 6+ years of experience with Data Lake Infrastructure, Data Warehousing, and Data Analytics tools.
- 6+ years of experience developing enterprise data models.
- 4+ years in cloud data engineering experience in at least one cloud (Azure, AWS, GCP).
- Experience in at least one data modeling tool (ER/Studio, Erwin).
- Experience with integration of multi cloud services with on-premises technologies.
- Experience with data modeling, data warehousing, and building high-volume ETL/ELT pipelines.
- Experience with data profiling and data quality tools like Apache Griffin, Deequ, and Great Expectations.
- Experience building/operating highly available, distributed systems of data extraction, ingestion, and processing of large data sets.
- Experience with at least one MPP database technology such as Redshift, Synapse or SnowFlake.
- Experience with running and scaling applications on the cloud infrastructure and containerized services like Kubernetes.
- Experience with version control systems like Github and deployment & CI tools.
- Experience with Azure Data Factory, Databricks and Azure Machine learning is a plus.
- Experience with building solutions in the retail or in the supply chain space is a plus
- Understanding of metadata management, data lineage, and data glossaries is a plus.
- Working knowledge of agile development, including DevOps and DataOps concepts.
- Familiarity with business intelligence tools (such as PowerBI).
- BA/BS in Computer Science, Math, Physics, or other technical fields.
Skills, Abilities, Knowledge:
- Excellent communication skills, both verbal and written, along with the ability to influence and demonstrate confidence in communications with senior level management.
- Proven track record of leading, mentoring, hiring and scaling data teams.
- Strong change manager. Comfortable with change, especially that which arises through company growth. Able to lead a team effectively through times of change.
- Ability to understand and translate business requirements into data and technical requirements.
- High degree of organization and ability to manage multiple, competing projects and priorities simultaneously.
- Positive and flexible attitude to enable adjusting to different needs in an ever-changing environment.
- Strong leadership, organizational and interpersonal skills; comfortable managing trade-offs.
- Foster a team culture of accountability, communication, and self-management.
- Proactively drives impact and engagement while bringing others along.
- Consistently attain/exceed individual and team goals
- Ability to lead others without direct authority in a matrixed environment.