Description & Requirements
Introduction: A Career at HARMAN Automotive
We’re a global, multi-disciplinary team that’s putting the innovative power of technology to work and transforming tomorrow. At HARMAN Automotive, we give you the keys to fast-track your career.
Engineer intelligent systems and data-driven solutions that enhance personalization and entertainment as part of in-cabin experience
Combine research, experimentation, and collaboration across multiple engineering disciplines
Advance analytics and machine learning capabilities that power next generation automotive solutions
About the Role
As a Data Engineer on the Innovation Team, you will design and build an end-to-end audio data collection and consumption platform that enables advanced analytics, audio information retrieval, and personalization capabilities across Car Audio Division
You will establish standardized, secure, and scalable pipelines to collect, curate, and govern data from internal engineering systems, companion apps, and connected-vehicle telemetry. You will collaborate closely with Data Scientists, ML Engineers, Embedded/DSP Engineers, and Audio Experts to ensure high-quality data products are consistently available for experimentation, training, validation, analysis, and production monitoring—while meeting strict privacy, security, reliability, and automotive compliance requirements.
You will also serve as a platform consultant, advising other departments on data architecture, data contracts, ingestion design, and downstream consumption, while adhering to strict privacy, safety, and reliability requirements.
What You Will Do
Design and implement standardized data ingestion frameworks for batch and streaming sources (internal databases, user‑preference data, vehicle telemetry).
Define and maintain data models and data contracts (schemas, semantics, versioning rules) for audio-related files, user preferences and metadata including quality guardrails (schema validation, anomaly detection, etc.)
Develop and maintain data lakes / lakehouse architectures
Design and implement standardized API for data consumption.
Prepare and curate high‑quality datasets for AI/ML model training, validation, experimentation and statistical analysis.
Collaborate directly with ML Engineer and Data Scientist to optimize data formats for training performance and storage efficiency
Design cost‑aware policies for data retention, sampling, compression, and technology selection
Optimize pipeline execution times and resource usage (batch vs streaming, compute sizing, caching strategies)
Establish measurable KPIs for data cost efficiency, pipeline reliability, and performance
Ensure compliance with internal policies, OEM requirements, and regulatory constraints
Create clear documentation for pipelines, schemas, and architectural decisions
Mentor other engineers on practical data engineering, performance tuning, and cost‑efficient design
What You Need to Be Successful
Strong execution mindset with the ability to design, implement, debug, and optimize data systems end‑to‑end
Bachelor’s or Master’s degree in Computer Science, Data Engineering, Software Engineering, or a related field
5+ years of hands‑on experience building and operating large‑scale data pipelines and platforms
Strong proficiency in Python, PySpark and SQL for data processing and automation
Solid experience with cloud‑based data platforms (e.g., AWS S3, Azure, Databricks, object storage, distributed computing)
Proven ability to automate repetitive tasks and improve data hygiene
Proven experience implementing cost‑saving strategies in data platforms (storage tiering, compute optimization, pipeline tuning)
Deep understanding of data modeling, partitioning, indexing, and performance optimization
Experience handling large‑volume, high‑frequency or unstructured data (audio, signals, logs, telemetry)
Strong knowledge of data governance, and privacy best practices
Working knowledge of privacy‑preserving data handling (pseudonymization, anonymization, etc.), data‑quality checks, and data‑lineage tracking.
Ability to work independently and collaborate effectively in cross‑functional global teams
Strong communication and collaboration skills, with the ability to work effectively in an intercultural and cross‑functional team.
Bonus Points if You Have
Experience working with audio, image or signal processing domains
Background working with automotive datasets or connected vehicle data
Experience deploying ML models to production (cloud or edge environments).
Experience with event-driven architectures (Kafka-like patterns, IoT ingestion patterns).
Experience programming with C++ and embedded systems
Familiarity with OTA update flows, artifact versioning, and safe rollback mechanisms.
Experience building reproducible working‑environment (docker, virtual environments)
Experience with privacy-aware designs: PII privacy, tokenization/pseudonymization, retention automation.
What Makes You Eligible
Advanced English communication skill, will be part of a Global team
Willing to work in an office at Queretaro as part of hybrid flex work
Because the team operates globally, flexibility in working hours is required.
Successfully complete a background investigation and drug screen as a condition of employment
What We Offer
Flexible work environment
Access to employee discounts on world-class Harman and Samsung products (JBL, HARMAN Kardon, AKG, etc.)
Extensive training opportunities through our own HARMAN University
Competitive wellness benefits
Tuition reimbursement
“Be Brilliant” employee recognition and rewards program
An inclusive and diverse work environment that fosters and encourages professional and personal development
#LI-KV2