Databrick Architect-100% Remote Role

Contract

websites for c2c jobs

Role: Databrick Architect
Location: Remote
Duration: 12+ Months

Must Have Skills –
•       Databricks +AWS
•       Data Modeling & Design
•       PySpark Scripts
•       SQL Knowledge
•       Data Integration
•       Unity Catalog and Security Design
•       Identity federation
•       Auditing and Observability system tables/API/external tools
•       Access control / Governance in UC
•       External locations & storage credentials
•       Personal tokens & service principals
•       Metastore & unity catalog concepts
•       Interactive vs production workflows
•       Policies & entitlements
•       Compute types (incl. UC & non-UC

Job description-
Note:- Candidate should have Hands on experience in Databricks +AWS,
Data Modeling & Design, PySpark Scripts, SQL Knowledge, Unity Catalog
and Security Design, Identity federation, Auditing and Observability
system tables/API/external tools, Access control / Governance in UC,
External locations & storage credentials, Personal tokens & service
principals, Metastore & unity catalog concepts, Interactive vs
production workflows, Policies & entitlements, Compute types (incl. UC &
non UC, scaling, optimization)

Key Responsibilities:
1. Data Strategy & Architecture Development
•       Define and implement scalable, cost-effective, and high-performance
data architecture aligned with business objectives.
•       Design Lakehouse solutions using Databricks on AWS, Azure, or GCP.
•       Establish best practices for Delta Lake and Lakehouse Architecture.
2. Data Engineering & Integration
•       Architect ETL/ELT pipelines using Databricks Spark, Delta Live Tables
(DLT), and Databricks Workflows.
•       Integrate data from sources like Oracle Fusion Middleware, Web
Methods, MuleSoft, Informatica.
•       Enable real-time and batch processing using Apache Spark and Delta
Lake.
•       Ensure seamless connectivity with enterprise platforms (Salesforce,
SAP, ERP, CRM).
3. Data Governance, Security & Compliance
•       Implement governance frameworks using Unity Catalog for lineage,
metadata, and access control.
•       Ensure HIPAA, GDPR, and life sciences regulatory compliance.
•       Define and manage RBAC, Databricks SQL security, and access policies.
•       Enable self-service data stewardship and democratization.
4. Performance Optimization & Cost Management
•       Optimize Databricks compute clusters (DBU usage) for cost efficiency.
•       Leverage Photon Engine, Adaptive Query Execution (AQE), and caching
for performance tuning.
•       Monitor workspace health, job efficiency, and cost analytics.
5. AI/ML Enablement & Advanced Analytics
•       Design and manage ML pipelines using Databricks MLflow.
•       Support AI-driven analytics in genomics, drug discovery, and clinical
data.
•       Collaborate with data scientists to deploy and operationalize ML
models.
6. Collaboration & Stakeholder Engagement
•       Align data strategy with business objectives across teams.
•       Engage with platform vendors (Databricks, AWS, Azure, GCP,
Informatica, Oracle, MuleSoft).
•       Lead PoCs, drive Databricks adoption, and provide technical
leadership.
7. Data Democratization & Self-Service Enablement
•       Implement self-service analytics using Databricks SQL and BI tools
(Power BI, Tableau).
•       Foster data literacy and enable data sharing frameworks.
•       Establish robust data cataloging and lineage.
8. Migration & Modernization
•       Lead migration from legacy platforms (Informatica, Oracle, Hadoop) to
Databricks Lakehouse.
•       Design cloud modernization roadmaps ensuring minimal disruption.

Key Skills:
Databricks & Spark:
•       Databricks Lakehouse, Delta Lake, Unity Catalog, Photon Engine.
•       Apache Spark (PySpark, Scala, SQL), Databricks SQL, Delta Live Tables,
Databricks Workflows.
Cloud Platforms:
•       Databricks on AWS (preferred), Azure, or GCP.
•       Cloud storage (S3, ADLS, GCS), VPC, IAM, Private Link.
•       Infrastructure as Code: Terraform, ARM, CloudFormation.
Data Modeling & Architecture:
•       Dimensional, Star Schema, Snowflake, Data Vault.
•       Experience with Lakehouse, Data Mesh, and Data Fabric architectures.
•       Data partitioning, indexing, caching, query optimization.
ETL/ELT & Integration:
•       ETL/ELT development with Databricks, Informatica, MuleSoft, Apache
tools.

 

To apply for this job email your details to piyush@empowerprofessionals.com