Data Lake Systems Engineer (Advanced Computing)

  • Full Time
  • Massachusetts
  • Applications have closed

Data Lake Systems Engineer (Advanced Computing)

US-MA-Cambridge

 

Summary

The Data Lake Systems Engineer is a critical member of the High Performance Computing (Data Lake) Team reporting into the Infrastructure & Operations IT organization, which provides support for the Biogen global research community.

 

Job Category

Information Technology

 

Requisition Number

34650BR

 

Job Description

Reporting to the Head of Data Lake and HPC, the Data Lake Systems Engineer will play a critical role towards ensuring the usability, availability and reliability of the Data Lake computational and storage infrastructure by performing systems administrative and maintenance tasks, fulfilling user requests, resolving incidents and outages, providing comprehensive user training, maintaining high quality documentation, participating in the design and deployment of complex IT systems, and collaborating closely with the research community to more effectively leverage our Data lake infrastructure.

This person will oversee a team of contractors to ensure that the key responsibilities listed below are accomplished – this person will own delivery of these services to our customers.

 

Key responsibilities: 

• Implementation and Administration of Datalake environment.

• Monitoring and managing the Hadoop services on HDP Production and DR clusters.

• Maintenance and Monitoring of the jobs of production, UAT and Dev environments.

• Code changes and updated code deployments in the UAT and Production environments.

• Deploying code changes on Rshiny server as per the user request.

• Implmentation and Monitoring of oozie scheduled jobs for the UAT, DEV and Production environments.

• Implmentation of patching activities and applying the fixes to the Datalake environment provided by the Hortonworks.

• Working on the job failures mostly Hive and Spark jobs across the Datalake environment.

• Onboarding the new users to the Hadoop datalake environment.

• Requirements gathering for creating the databases in Hive and providing policy based access management from the Ranger for the new POCs.

• Supporting the developers for executing the Adhoc jobs in Hive environments for the existing POCs like enrollment_forecaster etc.

• HDFS home directories and Hive schema,table and column level enforcing access bases policies management from Ranger.

The ideal candidate will possess an accomplished professional track record combined with exceptional technical acumen, self-confidence, excellent written and verbal communication skills, and a demonstrated potential to continually grow into new responsibilities. 

She/he must be a self-motivated team player who is coachable, flexible, resilient, and comfortable working under high pressure with multiple deadlines and minimal supervision in a dynamic research environment.

 

Qualifications

• 5 – 8 years of progressively complex related experience in data engineering.

• 3 to 4 year of working experience in Big data stack on HDP or similar environments.

• AWS Big Data Certification is strongly preferred.

• Expertise in various data integration/ETL tools, application integration, business process and data science tools

• Strong knowledge of various DMBS systems including NoSQL architectures like HBASE and MongoDB.

• Has strong knowledge of large scale data applications and building high volume data integrations (preferably in a pharma/biotech setup).

• Strong Knowledge in Hadoop architecture, HDFS commands and experience designing & optimizing analytical jobs and queries against data in the HDFS/Hive environments.

• Strong Knowledge in Spark framework.

• Experience in Design & Development of API framework using Java, Python, Hive, Scala and pyspark on Jupyter notebook.

• Excellent knowledge of Hadoop integration points with enterprise BI and EDW tools (Tableau and Qlik).

• Strong Experience with Hadoop cluster management and Administrative, operations using Oozie, Yarn, Ambari, Zookeeper, Tez, Slider.

• Strong Experience with Hadoop ETL, Data Ingestion: Sqoop, Flume, Hive, Spark, Hbase latest developments in the Hadoop ecosystem.

• Experience in Hadoop Data Consumption and Other Components: Hive, Hue HBase, Phoenix, Spark, Pig.

• Experience monitoring, troubleshooting and tuning services and applications and operational expertise such as good troubleshooting skills, understanding of systems capacity, bottlenecks, and basics of memory, CPU, OS, storage, and networks.

• Experience in managing the cluster resources by implementing fair and capacity scheduler.

• Experience in scheduling jobs using OOZIE workflow and scheduling jobs using crontab.

• Experience in benchmarking, performing backup and disaster recovery of Name Node metadata and important sensitive data residing on cluster.

• Experience with bash shell scripts, UNIX utilities & UNIX Commands.

• Ansible – configuration management tool – nice to have.

• Prior experience in Pharma industry is an added benefit

 

Other desirable skills include:

• Knowledge of Perl, Python, or another high level programming language.

• Strong familiarity with shell scripting in the Linux environment.

• Previous experience building and/or testing software applications, supporting software engineering environments, administering databases, or managing DevOps platforms.

• Systems automation using Ansible, Puppet, Salt, or Chef.

• High comfort level with modern database, web, and collaborative technologies.

• Familiarity with the Amazon Web Services ecosystem and Atlassian tool stack.

 

Education

BA/BS in computer science, engineering or related technical disciplines and/or equivalent experience preferred.

 

Employment Category

Full-Time Regular

 

Experience Level

Mid-Senior Level

 

Please Apply here.

No Comments

Sorry, the comment form is closed at this time.