Data Lake Systems Engineer (Advanced Computing)

15 Apr Data Lake Systems Engineer (Advanced Computing)

Posted at 09:00h in by NeuroTech Jobs 0 Comments

0 Likes

Full Time
Massachusetts
Posted on April 15, 2019
Applications have closed

Go Back

Biogen

Website

Data Lake Systems Engineer (Advanced Computing)

US-MA-Cambridge

Summary

The Data Lake Systems Engineer is a critical member of the High Performance Computing (Data Lake) Team reporting into the Infrastructure & Operations IT organization, which provides support for the Biogen global research community.

Job Category

Information Technology

Requisition Number

34650BR

Job Description

Reporting to the Head of Data Lake and HPC, the Data Lake Systems Engineer will play a critical role towards ensuring the usability, availability and reliability of the Data Lake computational and storage infrastructure by performing systems administrative and maintenance tasks, fulfilling user requests, resolving incidents and outages, providing comprehensive user training, maintaining high quality documentation, participating in the design and deployment of complex IT systems, and collaborating closely with the research community to more effectively leverage our Data lake infrastructure.

This person will oversee a team of contractors to ensure that the key responsibilities listed below are accomplished – this person will own delivery of these services to our customers.

Key responsibilities:

• Implementation and Administration of Datalake environment.

• Monitoring and managing the Hadoop services on HDP Production and DR clusters.

• Maintenance and Monitoring of the jobs of production, UAT and Dev environments.

• Code changes and updated code deployments in the UAT and Production environments.

• Deploying code changes on Rshiny server as per the user request.

• Implmentation and Monitoring of oozie scheduled jobs for the UAT, DEV and Production environments.

• Implmentation of patching activities and applying the fixes to the Datalake environment provided by the Hortonworks.

• Working on the job failures mostly Hive and Spark jobs across the Datalake environment.

• Onboarding the new users to the Hadoop datalake environment.

• Requirements gathering for creating the databases in Hive and providing policy based access management from the Ranger for the new POCs.

• Supporting the developers for executing the Adhoc jobs in Hive environments for the existing POCs like enrollment_forecaster etc.

• HDFS home directories and Hive schema,table and column level enforcing access bases policies management from Ranger.

The ideal candidate will possess an accomplished professional track record combined with exceptional technical acumen, self-confidence, excellent written and verbal communication skills, and a demonstrated potential to continually grow into new responsibilities.

She/he must be a self-motivated team player who is coachable, flexible, resilient, and comfortable working under high pressure with multiple deadlines and minimal supervision in a dynamic research environment.

Qualifications

• 5 – 8 years of progressively complex related experience in data engineering.

• 3 to 4 year of working experience in Big data stack on HDP or similar environments.

• AWS Big Data Certification is strongly preferred.

• Expertise in various data integration/ETL tools, application integration, business process and data science tools

• Strong knowledge of various DMBS systems including NoSQL architectures like HBASE and MongoDB.

• Has strong knowledge of large scale data applications and building high volume data integrations (preferably in a pharma/biotech setup).

• Strong Knowledge in Hadoop architecture, HDFS commands and experience designing & optimizing analytical jobs and queries against data in the HDFS/Hive environments.

• Strong Knowledge in Spark framework.

• Experience in Design & Development of API framework using Java, Python, Hive, Scala and pyspark on Jupyter notebook.

• Excellent knowledge of Hadoop integration points with enterprise BI and EDW tools (Tableau and Qlik).

• Strong Experience with Hadoop cluster management and Administrative, operations using Oozie, Yarn, Ambari, Zookeeper, Tez, Slider.

• Strong Experience with Hadoop ETL, Data Ingestion: Sqoop, Flume, Hive, Spark, Hbase latest developments in the Hadoop ecosystem.

• Experience in Hadoop Data Consumption and Other Components: Hive, Hue HBase, Phoenix, Spark, Pig.

• Experience monitoring, troubleshooting and tuning services and applications and operational expertise such as good troubleshooting skills, understanding of systems capacity, bottlenecks, and basics of memory, CPU, OS, storage, and networks.

• Experience in managing the cluster resources by implementing fair and capacity scheduler.

• Experience in scheduling jobs using OOZIE workflow and scheduling jobs using crontab.

• Experience in benchmarking, performing backup and disaster recovery of Name Node metadata and important sensitive data residing on cluster.

• Experience with bash shell scripts, UNIX utilities & UNIX Commands.

• Ansible – configuration management tool – nice to have.

• Prior experience in Pharma industry is an added benefit

Other desirable skills include:

• Knowledge of Perl, Python, or another high level programming language.

• Strong familiarity with shell scripting in the Linux environment.

• Previous experience building and/or testing software applications, supporting software engineering environments, administering databases, or managing DevOps platforms.

• Systems automation using Ansible, Puppet, Salt, or Chef.

• High comfort level with modern database, web, and collaborative technologies.

• Familiarity with the Amazon Web Services ecosystem and Atlassian tool stack.

Education

BA/BS in computer science, engineering or related technical disciplines and/or equivalent experience preferred.

Employment Category

Full-Time Regular

Experience Level

Mid-Senior Level

Please Apply here.

15 Apr Data Lake Systems Engineer (Advanced Computing)

Data Lake Systems Engineer (Advanced Computing)

No Comments

QUICK LINKS

COLLABORATE

ABOUT