Big Data Engineer Resume , Jersey City, NJ - Hire IT People

SUMMARY:

12+ years of experience in analysis, design and development using Big Data, Java and Confidential.
Experience on Hadoop, HDFS, Hive, Pig, MapReduce, Spark
Configured Zoo Keeper, Flume, Kafka & Sqoop to the existing Hadoop cluster.
Hands - on experience with Hadoop applications (such as administration, configuration management, monitoring, debugging, and performance tuning).
Having experience on various Databases and Sources like Oracle, Netezza, MySql, Sql Server, Db2, Postgres, MainFrames.
Participated in requirement analysis, reviews and working sessions to understand the requirements and system design.
Experience in developing Front-End using JSF, JavaScript, HTML, XHTML and CSS.
Experience in working with web/applications servers IBM Web sphere, Oracle Weblogic, Apache Tomcat.
Experience in designing highly transactional web sites using J2EE technologies and handling design/implementation-using Eclipse.

TECHNICAL SKILLS:

Languages: Java, Python, R, Scala

Platforms: LINUX, Windows

Big Data: Hadoop, HDFS, MapReduce, Pig, Zookeeper, Hive, Sqoop, Flume, Kafka, Spark, Impala

J2SE / J2EE Technologies: Java, J2EE, JDBC, JSF, JSP, Web Services, Maven

Web Technologies: HTML, XHTML, CSS, Java Script, JSF and AJAX, Qlikview, XML and Shell Script.

Cloud Technologies: AWS, EC2, S3, Redshift, Data Pipeline, EMR.

Web/Application Servers: Web Sphere, Web logic Application server, Apache Tomcat

IDE / Tools: Eclipse, IntelliJ, RStudio

Methodologies: Agile, Scrum, Kanban

PROFESSIONAL EXPERIENCE:

Confidential

Big Data Engineer

Responsibilities:

Used Sqoop to pull the data from RDBMS like Teradata, Netezza, Oracle and storing it in the Hadoop.
Creating external hive tables to store and queries the data which is loaded.
Data will be loaded monthly, weekly and daily depends on the portfolios.
Different data include retail, auto, cards, home loans, and references.
Some of the retail data is in Mainframes and RDBMS, so need to apply joins and store them at one location.
Used Spark SQL to create structured data by using data frame and querying from other data sources using JDBC and hive.
Scrubbed the history data present in hive and files located in HDFS.
Optimizations techniques include partitioning, bucketing.
Created internal tool for comparing the RDBMS and Hadoop such that all the data located in source and target matches using shell script.
Working with copybook files converting them from ASCHII, binary formats and storing in HDFS and creating hive tables such that we can Decommission Mainframes and make Hadoop as a primary source and same this for the export to mainframes.
Used some of the Pig and written pig scripts to transform the data in structured format.
Worked with Text, Avro, and Parquet file formatted and snappy as a default compression.
Created Oozie work flows to automate the process in structured manner.
We have 3 layers of storing the data Raw layer, Intermediate layer and Publish layer.
Used impala to query the data into the publish layers where all the other teams or business users can access for faster processing.
Worked on the Autosys and created jil with the dependencies of the other jobs such that all the jobs run in parallel and it’s been automated.
Used Eclipse IDE to check the new files, existing, and modification needs be done.
Used SVN repository to checking or checkout the code.

Environment: Hadoop, HDFS, Cloudera, Hive, Impala, shell script, eclipse, SVN, linux, oozie, Autosys, Teradata, Netezza, Oracle, Spark, Scala

Confidential

Hadoop Engineer

Responsibilities:

Managing several Hadoop clusters and other services of Hadoop Ecosystem in development and production environments.
Work closely with engineering teams and participate in the infrastructure development and framework development.
Worked on POCs in R&D environment on Hive2, Sparkand Kafka before providing services to the applications teams.
Automate deployment and management of Hadoop services including implementing monitoring.
Worked closely with Alpide team, ensuring all the issues where addressed or resolved sooner.
Contribute to the evolving architecture of our services to meet changing requirements for scaling, reliability, performance, manageability, and price.
Capacity planning of Hadoop clusters based on application requirement.
Peer Reviews with the application teams for their release and ensure they maintain the standards.
Created sentry policy files to provide access to the required databases and tables to view from impala to the business users in the dev, uat and prod environment.
Migrated the existing data to Hadoop from RDBMS (Netezza, Oracle and Teradata) using Sqoop for processing the data and logs from server using flume into HDFS.
Created managed and external tables in hive and implemented partitioning and bucketing techniques for space and performance efficiency.
Used Impala on select queries for Business Users to retrieve the tables faster.
Developed Oozie shell wrapper for implementing Oozie re-run process for common workflows and sub-workflows.
Used Autosys scheduler to automate the jobs.
Used various file formats Avro, Parquet, Json, Text by using snappy compression.
Used CVS repository to checking or checkout the code.

Environment: Hadoop, HDFS, Hive, Sqoop, Impala, Flume, Python, Oozie, Autosys, Linux, Oracle, Netezza and CVS, Cloudera, Spark sql.

Confidential, Jersey City, NJ

Big Data Developer

Responsibilities:

Worked with closely with Business sponsors on the architectural solutions to meet their business needs
Conducted information sharing and teaching sessions to facilitate increased awareness of industry trends and upcoming initiatives by ensuring compliance between business strategies and goals and solution architecture designs
Performance tuned the application at various layers - MR, HIVE, CDH, and Oracle.
Worked on Spark SQL, creating Data Frames from hive tables and applying schema to the data in hdfs.
Used Qlikview to create visual interface of the real time data processing.
Implemented partitioning, dynamic partitioning and bucketing in hive.
Imported and exported data from various databases Netezza, oracle, MySql, DB2 into hdfs.
Automated the process from pulling the data from data sources to Hadoop and exporting the data in the form of Jason files in to specified location.
Migrated the Hive queries to Impala
Worked on various file formats Avro, SerDe, Parquet, Text by using snappy compression.
Created analysis batch job prototypes using Hadoop, Pig, Oozie, Hue and Hive.
Used Git repository to checking and checkout the code.
Designed, documented operational problems by following standards and procedures using a software-reporting tool JIRA.

Environment: Hadoop, HDFS, Map Reduce, Spark, Kafka, Hive, Impala, Pig, Sqoop, Java, Linux shell scripting, Oracle, Netezza, MySql, Db2, Qlikview, GIT.

Confidential, Charlotte, NC

Hadoop Developer

Responsibilities:

Worked extensively on importing and exported data into HDFS using sqoop.
Responsible for creating complex tables using hive.
Created partitioned tables in Hive for best performance and faster querying.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS.
Experience with professional software engineering practices and best practices for the full software development life cycle including coding standards, code reviews, source control management and build processes.
Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
Written MapReduce using java for implementing various formulas. By using partitioners, combiners and sorting.
Written multiple MapReduce procedures to power data for extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.
Handling structured and unstructured data and applying ETL processes.
Prepare Developer (Unit) Test cases and execute Developer Testing.
Create/Modify shell scripts for scheduling various data cleansing scripts and ETL loading process.
Supports and assist QA Engineers in understanding, testing and troubleshooting.
Written build scripts using ant and participated in the deployment of one or more production systems
Production Rollout Support that includes monitoring the solution post go-live and resolving any issues that are discovered by the client and client services teams.
Designed, documented operational problems by following standards and procedures using a software-reporting tool JIRA.
Used Git repository to checking and checkout the code.

Environment: MapReduce, Java, Flat files, Oracle, Netezza, Postgres, UNIX, HDFS, Sqoop, Hive, Oozie, Intellij, GIT, shell scripting

Confidential, Kansas City, MO

Hadoop Developer

Responsibilities:

Extracted data files from MySql, Oracle through Sqoop and placed in HDFS and processed.
Load and transform large sets of structured, semi structured and unstructured data.
Responsible to manage data coming from different sources.
Involved in loading data from UNIX file system to HDFs.
Involved in creating Hive tables, loading with data and writing hive queries which will run Internally in map reduce way.
Experienced in running Hadoop streaming jobs to process terabytes of xml, csv format data.
Load and transform large sets of structured, semi structured and unstructured data.
Loaded structured data from oracle into Cassandra (NoSql) using sqoop.
Worked on file formats csv, xml, json, avro, parquet.
Used compressions snappy, bz2, avro.

Environment: HDFS, Hive, Map Reduce, Eclipse, Oracle, MySQL, unix, sqoop, Cassandra, Shell Scripting.

Confidential

Java Developer

Responsibilities:

Used class-responsibility-collaborator (CRC) model to identify organized classes in the Hospital Management Systems.
Used sequence diagrams to show the object interactions involved with the Use-Cases of a user of the system.
Involved in Database Design by creating Data Flow Diagram (Process Model) and ER Diagram (Data Model).
Designed HTML screens with JSP for the front-end.
Made JDBC calls from the Servlets to the Database
Involved in designing stored procedures to extract and calculate billing information connecting to oracle.
Formatting the results from the Database as HTML reports to the client.
Java Script was used for client side validation.
Servlets are used as the controllers and Entity/Session Beans for Business logic purpose.
Used WebLogic to deploy applications on local and development environments of the application.
Used Eclipse for building the application.
Participated in User review meetings and used Test Director to periodically log the development issues, production problems and bugs.
Implemented and supported the project through development, Unit testing phase into production environment.
Used CVS Version manager for source control and CVS Tracker for change control management.

Environment: Java, JSP, JDBC, Java Script, HTML, WebLogic, Eclipse and CVS.

Confidential

Associate

Responsibilities:

Worked in criminal section, based on the portfolio of Judges and states handling the petitions.
Each day new filing comes in, and existing petitions which we have prepare a list based on the adjourn orders by the judge.
Working closely with Judge, registrars, Court masters, section officers on the cases by day to day activity.
Interacted with many lower court staff, to ensure all the petitions are correctly filed.
Communicated with many attorneys and clients on their petitions and their hearing.
Moved Computer section, where we the entire record of the organization and their corresponding portfolios are stored.
Taking care of the computers, printers, networking for different sections, court halls.
Installing, configuring and setting up network privileges to those who work on those computers and assigned them a unique to identify where the problem and user.
Upgrading the computers using RAM hard disk or any virus has been spread when they are getting slow.

We provide IT Staff Augmentation Services!

Big Data Engineer Resume

Jersey City, NJ

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship