We provide IT Staff Augmentation Services!

Big Data Engineer Resume

2.00/5 (Submit Your Rating)

Jersey City, NJ

SUMMARY:

  • 12+ years of experience in analysis, design and development using Big Data, Java and Confidential.
  • Experience on Hadoop, HDFS, Hive, Pig, MapReduce, Spark
  • Configured Zoo Keeper, Flume, Kafka & Sqoop to the existing Hadoop cluster.
  • Hands - on experience with Hadoop applications (such as administration, configuration management, monitoring, debugging, and performance tuning).
  • Having experience on various Databases and Sources like Oracle, Netezza, MySql, Sql Server, Db2, Postgres, MainFrames.
  • Participated in requirement analysis, reviews and working sessions to understand the requirements and system design.
  • Experience in developing Front-End using JSF, JavaScript, HTML, XHTML and CSS.
  • Experience in working with web/applications servers IBM Web sphere, Oracle Weblogic, Apache Tomcat.
  • Experience in designing highly transactional web sites using J2EE technologies and handling design/implementation-using Eclipse.

TECHNICAL SKILLS:

Languages: Java, Python, R, Scala

Platforms: LINUX, Windows

Big Data: Hadoop, HDFS, MapReduce, Pig, Zookeeper, Hive, Sqoop, Flume, Kafka, Spark, Impala

J2SE / J2EE Technologies: Java, J2EE, JDBC, JSF, JSP, Web Services, Maven

Web Technologies: HTML, XHTML, CSS, Java Script, JSF and AJAX, Qlikview, XML and Shell Script.

Cloud Technologies: AWS, EC2, S3, Redshift, Data Pipeline, EMR.

Web/Application Servers: Web Sphere, Web logic Application server, Apache Tomcat

IDE / Tools: Eclipse, IntelliJ, RStudio

Methodologies: Agile, Scrum, Kanban

PROFESSIONAL EXPERIENCE:

Confidential

Big Data Engineer

Responsibilities:
  • Used Sqoop to pull the data from RDBMS like Teradata, Netezza, Oracle and storing it in the Hadoop.
  • Creating external hive tables to store and queries the data which is loaded.
  • Data will be loaded monthly, weekly and daily depends on the portfolios.
  • Different data include retail, auto, cards, home loans, and references.
  • Some of the retail data is in Mainframes and RDBMS, so need to apply joins and store them at one location.
  • Used Spark SQL to create structured data by using data frame and querying from other data sources using JDBC and hive.
  • Scrubbed the history data present in hive and files located in HDFS.
  • Optimizations techniques include partitioning, bucketing.
  • Created internal tool for comparing the RDBMS and Hadoop such that all the data located in source and target matches using shell script.
  • Working with copybook files converting them from ASCHII, binary formats and storing in HDFS and creating hive tables such that we can Decommission Mainframes and make Hadoop as a primary source and same this for the export to mainframes.
  • Used some of the Pig and written pig scripts to transform the data in structured format.
  • Worked with Text, Avro, and Parquet file formatted and snappy as a default compression.
  • Created Oozie work flows to automate the process in structured manner.
  • We have 3 layers of storing the data Raw layer, Intermediate layer and Publish layer.
  • Used impala to query the data into the publish layers where all the other teams or business users can access for faster processing.
  • Worked on the Autosys and created jil with the dependencies of the other jobs such that all the jobs run in parallel and it’s been automated.
  • Used Eclipse IDE to check the new files, existing, and modification needs be done.
  • Used SVN repository to checking or checkout the code.

Environment: Hadoop, HDFS, Cloudera, Hive, Impala, shell script, eclipse, SVN, linux, oozie, Autosys, Teradata, Netezza, Oracle, Spark, Scala

Confidential

Hadoop Engineer

Responsibilities:
  • Managing several Hadoop clusters and other services of Hadoop Ecosystem in development and production environments.
  • Work closely with engineering teams and participate in the infrastructure development and framework development.
  • Worked on POCs in R&D environment on Hive2, Sparkand Kafka before providing services to the applications teams.
  • Automate deployment and management of Hadoop services including implementing monitoring.
  • Worked closely with Alpide team, ensuring all the issues where addressed or resolved sooner.
  • Contribute to the evolving architecture of our services to meet changing requirements for scaling, reliability, performance, manageability, and price.
  • Capacity planning of Hadoop clusters based on application requirement.
  • Peer Reviews with the application teams for their release and ensure they maintain the standards.
  • Created sentry policy files to provide access to the required databases and tables to view from impala to the business users in the dev, uat and prod environment.
  • Migrated the existing data to Hadoop from RDBMS (Netezza, Oracle and Teradata) using Sqoop for processing the data and logs from server using flume into HDFS.
  • Created managed and external tables in hive and implemented partitioning and bucketing techniques for space and performance efficiency.
  • Used Impala on select queries for Business Users to retrieve the tables faster.
  • Developed Oozie shell wrapper for implementing Oozie re-run process for common workflows and sub-workflows.
  • Used Autosys scheduler to automate the jobs.
  • Used various file formats Avro, Parquet, Json, Text by using snappy compression.
  • Used CVS repository to checking or checkout the code.

Environment: Hadoop, HDFS, Hive, Sqoop, Impala, Flume, Python, Oozie, Autosys, Linux, Oracle, Netezza and CVS, Cloudera, Spark sql.

Confidential, Jersey City, NJ

Big Data Developer

Responsibilities:
  • Worked with closely with Business sponsors on the architectural solutions to meet their business needs
  • Conducted information sharing and teaching sessions to facilitate increased awareness of industry trends and upcoming initiatives by ensuring compliance between business strategies and goals and solution architecture designs
  • Performance tuned the application at various layers - MR, HIVE, CDH, and Oracle.
  • Worked on Spark SQL, creating Data Frames from hive tables and applying schema to the data in hdfs.
  • Used Qlikview to create visual interface of the real time data processing.
  • Implemented partitioning, dynamic partitioning and bucketing in hive.
  • Imported and exported data from various databases Netezza, oracle, MySql, DB2 into hdfs.
  • Automated the process from pulling the data from data sources to Hadoop and exporting the data in the form of Jason files in to specified location.
  • Migrated the Hive queries to Impala
  • Worked on various file formats Avro, SerDe, Parquet, Text by using snappy compression.
  • Created analysis batch job prototypes using Hadoop, Pig, Oozie, Hue and Hive.
  • Used Git repository to checking and checkout the code.
  • Designed, documented operational problems by following standards and procedures using a software-reporting tool JIRA.

Environment: Hadoop, HDFS, Map Reduce, Spark, Kafka, Hive, Impala, Pig, Sqoop, Java, Linux shell scripting, Oracle, Netezza, MySql, Db2, Qlikview, GIT.

Confidential, Charlotte, NC

Hadoop  Developer

Responsibilities:
  • Worked extensively on importing and exported data into HDFS using sqoop.
  • Responsible for creating complex tables using hive.
  • Created partitioned tables in Hive for best performance and faster querying.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS.
  • Experience with professional software engineering practices and best practices for the full software development life cycle including coding standards, code reviews, source control management and build processes.
  • Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
  • Written MapReduce using java for implementing various formulas. By using partitioners, combiners and sorting.
  • Written multiple MapReduce procedures to power data for extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.
  • Handling structured and unstructured data and applying ETL processes.
  • Prepare Developer (Unit) Test cases and execute Developer Testing.
  • Create/Modify shell scripts for scheduling various data cleansing scripts and ETL loading process.
  • Supports and assist QA Engineers in understanding, testing and troubleshooting.
  • Written build scripts using ant and participated in the deployment of one or more production systems
  • Production Rollout Support that includes monitoring the solution post go-live and resolving any issues that are discovered by the client and client services teams.
  • Designed, documented operational problems by following standards and procedures using a software-reporting tool JIRA.
  • Used Git repository to checking and checkout the code.

Environment: MapReduce, Java, Flat files, Oracle, Netezza, Postgres, UNIX, HDFS, Sqoop, Hive, Oozie, Intellij, GIT, shell scripting

Confidential, Kansas City, MO

Hadoop Developer

Responsibilities:
  • Extracted data files from MySql, Oracle through Sqoop and placed in HDFS and processed.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Responsible to manage data coming from different sources.
  • Involved in loading data from UNIX file system to HDFs.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run Internally in map reduce way.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml, csv format data.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Loaded structured data from oracle into Cassandra (NoSql) using sqoop.
  • Worked on file formats csv, xml, json, avro, parquet.
  • Used compressions snappy, bz2, avro.

Environment: HDFS, Hive, Map Reduce, Eclipse, Oracle, MySQL, unix, sqoop, Cassandra, Shell Scripting.

Confidential

Java Developer

Responsibilities:
  • Used class-responsibility-collaborator (CRC) model to identify organized classes in the Hospital Management Systems.
  • Used sequence diagrams to show the object interactions involved with the Use-Cases of a user of the system.
  • Involved in Database Design by creating Data Flow Diagram (Process Model) and ER Diagram (Data Model).
  • Designed HTML screens with JSP for the front-end.
  • Made JDBC calls from the Servlets to the Database
  • Involved in designing stored procedures to extract and calculate billing information connecting to oracle.
  • Formatting the results from the Database as HTML reports to the client.
  • Java Script was used for client side validation.
  • Servlets are used as the controllers and Entity/Session Beans for Business logic purpose.
  • Used WebLogic to deploy applications on local and development environments of the application.
  • Used Eclipse for building the application.
  • Participated in User review meetings and used Test Director to periodically log the development issues, production problems and bugs.
  • Implemented and supported the project through development, Unit testing phase into production environment.
  • Used CVS Version manager for source control and CVS Tracker for change control management.

Environment: Java, JSP, JDBC, Java Script, HTML, WebLogic, Eclipse and CVS.

Confidential

Associate

Responsibilities:
  • Worked in criminal section, based on the portfolio of Judges and states handling the petitions.
  • Each day new filing comes in, and existing petitions which we have prepare a list based on the adjourn orders by the judge.
  • Working closely with Judge, registrars, Court masters, section officers on the cases by day to day activity.
  • Interacted with many lower court staff, to ensure all the petitions are correctly filed.
  • Communicated with many attorneys and clients on their petitions and their hearing.
  • Moved Computer section, where we the entire record of the organization and their corresponding portfolios are stored.
  • Taking care of the computers, printers, networking for different sections, court halls.
  • Installing, configuring and setting up network privileges to those who work on those computers and assigned them a unique to identify where the problem and user.
  • Upgrading the computers using RAM hard disk or any virus has been spread when they are getting slow.

We'd love your feedback!