Big Data Engineer Resume
Jersey City, NJ
SUMMARY:
- 12+ years of experience in analysis, design and development using Big Data, Java and Confidential.
- Experience on Hadoop, HDFS, Hive, Pig, MapReduce, Spark
- Configured Zoo Keeper, Flume, Kafka & Sqoop to the existing Hadoop cluster.
- Hands - on experience with Hadoop applications (such as administration, configuration management, monitoring, debugging, and performance tuning).
- Having experience on various Databases and Sources like Oracle, Netezza, MySql, Sql Server, Db2, Postgres, MainFrames.
- Participated in requirement analysis, reviews and working sessions to understand the requirements and system design.
- Experience in developing Front-End using JSF, JavaScript, HTML, XHTML and CSS.
- Experience in working with web/applications servers IBM Web sphere, Oracle Weblogic, Apache Tomcat.
- Experience in designing highly transactional web sites using J2EE technologies and handling design/implementation-using Eclipse.
TECHNICAL SKILLS:
Languages: Java, Python, R, Scala
Platforms: LINUX, Windows
Big Data: Hadoop, HDFS, MapReduce, Pig, Zookeeper, Hive, Sqoop, Flume, Kafka, Spark, Impala
J2SE / J2EE Technologies: Java, J2EE, JDBC, JSF, JSP, Web Services, Maven
Web Technologies: HTML, XHTML, CSS, Java Script, JSF and AJAX, Qlikview, XML and Shell Script.
Cloud Technologies: AWS, EC2, S3, Redshift, Data Pipeline, EMR.
Web/Application Servers: Web Sphere, Web logic Application server, Apache Tomcat
IDE / Tools: Eclipse, IntelliJ, RStudio
Methodologies: Agile, Scrum, Kanban
PROFESSIONAL EXPERIENCE:
Confidential
Big Data Engineer
Responsibilities:- Used Sqoop to pull the data from RDBMS like Teradata, Netezza, Oracle and storing it in the Hadoop.
- Creating external hive tables to store and queries the data which is loaded.
- Data will be loaded monthly, weekly and daily depends on the portfolios.
- Different data include retail, auto, cards, home loans, and references.
- Some of the retail data is in Mainframes and RDBMS, so need to apply joins and store them at one location.
- Used Spark SQL to create structured data by using data frame and querying from other data sources using JDBC and hive.
- Scrubbed the history data present in hive and files located in HDFS.
- Optimizations techniques include partitioning, bucketing.
- Created internal tool for comparing the RDBMS and Hadoop such that all the data located in source and target matches using shell script.
- Working with copybook files converting them from ASCHII, binary formats and storing in HDFS and creating hive tables such that we can Decommission Mainframes and make Hadoop as a primary source and same this for the export to mainframes.
- Used some of the Pig and written pig scripts to transform the data in structured format.
- Worked with Text, Avro, and Parquet file formatted and snappy as a default compression.
- Created Oozie work flows to automate the process in structured manner.
- We have 3 layers of storing the data Raw layer, Intermediate layer and Publish layer.
- Used impala to query the data into the publish layers where all the other teams or business users can access for faster processing.
- Worked on the Autosys and created jil with the dependencies of the other jobs such that all the jobs run in parallel and it’s been automated.
- Used Eclipse IDE to check the new files, existing, and modification needs be done.
- Used SVN repository to checking or checkout the code.
Environment: Hadoop, HDFS, Cloudera, Hive, Impala, shell script, eclipse, SVN, linux, oozie, Autosys, Teradata, Netezza, Oracle, Spark, Scala
Confidential
Hadoop Engineer
Responsibilities:- Managing several Hadoop clusters and other services of Hadoop Ecosystem in development and production environments.
- Work closely with engineering teams and participate in the infrastructure development and framework development.
- Worked on POCs in R&D environment on Hive2, Sparkand Kafka before providing services to the applications teams.
- Automate deployment and management of Hadoop services including implementing monitoring.
- Worked closely with Alpide team, ensuring all the issues where addressed or resolved sooner.
- Contribute to the evolving architecture of our services to meet changing requirements for scaling, reliability, performance, manageability, and price.
- Capacity planning of Hadoop clusters based on application requirement.
- Peer Reviews with the application teams for their release and ensure they maintain the standards.
- Created sentry policy files to provide access to the required databases and tables to view from impala to the business users in the dev, uat and prod environment.
- Migrated the existing data to Hadoop from RDBMS (Netezza, Oracle and Teradata) using Sqoop for processing the data and logs from server using flume into HDFS.
- Created managed and external tables in hive and implemented partitioning and bucketing techniques for space and performance efficiency.
- Used Impala on select queries for Business Users to retrieve the tables faster.
- Developed Oozie shell wrapper for implementing Oozie re-run process for common workflows and sub-workflows.
- Used Autosys scheduler to automate the jobs.
- Used various file formats Avro, Parquet, Json, Text by using snappy compression.
- Used CVS repository to checking or checkout the code.
Environment: Hadoop, HDFS, Hive, Sqoop, Impala, Flume, Python, Oozie, Autosys, Linux, Oracle, Netezza and CVS, Cloudera, Spark sql.
Confidential, Jersey City, NJ
Big Data Developer
Responsibilities:- Worked with closely with Business sponsors on the architectural solutions to meet their business needs
- Conducted information sharing and teaching sessions to facilitate increased awareness of industry trends and upcoming initiatives by ensuring compliance between business strategies and goals and solution architecture designs
- Performance tuned the application at various layers - MR, HIVE, CDH, and Oracle.
- Worked on Spark SQL, creating Data Frames from hive tables and applying schema to the data in hdfs.
- Used Qlikview to create visual interface of the real time data processing.
- Implemented partitioning, dynamic partitioning and bucketing in hive.
- Imported and exported data from various databases Netezza, oracle, MySql, DB2 into hdfs.
- Automated the process from pulling the data from data sources to Hadoop and exporting the data in the form of Jason files in to specified location.
- Migrated the Hive queries to Impala
- Worked on various file formats Avro, SerDe, Parquet, Text by using snappy compression.
- Created analysis batch job prototypes using Hadoop, Pig, Oozie, Hue and Hive.
- Used Git repository to checking and checkout the code.
- Designed, documented operational problems by following standards and procedures using a software-reporting tool JIRA.
Environment: Hadoop, HDFS, Map Reduce, Spark, Kafka, Hive, Impala, Pig, Sqoop, Java, Linux shell scripting, Oracle, Netezza, MySql, Db2, Qlikview, GIT.
Confidential, Charlotte, NC
Hadoop Developer
Responsibilities:- Worked extensively on importing and exported data into HDFS using sqoop.
- Responsible for creating complex tables using hive.
- Created partitioned tables in Hive for best performance and faster querying.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS.
- Experience with professional software engineering practices and best practices for the full software development life cycle including coding standards, code reviews, source control management and build processes.
- Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
- Written MapReduce using java for implementing various formulas. By using partitioners, combiners and sorting.
- Written multiple MapReduce procedures to power data for extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.
- Handling structured and unstructured data and applying ETL processes.
- Prepare Developer (Unit) Test cases and execute Developer Testing.
- Create/Modify shell scripts for scheduling various data cleansing scripts and ETL loading process.
- Supports and assist QA Engineers in understanding, testing and troubleshooting.
- Written build scripts using ant and participated in the deployment of one or more production systems
- Production Rollout Support that includes monitoring the solution post go-live and resolving any issues that are discovered by the client and client services teams.
- Designed, documented operational problems by following standards and procedures using a software-reporting tool JIRA.
- Used Git repository to checking and checkout the code.
Environment: MapReduce, Java, Flat files, Oracle, Netezza, Postgres, UNIX, HDFS, Sqoop, Hive, Oozie, Intellij, GIT, shell scripting
Confidential, Kansas City, MO
Hadoop Developer
Responsibilities:- Extracted data files from MySql, Oracle through Sqoop and placed in HDFS and processed.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from different sources.
- Involved in loading data from UNIX file system to HDFs.
- Involved in creating Hive tables, loading with data and writing hive queries which will run Internally in map reduce way.
- Experienced in running Hadoop streaming jobs to process terabytes of xml, csv format data.
- Load and transform large sets of structured, semi structured and unstructured data.
- Loaded structured data from oracle into Cassandra (NoSql) using sqoop.
- Worked on file formats csv, xml, json, avro, parquet.
- Used compressions snappy, bz2, avro.
Environment: HDFS, Hive, Map Reduce, Eclipse, Oracle, MySQL, unix, sqoop, Cassandra, Shell Scripting.
Confidential
Java Developer
Responsibilities:- Used class-responsibility-collaborator (CRC) model to identify organized classes in the Hospital Management Systems.
- Used sequence diagrams to show the object interactions involved with the Use-Cases of a user of the system.
- Involved in Database Design by creating Data Flow Diagram (Process Model) and ER Diagram (Data Model).
- Designed HTML screens with JSP for the front-end.
- Made JDBC calls from the Servlets to the Database
- Involved in designing stored procedures to extract and calculate billing information connecting to oracle.
- Formatting the results from the Database as HTML reports to the client.
- Java Script was used for client side validation.
- Servlets are used as the controllers and Entity/Session Beans for Business logic purpose.
- Used WebLogic to deploy applications on local and development environments of the application.
- Used Eclipse for building the application.
- Participated in User review meetings and used Test Director to periodically log the development issues, production problems and bugs.
- Implemented and supported the project through development, Unit testing phase into production environment.
- Used CVS Version manager for source control and CVS Tracker for change control management.
Environment: Java, JSP, JDBC, Java Script, HTML, WebLogic, Eclipse and CVS.
Confidential
Associate
Responsibilities:- Worked in criminal section, based on the portfolio of Judges and states handling the petitions.
- Each day new filing comes in, and existing petitions which we have prepare a list based on the adjourn orders by the judge.
- Working closely with Judge, registrars, Court masters, section officers on the cases by day to day activity.
- Interacted with many lower court staff, to ensure all the petitions are correctly filed.
- Communicated with many attorneys and clients on their petitions and their hearing.
- Moved Computer section, where we the entire record of the organization and their corresponding portfolios are stored.
- Taking care of the computers, printers, networking for different sections, court halls.
- Installing, configuring and setting up network privileges to those who work on those computers and assigned them a unique to identify where the problem and user.
- Upgrading the computers using RAM hard disk or any virus has been spread when they are getting slow.