Oracle Stream Analytics is a powerful analytic toolkit designed to work directly on data in motion – simple data correlations, complex event processing, geo-fencing, and advanced dashboards run on millions of events per second.
The Oracle Stream Analytics platform provides a compelling combination of an easy-to-use visual façade to rapidly create and dynamically change Real Time Event Stream Processing (Fast Data) applications, together with a comprehensive run-time platform to manage and execute these solutions. This tool is business user-friendly and solves the business dilemmas by completely hiding and abstracting the underlying technology platform.
Oracle Stream Explorer has been renamed to Oracle Stream Analytics and contains all the capabilities of processing streams as well as events. The merged product provides the users a seamless experience.
Oracle GoldenGate for Big Data 12c product streams transactional data into big data systems in real time, without impacting the performance of source systems. It streamlines real-time data delivery into most popular big data solutions, including Apache Hadoop (HDFS), HBase, Hive, Flume, Kafka and Cassandra to facilitate improved insight and timely action.
Oracle GoldenGate for Big Data provides optimized and high performance delivery to Flume, HDFS, Hive, HBase, Kafka and Cassandra to support customers with their real-time big data analytics initiatives. Oracle GoldenGate for Big Data includes Oracle GoldenGate for Java, which enables customers to easily integrate to additional big data systems, such as Apache Storm, Apache Spark, Oracle NoSQL, MongoDB, SAP HANA, IBM PureData System for Analytics and many others.
Oracle GoldenGate for Big Data’s real-time data streaming platform also allows customers to keep their big data reservoirs, or big data lakes, up to date with their production systems.
Oracle GoldenGate for Big Data offers high-performance, fault-tolerant, easy-to-use, and flexible real-time data streaming platform for big data environments. It easily extends customers’ real-time data integration architectures to big data systems without impacting the performance of the source systems and enables timely business insight for better decision making.
These are 7 steps. GoldenGate software for MySQL are auto-installed as part of the setup.
Step 1 – Install GoldenGate for Big Data 12.3.0.1.0
Step2 – MySQL -> MySQL
Step 3 – MySQL -> HDFS (CSV format)
Step 4 – MySQL -> Hive (Avro format)
Step 5 – MySQL -> HBase
Step 6 – MySQL -> Kafka (Json format)
Step 7 – MySQL -> Cassandra
Create a base directory /u01
– Install Python 2.7 (required for Cassandra cqlsh)
– Copy files from the shared folder to /u01
– Enable supplemental logging in MySQL (stop and start MySQL)
– Create OS user: ggadmin/oracle
– Create MySQL databases: ggsource & ggtarget
– Create MySQL user: ggdemo/oracle
– Create empty schema tables in both MySQL database: ggsource & ggtarget
Tables: emp, dept, salgrade
– Create HDFS base directories:
/user/ggtarget/hdfs
/user/ggtarget/hive/schema
/user/ggtarget/hive/data
– Install GG 12.2.0.1.1 for MySQL (and execute CREATE SUBDIRS from ggsci)
Step 1: Install GoldenGate 12.3.0.1.0 binaries for Big Data
To install GoldenGate, you will extract the GG binaries tar file – this file has been copied to /u01 as part of the setup. Then connect to the GoldenGate command line interface (ggsci) and run CREATE SUBDIRS to create the subdirectories in the GoldenGate home. If you would like to auto-install GoldenGate for Big Data, you can select this option.
Step 2 – MySQL -> MySQL unidirectional replication
This step is intended to give you familiarity with how to configure GG for database to database replication. This step we will load data in MySQL database ‘ggsource’. The GG extract process ‘extmysql’ will capture the changes from MySQL’s binary logs and write them to the local trail file. The pump process ‘pmpmysql’ will route the data from the local trail (on the source) to the remote trail (on the target). The replicat process ‘repmysql’ will read the remote trail files, and apply the changes to the MySQL database ‘ggtarget’
In summary, we loaded data in MySQL database ‘ggsource’, GG extract process ‘extmysql’ captured the changes from the MySQL binary logs and wrote them to the local GG trail file. The pump process ‘pmpmysql’ routed the data from the local trail (on the source) to the remote trail (on the target). The replicat process ‘repmysql’ read the remote trail files, and applied the changes to the MySQL database ‘ggtarget’.
Step 3 – MySQL –> HDFS (delimited text format)
In this step we will load data in MySQL database ‘ggsource’, GG extract process ‘extmysql’ will capture the changes from MySQL’s binary logs and write them to the local trail file. The pump process ‘pmphadop’ will route the data from the local trail (on the source) to the remote trail (on the target). The replicat process ‘rhdfs’ will read the remote trail files, and write the data to the HDFS target directory /user/ggtarget/hdfs/
In summary, we loaded data in MySQL database ‘ggsource’, GG extract process ‘extmysql’ captured the changes from the MySQL binary logs and wrote them to the local trail file. The pump process ‘pmphadop’ routed the data from the local trail (on the source) to the remote trail (on the target). The replicat process ‘rhdfs’ read the remote trail file, and wrote the data to the HDFS target directory /user/ggtarget/hdfs/*.
The stats command displays the statistics of the data that GoldenGate processed (grouped by insert/update/deletes). Counts should match between source and target.
You can also see the files created by GG from Hue:
http://127.0.0.1:8888/
Login to Hue: cloudera/cloudera
Click on File Browser (Manage HDFS) > Navigate to /user/ggtarget/hdfs
Step 4 – MySQL –> Hive (Avro format)
In this step we will load data in MySQL database ‘ggsource’, GG extract process ‘extmysql’ will capture the changes from MySQL’s binary logs and wrote them to the local trail file. The pump process ‘pmphadop’ will route the data from the local trail (on the source) to the remote trail (on the target). The replicat process ‘rhive’ will read the trail file, create the Hive tables, write the data and the schema files (avsc) to the HDFS target directory for Hive: /user/ggtarget/hive/data/* and /user/ggtarget/hive/schema/*
You can also see the Hive data created by GG from Hue:
Open a Browser window> http://127.0.0.1:8888/
Login to Hue: cloudera/cloudera
1- Click on Query Editor, Hive
2- Pull down on Database selection, and select ggtarget2hive_avro
3- Then hover the mouse over the emp table, and click the ‘preview sample data’ –small grey icon
In summary, we loaded data in MySQL database ‘ggsource’, GG extract process ‘extmysql’ captured the changes from the MySQL binary logs and wrote them to the local trail file. The pump process ‘pmphadop’ routed the data from the local trail (on the source) to the remote trail (on the target). The replicat process ‘rhive’ read the remote trail files, created the Hive tables, wrote the data and the schema files (avsc) to the HDFS target directory for Hive: /user/ggtarget/hive/data/* and /user/ggtarget/hive/schema/*
Step 5 – MySQL –> HBase
In this step we will load data in MySQL database ‘ggsource’, GG extract process ‘extmysql’ will capture the changes from MySQL’s binary logs and write them to the local trail file. The pump process ‘pmphadop’ will route the data from the local trail (on the source) to the remote trail (on the target). The replicat process ‘rhbase’ will read the remote trail files, create the HBase tables and write the data to those tables.
You can also see the HBase data created by GG from Hue:
Open a Browser window> http://127.0.0.1:8888/
Login to Hue: cloudera/cloudera
1- Click on Data Browser, HBase
2- Click on one of the table to browse the data
In summary, you loaded data in MySQL database ‘ggsource’, GG extract process ‘extmysql’ captured the changes from the MySQL binary logs and wrote them to the local trail file. The pump process ‘pmphadop’ routed the data from the local trail (on the source) to the remote trail (on the target). The replicat process ‘rhbase’ read the remote trail files, created the HBase tables and wrote the data to those tables.
Step 6 – MySQL –> Kafka (Json format)
In this step we will load data in MySQL database ‘ggsource’, GG extract process ‘extmysql’ will capture the changes from MySQL’s binary logs and write them to the local trail file. The pump process ‘pmphadop’ will route the data from the local trail (on the source) to the remote trail (on the target). The replicat process ‘rkafka’ will read the remote trail files, act as a producer and write the messages to an auto-created topic for each table in the source database.
In summary, you loaded data in MySQL database ‘ggsource’, GG extract process ‘extmysql’ captured the changes from the MySQL binary logs and wrote them to the local trail file. The pump process ‘pmphadop’ routed the data from the local trail (on the source) to the remote trail (on the target). The replicat process ‘rkafka’ read the remote trail files, acted as a producer and wrote the messages to an auto-created topic for each table in the source database.
Step 7 – MySQL –> Cassandra
In this step we will load data in MySQL database ‘ggsource’, GG extract process ‘extmysql’ will capture the changes from MySQL’s binary logs and write them to the local trail file. The pump process ‘pmphadop’ will route the data from the local trail (on the source) to the remote trail (on the target). The replicat process ‘rcass’ will read the remote trail files, create the Cassandra tables and write the data to those tables.
In summary, you loaded data in MySQL database ‘ggsource’, GG extract process ‘extmysql’ captured the changes from the MySQL binary logs and wrote them to the local trail file. The pump process ‘pmphadop’ routed the data from the local trail (on the source) to the remote trail (on the target). The replicat process ‘rcass’ read the remote trail files, created the Cassandra tables and wrote the data to those tables.
Leave a Reply