Description:
-
Background: I am new to Flink and have no prior experience with big data. My knowledge of Java or Linux is also limited.
-
Requirement: I want to use Flink CDC to perform a simple data table synchronization test between SQL Server databases, Specifically, using yaml files instead of java code
-
Environment Information:
name | version |
---|---|
flink | flink-1.19.1 |
Flink CDC | flink-cdc-3.1.0 |
SQL Server | SQL Server 2022 |
JDK | OpenJDK 11.0.23 |
Operating system | centos7.9 |
Steps and Issues Encountered:
- first conduct environmental verification,
- using the flink sql client ,executing create table and insert into table commands, successfully synchronizing data from TEST_FOR_FLINK.dbo.orders to DW.dbo.orders_dw.
-
Then I followed the documentation here :
-
Extract the following file to the corresponding directory of /flink-1.19.1:
- flink-cdc-3.1.0-bin.tar.gz
- Copy the following files to /flink-1.19.1/lib/:
- flink-connector-jdbc-3.1.2-1.18.jar
- mssql-jdbc-12.6.3.jre11.jar
- flink-sql-connector-sqlserver-cdc-3.1.0.jar
- Copy the following file
mssql-to-mssql-test01.yaml
to /flink-1.19.1/pipeline/.
source: type: sqlserver-cdc hostname: x.x.x.x port: 1433 username: sa password: xxxxxx database: TEST_FOR_FLINK tables: dbo.orders server-time-zone: UTC sink: type: jdbc driver: com.microsoft.sqlserver.jdbc.SQLServerDriver url: jdbc:sqlserver://x.x.x.x:1433;databaseName=DW username: sa password: xxxxxx table-name: dbo.orders_dw pipeline: name: Sync MSSQL Database to MSSQL parallelism: 2
- Execute the following command:
[root@master pipeline]# flink-cdc.sh mssql-to-mssql-test01.yaml
Error Message:
Exception in thread "main" java.lang.RuntimeException: Cannot find factory with identifier "sqlserver-cdc" in the classpath.
Available factory classes are:
at org.apache.flink.cdc.composer.utils.FactoryDiscoveryUtils.getFactoryByIdentifier(FactoryDiscoveryUtils.java:62)
at org.apache.flink.cdc.composer.flink.translator.DataSourceTranslator.translate(DataSourceTranslator.java:47)
at org.apache.flink.cdc.composer.flink.FlinkPipelineComposer.compose(FlinkPipelineComposer.java:101)
at org.apache.flink.cdc.cli.CliExecutor.run(CliExecutor.java:71)
at org.apache.flink.cdc.cli.CliFrontend.main(CliFrontend.java:71)
I hope someone can point out what I did wrong. The official documentation does not provide an ETL example for SQL Server to SQL Server use yaml file. If YAML files cannot be used for data transfer, where can I find documents or java project demo to use Java + Table API for data synchronization?
潘德拉贡阿尔托莉雅 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
sqlserver-cdc pipeline connector is not supported yet, you can wait the PR https://github.com/apache/flink-cdc/pull/3445