• DM 分库分表合并场景
    • 上游实例
    • 同步需求
    • 下游实例
    • 同步方案
    • 同步任务配置

    DM 分库分表合并场景

    本文介绍如何在分库分表合并场景中使用 Data Migration (DM)。使用场景中,三个上游 MySQL 实例的分库和分表数据需要同步至下游 TiDB 集群。

    上游实例

    假设上游库结构如下:

    • 实例 1

      | Schema | Tables | |:———|:———| | user | information, log_north, log_bak | | store_01 | sale_01, sale_02 | | store_02 | sale_01, sale_02 |

    • 实例 2

      | Schema | Tables | |:———|:———| | user | information, log_east, log_bak | | store_01 | sale_01, sale_02 | | store_02 | sale_01, sale_02 |

    • 实例 3

      | Schema | Tables | |:———|:———| | user | information, log_south, log_bak | | store_01 | sale_01, sale_02 | | store_02 | sale_01, sale_02 |

    同步需求

    1. 合并三个实例中的 user.information 表至下游 TiDB 中的 user.information 表。
    2. 合并三个实例中的 user.log_{north|south|east} 表至下游TiDB中的 user.log_{north|south|east} 表。
    3. 合并三个实例中的 store_{01|02}.sale_{01|02} 表至下游TiDB中的 store.sale 表。
    4. 过滤掉三个实例的 user.log_{north|south|east} 表的所有删除操作。
    5. 过滤掉三个实例的 user.information 表的所有删除操作。
    6. 过滤掉三个实例的 store_{01|02}.sale_{01|02} 表的所有删除操作。
    7. 过滤掉三个实例的 user.log_bak 表。
    8. 因为 store_{01|02}.sale_{01|02} 表带有 bigint 型的自增主键,将其合并至 TiDB 时会引发冲突。您需要有方案修改相应自增主键以避免冲突。

    下游实例

    假设同步后下游库结构如下:

    Schema Tables
    user information, log_north, log_east, log_south
    store sale

    同步方案

    • 要满足同步需求 #1 和 #2,配置 Table routing 规则 如下:

      1. routes:
      2. ...
      3. user-route-rule:
      4. schema-pattern: "user"
      5. target-schema: "user"
    • 要满足同步需求 #3,配置 table routing 规则 如下:

      1. routes:
      2. ...
      3. store-route-rule:
      4. schema-pattern: "store_*"
      5. target-schema: "store"
      6. sale-route-rule:
      7. schema-pattern: "store_*"
      8. table-pattern: "sale_*"
      9. target-schema: "store"
      10. target-table: "sale"
    • 要满足同步需求 #4 和 #5,配置 Binlog event filter 规则 如下:

      1. filters:
      2. ...
      3. user-filter-rule:
      4. schema-pattern: "user"
      5. events: ["truncate table", "drop table", "delete", "drop database"]
      6. action: Ignore

      注意:

      同步需求 #4、#5 和 #7 的操作意味着过滤掉所有对 user 库的删除操作,所以此处配置了库级别的过滤规则。但是 user 库以后加入表的删除操作也都会被过滤。

    • 要满足同步需求 #6,配置 Binlog event filter 规则 如下:

      1. filters:
      2. ...
      3. sale-filter-rule:
      4. schema-pattern: "store_*"
      5. table-pattern: "sale_*"
      6. events: ["truncate table", "drop table", "delete"]
      7. action: Ignore
      8. store-filter-rule:
      9. schema-pattern: "store_*"
      10. events: ["drop database"]
      11. action: Ignore
    • 要满足同步需求 #7,配置 Black & white table lists 如下:

      1. black-white-list:
      2. log-bak-ignored:
      3. ignore-tables:
      4. - db-name: "user"
      5. tbl-name: "log_bak"
    • 要满足同步需求 #8,配置 column mapping 规则 如下:

      1. column-mappings:
      2. instance-1-sale:
      3. schema-pattern: "store_*"
      4. table-pattern: "sale_*"
      5. expression: "partition id"
      6. source-column: "id"
      7. target-column: "id"
      8. arguments: ["1", "store", "sale", "_"]
      9. instance-2-sale:
      10. schema-pattern: "store_*"
      11. table-pattern: "sale_*"
      12. expression: "partition id"
      13. source-column: "id"
      14. target-column: "id"
      15. arguments: ["2", "store", "sale", "_"]
      16. instance-3-sale:
      17. schema-pattern: "store_*"
      18. table-pattern: "sale_*"
      19. expression: "partition id"
      20. source-column: "id"
      21. target-column: "id"
      22. arguments: ["3", "store", "sale", "_"]

    同步任务配置

    同步任务的完整配置如下。详情请参阅 Data Migration 任务配置文件。

    1. name: "shard_merge"
    2. task-mode: all
    3. meta-schema: "dm_meta"
    4. remove-meta: false
    5. target-database:
    6. host: "192.168.0.1"
    7. port: 4000
    8. user: "root"
    9. password: ""
    10. mysql-instances:
    11. -
    12. source-id: "instance-1"
    13. route-rules: ["user-route-rule", "store-route-rule", "sale-route-rule"]
    14. filter-rules: ["user-filter-rule", "store-filter-rule", "sale-filter-rule"]
    15. column-mapping-rules: ["instance-1-sale"]
    16. black-white-list: "log-bak-ignored"
    17. mydumper-config-name: "global"
    18. loader-config-name: "global"
    19. syncer-config-name: "global"
    20. -
    21. source-id: "instance-2"
    22. route-rules: ["user-route-rule", "store-route-rule", "sale-route-rule"]
    23. filter-rules: ["user-filter-rule", "store-filter-rule", "sale-filter-rule"]
    24. column-mapping-rules: ["instance-2-sale"]
    25. black-white-list: "log-bak-ignored"
    26. mydumper-config-name: "global"
    27. loader-config-name: "global"
    28. syncer-config-name: "global"
    29. -
    30. source-id: "instance-3"
    31. route-rules: ["user-route-rule", "store-route-rule", "sale-route-rule"]
    32. filter-rules: ["user-filter-rule", "store-filter-rule", "sale-filter-rule"]
    33. column-mapping-rules: ["instance-3-sale"]
    34. black-white-list: "log-bak-ignored"
    35. mydumper-config-name: "global"
    36. loader-config-name: "global"
    37. syncer-config-name: "global"
    38. # 所有实例共享的其他通用配置
    39. routes:
    40. user-route-rule:
    41. schema-pattern: "user"
    42. target-schema: "user"
    43. store-route-rule:
    44. schema-pattern: "store_*"
    45. target-schema: "store"
    46. sale-route-rule:
    47. schema-pattern: "store_*"
    48. table-pattern: "sale_*"
    49. target-schema: "store"
    50. target-table: "sale"
    51. filters:
    52. user-filter-rule:
    53. schema-pattern: "user"
    54. events: ["truncate table", "drop table", "delete", "drop database"]
    55. action: Ignore
    56. sale-filter-rule:
    57. schema-pattern: "store_*"
    58. table-pattern: "sale_*"
    59. events: ["truncate table", "drop table", "delete"]
    60. action: Ignore
    61. store-filter-rule:
    62. schema-pattern: "store_*"
    63. events: ["drop database"]
    64. action: Ignore
    65. black-white-list:
    66. log-bak-ignored:
    67. ignore-tables:
    68. - db-name: "user"
    69. tbl-name: "log_bak"
    70. column-mappings:
    71. instance-1-sale:
    72. schema-pattern: "store_*"
    73. table-pattern: "sale_*"
    74. expression: "partition id"
    75. source-column: "id"
    76. target-column: "id"
    77. arguments: ["1", "store", "sale", "_"]
    78. instance-2-sale:
    79. schema-pattern: "store_*"
    80. table-pattern: "sale_*"
    81. expression: "partition id"
    82. source-column: "id"
    83. target-column: "id"
    84. arguments: ["2", "store", "sale", "_"]
    85. instance-3-sale:
    86. schema-pattern: "store_*"
    87. table-pattern: "sale_*"
    88. expression: "partition id"
    89. source-column: "id"
    90. target-column: "id"
    91. arguments: ["3", "store", "sale", "_"]
    92. mydumpers:
    93. global:
    94. threads: 4
    95. chunk-filesize: 64
    96. skip-tz-utc: true
    97. loaders:
    98. global:
    99. pool-size: 16
    100. dir: "./dumped_data"
    101. syncers:
    102. global:
    103. worker-count: 16
    104. batch: 100
    105. max-retry: 100