• 2.1 包含和统一

    2.1 包含和统一

    认为特征结构提供一些对象的部分信息是很正常的,在这个意义上,我们可以根据它们通用的程度给特征结构排序。例如,(23a)比(23b)具有更少特征,(23b)比(23c)具有更少特征。

    1. [NUMBER = 74]

    统一被正式定义为一个(部分)二元操作:FS<sub>0</sub> ⊔ FS<sub>1</sub>。统一是对称的,所以 FS<sub>0</sub> ⊔ FS<sub>1</sub> = FS<sub>1</sub> ⊔ FS<sub>0</sub>。在 Python 中也是如此:

    1. >>> print(fs2.unify(fs1))
    2. [ CITY = 'Paris' ]
    3. [ NUMBER = 74 ]
    4. [ STREET = 'rue Pascal' ]

    如果我们统一两个具有包含关系的特征结构,那么统一的结果是两个中更具体的那个:

    1. >>> fs0 = nltk.FeatStruct(A='a')
    2. >>> fs1 = nltk.FeatStruct(A='b')
    3. >>> fs2 = fs0.unify(fs1)
    4. >>> print(fs2)
    5. None

    现在,如果我们看一下统一如何与结构共享相互作用,事情就变得很有趣。首先,让我们在 Python 中定义(21):

    1. >>> fs0 = nltk.FeatStruct("""[NAME=Lee,
    2. ... ADDRESS=[NUMBER=74,
    3. ... STREET='rue Pascal'],
    4. ... SPOUSE= [NAME=Kim,
    5. ... ADDRESS=[NUMBER=74,
    6. ... STREET='rue Pascal']]]""")
    7. >>> print(fs0)
    8. [ ADDRESS = [ NUMBER = 74 ] ]
    9. [ [ STREET = 'rue Pascal' ] ]
    10. [ ]
    11. [ NAME = 'Lee' ]
    12. [ ]
    13. [ [ ADDRESS = [ NUMBER = 74 ] ] ]
    14. [ SPOUSE = [ [ STREET = 'rue Pascal' ] ] ]
    15. [ [ ] ]
    16. [ [ NAME = 'Kim' ] ]

    我们为 Kim 的地址指定一个CITY作为参数会发生什么?请注意,fs1需要包括从特征结构的根到CITY的整个路径。

    1. >>> fs1 = nltk.FeatStruct("[SPOUSE = [ADDRESS = [CITY = Paris]]]")
    2. >>> print(fs1.unify(fs0))
    3. [ ADDRESS = [ NUMBER = 74 ] ]
    4. [ [ STREET = 'rue Pascal' ] ]
    5. [ ]
    6. [ NAME = 'Lee' ]
    7. [ ]
    8. [ [ [ CITY = 'Paris' ] ] ]
    9. [ [ ADDRESS = [ NUMBER = 74 ] ] ]
    10. [ SPOUSE = [ [ STREET = 'rue Pascal' ] ] ]
    11. [ [ ] ]
    12. [ [ NAME = 'Kim' ] ]

    通过对比,如果fs1fs2的结构共享版本统一,结果是非常不同的(如图(22)所示):

    1. >>> fs2 = nltk.FeatStruct("""[NAME=Lee, ADDRESS=(1)[NUMBER=74, STREET='rue Pascal'],
    2. ... SPOUSE=[NAME=Kim, ADDRESS->(1)]]""")
    3. >>> print(fs1.unify(fs2))
    4. [ [ CITY = 'Paris' ] ]
    5. [ ADDRESS = (1) [ NUMBER = 74 ] ]
    6. [ [ STREET = 'rue Pascal' ] ]
    7. [ ]
    8. [ NAME = 'Lee' ]
    9. [ ]
    10. [ SPOUSE = [ ADDRESS -> (1) ] ]
    11. [ [ NAME = 'Kim' ] ]

    不是仅仅更新 Kim 的 Lee 的地址的“副本”,我们现在同时更新他们两个的地址。更一般的,如果统一包含指定一些路径π的值,那么统一同时更新等价于π的任何路径的值。

    正如我们已经看到的,结构共享也可以使用变量表示,如?x

    1. >>> fs1 = nltk.FeatStruct("[ADDRESS1=[NUMBER=74, STREET='rue Pascal']]")
    2. >>> fs2 = nltk.FeatStruct("[ADDRESS1=?x, ADDRESS2=?x]")
    3. >>> print(fs2)
    4. [ ADDRESS1 = ?x ]
    5. [ ADDRESS2 = ?x ]
    6. >>> print(fs2.unify(fs1))
    7. [ ADDRESS1 = (1) [ NUMBER = 74 ] ]
    8. [ [ STREET = 'rue Pascal' ] ]
    9. [ ]
    10. [ ADDRESS2 -> (1) ]