Statistical vs. Rule-Based Machine Translation Clause Samples
Statistical vs. Rule-Based Machine Translation. Given the low volume of specialised data available in the domain of the MORMED project, no linguistic evaluation was done for ▇▇▇▇▇, since the results would not be reliable. The output generated by SMT systems improves dramatically with the amount of material on which they are trained, and the small size of the corpora available at the beginning of the project resulted in a decision against SMT at this stage. As explained above, statistical machine translation makes sense in 2 main contexts:
1. when the domain is restricted
2. when large corpora are available to train the system. This was confirmed during a test carried out with another SMT System (Language ▇▇▇▇▇▇) evaluated against Systran prior to MORMED in the automotive domain. For test results please refer to chapter 4. In our case, the domain may not be very broad. Within the medical domain, the diseases involved in the MORMED project are very specific. However, the limited amount of texts available regarding the subject matter means that we must disregard the use of statistical machine translation systems, at least in the initial phases of the project.
