RESHAPE: REVERSE-EDITED SYNTHETIC HYPOTHESES FOR AUTOMATIC POST-EDITING

RESHAPE: Reverse-Edited Synthetic Hypotheses for Automatic Post-Editing

RESHAPE: Reverse-Edited Synthetic Hypotheses for Automatic Post-Editing

Blog Article

Synthetic training data has been extensively used to train Automatic Post-Editing (APE) models in many recent studies because the quantity missy minzy of human-created data has been considered insufficient.However, the most widely used synthetic APE dataset, eSCAPE, overlooks respecting the minimal editing property of genuine data, and this defect may have been a limiting factor for the performance of APE models.This article suggests adapting back-translation to APE to constrain edit distance, while using stochastic sampling in decoding to maintain the diversity of outputs, to create a new synthetic APE dataset, RESHAPE.

Our experiments show that (1) RESHAPE contains more samples resembling genuine APE data than eSCAPE does, and (2) 2 lb m&m bag using RESHAPE as new training data improves APE models’ performance substantially over using eSCAPE.

Report this page