The author of pointer-generator propose a method called coverage mechanism.
Coverage vector, which is the sum of attention distributions over all previous decoder timesteps, but the coverage vector in this repository seems to sum the attention of encoder!
Please help me find the correct method to implementation the mechanism or tell me where is my fault.