Kubernetes container, "canceling" Cloud Storage notification

2/20/2020

So I have a kubernetes container running, attached to a Google Cloud Storage bucket. THe overall functionality is:

ON object change notification:
    IF this is a new file THEN
        process the file
        delete the file from the bucket
    ENDIF

It works, but if I start throwing several files into the bucket, the notification for a particular file might get triggered again, though it will promptly fail because the file has been deleted. If I throw 20 files at the bucket, this happens 4 or 5 times. So my files are getting processed correctly, but the extra errors concern me.

Is there a way for me to, in the javascript code, indicate that the file has been processed, so that the notification doesn't get triggered again?

Here is the code I'm using (trimmed for readability):

module.exports.processListingImage = (event, context) => {
    const file = event.data;
    if (file.resourceState === 'not_exists') return Promise.resolve( `main: This is a deletion event: ${file.name}\n` );
    if (!file.name) return Promise.resolve( `main: This is a deploy event\n` );

    // FILENAME VALIDATIONS REMOVED FOR CLARITY

    // The input and output buckets
    const sourceBucket      = file.bucket;
    const destinationBucket = storage.bucket( MLS_IMAGE_BUCKET );

    // get the MLS id, MLS no, and file number from the file path
    const inputFilename = file.name;
    let mlsId           = path.dirname( inputFilename );

    let extension       = path.extname( inputFilename );
    let bareFilename    = path.basename( inputFilename, extension ); 
    let pieces          = bareFilename.split( '_' );
    let mlsNo = pieces[0];
    let fileNo = pieces[1];

    // direct bucket access to the file so we can download it
    const sourceFile = storage.bucket(sourceBucket).file(file.name);

    let imageInfo = { 'mlsId':             mlsId,
                      'mlsNo':             mlsNo,
                      'fileNo':            fileNo,
                      'fileSizeList':      JSON.parse( IMAGE_FILE_SIZES ),
                      'sourceFile':        sourceFile,
                      'destinationBucket': destinationBucket
    };

    // Invoke code for resizing the image
    return resizeImage( imageInfo )
        .then( ( stuff ) => {
            return markImageProcessed({'mlsId':mlsId, 'mlsNo':mlsNo, 'fileNo':fileNo});
        })
        .catch( (err) => {
            console.log(`markImageProcessed failed :: ${mlsId} : ${mlsNo} : ${fileNo}: `, err);
            return Promise.reject('mark image failure: ', err);
        })
        .then( ( stuff ) => {
            console.log(`main: All done, deleting original file :: ${mlsId} : ${mlsNo} : ${fileNo}:`, stuff);
            return sourceFile.delete();  // returns a promise
        })
        .catch( (err) => {
            console.log(`catchall :: ${mlsId} : ${mlsNo} : ${fileNo}: `, err);
            return Promise.reject('imageProcess failure: ', err);
        });

} // end exported function processListingImage();

Notes:

resizeImage() downloads the image from GCS, then uses GraphicsMagick to create multiple sizes of the image passed in.

markImageProcessed() makes a connection to a mySQL database to record when things are finished

according to the documentation, delete() on a Google Cloud Storage file object returns a Promise

So the output looks something like this: A series of messages from my code about the processing:

START: Id: <full bucket/filename>
back from resizeimage :: AK-JUNEAU : 19098 : 009
markImageProcessed all done :: AK-JUNEAU : 19098 : 009: 
main: All done, deleting original file :: AK-JUNEAU : 19098 : 009: 
200 OK

Interspersed with the invocations for deleting the file (at the end of processing):

START: <full bucket/filename>
[160] Final Status:  main: This is a deletion event: AK-JUNEAU/19098_009.jpg

But after most of the files have been processed, I start getting:

Error: No such object: idx-photos-raw-gs.ihousedev.com/AK-JUNEAU/19226_004.jpg

The system for some reason has triggered my processing code again for a file that already went through the processing, and was deleted at the end of it. This seems to happen 2 or 3 times for each file that I process. I was wondering if there is something else I need to do to tell GCS to stop triggering my function. When this goes to production, there will be hundreds of thousands of files processed each day, so all these extra calls will probably get expensive.

-- Andy Wallace
google-cloud-storage
javascript
kubernetes

1 Answer

2/25/2020

Ok, so it seemed that my process was (often) taking a bit too long for GCS, so it would throw another notification at me, causing the errors. I switched my openfaas to use asynchronous functions instead of synchronous, and everything is working fine. Turns out that my process (resizing images) takes between 2 and 4 seconds to image (producing 3 different image sizes). I used the information here:

https://docs.openfaas.com/reference/async/

Hope this helps someone.

-- Andy Wallace
Source: StackOverflow