PDF Parser Client Side

A lightweight easy to use package to parse text from PDF files on client side without any server dependency.

How to Install ?

Use npm or yarn to install this npm package

``

js

npm i pdf-parser-client-side

or

js

yarn add pdf-parser-client-side





Include the package

js

import extractTextFromPDF from "pdf-parser-client-side";





####

variant

 Parameter



The

variant parameter is used to specify the type of text extraction and replacement to be performed on the extractedText. Depending on the value of the variant

 parameter, different types of characters will be removed or retained.



|

variant

 Value                                 | Description                                                                            | Regular Expression                 | Retained Characters        |

| ----------------------------------------------- | -------------------------------------------------------------------------------------- | ---------------------------------- | -------------------------- |

|

clean

                                         | Removes all non-ASCII characters and any spaces that follow them.                      |

/[^\x00-\x7F]+\ \(?:[^\x00-\x7F] | )\/g

                     | ASCII characters only |

|

alphanumeric

                                  | Retains only alphanumeric characters (letters and numbers).                            |

/[^a-zA-Z0-9]+/g

                 | A-Z, a-z, 0-9              |

|

alphanumericwithspace | Retains alphanumeric characters and spaces. | /[^a-zA-Z0-9 ]+/g

                | A-Z, a-z, 0-9, space       |

|

alphanumericwithspaceandpunctuation | Retains alphanumeric characters, spaces, and basic punctuation marks (.,!?,). | /[^a-zA-Z0-9 .,!?]+/g

            | A-Z, a-z, 0-9, space, .,!? |

|

alphanumericwithspaceandpunctuationandnewline | Retains alphanumeric characters, spaces, basic punctuation marks (.,!?), and newlines. | /[^a-zA-Z0-9 .,!?]+/g

            | A-Z, a-z, 0-9, space, .,!? |



#### Example Usage



Javascript

jsx

import React from "react";

import extractTextFromPDF from "pdf-parser-client-side";



export default function Test() {

  const handleFileChange = async (e, variant) => {

    const file = e.target.files?.[0];

    if (file) {

      try {

        const text = await extractTextFromPDF(file, variant);

        console.log("Extracted Text:", text);

      } catch (error) {

        console.error("Error extracting text from PDF:", error);

      }

    }

  };



  return (

    

              type="file"

        name=""

        id="file-selector"

        accept=".pdf"

        onChange={(e) => handleFileChange(e, "clean")}

      />

    

  );

}





Typescript

tsx

import React from "react";

import extractTextFromPDF, { Variant } from "pdf-parser-client-side";



export default function Test() {

  const handleFileChange = async (

    e: React.ChangeEvent,

    variant: Variant

  ) => {

    const file = e.target.files?.[0];

    if (file) {

      try {

        const text = await extractTextFromPDF(file, variant);

        console.log("Extracted Text:", text);

      } catch (error) {

        console.error("Error extracting text from PDF:", error);

      }

    }

  };



  return (

    

              type="file"

        name=""

        id="file-selector"

        accept=".pdf"

        onChange={(e) => handleFileChange(e, "clean")}

      />

    

  );

}

Contributing

Feel free to contribute!

1. Fork the repository
2. Make changes
3. Submit a pull request

$3

PDF Parser Client Side

A lightweight easy to use package to parse text from PDF files on client side without any server dependency.

How to Install ?

Use npm or yarn to install this npm package

``

js

npm i pdf-parser-client-side

or

js

yarn add pdf-parser-client-side





Include the package

js

import extractTextFromPDF from "pdf-parser-client-side";





####

variant

 Parameter



The

variant parameter is used to specify the type of text extraction and replacement to be performed on the extractedText. Depending on the value of the variant

 parameter, different types of characters will be removed or retained.



|

variant

 Value                                 | Description                                                                            | Regular Expression                 | Retained Characters        |

| ----------------------------------------------- | -------------------------------------------------------------------------------------- | ---------------------------------- | -------------------------- |

|

clean

                                         | Removes all non-ASCII characters and any spaces that follow them.                      |

/[^\x00-\x7F]+\ \(?:[^\x00-\x7F] | )\/g

                     | ASCII characters only |

|

alphanumeric

                                  | Retains only alphanumeric characters (letters and numbers).                            |

/[^a-zA-Z0-9]+/g

                 | A-Z, a-z, 0-9              |

|

alphanumericwithspace | Retains alphanumeric characters and spaces. | /[^a-zA-Z0-9 ]+/g

                | A-Z, a-z, 0-9, space       |

|

alphanumericwithspaceandpunctuation | Retains alphanumeric characters, spaces, and basic punctuation marks (.,!?,). | /[^a-zA-Z0-9 .,!?]+/g

            | A-Z, a-z, 0-9, space, .,!? |

|

alphanumericwithspaceandpunctuationandnewline | Retains alphanumeric characters, spaces, basic punctuation marks (.,!?), and newlines. | /[^a-zA-Z0-9 .,!?]+/g

            | A-Z, a-z, 0-9, space, .,!? |



#### Example Usage



Javascript

jsx

import React from "react";

import extractTextFromPDF from "pdf-parser-client-side";



export default function Test() {

  const handleFileChange = async (e, variant) => {

    const file = e.target.files?.[0];

    if (file) {

      try {

        const text = await extractTextFromPDF(file, variant);

        console.log("Extracted Text:", text);

      } catch (error) {

        console.error("Error extracting text from PDF:", error);

      }

    }

  };



  return (

    

              type="file"

        name=""

        id="file-selector"

        accept=".pdf"

        onChange={(e) => handleFileChange(e, "clean")}

      />

    

  );

}





Typescript

tsx

import React from "react";

import extractTextFromPDF, { Variant } from "pdf-parser-client-side";



export default function Test() {

  const handleFileChange = async (

    e: React.ChangeEvent,

    variant: Variant

  ) => {

    const file = e.target.files?.[0];

    if (file) {

      try {

        const text = await extractTextFromPDF(file, variant);

        console.log("Extracted Text:", text);

      } catch (error) {

        console.error("Error extracting text from PDF:", error);

      }

    }

  };



  return (

    

              type="file"

        name=""

        id="file-selector"

        accept=".pdf"

        onChange={(e) => handleFileChange(e, "clean")}

      />

    

  );

}

Contributing

Feel free to contribute!

1. Fork the repository
2. Make changes
3. Submit a pull request