Document#

Definition for Google Cloud Natural Language API documents.

A document is used to hold text to be analyzed and annotated.

class google.cloud.language.document.Annotations(sentences, tokens, sentiment, entities, language)#

Bases: tuple

Annotations for a document.

Parameters:
  • sentences (list) – List of Sentence in a document.
  • tokens (list) – List of Token from a document.
  • sentiment (Sentiment) – The sentiment of a document.
  • entities (list) – List of Entity found in a document.
  • language (str) – The language used for the annotation.
entities#

Alias for field number 3

language#

Alias for field number 4

sentences#

Alias for field number 0

sentiment#

Alias for field number 2

tokens#

Alias for field number 1

class google.cloud.language.document.Document(client, content=None, gcs_url=None, doc_type='PLAIN_TEXT', language=None, encoding='UTF32')[source]#

Bases: object

Document to send to Google Cloud Natural Language API.

Represents either plain text or HTML, and the content is either stored on the document or referred to in a Google Cloud Storage object.

Parameters:
  • client (Client) – A client which holds credentials and other configuration.
  • content (str) – (Optional) The document text content (either plain text or HTML).
  • gcs_url (str) – (Optional) The URL of the Google Cloud Storage object holding the content. Of the form gs://{bucket}/{blob-name}.
  • doc_type (str) – (Optional) The type of text in the document. Defaults to plain text. Can be one of PLAIN_TEXT or or HTML.
  • language (str) – (Optional) The language of the document text. Defaults to None (auto-detect).
  • encoding (str) – (Optional) The encoding of the document text. Defaults to UTF-8. Can be one of UTF8, UTF16 or UTF32.
Raises:

ValueError both content and gcs_url are specified or if neither are specified.

HTML = 'HTML'#

HTML document type.

PLAIN_TEXT = 'PLAIN_TEXT'#

Plain text document type.

TYPE_UNSPECIFIED = 'TYPE_UNSPECIFIED'#

Unspecified document type.

analyze_entities()[source]#

Analyze the entities in the current document.

Finds named entities (currently finds proper names as of August 2016) in the text, entity types, salience, mentions for each entity, and other properties.

See analyzeEntities.

Return type:EntityResponse
Returns:A representation of the entity response.
analyze_entity_sentiment()[source]#

Analyze the entity sentiment.

Finds entities, similar to AnalyzeEntities in the text and analyzes sentiment associated with each entity and its mentions.

Return type:EntitySentimentResponse
Returns:A representation of the entity sentiment response.
analyze_sentiment()[source]#

Analyze the sentiment in the current document.

See analyzeSentiment.

Return type:SentimentResponse
Returns:A representation of the sentiment response.
analyze_syntax()[source]#

Analyze the syntax in the current document.

See analyzeSyntax.

Return type:list
Returns:A list of Token returned from the API.
annotate_text(include_syntax=True, include_entities=True, include_sentiment=True)[source]#

Advanced natural language API: document syntax and other features.

Includes the full functionality of analyze_entities() and analyze_sentiment(), enabled by the flags include_entities and include_sentiment respectively.

In addition include_syntax adds a new feature that analyzes the document for semantic and syntacticinformation.

Note

This API is intended for users who are familiar with machine learning and need in-depth text features to build upon.

See annotateText.

Parameters:
  • include_syntax (bool) – (Optional) Flag to enable syntax analysis of the current document.
  • include_entities (bool) – (Optional) Flag to enable entity extraction from the current document.
  • include_sentiment (bool) – (Optional) Flag to enable sentiment analysis of the current document.
Return type:

Annotations

Returns:

A tuple of each of the four values returned from the API: sentences, tokens, sentiment and entities.

class google.cloud.language.document.Encoding[source]#

Bases: object

The encoding type used to calculate offsets.

Represents the text encoding that the caller uses to process the output. The API provides the beginning offsets for various outputs, such as tokens and mentions.

NONE = 'NONE'#

Unspecified encoding type.

UTF16 = 'UTF16'#

UTF-16 encoding type.

UTF32 = 'UTF32'#

UTF-32 encoding type.

UTF8 = 'UTF8'#

UTF-8 encoding type.

classmethod get_default()[source]#

Return the appropriate default encoding on this system.

Return type:str
Returns:The correct default encoding on this system.