slice icon Context Slice

Google Gemini (Nanobanana 2)

Overview

This skill provides instructions for generating and editing images using Nanobanana 2 via Google's Gemini models. The gemini-2.5-flash-image model provides native 4K resolution support, precise text rendering, region-based editing, and fast generation times. The base URL is https://generativelanguage.googleapis.com/v1beta with API key authentication via the gemini connection, passing the key in the x-goog-api-key header as PLACEHOLDER_TOKEN.

Restrictions

  • Free tier allows 15 requests per minute with standard resolution
  • Paid tier provides higher rate limits, 4K resolution access, and priority processing
  • Supported image formats for editing: PNG, JPEG, WebP
  • Handle rate limit errors (429) with exponential backoff
  • Images return as base64-encoded data that must be decoded and saved

Operations

Generate Image

Generate images using POST /models/gemini-2.5-flash-image:generateContent. Pass a contents array with user role and parts containing text prompt. Always prefix prompts with "Generate an image: [your prompt]" for best results.

Basic parameters:

  • temperature: Controls creativity (0.0-2.0 scale)
    • Low (0.4-0.6): Consistent, photorealistic results
    • Medium (0.7-0.8): Balanced creativity and consistency
    • High (0.9-1.5): Creative, artistic outputs
  • maxOutputTokens: Response metadata length (2048 standard)

Aspect Ratio Control

Control image dimensions by specifying aspect ratio at the beginning of prompts. Place aspect ratio specification before the main description for best results.

Supported ratios:

  • 1:1 - Square format
  • 16:9 - Wide landscape
  • 9:16 - Vertical portrait
  • 4:3 - Standard format

Format: "Generate an image in [ratio] aspect ratio: [description]"

Text Rendering

Nanobanana 2 supports precise text rendering in images. Include exact text in quotes within your prompt, specify typography style, mention text placement, and include design context.

Example prompts:

  • "business card with text 'John Doe', 'CEO', 'john@example.com', clean design, white background"
  • "modern poster with text 'Innovation Summit 2025', bold typography, minimalist design, blue gradient background"

Image Editing

Edit existing images using the same endpoint with multi-modal input. Include both the input image as inlineData (base64-encoded with mimeType) and text instructions in the parts array. Prefix instructions with "Edit this image: [edit instructions]".

Editing capabilities:

  • Color changes and adjustments
  • Style conversions (oil painting, watercolor, sketch)
  • Text additions and overlays
  • Region-based modifications
  • Brightness, contrast, saturation adjustments

Temperature for editing:

  • 0.5-0.6: Precise edits (color corrections, text additions)
  • 0.7-0.8: Balanced edits (style transformations, lighting changes)
  • 0.9-1.2: Creative edits (artistic transformations, dramatic changes)

Workflows

Basic Image Generation

Generate images with structured prompts for best results:

const response = await fetch(
  'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image:generateContent',
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': 'PLACEHOLDER_TOKEN'
    },
    body: JSON.stringify({
      contents: [{
        role: 'user',
        parts: [{
          text: 'Generate an image: mountain landscape, photorealistic, golden hour lighting, wide angle, 4K resolution'
        }]
      }],
      generationConfig: {
        temperature: 0.7,
        maxOutputTokens: 2048
      }
    })
  }
);

const data = await response.json();
const imageData = data.candidates[0].content.parts[0].inlineData.data;
const mimeType = data.candidates[0].content.parts[0].inlineData.mimeType;

// Decode and save the image
const imageBuffer = Buffer.from(imageData, 'base64');
fs.writeFileSync('output.png', imageBuffer);

Critical steps:

  1. Structure prompt as "Generate an image: [subject], [style], [lighting], [composition], [quality]"
  2. Set temperature based on desired creativity level (0.7 typical for balanced results)
  3. Extract image data from candidates[0].content.parts[0].inlineData.data
  4. Decode base64 data and save with proper file extension

Generate Image with Aspect Ratio

Control image dimensions by specifying aspect ratio:

const response = await fetch(
  'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image:generateContent',
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': 'PLACEHOLDER_TOKEN'
    },
    body: JSON.stringify({
      contents: [{
        role: 'user',
        parts: [{
          text: 'Generate an image in 16:9 aspect ratio: sunset over ocean, photorealistic, golden hour lighting, wide angle, 4K resolution'
        }]
      }],
      generationConfig: {
        temperature: 0.7,
        maxOutputTokens: 2048
      }
    })
  }
);

const data = await response.json();
const imageData = data.candidates[0].content.parts[0].inlineData.data;

Critical steps:

  1. Place aspect ratio specification at the beginning: "Generate an image in [ratio] aspect ratio: [description]"
  2. Supported ratios: 1:1 (square), 16:9 (wide), 9:16 (vertical), 4:3 (standard)
  3. Maintain structured prompt format after aspect ratio specification

Generate Image with Text

Create images with precise text rendering:

const response = await fetch(
  'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image:generateContent',
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': 'PLACEHOLDER_TOKEN'
    },
    body: JSON.stringify({
      contents: [{
        role: 'user',
        parts: [{
          text: 'Generate an image: modern poster with text "Innovation Summit 2025", bold typography, minimalist design, blue gradient background, 4K resolution'
        }]
      }],
      generationConfig: {
        temperature: 0.6,
        maxOutputTokens: 2048
      }
    })
  }
);

Critical steps:

  1. Put exact text in quotes within the prompt
  2. Specify typography style (bold, modern, elegant, etc.)
  3. Mention text placement if needed (top, center, top-right corner, etc.)
  4. Include design context (poster, business card, banner, etc.)
  5. Use slightly lower temperature (0.6) for more consistent text rendering

Edit Existing Image

Modify existing images with region-based or general edits:

// Read and encode the input image
const imageBuffer = fs.readFileSync('input.jpg');
const base64Image = imageBuffer.toString('base64');

const response = await fetch(
  'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image:generateContent',
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': 'PLACEHOLDER_TOKEN'
    },
    body: JSON.stringify({
      contents: [{
        role: 'user',
        parts: [
          {
            inlineData: {
              mimeType: 'image/jpeg',
              data: base64Image
            }
          },
          {
            text: 'Edit this image: change the sky area to sunset colors with pink and orange hues'
          }
        ]
      }],
      generationConfig: {
        temperature: 0.7,
        maxOutputTokens: 2048
      }
    })
  }
);

const data = await response.json();
const editedImageData = data.candidates[0].content.parts[0].inlineData.data;
const editedBuffer = Buffer.from(editedImageData, 'base64');
fs.writeFileSync('output_edited.png', editedBuffer);

Critical steps:

  1. Read input image file and convert to base64
  2. Include both image data and edit instructions in parts array
  3. Prefix edit instructions with "Edit this image: [instructions]"
  4. Specify regions clearly (sky, background, foreground, left side, top corner)
  5. Use temperature 0.7-0.8 for balanced edits
  6. Supported formats: PNG, JPEG, WebP

Style Transformation

Apply artistic styles or effects to existing images:

const imageBuffer = fs.readFileSync('photo.jpg');
const base64Image = imageBuffer.toString('base64');

const response = await fetch(
  'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image:generateContent',
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-goog-api-key': 'PLACEHOLDER_TOKEN'
    },
    body: JSON.stringify({
      contents: [{
        role: 'user',
        parts: [
          {
            inlineData: {
              mimeType: 'image/jpeg',
              data: base64Image
            }
          },
          {
            text: 'Edit this image: convert to oil painting style with visible brushstrokes'
          }
        ]
      }],
      generationConfig: {
        temperature: 0.8,
        maxOutputTokens: 2048
      }
    })
  }
);

Common style transformations:

  • Artistic styles: "oil painting", "watercolor", "sketch", "comic book", "anime"
  • Photographic effects: "vintage", "HDR", "long exposure", "tilt-shift", "black and white"
  • Filters: "sepia", "high contrast", "soft focus", "vignette"

Critical steps:

  1. Specify style transformation clearly in edit instruction
  2. Include style details (visible brushstrokes, bold outlines, etc.)
  3. Use temperature 0.8-1.0 for creative style transformations
  4. Iterate if needed - make multiple edits for best results

Prompt Patterns for Best Results

Structure prompts using these proven patterns:

Photorealistic Pattern:

text: 'Generate an image: portrait of a person, photorealistic, natural lighting, sharp focus, 8K resolution, professional photography'

Artistic Style Pattern:

text: 'Generate an image: cityscape at night, oil painting style, impressionist, vibrant colors, textured brushstrokes'

Text Integration Pattern:

text: 'Generate an image: business card with text "Jane Smith", "Designer", "jane@example.com", elegant typography, minimalist design, cream background'

Composition Control:

  • Layout: center composition, rule of thirds, symmetrical
  • Distance: close-up, medium shot, wide angle
  • Orientation: portrait, landscape
  • Angle: top-down, bird's eye view, low angle

Mood and Atmosphere:

  • Emotional tone: warm, cool, dramatic, serene, energetic
  • Brightness: dark, bright, moody, cheerful
  • Atmosphere: mysterious, inviting, tense, peaceful

Color Control:

  • Named colors: red, blue, emerald, crimson
  • Palette: pastel colors, vibrant hues, muted tones
  • Relationships: complementary colors, analogous palette
  • Gradients: blue to purple gradient, sunset colors

Lighting:

  • Type: natural light, studio lighting, golden hour, blue hour
  • Quality: soft light, hard shadows, dramatic lighting, flat lighting
  • Direction: backlit, side-lit, front-lit, rim lighting
  • Source: ambient light, directional, diffused, spotlight

Error Handling

Error codes:

  • 400 - Invalid prompt or parameters
  • 401 - Invalid API key
  • 429 - Rate limit exceeded (implement exponential backoff)
  • 500 - Generation failure (try different prompt/parameters)

Best practices:

  1. Be specific and detailed in prompts
  2. Specify resolution requirements explicitly
  3. Use temperature 0.6-0.9 for balanced results
  4. Handle base64 data properly when decoding
  5. Implement retry logic for rate limits and failures
  6. Save images with proper file extensions matching MIME type
  7. For editing, provide clear and specific instructions
  8. Iterate edits for best results rather than single complex edits
  9. Place aspect ratio specifications at the beginning of prompts
  10. Use structured prompt format: Subject + Style + Details

Hosting Generated Images

To get a public URL for a generated image, use skillImage Hosting after generation:

# After generating image to session/
- run: codeUpload Image to GitHub Pages
  args:
    - "session/generated-image.png"
    - "optional-custom-name"
    - "uiHosted Images Index"

This uploads the image to GitHub Pages and returns a permanent URL like:
https://username.github.io/media-assets/2026/01/14/custom-name.png

Use this when the user needs to share, embed, or reference the generated image by URL.