Google Gemini 1.5 Pro personal test: powerful and fragile at the same time

Author:neo yang Time:2024/03/17 Read: 8368

After testing the newly upgraded multi-modal AI model Gemini 1.5 Pro, users found that although it supports a more comprehensive input type including text, pictures, videos, files and folders, the reasoning ability has not been significantly improved, especially in distinguishing right from wrong. Additionally, processing of video, file, and folder inputs takes a long time, and there are limitations in handling large amounts of data.

Overview

Some time ago, I applied for Gemini 1.5 Pro on my wishlist. After that, I forgot about it. Today, I logged into Google AI Studio and found that I can use Gemini 1.5 pro. So I tested it. Later, I plan to switch from Gemini 1.0 pro to Gemini 1.5 pro.

Gemini 1.5 pro can support text, pictures, videos, files, and folders as prompt input.

Enter text

Nothing too special though.

Enter picture + text

After inputting the picture, Gemini1.5 pro takes more than 30 seconds to return the result.

I specifically told it that it was wrong, and it admitted it. It seems that Gemini is not very good at distinguishing right from wrong.

Input video + text

When inputting a video, the time it takes for Gemini1.5 pro to return the result is more than 200 seconds.

Input file + text

After inputting the file, Gemini1.5 pro takes more than 200 seconds to return the result.

Enter folder + text

The input folder has too much content, and combined with the previous content, the prompt token exceeds the limit and cannot return results.

Summarize

As a large multi-modal model, the most obvious feature of Gemini 1.5 pro is that it can input more comprehensive types of data than 1.0: text, pictures, videos, files and folders.

However, it seems that there has been no significant improvement in reasoning ability. At least, it is still unable to distinguish right from wrong.

tags:AIGC , AI

关注我的微信公众号