[SnowNLP] Python3で中国語の自然言語処理

$ pip3 install snownlp

tokenization

from snownlp import SnowNLP

s = SnowNLP(u'今天是周六。')
print(s.words)

$ python3 snow.py
[‘今天’, ‘是’, ‘周六’, ‘。’]

speech tagにするとnoun, adverb, verb, adjectiveなどを表現できます。

print(list(s.tags))

$ python3 snow.py
[(‘今天’, ‘t’), (‘是’, ‘v’), (‘周六’, ‘t’), (‘。’, ‘w’)]

pinyin

print(s.pinyin)

$ python3 snow.py
[‘jin’, ‘tian’, ‘shi’, ‘zhou’, ‘liu’, ‘。’]

sentences

s = SnowNLP(u'在茂密的大森林里,一只饥饿的老虎逮住了一只狐狸。老虎张开大嘴就要把狐狸吃掉。"慢着"!狐狸虽然很害怕但还是装出一副很神气的样子说,"你知道我是谁吗?我可是玉皇大帝派来管理百兽的兽王,你要是吃了我,玉皇大帝是决不会放过你的"。')
print(s.sentences)

[‘在茂密的大森林里’, ‘一只饥饿的老虎逮住了一只狐狸’, ‘老虎张开大嘴就要把狐狸吃掉’, ‘”慢着”‘, ‘狐狸虽然很害怕但还是装出一副很神气的样子说’, ‘”你知道我是谁吗’, ‘我可是玉皇大帝派来管理百兽的兽王’, ‘你要是吃了我’, ‘玉皇大帝是决不会放过你的”‘]

keyword

print(s.keywords(5))

$ python3 snow.py
[‘狐狸’, ‘大’, ‘老虎’, ‘大帝’, ‘皇’]

summary

print(s.summary(3))

[‘老虎张开大嘴就要把狐狸吃掉’, ‘我可是玉皇大帝派来管理百兽的兽王’, ‘玉皇大帝是决不会放过你的”‘]

sentiment analysis

text = SnowNLP(u'这个产品很好用,这个产品不好用,这个产品是垃圾,这个也太贵了吧,超级垃圾,是个垃圾中的垃圾')
sent = text.sentences
for sen in sent:
	s = SnowNLP(sen)
	print(s.sentiments)

$ python3 snow.py
0.7853504415636449
0.5098208142944668
0.13082804652201174
0.5
0.0954842128485538
0.04125325276132508

0から1の値を取り、1に近づくほどポジティブ、0に近いほどネガティブとなります。

[NLTK] customize sentiment analysis

unwanted = nltk.corpus.stopwords.words("english")
unwanted.extend([w.lower() for w in nltk.corpus.names.words()])

def skip_unwanted(pos_tuple):
	word, tag = pos_tuple
	if not word.isalpha() or word in unwanted:
		return False
	if tag.startswith("NN"):
		return False
	return True

positive_words = [word for word, tag in filter(
	skip_unwanted,
	nltk.pos_tag(nltk.corpus.movie_reviews.words(categories=["pos"]))
)]
negative_words = [word for word, tag in filter(
	skip_unwanted,
	nltk.pos_tag(nltk.corpus.movie_reviews.words(categories=["neg"]))
)]

positive_fd = nltk.FreqDist(positive_words)
negative_fd = nltk.FreqDist(negative_words)

common_set = set(positive_fd).intersection(negative_fd)

for word in common_set:
	del positive_fd[word]
	del negative_fd[word]

top_100_positive = {word for word, count in positive_fd.most_common(100)}
top_100_negative = {word for word, count in negative_fd.most_common(100)}

unwanted = nltk.corpus.stopwords.words("english")
unwanted.extend([w.lower() for w in nltk.corpus.names.words()])

positive_bigram_finder = nltk.collocations.BigramCollocationFinder.from_words([
	w for w in nltk.corpus.movie_reviews.words(categories=["pos"])
	if w.isalpha() and w not in unwanted
])

negative_bigram_finder = nltk.collocations.BigramCollocationFinder.from_words([
	w for w in nltk.corpus.movie_reviews.words(categories=["neg"])
	

[NLTK] sentiment analysis

NLTK has a built-in pretrained sentiment analyzer, VADER(Valence Aware Dictionary and sEntiment Reasoner)

import nltk
from pprint import pprint
from nltk.sentiment import SentimentIntensityAnalyzer

sia = SentimentIntensityAnalyzer()
pprint(sia.polarity_scores("Wow, NLTK is really powerful!"))

$ python3 app.py
{‘compound’: 0.8012, ‘neg’: 0.0, ‘neu’: 0.295, ‘pos’: 0.705}

compoundはaverageで-1から1までを示す

twitter corpus

tweets = [t.replace("://", "//") for t in nltk.corpus.twitter_samples.strings()]

def is_positive(tweet: str) -> bool:
    """True if tweet has positive compound sentiment, False otherwise."""
    return sia.polarity_scores(tweet)["compound"] > 0

shuffle(tweets)
for tweet in tweets[:10]:
    print(">", is_positive(tweet), tweet)

$ python3 app.py
> False Most Tory voters not concerned which benefits Tories will cut. Benefits don’t figure in the lives if most Tory voters. #Labour #NHS #carers
> False .@uberuk you cancelled my ice cream uber order. Everyone else in the office got it but me. 🙁
> False oh no i’m too early 🙁
> False I don’t know what I’m doing for #BlockJam at all since my schedule’s just whacked right now 🙁
> False What should i do .

BAD VS PARTY AGAIN :(((((((
> True @Shadypenguinn take care! 🙂
> True Thanks to amazing 4000 Followers on Instagram
If you´re not among them yet,
feel free to connect :-)… http//t.co/ILy03AtJ83
> False RT @mac123_m: Ed Miliband has spelt it out again. No deals with the SNP.
There’s a choice:
Vote SNP get Tories
Vote LAB and get LAB http//…
> True @gus33000 but Disk Management is same since NT4 iirc 😀
Also, what UX refinements were in zdps?
> False RT @KevinJPringle: One of many bizarre things about @Ed_Miliband’s anti-SNP stance is he doesn’t reject deal with LibDems, who imposed aust…

postivie_review_ids = nltk.corpus.movie_reviews.fileids(categories=["pos"])
negative_review_ids = nltk.corpus.movie_reviews.fileids(categories=["neg"])
all_review_ids = positive_review_ids + negative_review_ids

def is_positive(review_id: str) -> bool:
	"""True if the average of all sentence compound scores is positive. """
	text = nltk.corpus.movie_reviews.raw(review_id)
	scores = [
		sia.polarity_scores(sentence)["compound"]
		for sentence in nltk.sent_tokenize(text)
	]
	return mean(scores) > 0

shuffle(all_review_ids)
correct = 0
for review_id in all_review_ids:
	if is_positive(review_id):
		if review in positive_review_ids:
			correct += 1
	else:
		if review in negative_review_ids:
			correct += 1

print(F"{correct / len(all_review_ids):.2%} correct")

既にcorpusがあるのは良いですね。

[NLTK] Word frequency

$ pip3 install nltk

### download
NLTK can be download resouces
– names, stopwords, state_union, twitter_samples, moview_review, averaged_perceptron_tagger, vader_lexicon, punkt

import nltk

nltk.download([
	"names",
	"stopwords",
	"state_union",
	"twitter_samples",
	"movie_reviews",
	"averaged_perceptron_tagger",
	"vader_lexicon",
	"punkt",
])

State of union corpus

words = [w for w in nltk.corpus.state_union.words() if w.isalpha()]

to use stop words

words = [w for w in nltk.corpus.state_union.words() if w.isalpha()]
stopwords = nltk.corpus.stopwords.words("english")
words = [w for w in words if w.lower() not in stopwords]

word_tokenize()

text = """
For some quick analysis, creating a corpus could be overkill.
If all you need is a word list,
there are simpler ways to achieve that goal.
"""
pprint(nltk.word_tokenize(text), width=79, compact=True)

most common

fd = nltk.FreqDist(words)
pprint(fd.most_common(3))

$ python3 app.py
[(‘must’, 1568), (‘people’, 1291), (‘world’, 1128)]

specific word

fd = nltk.FreqDist(words)
pprint(fd["America"])

$ python3 app.py
1076

### concordance
どこに出現するかを示す

text = nltk.Text(nltk.corpus.state_union.words())
text.concordance("america", lines=5)

$ python3 app.py
Displaying 5 of 1079 matches:
would want us to do . That is what America will do . So much blood has already
ay , the entire world is looking to America for enlightened leadership to peace
beyond any shadow of a doubt , that America will continue the fight for freedom
to make complete victory certain , America will never become a party to any pl
nly in law and in justice . Here in America , we have labored long and hard to

text = nltk.Text(nltk.corpus.state_union.words())
concordance_list = text.concordance_list("america", lines=2)
for entry in concordance_list:
	print(entry.line)

$ python3 app.py
would want us to do . That is what America will do . So much blood has already
ay , the entire world is looking to America for enlightened leadership to peace

other frequency distribution

words: list[str] = nltk.word_tokenize(
"""Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.""")
text = nltk.Text(words)
fd = text.vocab()
fd.tabulate(3)

collocation

words = [w for w in nltk.corpus.state_union.words() if w.isalpha()]
finder = nltk.collocations.TrigramCollocationFinder.from_words(words)

pprint(finder.ngram_fd.most_common(2))
pprint(finder.ngram_fd.tabulate(2))

$ python3 app.py
[((‘the’, ‘United’, ‘States’), 294), ((‘the’, ‘American’, ‘people’), 185)]
(‘the’, ‘United’, ‘States’) (‘the’, ‘American’, ‘people’)
294 185

nltkが強力なのはわかった。

別ウィンドウでブラウザを開閉するaタグ・JSの書き方

アプリケーションで一部の機能を別ウィンドウで表示させ、ウィンドウを閉じるリンクも設置させたい時

### 別ウィンドウのリンク
– aタグに直接window.openとして書きます。ここではwidth500 height400のウィンドウです。

<a href="./translate.html" onclick="window.open('./translate.html', '', 'width=500,height=400'); return false;">翻訳</a>

### ウィンドウを閉じる
– window.closeで閉じます。

<a class="nav-link btn-magnify" href="" id="close">ブラウザを閉じる</a>

// 省略

<script>
    let close = document.getElementById('close');
    close.addEventListener('click', ()=>{
      window.close();
    });
  </script>

中々良い感じです。

bootstrapでtable trの背景色を変える

table-secondary など、classにtable-*を入れる

<table class="table">
                  <tr>
                    <th>問合せ日時</th><th>会社名</th><th>氏名</th><th>カテゴリ</th><th>問合せ内容</th><th>詳細</th>
                  </tr>
                  @if($quotes)
                    @foreach ($quotes as $quote)
                      <tr class="{{ $quote->status == 1 ? 'table-secondary' : ''}}">
                        <td>{{ \Carbon\Carbon::parse($quote->created_at)->isoFormat("YYYY年MM月DD日(ddd) H:m") }}</td> // 省略
                      </tr>
                    @endforeach
                  @endif
                </table>

before

after

class=”” 自体を三項方程式で切り分けたいが上手くいかないのでclassの中身だけを条件分岐としています。
しかし、bootstrap万能です。

[Laravel8.46.0] websocket通信できてるのにメッセージが返ってこない時

bootstrap.js


import Echo from 'laravel-echo';

window.Pusher = require('pusher-js');

window.Echo = new Echo({
    broadcaster: 'pusher',
    key: process.env.MIX_PUSHER_APP_KEY,
    cluster: process.env.MIX_PUSHER_APP_CLUSTER,
    
    wsHost: window.location.hostname,
	wsPort: 8000,
	forceTLS: true
    // forceTLS: false,
});

view

<script src="/js/app.js"></script>
    <script>
    	new Vue({
    		el: '#chat',
    		data: {
    			message: '',
                messages: []
    		},
    		methods: {
                getMessages(){

                    const url = '/ajax/chat';
                    axios.get(url)
                        .then((response)=>{

                            this.messages = response.data;
                        })
                },
    			send(){

    				const url = '/ajax/chat';
    				const params = { message: this.message};
    				axios.post(url, params)
    					.then((response) => {
    						this.message = '';
    					});
    			}
    		},
            mounted(){
                this.getMessages();

                Echo.channel('chat')
                    .listen('MessageCreated', (e) =>{
                        this.getMessages();
                    });
            }
    	});
    </script>

メッセージを送信しても反応しないので、何があかんねん、と思っていたら、コンソールを見て、websocketのconnection failedが出ていなければ、socket通信はできている。

再度ソースコードを見直していたところ、EventでprivateChannelになっているところを、priveteを外したら反応するようになった。
Event/MessageCreated.php

    public function broadcastOn()
    {
        return new Channel('chat');
    }

一日中悩んでトラブルシューティングできなくて愕然としてたが、解決する時は一瞬だ。
さあ、チャットのフロント作るぞー

[Laravel8.46.0] laravel-echoとpusher-jsでchat機能を実装する

config/app.php
L BroadcastServiceProviderのコメントアウトを外します。

App\Providers\BroadcastServiceProvider::class,

クライアント側で必要なパッケージをインストール
$ npm install –save laravel-echo pusher-js

resources/js/bootstrap.js
L コメントアウトを外します

import Echo from 'laravel-echo';

window.Pusher = require('pusher-js');

window.Echo = new Echo({
    broadcaster: 'pusher',
    key: process.env.MIX_PUSHER_APP_KEY,
    cluster: process.env.MIX_PUSHER_APP_CLUSTER,
    forceTLS: true
});

.env

BROADCAST_DRIVER=pusher
// 省略
PUSHER_APP_ID=
PUSHER_APP_KEY=
PUSHER_APP_SECRET=
PUSHER_APP_CLUSTER=mt1

$ php artisan make:model Message -m

migrationfile
L rollbackしてもう一度作り直します。

    public function up()
    {
        Schema::create('messages', function (Blueprint $table) {
            $table->bigIncrements('id');
            $table->integer('user_id');
            $table->text('message');
            $table->timestamps();
        });
    }

Model
Message.php

    protected $fillable = [
        'user_id',
        'message',
    ];

    public function user(){

    	return $this->belongsTo('App\Models\User');
    }

User.php

    public function messages(){
        return $this->hasMany('App\Models\Message');
    }

php artisan make:controller ChatsController

route

Route::get('/post', [ChatsController::class, 'index']);
Route::get('/messages', [ChatsController::class, 'fetchMessages']);
Route::post('/messages', [ChatsController::class, 'sendMessage']);

ChatsController.php

use App\Models\Message;
use Illuminate\Support\Facades\Auth;
use App\Events\MessageSent;

class ChatsController extends Controller
{
    //
    public function __construct(){
    	$this->middleware('auth');
    }

    public function index(){
    	return view('chat.post');
    }

    public function fetchMessages(){
    	return Message::with('user')->get();
    }

    public function sendMessage(Request $request){
    	$user = Auth::user();

    	$message = $user->messages()->create([
    		'message' => $request->input('message')
    	]);

    	event(new MessageSent($user, $message));

    	return ['status' => 'Message Sent!'];
    }
}

$ php artisan make:event MessageSent
MessageSent.php

use App\Models\User;
use App\Models\Message;

class MessageSent implements ShouldBroadcast
{
    use Dispatchable, InteractsWithSockets, SerializesModels;
    public $user,
    public $message;

    /**
     * Create a new event instance.
     *
     * @return void
     */
    public function __construct(User $user, Message $message)
    {
        //
        $this->user = $user;
        $this->message = $message;
    }

    /**
     * Get the channels the event should broadcast on.
     *
     * @return \Illuminate\Broadcasting\Channel|array
     */
    public function broadcastOn()
    {
        return new PrivateChannel('testApp');
    }
}

routes/channels.php
L プライベートチャンネルをリッスンする場合は、channels.phpで許可する

Broadcast::channel('testApp', function($user){
	return Auth::check();
});

post.blade.php

<!DOCTYPE html>
<html lang="en">
<head>
	<meta charset="UTF-8">
	<meta name="viewport" content="width=device-width, initial-scale=1.0">
	<title>Document</title>
	<link href="{{ mix('css/app.css')}}" rel="stylesheet" type="text/css">
	<meta name="csrf-token" content="{{ csrf_token() }}">
</head>
<body>
	<div id="app">
		<example-component></example-component>
	</div>

	<script src="{{ mix('js/app.js')}}"></script>
</body>
</html>

$ composer require laravel/ui
$ php artisan ui vue
$ npm install && npm run dev

/resources/js/components/ExampleComponent.vue

<template>
    <div>
        <ul>
            <li v-for="(message, key) in messages" :key="key">
                <strong>{{ message.user.name }}</strong>
                {{ message.message }}
            </li>
        </ul>
        <input v-model="text" />
        <button @click="postMessage" :disabled="!textExists">送信</button>
    </div>
</template>

<script>
    export default {
        data(){
            return {
                text: "",
                messages: []
            };
        },
        computed: {
            textExists(){
                return this.text.length > 0;
            }
        },
        created() {
            this.fetchMessages();
            Echo.private("testApp").listen("MessageSent", e=>{
                this.messages.push({
                    message: e.message.message,
                    user: e.user
                });
            });
        },
        methods: {
            fetchMessages(){
                axios.get("/messages").then(response =>{
                    this.messages = response.data;
                });
            },
            postMessage(message){
                axios.post("/messages", {message: this.text}).then(response => {
                    this.text = "";
                });
            }
        }
    }
</script>

あれ? 何やこれ。。。componentが上手く表示されんな。。。

[Laravel8.46.0] 8系でChat機能を実装したい1

$ php artisan –version
Laravel Framework 8.46.0
$ composer require pusher/pusher-php-server
$ php artisan make:model Message -m

migrationfile

    public function up()
    {
        Schema::create('messages', function (Blueprint $table) {
            $table->bigIncrements('id');
            $table->integer('sent_id')->index()->unsigned();
            $table->integer('recived_id')->index()->unsigned();
            $table->text('message');
            $table->timestamps();
        });
    }

model(Message.php)

    protected $fillable = [
        'sent_id',
        'recieved_id',
        'message',
    ];

$ php artisan serve –host 192.168.33.10 –port 8000

register画面からuserを3つぐらい作ります。

$ php artisan make:controller HomeController

route

use App\Http\Controllers\HomeController;

Route::get('/home', [HomeController::class, 'index']);

HomeController.php

use App\Models\User;
use Illuminate\Support\Facades\Auth;

class HomeController extends Controller
{
    //
    public function __construct(){
    	$this->middleware('auth');
    }

    public function index(){
    	$user = Auth::user();
    	$users = User::where('id','<>', $user->id)->get();

    	return view('chat.user_select', compact('users'));
    }
}

layout/app.blade.php
L bootstrap5を追加
L pusher.jsを追加

<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.0.0-beta1/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-giJF6kkoqNQ00vy+HMDP7azOuL0xtbfIcaT9wjKHr8RbDVddVHyTfAAsrekwKmP1" crossorigin="anonymous">

// 省略
            <main>
                @yield('content')
            </main>
// 省略
<script src="https://js.pusher.com/7.0.3/pusher.min.js"></script>
<script
  src="https://code.jquery.com/jquery-3.6.0.min.js"
  integrity="sha256-/xUj+3OJU5yExlq6GSYGSHk7tPXikynS7ogEvDej/m4="
  crossorigin="anonymous"></script>
@yield('script')

user_select.blade.php

@extends('layouts.app')

@section('content')
<div class="container">
	<div class="row">
		<div class="col-md-8 col-md-offset-2">
		</div>
	</div>

	<table class="table">
		<thead>
			<tr>
				<th>#</th>
				<ht>Name</ht>
				<th></th>
			</tr>
		</thead>
		<tbody>
		@foreach($users as $key => $user)
		<tr>
			<td>{{$loop->iteration}}</td>
			<td>{{$user->name}}</td>
			<td><a href="/chat/{{$user->id}}"><button type="button" class="btn btn-primary">Chat</button></a></td>
		</tr>
		</tbody>
	</table>
</div>
@endsection

$ php artisan make:controller ChatController

route

use App\Http\Controllers\ChatController;

Route::get('/chat/{recieved_id}', [ChatController::class, 'index'])->name('chat');
Route::post('/chat/send', [ChatController::class, 'store'])->name('chatSend');

chat/chat.blade.php

@extends('layouts.app')

@section('content')
<div class="container">
	<div class="row">
		<div class="col-md-8 col-md-offset-2">
		</div>
	</div>

	<!-- チャットルーム -->
	<div id="room">
		@foreach($messages as $key => $message)
			@if($message->sent_id == \Illuminate\Support\Facades\Auth::id())
				<div class="send" style="text-align: right">
					<p>{{$message->message}}</p>
				</div>
			@endif

			@if($message->received_id == \Illuminate\Support\Facades\Auth::id())
				<div class="send" style="text-align: right">
					<p>{{$message->message}}</p>
				</div>
			@endif
		@endforeach
	</div>

	<form>
		<textarea name="message" style="width:100%"></textarea>
		<button type="button" id="btn_send" class="btn btn-primary">送信</button>
	</form>

	<input type="hidden" name="sent_id" value="{{$param['sent_id']}}">
	<input type="hidden" name="recieved_id" value="{{$param['recieved_id']}}">
	<input type="hidden" name="login" value="{{\Illuminate\Support\Facades\Auth::id()}}">
</div>
@endsection

@section('script')
<script>
	Pusher.logToConsole = true;

	var pusher = new Pusher('*', {
		cluster: '*',
		encrypted: true
	});

	var pusherChannel = pusher.subscribe('testApp');

	pusherChannel.bind('chat_event', function(data){

		let appendText;
		let login = $('input[name="login"]').val();

		if(data.sent_id === login){
			appendText = '<div class="send" style="text-align:right"><p>' + data.message + '</p></div> ';
		} else if(data.recieved_id === login){
			appendText = '<div class="recieve" style="text-align:left"><p>' + data.message + '</p></div> ';
		} else {
			return false;
		}

		$("#room").append(appendText);

		if(data.recieved_id === login){
			Push.creaet("新着メッセージ",
			{
				body: data.message,
				timeout: 8000,
				onClick: function(){
					window.focus();
					this.close();
				}
			})
		}
	});

	$.ajaxSetup({
		headers : {
			'X-CSRF-TOKEN' : $('meta[name="csrf-token').attr('content'),
		}
	});

	$('#btn_send').on('click', function(){
		$.ajax({
			type: 'POST',
			url: '/chat/send',
			data: {
				message : $('textarea[name="message"]').val(),
                sent_id : $('input[name="sent_id"]').val(),
                recieved_id : $('input[name="recieved_id"]').val(),
			}
		}).done(function(result){
			$('textarea[name="message"]').val('');
		}).fail(function(result){

		});
	});
</script>
@endsection

$ php artisan make:event ChatMessageRecieved
ChatMessageRecieved.php

public function __construct($request)
    {
        //
        $this->request = $request;
    }

    /**
     * Get the channels the event should broadcast on.
     *
     * @return \Illuminate\Broadcasting\Channel|array
     */
    public function broadcastOn()
    {
        return new PrivateChannel('testApp');
    }

    public function broadcastWith(){
        return [
            'message' => $this->request['message'],
            'sent_id' => $this->request['sent_id'],
            'recieved_id' => $this->request['recieved_id'],
        ];
    }

    public function broadcastAs(){
        return 'chat_event';
    }

あ、想定通りに動いてないけど、ちょっと思い出してきた。
pusherは基本jsでonclickでpostしてstoreし、eventを発火させて新しいデータを取得するんだった。
でも確かpusherのcrudentialは.envだった記憶があるんだが。。。

fontawesomeをinputのvalueに表示させたい時

fontawesomeのアイコンをformのinputで表示させたい。

通常は以下のように、i要素で指定する。

<a href="/cart.html"><button type="button" class="btn btn-danger"><i class="fas fa-cart-plus"></i>	Add to cart</button></a>

formでは、classにfas、valueに、”&#x” と ”f**;”を入れる。ここでは、cartのアイコンで、f217

<form action="/cart/store" method="POST">
<input type="submit" class="btn btn-danger fas pt-2 pb-2" value="&#xf217; Add to cart">
</form>

使い方はわかったが、inputの中で使うと、bootstrapのfontと異なる表示に見えるな。。まあ良しとしましょう。