Anyone can write Alexa Skills in PHP. But what about NLU, the Natural Language Understanding? Now there are some tools that have been written in Python, which is the norm in machine learning environments. In the following part you will see how to do this with RASA NLU in PHP, without integrating Python.
Why not use tools from Amazon or Google?
Of course, that would make life a lot easier, but nobody really knows anything about the lifetime of data in the networks of these global companies. And what about privacy? From my point of view, there is only one solution: You set it up yourself and do it by using an already existing open source project. If your training data is not too complex, this approach does not really harm you from the hardware point of view.
NLU vs. NLP – a few basics
First, you should know that NLU is not NLP. This means that Natural Language Understanding is not Natural Language Processing. If you want to know more about the differences, I recommend this blog post. In short, NLP is the complete ecosystem of human-machine interaction, while NLU is “only” the AI ​​part in sorting the unstructured data and bringing it into shape with the
- these can be understood by machines
- with which an intent can be evaluated
- significant entities can be extracted
- it is possible to train models
Let’s go a little further into detail.
INTENT
Understanding the Intent – more simply, what the other person actually means – is very tricky even in human interaction. What is the intention that a person wants to express? But that’s a different story.
At the moment we are trying to train machines to understand the intent of a sentence. Let’s take a look at the following example: “I have to stay home by Friday.”
This sentence can have multiple intents, which makes it hard to understand it:
- The narrator has children and not a nanny for the coming week.
- The narrator is sick.
- For some reason, the narrator has to work from home.
These intents have almost nothing in common except the fact that the narrator will be back on Friday.
ENTITY
For these intents it is important that we take some useful values ​​from these sentences. Let’s take the example from above. What could be a value in this sentence that we should try to obtain? I would say that Friday is such a value.
Why? Depending on the intent, this information is really useful. So let’s go ahead and put it in a variable. Let’s say it’s the last day of something. If we have an intent like “sick message”, then this value can be z. This can be used, for example, to create a Jira ticket with the subject “Sick: Maximillian Berghoff [date of last day]”.
MODEL
However, to get a machine to understand our intents and capture entities, we need to train them. For this blog post I used RASA NLU.
The training data looks like this:
language: "en" pipeline: - name: "nlp_spacy" model: "de" - name: "tokenizer_spacy" - name: "ner_crf" - name: "intent_featurizer_spacy" - name: "intent_classifier_sklearn" data: | ## intent:report_sick_duration - I will stay home for [3 days](duration) - I can not come the next [4 days](duration) - I will stay in bed for the next [5 days](duration) ## intent:report_sick_from_to - I will stay home until [friday](last) - My doctor suggests me to stay home until [friday](last) - I am absent from [monday](first) to [friday](last) - I will be back on [friday](last) ## lookup:duration - 1 days - 2 days - a week ## lookup:first - monday - tuesday - wednesday - thursday - friday ## lookup:last - monday - tuesday - wednesday - thursday - friday
Let’s look at the data key. Here you will find the intents. I split the sick leave into two different intents: an intent to inform your colleague / boss that you are away from X to Y due to illness, and another to specify the duration (number of days). It is useful to have two different intents because you can define two different sets of entities.
First Entity: I need a date for the first and last day, at least the first day should exist, so today we take the first day.
For the second entity, I only need the duration (and take today as the first day). In this example I have also described a kind of setting, which values ​​are possible. This is enriched during the training process.
Although this looks like some sort of pattern or algorithm to do the search for Intent and Entities, it is not.
In the end, the machine should recognize sentences that are not on this list. Until this is done with acceptable certainty, we need to give more examples, such as those mentioned above.
Tools for use – RASA NLU
There are other software tools that can be used to run NLU, but I have decided to do it with RASA NLU. It has all the features we need for our application and it is quite easy to install:
pip install rasa_nlu # OR (to get a bleeding edge) # git clone https://github.com/RasaHQ/rasa_nlu.git cd rasa_nlu pip install -r requirements.txt pip install -e .
Now you can pass on / create some configurations and start training. This one has one drawback: It’s written in Python. But what if you run a PHP application? I recommend that you use the RASA HTTP API and include it in your application by performing simple curl requests.
PHP Integration
For a lecture at PHP Central Europe, I prepared a code that shows that it is possible to integrate NLU into PHP without implementing NLU in PHP. You can look at the repository and also have a look at rasa_client/lib . This code should be sufficient to meet some basic requirements and get back meaningful models. For our example (Disease Report), I also introduced a command line application in Symfony.
This is not absolutely necessary, it is simply the fastest way for me to call the given code and get something readable.
To work with this example, you should run both Docker containers and enter the one for the app code:
$ cd examples/ $ docker-compose up -d $ docker exec -it rasa-nlu-client sh $ cd /app/src/ $ bin/console
The bin / console command should return a list that looks like this:
rasa rasa:nlu:parse Parse a given text for its intents. rasa:nlu:remove-model Remove a training model. rasa:nlu:status return the available projects and models rasa:nlu:train Train a project by a well-defined training data. For the training data, you should have a look at https://rasa.com/docs/nlu/dataformat/
….. which is an overview of the given commands. Then get to work!
Output $ bin/console rasa:nlu:status Got following projects\ Project: sick_report ======================= currently training:0 ----------------------- Available Model ----------------------- model_20181027-164038 model_20181027-173358 ----------------------- ----------------------- Loaded Model ----------------------- model_20181027-173358 -----------------------
This gives an overview of models and currently running training sets. The models listed in “Loaded Model” are loaded into memory, giving you the fastest answers.
TRAIN
output: rasa:nlu:train --project=illness_report data/config_train_illness_report.yml new model trained ================= Created Model: model_20181029-062412
This posts a valid training data file into a project (I used the above) to train a new model. You can also mention a model by using -model to train an existing model.
STATUS (NEW MODEL THERE)
# bin/console rasa:nlu:status
Got following projects\
Project: illness_report
=======================
currently training:0
-----------------------
Available Model
-----------------------
model_20181027-164038
model_20181027-173358
model_20181029-062412
-----------------------
-----------------------
Loaded Model
-----------------------
model_20181027-173358
-----------------------
After adding new training data without mentioning the model, the newly created model is displayed in the status list.
PARSE
$ bin/console rasa:nlu:parse --project=illness_report "I will be absent due to sickness until friday" Intent: report_illness_from_to - Confidence: 0.8078944273721 ============================================================ Entities found: ------ -------- ------- ----- ----------- ------------------ Name Value start end extractor confidence ------ -------- ------- ----- ----------- ------------------ last friday 20 26 ner_crf 0.93667437133644 ------ -------- ------- ----- ----------- ------------------ Ranking: ------------------------- ----------------- ------------ Pos. Name Confidence ------------------------- ----------------- ------------ report_illness_from_to 0.8078944273721 report_illness_duration 0.1921055726279 ------------------------- ----------------- ------------
Now you can start parsing text. You will receive a statistical answer. This means that every intent you receive is only one with calculated trust. “0.8” is pretty okay, but a little more training will increase your confirmation level. You will also get an entity back if it has been defined in your training data.
Conclusion
As you can see, it is possible to implement NLU in your application with PHP and not having to resort to Python, AWS and the like. And you do not have to rely on web services where you have no control over your data. Instead, you can use tools like RASA NLU that are very easy to implement and allow you to parse text across the AI ​​without leaving the PHP context.