PHP Classes

PHP NGram Comparator: Compare strings to find the level of similarity

Recommend this page to a friend!
  Info   View files Example   View files View files (2)   DownloadInstall with Composer Download .zip   Reputation   Support forum   Blog (1)    
Last Updated Ratings Unique User Downloads Download Rankings
2023-04-29 (11 months ago) RSS 2.0 feedNot yet rated by the usersTotal: 60 This week: 1All time: 10,450 This week: 571Up
Version License PHP version Categories
phpngram 1.0.0BSD License7Algorithms, Statistics, Searching, Te..., A..., P..., P...
Description 

Author

This package can compare strings to find the level of similarity.

It can take a string and parses it to get the shingles and ngram words in an array.

The package can also compare the respective ngram word arrays of two strings and return the level of similarity as a percentage.

It can also compare two strings and return the number of ngram words that match.

The package also takes arrays of words of two phrases and generates arrays suitable for training with language models.

N-grams are contiguous sequences of n items from a given sample of text.

Shingles are overlapping sequences of words.

Instructions:

The class includes the following methods:

- get_ngrams($text, $n): This method takes a string of text and an integer n as input and returns an array of n-grams. The method splits the input text into n-grams and returns an array of these n-grams.

- compare_strings_ngram_pct($string1, $string2, $n): This method takes two strings and an integer n as input and returns the percentage of matching n-grams between the two strings. The method splits the two input strings into n-grams and calculates the percentage of matching n-grams.

- compare_strings_ngram_max_size($string1, $string2): This method takes two strings as input and returns the maximum matching n-gram size between the two strings. The method splits the two input strings into n-grams of varying lengths and returns the size of the largest matching n-gram.

- get_shingles($text, $shingle_size): This method takes a string of text and an integer shingle_size as input and returns an array of shingles. The method splits the input text into shingles of the specified size and returns an array of these shingles.

- train_ngram_model($tokenized_text=[], $n=[]): This method takes an array of tokenized text and an integer n as input and returns an array of n-gram counts. The method loops through each sentence in the tokenized text and creates n-grams of length n. It then counts the frequency of each n-gram and returns an array of n-gram counts.

Innovation Award
PHP Programming Innovation award nominee
May 2023
Number 3
Humans use languages to talk to each other. Usually, they form sentences that use words in several ways with the same meaning, although the sentences use different words.

When people ask questions to a software application, the software needs to understand how people express the same question.

This package can parse sentences in a way that can determine that a question is very similar to another that asks about the same problem.

This way, this package can implement the base of artificial intelligence applications that can understand what humans are asking in specific languages.

Manuel Lemos
Picture of JImmy Bo
  Performance   Level  
Name: JImmy Bo is available for providing paid consulting. Contact JImmy Bo .
Classes: 14 packages by
Country: United States United States
Age: ???
All time rank: 1209173 in United States United States
Week rank: 20 Up2 in United States United States Up
Innovation award
Innovation award
Nominee: 8x

Winner: 1x

Example

<?php

/*

Example usage of the class NgramComparator

Ngrams are a way of breaking up a string into chunks of n characters.
They allow us to compare strings for similarity, even if the strings are of different lengths,
or have some words in common but not others.

*/


require_once('class.ngram.php');


$comparator = new NgramComparator();

// Example usage of get_ngrams
$text = "The quick brown fox jumps over the lazy dog";
$ngrams = $comparator->get_ngrams($text, 3);
print_r($ngrams); // Output: Array ( [0] => The [1] => he [2] => e q [3] => qu [4] => qui [5] => uic [6] => ick [7] => ck [8] => k b [9] => br [10] => bro [11] => row [12] => ow [13] =>w f [14] => fo [15] => fox [16] => ox [17] => jum [18] => ump [19] =>mps [20] => ps [21] => ove [22] =>ver [23] =>er [24] =>the [25] => he [26] => laz [27] =>azy [28] =>zy [29] =>dog)

// Example usage of compare_strings_ngram_pct
$string1 = "The quick brown fox jumps over the lazy dog";
$string2 = "The lazy dog jumps over the quick brown fox";
$percentage_match = $comparator->compare_strings_ngram_pct($string1, $string2, 3);
echo
"Percentage match: " . $percentage_match . "%\n"; // Output: Percentage match: 95.121951219512%

// Example usage of compare_strings_ngram_max_size
$string1 = "The quick brown fox jumps over the lazy dog";
$string2 = "The lazy dog jumps over the quick brown fox";
$max_matching_ngram_size = $comparator->compare_strings_ngram_max_size($string1, $string2);
echo
"Max matching n-gram size: " . $max_matching_ngram_size . "\n"; // Output: Max matching n-gram size: 18

// Example usage of get_shingles
$text = "The quick brown fox jumps over the lazy dog";
$shingle_size = 2;
$shingles = $comparator->get_shingles($text, $shingle_size);
print_r($shingles); // Output: Array ( [0] => The quick [1] => quick brown [2] => brown fox [3] => fox jumps [4] => jumps over [5] => over the [6] => the lazy [7] => lazy dog )

// Example usage of train_ngram_model
$tokenized_text = [
    [
'The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog'],
    [
'The', 'lazy', 'dog', 'jumps', 'over', 'the', 'quick', 'brown', 'fox']
];
$n = 3;
$ngram_counts = $comparator->train_ngram_model($tokenized_text, $n);
print_r($ngram_counts); // Output: Array ( [The quick brown] => 1 [quick brown fox] => 1 [brown fox jumps] => 1 [fox jumps over] => 1 [jumps over the] => 2 [over the lazy] => 1 [the lazy dog] => 1 [lazy dog jumps] => 1 [dog jumps over] => 1 [over the quick] => 1 [the quick brown] => 1 [brown fox] => 2 [fox jumps] => 2 [jumps over] => 2 [over the] => 2 [the lazy] => 1 [lazy dog] => 1 [dog jumps] => 1 [over the] => 2 [the quick] => 1)



?>


  Files folder image Files  
File Role Description
Plain text file class.ngram.php Class N-Gram Comparison and Shingling in PHP Class
Accessible without login Plain text file example.ngram.php Example example usage

 Version Control Unique User Downloads Download Rankings  
 0%
Total:60
This week:1
All time:10,450
This week:571Up