Description of Program:
The program aims to analyze an input text in Italian and generate a frequency table for each word present in the text. It does so by processing the text character by character, identifying words and their frequencies, and creating a matrix to store this information. The resulting frequency table is then printed and saved to a CSV file.
Here’s what to expect from the program:
-
Input Text: The program expects an input text in Italian, structured in sentences terminated by ‘.’, ‘?’, or ‘!’.
-
Output CSV File: The output will be a CSV file containing the frequency table. Each row in the CSV file represents a word, along with its immediately succeeding words and their frequencies. The order of rows in the CSV file is not significant.
-
Handling Punctuation: Punctuation marks ‘.’, ‘?’, and ‘!’ are treated as separate words. Apostrophes are considered part of words. Other punctuation marks may be discarded.
-
Case Insensitivity: The program treats uppercase and lowercase letters as equivalent. For example, ‘oggi’ is considered the same as ‘Oggi’ or ‘OGGI’.
-
First Word of Sentence: The program ensures that the first word of each sentence (after ‘.’, ‘?’, or ‘!’) starts with an uppercase letter.
-
Maximum Word Length: Words are assumed to be no longer than 30 printable characters.
-
Memory Allocation: The program dynamically allocates memory for the matrix to store word frequencies and for the dictionary to keep track of encountered words.
Overall, the program provides a robust solution for analyzing Italian text and generating corresponding frequency tables efficiently.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
// Structure definition for an object containing a string, its frequency, and coordinates in the matrix
typedef struct {
char String[30];
float frequency;
int occurrence;
int x;
int y;
} Object;
// Structure definition for a matrix containing a list of objects and its dimensions
typedef struct {
Object** list;
int x;
int y;
} Matrix;
// Structure definition for the alphabet containing a sub-alphabet and a word
struct Alphabet {
struct Alphabet* sub_alphabet[30];
Object *word;
};
// Definition of the Dictionary type as a pointer to struct Alphabet
typedef struct Alphabet Dictionary;
// Declaration of the recursive deallocation function
Dictionary* recursiveDeallocation(Dictionary* dict);
// Function to get the ASCII index of a character
int asciiIndex(char character) {
// Maps characters A-Z, a-z, !, ?, ., and ' to ASCII values
if ((character >= 'A' && character <= 'Z')) {
return (int)(character - 'A');
} else if ((character >= 'a' && character <= 'z')) {
return (int)(character - 'a');
} else if (character == '!') {
return 27;
} else if (character == '?') {
return 28;
} else if (character == '.') {
return 29;
} else if (character == ''') {
return 30;
}
return '';
}
// Function to compare two strings
bool compareStrings(char* string1, char* string2) {
// Checks if the two strings are equal, considering case insensitivity
if ((int)*string1 == (int)*string2 || ((int)*string1 + 32) == (int)*string2 || ((int)*string1 - 32) == (int)*string2) {
// The strings are equal or differ only in case
} else {
// The strings are different
return false;
}
// If both strings are terminated, they are equal
if ((*string1 || *string2) == '') {
return true;
} else {
// Otherwise, recursively compare the next characters
compareStrings((string1 + 1), (string2 + 1));
}
}
// Function for operational search of a string in the matrix
bool operationalSearch(char string[], int index, Dictionary* dict, int i, int p, Matrix* mat) {
// Initializes the current dictionary
Dictionary* currentDict = dict;
int ASCII;
// If the end of the string is reached, compare the strings or add the new word
if (string[index] == '') {
if (compareStrings(mat->list[i][0].String, mat->list[i][1].String) == 1) {
return true;
}
if (compareStrings(currentDict->word->String, mat->list[i][0].String) == 1) {
// If the word is already present, update occurrences and frequency
int n = 1;
int y = currentDict->word->y;
int x = currentDict->word->x;
int found = 0;
mat->list[y][0].occurrence++;
while (n <= mat->x && mat->list[y][n].x == n) {
if (compareStrings(mat->list[y][n].String, mat->list[i][p].String) == 1) {
mat->list[y][n].occurrence++;
mat->list[y][n].frequency = (float)mat->list[y][n].occurrence / (float)mat->list[y][0].occurrence;
found = 1;
}
mat->list[y][n].frequency = (float)mat->list[y][n].occurrence / (float)mat->list[y][0].occurrence;
n++;
}
if (found == 0) {
// If the word is not found, add it to the matrix
if (n > mat->x) {
mat->x = n;
}
mat->list[y] = realloc(mat->list[y], (n + 1) * sizeof(Object));
strcpy(mat->list[y][n].String, mat->list[i][p].String);
mat->list[y][n].occurrence = 1;
mat->list[y][n].frequency = (float)mat->list[y][n].occurrence / (float)mat->list[y][0].occurrence;
mat->list[y][n].x = n;
}
return true;
}
// Add the new word to the dictionary
strcpy(currentDict->word->String, string);
currentDict->word->x = p;
currentDict->word->y = i;
mat->list[i][1].x = 1;
mat->list[i][0].x = 0;
mat->list[i][1].occurrence = 1;
mat->list[i][0].occurrence = 1;
mat->list[i][1].frequency = (float)mat->list[i][1].occurrence / (float)mat->list[i][0].occurrence;
return false;
}
// The word is not present in the matrix, so add it to the dictionary
ASCII = asciiIndex(string[index]);
if (currentDict->sub_alphabet[ASCII] == NULL) {
currentDict->sub_alphabet[ASCII] = (Dictionary*)calloc(1, sizeof(Dictionary));
if (currentDict->sub_alphabet[ASCII] == NULL) {
exit(1);
}
currentDict->sub_alphabet[ASCII]->word = (Object*)malloc(sizeof(Object));
if (currentDict->sub_alphabet[ASCII]->word == NULL) {
exit(1);
}
// Initialize the allocated memory to zero
memset(currentDict->sub_alphabet[ASCII]->word, 0, sizeof(Object));
}
currentDict = currentDict->sub_alphabet[ASCII];
return operationalSearch(string, index + 1, currentDict, i, p, mat);
}
// Function to handle the end of a string
int endString(int index, char string1[30], Matrix* matrix, Dictionary* dict, int end) {
printf("%s ", string1);
// Assign the string to index 1 of the current row using strncpy()
size_t len1 = strlen(string1);
strncpy(matrix->list[matrix->y][1].String, string1, len1);
matrix->list[matrix->y][1].String[len1] = ''; // Ensure the string is properly terminated
matrix->list[matrix->y][1].x = 1;
matrix->list[matrix->y][0].x = 0;
// Check if the key (string) is already present in the dictionary
int check = operationalSearch(matrix->list[matrix->y][0].String, 0, dict, matrix->y, 1, matrix);
if (check == 0) {
// If not present, allocate a new row in the matrix
matrix->y++;
matrix->list = realloc(matrix->list, (matrix->y + 1) * sizeof(Object*));
matrix->list[matrix->y] = (Object*)malloc(2 * sizeof(Object));
for (int j = 0; j < 2; j++) {
memset(matrix->list[matrix->y][j].String, '', sizeof(matrix->list[matrix->y][j].String));
}
}
if (end == 1) {
return 0;
}
// Assign the string to index 0 of the current row using strncpy()
size_t len0 = strlen(string1);
strncpy(matrix->list[matrix->y][0].String, string1, len0);
matrix->list[matrix->y][0].String[len0] = ''; // Ensure the string is properly terminated
// Reset the temporary string
string1[0] = '';
return 0;
}
// Main function
int main() {
// Opening input and output files
FILE* file = fopen("input.txt", "r");
FILE* sheet = fopen("output.csv", "w");
if (file == NULL || sheet == NULL) {
printf("Error opening filesn");
return 1;
}
// Allocation of the dictionary and the matrix
Dictionary* dictionary = (Dictionary*)calloc(1, sizeof(Dictionary));
dictionary->word = (Object*)malloc(sizeof(Object));
Matrix matrix;
matrix.x = 1;
matrix.y = 0;
matrix.list = NULL;
matrix.list = (Object**)malloc(matrix.y * sizeof(Object*));
matrix.list[matrix.y] = (Object*)malloc(2 * sizeof(Object));
for (int j = 0; j < 2; j++) {
memset(matrix.list[matrix.y][j].String, '', sizeof(matrix.list[matrix.y][j].String));
}
char string1[30];
int index = 0;
char character;
strcpy(matrix.list[0][0].String, ".");
// Reading the input file
while ((character = fgetc(file))) {
char next = fgetc(file);
fseek(file, -1, SEEK_CUR);
// Checking punctuation characters to terminate strings
if (next == '!' || next == '?' || next == '.') {
string1[index] = character;
string1[index + 1] = '';
index = endString(index, string1, &matrix, dictionary,0);
string1[0] = '';
printf("|| ");
}
// Checking valid characters for strings
if (character == '!' || character == '?' || character == '.' ||
(character >= 'a' && character <= 'z') ||
(character >= 'A' && character <= 'Z') ||
(character >= '0' && character <= '9')) {
if (character == '!' || character == '?' || character == '.') {
string1[index - 1] = character;
fseek(file, +1, SEEK_CUR); // Checking the next of the next
int next = fgetc(file);
fseek(file, -2, SEEK_CUR);
if (next == EOF) { // If it reaches the end, force the loop to end
string1[index] = '';
endString(index, string1, &matrix, dictionary,1);
break;
}
} else {
string1[index] = character;
index++;
}
} else {
string1[index] = '';
if (character == ''') {
int len = strlen(string1);
string1[index] = ''';
string1[index + 1] = '';
}
index = endString(index, string1, &matrix, dictionary,0);
printf("| ");
}
}
printf("nnn");
// Printing the matrix
for (int i = 0; i < matrix.y; i++) {
int occurrences =0;
for (int j = 0; j <= matrix.x; j++) {
if(j==0){
printf("%s ", matrix.list[i][j].String);
fprintf(sheet, "%s ", matrix.list[i][j].String); // Print the value of the element
}
else{
if(occurrences==matrix.list[i][0].occurrence){
break;
}
occurrences= occurrences + matrix.list[i][j].occurrence;
printf("%s,%.4f ", matrix.list[i][j].String,matrix.list[i][j].frequency); // Print the value of the element
fprintf(sheet, "%s,%.4f ", matrix.list[i][j].String,matrix.list[i][j].frequency); // Print the value of the element
}
}
printf("n"); // Go to the next line at the end of each row
fprintf(sheet,"n"); // Go to the next line at the end of each row
}
// Closing files and freeing memory
fclose(sheet);
fclose(file);
// Freeing memory
for (int i = 0; i <= 2; i++) {
free(matrix.list[i]);
}
free(matrix.list);
// Deallocating memory for the dictionary (words and structure)
for (int i = 0; i < 26; i++) {
dictionary->sub_alphabet[i] = recursiveDeallocation(dictionary->sub_alphabet[i]);
}
free(dictionary->word);
free(dictionary);
return 0;
}
// Recursive deallocation function
Dictionary* recursiveDeallocation(Dictionary* dict) {
if (dict == NULL) {
return NULL;
}
for (int i = 0; i < 26; i++) {
dict->sub_alphabet[i] = recursiveDeallocation(dict->sub_alphabet[i]);
}
free(dict->word);
free(dict);
return NULL;
}
This C program is designed to analyze a text and generate a frequency table for each word in the text. Here’s a breakdown of each function:
asciiIndex: This function returns the ASCII index of a given character. It maps characters A-Z, a-z, !, ?, ., and ‘ to corresponding ASCII values.
compareStrings: This function compares two strings for equality, considering case insensitivity.
operationalSearch: This function performs an operational search of a string in the matrix. It checks if the word is already present in the dictionary and updates occurrences and frequencies accordingly.
endString: This function handles the end of a string, updates the matrix with the string and its frequency, and adds new words to the dictionary.
main: The main function opens input and output files, allocates memory for the dictionary and matrix, reads the input file character by character, and processes the text. It then prints the resulting frequency table, writes it to an output file, and frees memory.
recursiveDeallocation: This function recursively deallocates memory for the dictionary.
This program provides a comprehensive solution for analyzing text and generating frequency tables efficiently. It adheres to the requirements specified in the project description.
Regarding the segmentation fault error, it likely occurs in the function during memory reallocation. Specifically, the issue might be with the line:endString`
matrix->list = realloc(matrix->list, (matrix->y + 1) * sizeof(Object*));
This line attempts to reallocate memory for based on , which might not have been properly initialized or incremented. This could lead to an invalid memory access, resulting in a segmentation fault. To address the error, ensure that is correctly updated and initialized before using it for memory allocation.matrix->listmatrix->ymatrix->y
Input Text:
“What do the weather forecasts say? Today’s weather forecast: uncertain weather! Tomorrow’s forecast?”
Expected CSV Output (order of rows is not important):
., what, 1
say,?, 1
what, do, 1
do, the, 1
the, weather, 1
weather, forecasts, 0.5
forecasts, say, 0.5 uncertain, 0.5
say, today's, 1
today's, weather, 1
weather, forecast, 1
forecast, uncertain, 1
uncertain, weather, 1
weather, !, 1
Error Description:
The segmentation fault occurs in the endString function during memory allocation for a new row in the matrix. This issue may arise due to incorrect memory allocation or manipulation within the function. A segmentation fault indicates that the program tried to access memory that it did not have permission to access, leading to a crash. This can happen when accessing memory beyond the bounds of an allocated block, dereferencing a null pointer, or other memory-related issues.
2