Rafik Amari1,2 and Mounir Zrigui1,2, 1University of Monastir, Faculty of Science of Monastir, Tunisia, 2University of Monastir, Research Laboratory in Algebra, Numbers Theory and Intelligent Systems RLANTIS, Tunisia
Speech recognition is considered as the main task of the speech processing field. In this paper, we study the problem of discontinuous speech recognition (Isolated Words) for the Arabic language. Two architectures based on deep learning are compared in this work. The first one is based on CNN networks and the second combine CNN and LSTM networks. The “Arabic Speech Corpus for isolated Words” (ASD) database is used for all experiments. The results proved the performance and the advantage of CNN-LSTM approach compared to the CNN approach.
Deep Learnig, CNN, LSTM, Arabic speech Corpus, Speech recognition.
Saranya M1, Arockia Xavier Annie R2 and Geetha T V3, 1Computer Science and Engineering, CEG, Anna University, India, 2Assistant Professor, Computer Science and Engineering, CEG, Anna University, Chennai, India, 3UGC-BSR Faculty Fellow, Computer Science and Engineering, former Dean CEG, Anna University, Chennai, India
Now-a-days people around the world are infected by many new diseases. Developing or discovering a new drug for the newly discovered disease is an expensive and time consuming process and these could be eliminated if already existing resources could be used. For identifying the candidates from available drugs we need to perform text mining of a large-scale literature repository to extract the relation between the chemical, target and disease. Computational approaches for identifying the relationships between the entities in biomedical domain are appearing as an active area of research for drug discovery as it requires more man power. Currently, the computational approaches for extracting the biomedical relations such as drug-gene and gene-disease relationships are limited as the construction of drug-gene and gene-disease association from the unstructured biomedical documents is very hard. In this work, we propose pattern based bootstrapping method which is a semi-supervised learning algorithm to extract the direct relations between drug, gene and disease from the biomedical documents. These direct relationships are used to infer indirect relationships between entities such as drug and disease. Now these indirect relationships are used to determine the new candidates for drug repositioning which in turn will reduce the time and the patient’s risk.
Text Mining, Drug Discovery, Drug Repositioning, Bootstrapping, Machine Learning.
Harshit Jain and Naveen Pundir, Department of Computer Science and Engineering, IIT Kanpur, India
India and many other countries like UK, Australia, Canada follow the ‘common law system’ which gives substantial importance to prior related cases in determining the outcome of the current case. Better similarity methods can help in finding earlier similar cases, which can help lawyers searching for precedents. Prior approaches in computing similarity of legal judgements use a basic representation which is either a bag-of-words or dense embedding which is learned by only using the words present in the document. They, however, either neglect or do not emphasize the vital ‘legal’ information in the judgements, e.g. citations to prior cases, act and article numbers or names etc. In this paper, we propose a novel approach to learn the embeddings of legal documents using the citation network of documents. Experimental results demonstrate that the learned embedding is at par with the state-of-the-art methods for document similarity on a standard legal dataset.
Representation Learning, Similarity, Citation Network, Graph Embedding, Legal Judgements.
Elijah Pelofske1, Lorie M. Liebrock2, and Vincent Urias3, 1Cybersecurity Centers, New Mexico Institute of Mining and Technology, Socorro, New Mexico, USA, 2Cybersecurity Centers, New Mexico Institute of Mining and Technology, Socorro, New Mexico, USA, 3Sandia National Laboratories, Albuquerque, New Mexico, USA
In this research, we use user defined labels from three internet text sources (Reddit, Stackexchange, Arxiv) to train 21 different machine learning models for the topic classification task of detecting cybersecurity discussions in natural text. We analyze the false positive and false negative rates of each of the 21 model’s in a cross validation experiment. Then we present a Cybersecurity Topic Classification (CTC) tool, which takes the majority vote of the 21 trained machine learning models as the decision mechanism for detecting cybersecurity related text. We also show that the majority vote mechanism of the CTC tool provides lower false negative and false positive rates on average than any of the 21 individual models. We show that the CTC tool is scalable to the hundreds of thousands of documents with a wall clock time on the order of hours.
cybersecurity, topic modeling, text classification, machine learning, neural networks, natural language processing, Stackexchange, Reddit, Arxiv, social media.
Ishika Godage, Ruvan Weerasignhe and Damitha Sandaruwan, University of Colombo School of Computing, Colombo 07, Sri Lanka
It is no doubt that communication plays a vital role in human life. There is however a significant population of hearing impaired people who use non-verbal techniques for communication, which a majority in the population cannot understand. The predominant of these techniques is based on sign language, the main communication protocol among hearing impaired people. In this research, we propose a method to bridge the communication gap between hearing impaired people and others, which translates signed gestures into text. Most existing solutions, based on technologies such as Kinect, Leap Motion, Computer vision, EMG and IMU try to recognize and translate individual signs of hearing impaired people. The few approaches to sentence-level sign language recognition suffer from not being user-friendly or even practical owing to the devices they use. The proposed system is designed to provide full freedom to the user to sign an uninterrupted full sentence at a time. For this purpose, we employ two Myo armbands for gesture-capturing. Using signal processing and supervised learning based on a vocabulary of 49 words and 346 sentences for training with a single signer, we were able to achieve 75-80% word-level accuracy using gestural (EMG) and spatial (IMU) features for our signer-dependent experiment.
Sign Language, Word-Level Recognition, Sentence-Level Recognition, Myo Armband, EMG, IMU, Supervised Learning.
I.V.Gomes1,2, H.Puga2 and J.L.Alves2, 1MIT Portugal, Guimarães, Portugal, 2CMEMS – Center for Microelectromechanical Systems University of Minho, Portugal
Ultrasonic-microcasting is a manufacturing technique that opens the possibility of obtaining biodegradable magnesium stents through a faster and cheaper process whilst it also brings important features such as the production of devices with cross-section variation.This way, it may be feasible tailoring the expansion profile of the stent. Even so, there are still geometric constraints which are essentially associated with the minimum thickness that the process allows to obtain and that is, currently, about 0.20 mm. Moreover, the nature of the material used - magnesium alloy - also demands thicker structures which may be harmful to the stent performance. In this work, a numerical model for stent shape optimization based on its cross-section variation is presented, aiming at reducing the dogboning phenomenon observed in this type of device. Such model is in agreement with a set of optimization variables and limiting values of the design and optimization parameters, which are defined considering both the advantages and constraints of the ultrasonic-microcasting process. Moreover, this model suggests an optimized geometry that despite it presents higher thickness, has a performance comparable to that ofthe most popular stent models currently being used.
Stent, Optimization, Ultrasonic-Microcasting, Dogboning.
Andrew Bloch-Hansen , Roberto Solis Oba , and Andy Yu, Department of Computer Science, Western University, Ontario, Canada
The two-dimensional strip packing problem consists of packing in a rectangular strip of width 1 and minimum height a set of n rectangles, where each rectangle has width 0 < w ≤ 1 and height 0 < h ≤ 1. We consider the high-multiplicity version of the problem in which there are only K different types of rectangles. For the case when K = 3, we give an algorithm which provides a solution requiring at most height 3/2 +ε plus the height of an optimal solution, where ε is any positive constant.
LP-relaxation, two-dimensional strip packing, high multiplicity, approximation algorithm.
Sulaiman AdesegunKukoyi, O.F.W Onifade, Kamorudeen A. Amuda, Department of Computer Science, University of Ibadan, Nigeria
Voice information retrieval is a technique that provides Information Retrieval System with the capacity to transcribe spoken queries and use the text output for information search. CIS is a field of research that involves studying the situation, motivations, and methods for people working in a collaborative group for information seeking projects, as well as building a system for supporting such activities. Human find it easier to communicate and express ideas via speech. Existing voice search like Google and other mainstream voice search does not support collaborative search. The spoken speeches passed through the ASR for feature extraction using MFCC and HMM, Viterbi algorithm precisely for pattern matching. The result of the ASR is then passed as input into CIS System, results is then filtered have an aggregate result. The result from the simulation shows that our model was able to achieve 81.25% transcription accuracy.
Information Retrieval, Collaborative Search, Collaborative Information Seeking, Automatic Speech Recognition, Feature Extraction, MFCC, Hidden Markov Model, Acoustic Model, Viterbi Algorithm.
Gunbir Singh Baveja1 and Jaspreet Singh2, 1Delhi Public School, Dwarka, New Delhi, Delhi, India, 2GD Goenka University, Sohna, Haryana, India
Earthquake Prediction has been a challenging research area for many decades, where the future occurrence of this highly uncertain calamity is predicted. In this paper, several parametric and non-parametric features were calculated, where the non-parametric features were calculated using the parametric features. 8 seismic features were calculated using Gutenberg-Richter law, total recurrence time, seismic energy release. Additionally, criterions such as Maximum Relevance and Maximum Redundancy were applied to choose the pertinent features. These features along with others were used as input for an Extreme Learning Machine (ELM) Regression Model. Magnitude and Time data of 5 decades from the Assam-Guwahati region were used to create this model for magnitude prediction. The Testing Accuracy and Testing Speed were computed taking Root Mean Squared Error (RMSE) as the parameter for evaluating the model. As confirmed by the results, ELM shows better scalability with much faster Training and Testing Speed (up to thousand times faster) than traditional Support Vector Machines. The Testing RMSE (Root Mean Squared Error) came out to be . To further test the model’s robustness, magnitude-time data from California was used to- calculate the seismic indicators, fed into neural network (ELM) and tested on the Assam-Guwahati region. The model proves to be successful and can be implemented in early warning systems as it continues to be a major part of Disaster Response and Management.
Earthquake Prediction, Machine Learning, Extreme Learning Machine, Seismological Features, Gutenberg-Richter Law, Support Vector Machine.
Chee Keong Wee1 and Nathan Wee2, 1Digital Application Services, eHealth Queensland, Queensland, Australia, 2Science &Engineering Faculty, Queensland University of Technology, Queensland, Australia
Database replication is ubiquitous among organizations’ IT infrastructure when data is shared across multiple systems and their service uptime is critical. But complex software will eventually suffer outages due to different type of circumstances and it is important to resolve them promptly and restore the services. This paper proposes an approach to resolve data replication software’s through deep reinforcement learning. Empirical results show that the new method can resolve the software faults quickly with high accuracy.
Database Management, Data replication, reinforcement learning, fault resolution.
Yiqi Gao1 and Yu Sun2, 1Sage High School, Newport Coast, CA 92657, 2California State Polytechnic University, Pomona, CA, 91768
The start of 2020 marked the beginning of the deadly COVID-19 pandemic caused by the novel SARSCOV-2 from Wuhan, China. At the time of writing, the virus has infected over 150 million people worldwide and resulted in more than 3.5 million global deaths. Accurate predictions made using machine learning algorithms can be useful as a guide for hospitals and policy makers to make adequate preparations and enact effective policies to combat the pandemic. This paper takes a two-pronged approach to analyzing COVID-19. First, it attempts to utilize machine learning algorithms such as linear regression, polynomial regression, and random forest regression to make accurate predictions of daily COVID-19 cases using combinations of a range of predictors. Then, using the feature significance of random forest regression, it attempts to compare the influence of the individual predictors on the general trend of COVID-19 with the predictions made and to also highlight factors of high influence, which can then be targeted by policies for efficient pandemic response.
Covid-19 Case Prediction, Data Mining, Machine Learning Algorithm.
Santanu Ray1 and Pratik Gupta2, 1Ericsson, New Jersey, USA, 2Ericsson, Kolkata, India
Conventional test automation framework executes test cases in sequential manner which increases the execution time. Even though framework supports multiple test suites execution in parallel, we are unable to perform the same due to system limitations and infrastructure cost. Build and maintenance of automation framework is also time consuming and cost-effective. The paper is design for implementing ascalable test automation framework providing test framework as a service which expedite test execution by distributing test suites in multiple services running in parallel without any extra infrastructure.
Distributed Testing, Robot Framework, Docker, Automation Framework
Copyright © CST 2021