Leveraging Structured Data for Natural Language Understanding
Traditionally, structured data and text have been living on different planets. We help organizations bring the two worlds together using advanced AI methods, tools and technologies.
Searching and analyzing text documents and messages has come a long way since the days of simple full-text search engines. Modern machine learning techniques are being used successfully to capture more of the meaning in natural language texts, identify types of entities, find associations and disambiguate terms far more accurately than ever before.
The basic principle of these techniques is to learn from what was said before. The more often a term is used in various different ways the more complete the picture becomes. That is why these methods benefit greatly from very large amounts of text data.
The problem is that a lot of the domain knowledge that provides the frame of reference for employees and customers of a particular organization or the members of a particular community is not available in the form of extensive text corpora. Some of it is defined in relatively few authoritative documents such as product documentation, contracts, regulations or technical specifications. A lot of it is formally modelled (or at least somewhat structured) and stored in databases, database schemas, spreadsheets or communications- and document metadata.
Enabling question answering and data analysis spanning all of these heterogeneous sources has traditionally incurred substantial data integration cost. We are developing tools to reduce that cost using advanced AI methods and a pragmatic approach of gradual enhancement, carefully avoiding many of the pitfalls of large data integration projects.
Learning Key Properties of Products and Services
Product descriptions are an amalgamation of marketing language and references to key features of products and services. They are not a detailed explanation of how the product works, what its exact purpose is or what goes into it. Product presentations are supposed to make the product appear special and unique, which sometimes makes it more difficult to understand and compare on a factual basis.
For tasks such as price comparison, market research, competitive analysis, customer sentiment analysis or automated customer support that poses a problem as the language used by customers, employees and suppliers very much depends on background knowledge about the functional properties of the product and its use in specific contexts.
At Topolyte we build software to automatically learn a detailed understanding of thousands of products and product categories from various structured and unstructured sources such as product and parts catalogs, operation manuals, product websites and Wikipedia articles.
We are also creating a high quality products knowledge base using our own software, which can be tuned, extended and customised for specific purposes. Please talk to us if you require consulting, software development, data engineering or data science services in this particular area.
PDFancyFolders iOS and Mac App
Search and extract text fragments and links from collections of PDF files sharing common formatting attributes. You can define complex filters using AND/OR operators combining any of the following filter types: Text content with wildcards (*), heading level, font family, font size, text and background colors, font style (bold/italic), page number as well as internal and external links.
You can view and export search results in various formats, including list (CSV for import into Excel), plain text and HTML. Exporting is available to paying customers only.
PDF extraction is not an exact science to say the least. If you don't get good results, check out the help pages or contact support.
The app does not support extraction of images, which makes it unsuitable for most scanned files. Password and copy protected files are not supported either.Training, Consulting and Software Development
We provide training and consulting as well as software development services in areas related to our core data science and data engineering specialty.