The goal of semanti.ca (pronounced seh-man-tee-kah) is to make the information on the Web accessible in its pure form. We build an AI-powered technology that looks at web pages and sees the information they contain. The modern web pages are noisy, user interface fashion and technology are constantly changing, but semanti.ca keeps bringing you clean, normalized and organized information from noisy and ever-changing Web. Whether you provide an online service that relies on web content, or a data-driven business-to-business software, do technology or competitive intelligence, you can rely on us.
semanti.ca is an AI-powered scalable web article data extraction API. To extract data, semanti.ca loads a web article in a browser and reads it, just like humans do. semanti.ca accurately recognizes titles, headlines, published and updated dates, images, captions, tags. It extracts the content text and the HTML code, by ignoring advertisements, design elements, and any other text or image not related to the main content.
semanti.ca is not tailored to some specific website UI or technology. It was trained on hundreds of thousands of examples and is capable of recognizing relevant elements on the web page, independently of how the web page was built. It actually "looks" at the web pages and recognizes the content by using a statistical model learned from data.
Furthermore, semanti.ca classifies the extracted content according to the IPTC Media Topics Taxonomy and extracts key phrases from the text. This helps our customers to organize the extracted content.
We believe that if you do something you have to do that better than anyone else. Otherwise, what is the point of doing anything?
To ensure the highest quality of our statistical models, we constantly add new training data to improve them. We monitor the data our clients send to us and analyze how our AI reacts to it. If our AI could have reacted better to some client's input, believe us, we will make sure that next time it will.