Today’s professionals can take advantage of the new data-related technologies. But due to the dynamic in the digital industry, completely new terminologies often change or emerge very rapidly.
So we have developed this data-related dictionary of terminologies to help our global customers or students stay up-to-date.
A to Z
Acid Test: A test applied to data for atomicity, consistency, isolation, and durability.
Ad Targeting: The attempt to reach a specific audience with a specific message, typically by either contacting them directly or placing contextual ads on the Web.
Ad-Hoc Reporting: Reports generated for a specific and one-time need.
Algorithm: In mathematics and computer science, an algorithm is a self-contained step-by-step set of operations to be performed. Algorithms exist that perform calculation, data processing, and automated reasoning. An algorithm is an effective method that can be expressed within a finite amount of space and time and in a well-defined formal language for calculating a function. Starting from an initial state and initial input (perhaps empty), the instructions describe a computation that, when executed, proceeds through a finite number of well-defined successive states, eventually producing “output” and terminating at a final ending state. The transition from one state to the next is not necessarily deterministic; some algorithms, known as randomized algorithms, incorporate random input.”
Analytics Governance: It’s the centralization of the generation of findings, based on the objectivity of the information within organizational structures, in a specific area or department that works out of the productive structure of the organization.
Analytics Platform: Software or software and hardware that provides the tools and computational power needed to build and perform many different analytical queries.
Analytics: Analytics is the set of practices which simultaneously uses statistics, computing platforms and operational market research in order to identify findings in large volumes of data.
Anonymization: The severing of links between people in a database and their records to prevent the discovery of the source of the records.
Application: Software that is designed to perform a specific task or suite of tasks.
Automatic Identification And Capture (Aidc): Automatic identification and data capture (AIDC) refers to the methods of automatically identifying objects, collecting data about them, and entering that data directly into computer systems (i.e. without human involvement). Technologies typically considered as part of AIDC include bar codes, Radio Frequency Identification (RFID), biometrics, magnetic stripes, Optical Character Recognition (OCR), smart cards, and voice recognition. AIDC is also commonly referred to as “Automatic Identification,” “Auto-ID,” and “Automatic Data Capture.”
Behavioral Analytics: Using data about people’s or browser’s behavior to understand intent and predict future actions.
BI: Business Intelligence it the big category that integrates a varied set of practices (Analytics, Data Mining, Statistics, Market Research, etc) to simplify the decision making proceses towards the improvement of the business results.
Big Data: Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, and information privacy.
Biometrics: The use of technology to identify people by one or more of their physical traits.
Brand Monitoring: The act of monitoring your brand’s reputation online, typically by using software to automate the process.
Call Detail Record (Cdr) Analysis: CDRs contain data that a telecommunications company collects about phone calls, such as time and length of call. This data can be used in any number of analytical application.
Cassandra: A popular choice of columnar database for use in big data applications. It is an open source database managed by The Apache Software Foundation.
Cell (Mobile) Phone Data: Cell phones generate a tremendous amount of data, and much of it is available for use with analytical applications.
Clickstream Analytics: The analysis of users’ Web activity through the items they click on a page.
Cloud Computing: Cloud computing refers to the practice of transitioning computer services such as computation or data storage to multiple redundant offsite locations available on the Internet, which allows application software to be operated using internetenabled devices. Clouds can be classified as public, private, and hybrid.
Cloud Storage: Cloud storage is a model of data storage where the digital data is stored in logical pools, the physical storage spans multiple servers (and often locations), and the physical environment is typically owned and managed by a hosting company. These cloud storage providers are responsible for keeping the data available and accessible, and the physical environment protected and running. People and organizations buy or lease storage capacity from the providers to store user, organization, or application data.
Cloud: A broad term that refers to any Internet-based application or service that is hosted remotely.
Columnar Database Or ColumnOriented Database: A database that stores data by column rather than by row. In a row-based database, a row might contain a name, address, and phone number. In a column-oriented database, all names are in one column, addresses in another, and so on. A key advantage of a columnar database is faster hard disk access
Competitive Monitoring: Keeping tabs of competitors’ activities on the Web using software to automate the process.
Complex Event Processing (Cep): Complex event processing, or CEP, is event processing that combines data from multiple sources to infer events or patterns that suggest more complicated circumstances. The goal of complex event processing is to identify meaningful events (such as opportunities or threats) and respond to them as quickly as possible. These events may be happening across the various layers of an organization as sales leads, orders or customer service calls. Or, they may be news items, text messages, social media posts, stock market feeds, traffic reports, weather reports, or other kinds of data.
Computer-Generated Data: Any data generated by a computer rather than a human–a log file for example. Concurrency: The ability to execute multiple processes at the same time.
Content Management System (CMS): Software that facilitates the management and publication of content on the Web.
Cross-Channel Analytics: Analysis that can attribute sales, show average order value, or the lifetime value.
Crowdsourcing: Crowdsourcing, a modern business term coined in 2005, is defined by Merriam-Webster as the process of obtaining needed services, ideas, or content by soliciting contributions from a large group of people, and especially from an online community, rather than from traditional employees or suppliers.
Customer Relationship Management (CRM): is an approach to managing a company’s interactions with current and future customers. It often involves using technology to organize, automate, and synchronize sales, marketing, customer service, and technical support.
Dark Data: Gartner defines dark data as the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). Similar to dark matter in physics, dark data often comprises most organizations’ universe of information assets. Thus, organizations often retain dark data for compliance purposes only. Storing and securing data typically incurs more expense (and sometimes greater risk) than value.
Dashboard: Is an easy to read, often single page, real-time user interface, showing a graphical presentation of the current status (snapshot) and historical trends of an organization’s or computer appliances key performance indicators to enable instantaneous and informed decisions to be made at a glance.
Data Access: The act or method of viewing or retrieving stored data.
Data Aggregation: The act of collecting data from multiple sources for the purpose of reporting or analysis.
Data Analyst: A person responsible for the tasks of modeling, preparing, and cleaning data for the purpose of deriving actionable information from it.
Data Architecture And Design: How enterprise data is structured. The actual structure or design varies depending on the eventual end result required. Data architecture has three stages or processes: 1. Conceptual representation of business entities.; 2. The logical representation of the relationships among those entities; 3. The physical construction of the system to support the functionality.
Data Center: A physical facility that houses a large number of servers and data storage devices. Data centers might belong to a single organization or sell their services to many organizations.
Data Cleansing: Data scrubbing, also called data cleansing, is the process of amending or removing data in a database that is incorrect, incomplete, improperly formatted, or duplicated.
Data Collection: Data collection is the process of gathering and measuring information on variables of interest, in an established systematic fashion that enables one to answer stated research questions, test hypotheses, and evaluate outcomes.
Data Custodian: The data custodian is usually the person responsible for, or the person with administrative control over, granting access to an organization’s documents or electronic files while protecting the data as defined by the organization’s security policy or its standard IT practices.
Data Exhaust: Data exhaust is the data generated as a byproduct of people’s online actions and choices. Data exhaust consists of the various files generated by web browsers and their plug-ins such as cookies, log files, temporary internet files and and .sol files (flash cookies).
Data Feed: Data feed is a mechanism for users to receive updated data from data sources. It is commonly used by real-time applications in point-to-point settings as well as on the World Wide Web.
Data Governance: Data governance is a control that ensures that the data entry by an operations team member or by an automated process meets precise standards, such as a business rule, a data definition and data integrity constraints in the data model. The data governor uses data quality monitoring against production data to communicate errors in data back to operational team members, or to the technical support team, for corrective
action action. Data governance is used by organizations to exercise control over processes and methods used by their data stewards and data custodians in order to improve data quality.
Data Integration: The process of combining data from different sources and presenting it in a single view.
Data Integrity: The measure of trust an organization has in the accuracy, completeness, timeliness, and validity of the data.
Data Management: Data Resource Management is the development and execution of architectures, policies, practices and procedures that properly manage the full data lifecycle needs of an enterprise (DAMA International).
Data Marketplace: A place where people can buy and sell data online.
Data Mart: The access layer of a data warehouse used to provide data to users.
Data Migration: The process of moving data between different storage types or formats, or between different computer systems.
Data Mining: Data mining is an interdisciplinary subfield of computer science is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.
Data Modeling: Data modeling is a process used to define and analyze data requirements needed to support the business processes within the scope of corresponding information systems in organizations. Therefore, the process of data modeling involves professional data modellers working closely with business stakeholders, as well as potential users of the information system.
Data Point: An individual item on a graph or a chart.
Data Profiling: Data profiling is the process of examining the data available in an existing data source (e.g. a database or a file) and collecting statistics and information about that data. The purpose of these statistics may be to find out whether existing data can easily be used for other purposes.
Data Quality: Data quality refers to the level of quality of Data. There are many definitions of data quality but data is generally considered high quality if, “they are fit for their intended uses in operations, decision making and planning.” (J. M. Juran).
Data Replication: The process of sharing information to ensure consistency between redundant sources.
Data Repository: The location of permanently stored data.
Data Science: Data Science is the extraction of knowledge from large volumes of data that are structured or unstructured, which is a continuation of the field data mining and predictive analytics, also known as knowledge discovery and data mining (KDD).
Data Scientist: A data scientist represents an evolution from the business or data analyst role. The formal training is similar, with a solid foundation typically in computer science and applications, modeling, statistics, analytics and math. What sets the data scientist apart is strong business acumen, coupled with the ability to communicate findings to both business and IT leaders in a way that can influence how an organization approaches a business challenge. Good data scientists will not just address business problems, they will pick the right problems that have the most value to the organization.
Data Set: A data set (or dataset) is a collection of data. Most commonly a data set corresponds to the contents of a single database table, or a single statistical data matrix, where every column of the table represents a particular variable, and each row corresponds to a given member of the data set in question.
Data Source: Any provider of data–for example, a database or a data stream.
Data Steward: A data steward is a person responsible for the management of data elements (also known as critical data elements) – both the content and metadata.
Data Structure: In computer science, a data structure is a particular way of organizing data in a computer so that it can be used efficiently. Data structures can implement one or more particular abstract data types(ADT), which are the means of specifying the contract of operations and their complexity.
Data Virtualization: The process of abstracting different data sources through a single data access layer.
Data Visualization: Data visualization refers to the techniques used to communicate data or information by encoding it as visual objects (e.g., points, lines or bars) contained in graphics. The goal is to communicate information clearly and efficiently to users. It is one of the steps in data analysis or data science.
Data Warehouse: In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis. DWs are central repositories of integrated data from one or more disparate sources.
Data-Directed Decision Making: Using data to support making crucial decisions.
Data: Data is a set of values of qualitative or quantitative variables; restated, pieces of data are individual pieces of information. Data is measured, collected and reported, and analyzed, whereupon it can be visualized using graphs or images. Data as a general concept refers to the fact that some existing information or knowledge is represented or coded in some form suitable for better usage or processing.
Database Administrator (Dba): A person, often certified, who is responsible for supporting and maintaining the integrity of the structure and content of a database.
Database As A Service (Daas): A database hosted in the cloud and sold on a metered basis. Examples include Heroku Postgres and Amazon Relational Database Service.
Database Management Systems (Dbms): Are computer software applications that interact with the user, other applications, and the database itself to capture and analyze data. A general-purpose DBMS is designed to allow the definition, creation, querying, update, and administration of databases. Well-known DBMSs include MySQL, PostgreSQL, Microsoft SQL Server, Oracle, Sybase and IBM DB2. A database is not generally portable across different DBMSs, but different DBMS can interoperate by using standards such as SQL and ODBC or JDBC to allow a single application to work with more than one DBMS. Database management systems are often classified according to the database model that they support; the most popular database systems since the 1980s have all supported the relational model as represented by the SQL language. Sometimes a DBMS is loosely referred to as a ‘database’.
Database: A database is an organized collection of data. It is the collection of schemas, tables, queries, reports, views and other objects. The data is typically organized to model aspects of reality in a way that supports processes requiring information, such as modelling the availability of rooms in hotels in a way that supports finding a hotel with vacancies.
De-Identification: The act of removing all data that links a person to a particular piece of information.
Deep Thunder: Deep Thunder is a research project by IBM that aims to improve short-term local weather forecasting through the use of highperformance computing. It is part of IBM’s Deep Computing initiative that also produced the Deep Blue chess computer.
Demographic Data: Data relating to the characteristics of a human population.
Distributed Cache: In computing, a distributed cache is an extension of the traditional concept of cache used in a single locale. A distributed cache may span multiple servers so that it can grow in size and in transactional capacity. It is mainly used to store application data residing in database and web session data.
Distributed Object: The term distributed objects usually refers to software modules that are designed to work together,but reside either in multiple computers connected via a network or in different processes inside the same computer.
Distributed Processing: Distributed computing is a field of computer science that studies distributed system. A distributed system is a software system in which components located on networked computers communicate and coordinate their actions by passing messages. The components interact with each other to achieve a common goal. Three significant characteristics of distributed systems are: concurrency of components, lack of a global clock, and independent failure of components. Examples of distributed systems vary from SOA based systems to massively multiplayer online games to peer-to peer applications.
Document Management: The practice of tracking and storing electronic documents and scanned images of paper documents.
Drill: An open source distributed system for performing interactive analysis on large-scale datasets. It is similar to Google’s Dremel, and is managed by Apache.
Elasticsearch: Elasticsearch is a search server based on Lucene. It provides a distributed, multitenantcapable full-text search engine with a RESTful web interface and schemafree JSON documents. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License. Elasticsearch is the second most popular enterprise search engine.
Enterprise Resource Planning (Erp): Enterprise resource planning (ERP) is business management software— typically a suite of integrated applications—that a company can use to collect, store, manage and interpret data from many business activities, including:
- Product planning, cost
- Manufacturing or service delivery
- Marketing and sales
- Inventory management
- Shipping and payment
Event Analytics: Shows the series of steps that led to an action.
Exabyte: One million terabytes, or 1 billion gigabytes of information.
External Data Serialization: External Data Representation (XDR) is a standard data serialization format, for uses such as computer network protocols. It allows data to be transferred between different kinds of computer systems. Converting from the local representation to XDR is called encoding. Converting from XDR to the local representation is called decoding. XDR is implemented as a software library of functions which is portable between different operating systems and is also independent of the transport layer.
External Data: Data that exists outside of a system.
Extract, Transform, And Load (Etl): In computing, Extract, Transform and Load (ETL) refers to a process in database usage and especially in data warehousing that:
- Extracts data from homogeneous or heterogeneous data sources
- Transforms the data for storing it in proper format or structure for querying and analysis purpose
Loads it into the final target (database, more specifically, operational data store, data mart, or data warehouse)
Failover: The automatic switching to another computer or node should one fail.
Federal Information Security Management Act (Fisma): A US federal law that requires all federal agencies to meet certain standards of information security across its systems.
Grid Computing: Grid computing is the collection of computer resources from multiple locations to reach a common goal. The grid can be thought of as a distributed system with noninteractive workloads that involve a large number of files.
Hadoop: Apache Hadoop is an opensource software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware.
Hana: A software/hardware in-memory computing platform from SAP designed for high-volume transactions and realtime analytics.
Hbase: A distributed columnar NoSQL database.
High-Performance Computing (Hpc): HPC systems, also called supercomputers, are often custom built from state-of-the-art technology to maximize compute performance, storage capacity and throughput, and data transfer speeds.
Hive: A SQL-like query and data warehouse engine.
In-Database Analytics: In-database processing, sometimes referred to as in-database analytics, refers to the integration of data analytics into data warehousing functionality.
In-Memory Data Grid (Imdg): The storage of data in memory across multiple servers for the purpose of greater scalability and faster access or analytics.
In-Memory Database: An in-memory database (IMDB; also main memory database system or MMDB or memory resident database) is a database management system that primarily relies on main memory for computer data storage. It is contrasted with database management systems that employ a disk storage mechanism.
Infonomics: Composed by the terms “information” and “economics”, Infonomics is the theory, study and discipline of asserting economic significance to information. It provides the framework for businesses to value, manage and wield information as a real asset. Infonomics endeavors to apply both economic and asset management principles and practices to the valuation, handling and deployment of information assets.
Information Management: Information management (IM) concerns a cycle of organisational activity: the acquisition of information from one or more sources, the custodianship and the distribution of that information to those who need it, and its ultimate disposition through archiving or deletion.
Internet Of Things (Iot): The Internet of Things (IoT, sometimes Internet of Everything) is the network of physical objects or “things” embedded with electronics, software, sensors, and connectivity to enable objects to exchange data with the manufacturer, operator and/or other connected devices based on the infrastructure of International Telecommunication Union’s Global Standards Initiative. The Internet of Things allows objects to be sensed and controlled remotely across existing network infrastructure, creating opportunities for more direct integration between the physical world and computer-based systems, and resulting in improved efficiency, accuracy and economic benefit.
Kafka: LinkedIn’s open-source message system used to monitor activity events on the web.
Latency: Latency is a time interval between the stimulation and response, or, from a more general point of view, as a time delay between the cause and the effect of some physical change in the system being observed. Latency is physically a consequence of the limited velocity with which any physical interaction can propagate. This velocity is always lower than or equal to the speed of light. Therefore every physical system that has spatial dimensions different from zero will experience some sort of latency, regardless of the nature of stimulation that it has been exposed to.
Ldw: Logical Data Warehouse.
Legacy System: Any computer system, application, or technology that is obsolete, but continues to be used because it performs a needed function adequately.
Linked Data: As described by World Wide Web inventor Time Berners-Lee, “Cherry-picking common attributes or languages to identify connections or relationships between disparate sources of data.”
Load Balancing: The process of distributing workload across a computer network or computer cluster to optimize performance.
Location Analytics: Location analytics brings mapping and map driven analytics to enterprise business systems and data warehouses. It allows you to associate geospatial information with datasets.
Location Data: Data that describes a geographic location.
Log File: In computing, a logfile is a file that records either events that occur in an operating system or other software runs, or messages between different users of a communication software. Logging is the act of keeping a log.
Long Data: A term coined by mathematician and network scientist Samuel Arbesman that refers to “datasets that have massive historical sweep.”
Machine Learning: Machine learning is a subfield of computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence. Machine learning explores the construction and study of algorithms that can learn from and make predictions on data.
Machine-Generated Data: Machine-generated data is information which was automatically created from a computer process, application, or other machine without the intervention of a human. However, there is some indecision as to the breadth of the term.
Mashup: A mashup, in web development, is a web page, or web application, that uses content from more than one source to create a single new service displayed in a single graphical interface. For example, a user could combine the addresses and photographs of their library branches with a Google map to create a map mashup.
Massively Parallel Processing (Mpp): Refers to the use of a large number of processors (or separate computers) to perform a set of coordinated computations in parallel (simultaneously).
Master Data Management (Mdm): master data management (MDM) comprises the processes, governance, policies, standards and tools that consistently define and manage the critical data of an organization to provide a single point of reference.
Metadata: Metadata is data that describes other data. Meta is a prefix that in most information technology usages means “an underlying definition or description.” Metadata summarizes basic information about data, which can make finding and working with particular instances of data easier.
Mongodb: MongoDB (from humongous) is a cross-platform document-oriented database. Classified as a NoSQL database, MongoDB eschews the traditional table-based relational database structure in favor of JSON-like documents with dynamic schemas (MongoDB calls the format BSON), making the integration of data in certain types of applications easier and faster. Released under a combination of the GNU Affero General Public License and the Apache License, MongoDB is free and open-source software.
Mpp Database: A database optimized to work in a massively parallel processing environment. Multi-Threading: A technique by which a single set of code can be used by several processors at different stages of execution for performance purposes.
Nexus Of Forces: The Nexus of Forces is a concept developed by consultancy firm Gartner Inc. that describes how the convergence and mutual strengthening of social media, mobility, cloud computing and information patterns are creating new business opportunities.
Nosql: A class of database management system that does not use the relational model. NoSQL is designed to handle large data volumes that do not follow a fixed schema. It is ideally suited for use with very large data volumes that do not require the relational model.
Olap: Online Analytical Processing
Olap Cubes: An OLAP cube is an array of data understood in terms of its 0 or more dimensions. OLAP is a computer-based technique for analyzing business data in the search for business intelligence.
Online Transactional Processing (Oltp): Online transaction processing, or OLTP, is a class of information systems that facilitate and manage transaction-oriented applications, typically for data entry and retrieval transaction processing.
Open Data Center Alliance (Odca): A consortium of global IT organizations whose goal is to speed the migration of cloud computing.
Opendremel: The open source version of Google’s Big Query java code. It is being integrated with Apache Drill.
Operational Data Store (Ods): An operational data store (or “ODS”) is a database designed to integrate data from multiple sources for additional operations on the data. Unlike a master data store the data is not passed back to operational systems. It may be passed for further operations and to the data warehouse for reporting.
Parallel Data Analysis: Breaking up an analytical problem into smaller components and running algorithms on each of those components at the same time. Parallel data analysis can occur within the same system or across multiple systems.
Parallel Method Invocation (Pmi): Allows programming code to call multiple functions in parallel.
Parallel Processing: In computers, parallel processing is the processing of program instructions by dividing them among multiple processors with the objective of running a program in less time. In the earliest computers, only one program ran at a time.
Parallel Query: A query that is executed over multiple system threads for faster performance.
Pattern Recognition: The classification or labelling of an identified pattern in the machine learning process.
Performance Management: Performance management is a process by which managers and employees work together to plan, monitor and review an employee’s work objectives and overall contribution to the organization.
Petabyte: One million gigabytes or 1,024 terabytes.
Pig: A data flow language and execution framework for parallel computation.
Predictive Analytics: Predictive analytics is the practice of extracting information from existing data sets in order to determine patterns and predict future outcomes and trends. Predictive analytics does not tell you what will happen in the future.
Predictive Modeling: Predictive modeling is a process used in predictive analytics to create a statistical model of future behavior. Predictive analytics is the area of data mining concerned with forecasting probabilities and trends.
Progressive Profiling: Progressive profiling is a marketing technique that allows digital marketers to ask for information incrementally instead of all at once. Over time leads will become more qualified because of their interaction with a website (and other digital properties) and will likewise deliver useful information to sales about demographics, firmographics, BANT criteria, etc.
Radio-Frequency Identification (Rfid): A technology that uses wireless communications to send information about an object from one point to another.
Raw Data: Is a collection of numbers, characters; data processing commonly occurs by stages, and the “processed data” from one stage may be considered the “raw data” of the next. Field data is raw data that is collected in an uncontrolled in situ environment. Experimental data is data that is generated within the context of a scientific investigation by observation and recording.
Real Time: A descriptor for events, data streams, or processes that have an action performed on them as they occur.
Recommendation Engine: A recommendation engine (sometimes referred to as a recommender system) is a tool that lets algorithm developers predict what a user may or may not like among a list of given items.
Records Management: Records management services (RM), also known as Records information management or RIM, is the professional practice or discipline of controlling and governing what are considered to be the most important records of an organization throughout the records life-cycle, which includes from the time such records are conceived through to their eventual disposal. This work includes identifying, classifying, prioritizing, storing, securing, archiving, preserving, retrieving, tracking and destroying of records.
Reference Data: Data that describes an object and its properties. The object may be physical or virtual. Report: A report or account is any informational work (usually of writing, speech, television, or film) made with the specific intention of relaying information or recounting certain events in a widely presentable form.
Risk Analysis: The application of statistical methods on one or more datasets to determine the likely risk of a project, action, or decision.
Root-Cause Analysis: The process of determining the main cause of an event or problem.
Sawzall: Google’s procedural domainspecific programming language designed to process large volumes of log records.
Scalability: The ability of a system or process to maintain acceptable performance levels as workload or scope increases.
Schema: A database schema of a database system is its structure described in a formal language supported by the database management system (DBMS) and refers to the organization of data as a blueprint of how a database is constructed (divided into database tables in the case of Relational Databases).
Search Data Sctructure: In computer science, a search data structure is any data structure that allows the efficient retrieval of specific items from a set of items, such as a specific record from a database.
Search Data: Aggregated data about search terms used over time.
Search: The process of locating specific data or content using a search tool.
Semantic Web: The Semantic Web is an extension of the Web through standards by the World Wide Web Consortium (W3C). The standards promote common data formats and exchange protocols on the Web, most fundamentally the Resource Description Framework (RDF). Semi-Structured Data: Data that is not structured by a formal data model, but provides other means of describing the data and hierarchies.
Sentiment Analysis: Sentiment analysis (also known as opinion mining) refers to the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information in source materials.
Server: A physical or virtual computer that serves requests for a software application and delivers those requests over a network.
Smart Grid: The smart grid refers to the concept of adding intelligence to the world’s electrical transmission systems with the goal of optimizing energy efficiency. Enabling the smart grid will rely heavily on collecting, analyzing, and acting on large volumes of data.
Software As A Service (Saas): Software as a service is a software licensing and delivery model in which software is licensed on a subscription basis and is centrally hosted. It is sometimes referred to as “on-demand software”. SaaS is typically accessed by users using a thin client via a web browser.
Storm: An open-source distributed computation system designed for processing multiple data streams in real time. Structured Data: Data that is organized by a predetermined structure.
Structured Query Language (Sql): Is a special-purpose programming language designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS).
Terabyte: 1,000 gigabytes.
Text Analytics: The application of statistical, linguistic, and machine learning techniques on text-based sources to derive meaning or insight.
Transactional Data: Transaction data are data describing an event (the change as a result of a transaction) and is usually described with verbs. Transaction data always has a time dimension, a numerical value and refers to one or more objects (i.e. the reference data). Typical transactions are: Financial, orders, invoices, payments.
Triz: TRIZ is a problem solving methodology based on logic, data and research, not intuition. It draws on the past knowledge and ingenuity of many thousands of engineers to accelerate the project team’s ability to solve problems creatively.
Unstructured Data: Unstructured Data (or unstructured information) refers to information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well.
V - W
Variable Pricing: Variable pricing is a pricing strategy for products. Traditional examples include auctions, stock markets, foreign exchange markets, bargaining and discounts.
Web Analytics Platform: An Analytics Platform that measures the behavior of users in a specific website (Website Centric).